03.12.2012 Views

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

C++ for Scientists - Technische Universität Dresden

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Technische</strong> <strong>Universität</strong> <strong>Dresden</strong><br />

Fakültät Mathematik und Naturwissenschaften<br />

Institut für wissenschaftliches Rechnen<br />

01062 <strong>Dresden</strong><br />

http://www.math.tu-dresden.de/~pgottsch/script/cpp <strong>for</strong> scientists.pdf<br />

Peter Gottschling<br />

C ++ für Wissenschaftler<br />

basierend auf einer gemeinsamen Vorlesung mit Karl Meerbergen<br />

mit Hilfe von Andrey Chesnokov, Yvette Vanberghen,<br />

Kris Demarsin und Yao Yue<br />

und Beiträgen von René Heinzl und Philipp Schwaha<br />

Stand 16. Januar 2012


Copyright c○ 2010 Copyright (c); Peter Gottschling, René Heinzl, Karl Meerbergen, and<br />

Philipp Schwaha


Contents<br />

I Understanding <strong>C++</strong> 7<br />

Introduction 9<br />

0.1 Programming languages <strong>for</strong> scientific programming . . . . . . . . . . . . . . . . . 9<br />

0.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1 Good and Bad Scientific Software 11<br />

2 <strong>C++</strong> Basics 19<br />

2.1 Our First Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

2.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br />

2.3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.4 Expressions and Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.5 Control statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

2.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

2.7 Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

2.8 Structuring Software Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

2.9 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

2.10 Pointers and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

2.11 Real-world example: matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

2.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

2.13 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />

3 Classes 65<br />

3.1 Program <strong>for</strong> universal meaning not <strong>for</strong> technical details . . . . . . . . . . . . . . 65<br />

3.2 Class members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

3.3 Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />

3.4 Destructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

3.5 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75<br />

3.6 Automatically Generated Operators . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

3.7 Accessing object members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />

3.8 Other Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />

4 Generic programming 89<br />

4.1 Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

4.2 Generic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

3


4 CONTENTS<br />

4.3 Generic classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

4.4 Concepts and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br />

4.5 Inheritance or Generics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

4.6 Template Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

4.7 Non-Type Parameters <strong>for</strong> Templates . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

4.8 Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

4.9 STL — The Mother of All Generic Libraries . . . . . . . . . . . . . . . . . . . . . 121<br />

4.10 Cursors and Property Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123<br />

4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128<br />

5 Meta-programming 133<br />

5.1 Let the Compiler Compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134<br />

5.2 Providing Type In<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135<br />

5.3 Expression Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150<br />

5.4 Meta-Tuning: Write Your Own Compiler Optimization . . . . . . . . . . . . . . . 156<br />

5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185<br />

6 Inheritance 187<br />

6.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />

6.2 Dynamic Selection by Sub-typing . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />

6.3 Remove Redundancy With Base Classes . . . . . . . . . . . . . . . . . . . . . . . 189<br />

6.4 Casting Up and Down and Elsewhere . . . . . . . . . . . . . . . . . . . . . . . . . 189<br />

6.5 Barton-Nackman Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />

7 Effective Programming: The Polymorphic Way 199<br />

7.1 Imperative Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

7.2 Generic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br />

7.3 Programming with Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206<br />

7.4 Functional Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209<br />

7.5 From Monomorphic to Polymorphic Behavior . . . . . . . . . . . . . . . . . . . . 212<br />

7.6 Best of Both Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221<br />

II Using <strong>C++</strong> 223<br />

8 Finite World of Computers 225<br />

8.1 Mathematical Objects inside the Computer . . . . . . . . . . . . . . . . . . . . . 225<br />

8.2 More Numbers and Basic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 226<br />

8.3 A Loop and More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230<br />

8.4 The Other Way Around . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231<br />

9 How to Handle Physics on the Computer 233<br />

9.1 Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233<br />

9.2 Again, Integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234<br />

10 Programming tools 235<br />

10.1 GCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235<br />

10.2 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236<br />

10.3 Valgrind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br />

10.4 Gnuplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239


CONTENTS 5<br />

10.5 Unix and Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240<br />

11 <strong>C++</strong> Libraries <strong>for</strong> Scientific Computing 243<br />

11.1 GLAS: Generic Linear Algebra Software . . . . . . . . . . . . . . . . . . . . . . . 243<br />

11.2 Boost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244<br />

11.3 Boost.Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245<br />

11.4 Matrix Template Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

11.5 Blitz++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

11.6 Graph Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

11.7 Geometric Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />

12 Real-World Programming 253<br />

12.1 Transcending Legacy Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 253<br />

13 Parallelism 259<br />

13.1 Multi-Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259<br />

13.2 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259<br />

14 Numerical exercises 263<br />

14.1 Computing an eigenfunction of the Poisson equation . . . . . . . . . . . . . . . . 263<br />

14.2 The 2D Poisson equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272<br />

14.3 The solution of a system of differential equations . . . . . . . . . . . . . . . . . . 272<br />

14.4 Google’s Page rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274<br />

14.5 The bisection method <strong>for</strong> finding the zero of a function in an interval . . . . . . . 276<br />

14.6 The Newton-Raphson method <strong>for</strong> finding the minimum of a convex function . . . 278<br />

14.7 Sequential noise reduction of real-time measurements by least squares . . . . . . 281<br />

15 Programmierprojekte 285<br />

15.1 Exponisation von Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285<br />

15.2 Exponisation von Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.3 LU-Zerlegung für m × n-Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.4 Bunch-Kaufman Zerlegung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.5 Konditionszahl (reziprok) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />

15.6 Matrix-Skalierung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287<br />

15.7 QR mit Überschreiben . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287<br />

15.8 Direkter Löser für schwach besetzte Matrizen . . . . . . . . . . . . . . . . . . . . 287<br />

15.9 Anwendung MTL4 auf Typen der Intervallarithmetik . . . . . . . . . . . . . . . . 288<br />

15.10Anwendung MTL4 auf Typen mit höherer Genauigkeit . . . . . . . . . . . . . . . 289<br />

15.11Anwendung MTL4 auf AD-Typen . . . . . . . . . . . . . . . . . . . . . . . . . . 289<br />

16 Acknowledgement 291


6 CONTENTS


Part I<br />

Understanding C ++<br />

7


Introduction<br />

“It would be nice if every kind of numeric software could be written in <strong>C++</strong> without<br />

loss of efficiency, but unless something can be found that achieves this without compromising<br />

the <strong>C++</strong> type system it may be preferable to rely on Fortran, assembler<br />

or architecture-specific extensions.”<br />

— Bjarne Stroustrup.<br />

The purpose of this script is doing you this favor, Bjarne. Amongst others. Conversely, the<br />

reader of this book shall learn the best way to benefit from C ++ features <strong>for</strong> writing scientific<br />

software. It is not our goal to explain all C ++ features in a well-balanced manner. We rather<br />

aim <strong>for</strong> an application-driven illustration of features that are valuable <strong>for</strong> writing<br />

• Well-structured;<br />

• Readable;<br />

• Maintanable;<br />

• Extensible;<br />

• Type-safe;<br />

• Reliable;<br />

• Portable; and last but not least<br />

• Highly per<strong>for</strong>ming<br />

software.<br />

0.1 Programming languages <strong>for</strong> scientific programming<br />

Scientific programming is an old discipline in computer science. The first applications on computers<br />

were indeed computations. In the early decades, ALGOL was a relatively popular programming<br />

language, competing with FORTRAN. FORTRAN 77 became a standard in scientific<br />

programming because of its efficiency and portability. Other computer languages were developed<br />

in computer science but not frequently used in scientific computing : C, Ada, Java, C ++.<br />

They were merely used in universities and labs <strong>for</strong> research purposes.<br />

9


10<br />

C ++ was not a reliable computer language in the nineties : code was not portable, object<br />

code not efficient and had a large size. This made C ++ unpopular in scientific computing. This<br />

changed at the end of the nineties : the compilers produced more efficient code, and the standard<br />

was more and more supported by compilers. Especially the ability to inline small functions and<br />

the introduction of complex numbers in the C99 standard made C ++ more attractive to scientific<br />

programmers.<br />

Together with the development of compilers, numerical libraries are being developed in C ++<br />

that offer great flexibility together with efficiency. This work is still ongoing and more and<br />

more software is being written in C ++. Currently, other languages used <strong>for</strong> numerics are FOR-<br />

TRAN 77 (even new codes!), Fortran 95, and Matlab. More and more becoming popular is<br />

Python. The nice thing about Python is that it is relatively easy to link C ++ functions and<br />

classes into Python scripts. Writing such interfaces is not a subject of this course.<br />

The goal of this course is to introduce students to the exciting world of C ++ programming <strong>for</strong><br />

scientific applications. The course does not offer a deep study of the programming language<br />

itself, but rather focuses on those aspects that make C ++ suitable <strong>for</strong> scientific programming.<br />

Language concepts are introduced and applied to numerical programming, together with the<br />

STL and Boost.<br />

Starting C ++ programmers often adopt a Java programming style: both languages are object<br />

oriented, but there are subtle differences that allow C ++ to produce more compact expressions.<br />

For example, C ++ classes typically do not have getters and setters as this is often the case<br />

in Java classes. This will be discussed in more detail in the course. We use the following<br />

convention, which is also used by Boost, that is one of the good examples of C ++ software.<br />

Classes and variables are denoted by lower case characters. Underscores are used as separator<br />

in symbols. An exception are matrices that are written as single capitals <strong>for</strong> the simularity with<br />

the mathematical notation. Mixed upper and lower case characters (CamelCase) are typically<br />

used <strong>for</strong> concepts. Constants are often (as in C) written in capital.<br />

0.2 Outline<br />

The topics that will be discussed are several aspects of the syntax of C ++, illustrated by small<br />

numerical programs, an introduction to meta programming, expression templates, STL, boost,<br />

MTL4, and GLAS. We will also discuss interoperability with other languages. The first three<br />

chapters discuss basic language aspects, such as functions, types, and classes, inheritance and<br />

generic programming, include examples from STL. The remaining chapters discuss topics that<br />

are of great importance <strong>for</strong> numerical applications: functors, expression templates, and interoperability<br />

with FORTRAN and C.


Good and Bad Scientific Software<br />

Chapter 1<br />

This chapter will give you an idea what we consider good scientific software and what not. If<br />

you have never programmed be<strong>for</strong>e in your life you might wish to skip the entire chapter. This<br />

is o.k. because if you had no contact with the program sources of bad software you can learn<br />

programming with a pure mind.<br />

If you have some software knowledge, there might be still some details you will not understand<br />

right now but this is no reason to worry. If you do not understand it after reading this script<br />

then you can start worrying, or we as authors could. This chapter is only about getting a feeling<br />

what distinguishes good from bad software in science.<br />

As foundation of our discussion — and to not start the book with hello world — we consider an<br />

iterative method to solve system of linear equations Ax = b where A is a symmetric positivedefinite<br />

(SPD) matrix, x and b are vectors, and x is searched. The method is called ‘Conjugate<br />

Gradients’ (CG) and was introduced by Magnus R. Hestenes and Eduard Stiefel [?].<br />

The mathematical details do not matter here but the different styles of implementation. The<br />

algorithm can be written in the following <strong>for</strong>m: 1<br />

Algorithm 1: Conjugate Gradient Method Algorithm.<br />

Input: SPD matrix A, vector b, and left preconditioner L, termination criterion ε.<br />

Output: Vector x such Ax ≈ b.<br />

1 r = b − Ax<br />

2 while |r| ≥ ε do<br />

z = L−1 3<br />

r<br />

4 ρ = 〈r, z〉<br />

5 if First iteration then<br />

6 p = z<br />

7<br />

8<br />

9<br />

10<br />

11<br />

12<br />

13<br />

else<br />

p = z + ρ<br />

ρ ′ p<br />

q = Ap<br />

α = ρ/〈p, q〉<br />

x = x + αp<br />

r = r − αq<br />

ρ ′ = ρ<br />

1 This is not precisely the original notation but a slightly adapted version that introduces some extra variables<br />

to avoid redundant calculations.<br />

11


12 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />

Programmers trans<strong>for</strong>m this mathematical notation into a <strong>for</strong>m that a compiler understands,<br />

by using operations from the language. The result could look like Listing 1.1. Do not read it<br />

in detail, just skim it.<br />

#include <br />

#include <br />

double one norm(int size, double ∗vp)<br />

{<br />

int i;<br />

double sum= 0;<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

sum+= fabs(vp[i]);<br />

return sum;<br />

}<br />

double dot(int size, double ∗vp, double ∗wp)<br />

{<br />

int i;<br />

double sum= 0;<br />

<strong>for</strong> (inti= 0; i < size; i++)<br />

sum+= vp[i] ∗ wp[i];<br />

return sum;<br />

}<br />

int cg(int size, int nnz, int∗ aip, int∗ ajp, double∗ avp,<br />

double ∗x, double ∗b, void (∗lpre)(int, double∗, double∗), double eps)<br />

{<br />

int i, j, iter= 0;<br />

double rho, rho 1, alpha;<br />

double ∗p= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗q= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗r= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗z= (double∗) malloc(size ∗ sizeof(double));<br />

// r= b;<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

r[i]= b[i];<br />

// r−= A∗x;<br />

<strong>for</strong> (i= 0; i < nnz; i++)<br />

r[aip[i]]−= avp[i] ∗ b[ajp[i]];<br />

while (one norm(size, r) >= eps) {<br />

// z = solve(L, r);<br />

(∗lpre)(size, z, r); // function pointer call<br />

rho= dot(size, r, z);<br />

if (!iter) {<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

p[i]= z[i];<br />

} else {<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

p[i]= z[i] + rho / rho 1 ∗ p[i];


}<br />

}<br />

// q= A ∗ p;<br />

<strong>for</strong> (i= 0; i < size; i++)<br />

q[i]= 0;<br />

<strong>for</strong> (i= 0; i < nnz; i++)<br />

q[aip[i]]+= avp[i] ∗ p[ajp[i]];<br />

alpha= rho / dot(size, p, q);<br />

// x+= alpa ∗ p; r−= alpha ∗ q;<br />

<strong>for</strong> (i= 0; i < size; i++) {<br />

x[i]+= alpha ∗ p[i];<br />

r[i]−= alpha ∗ q[i];<br />

}<br />

iter++;<br />

}<br />

free(q); free(p); free(r); free(z);<br />

return iter;<br />

void ic 0(int size, double∗ out, double∗ in) { /∗ .. ∗/ }<br />

int main (int argc, char∗ argv[])<br />

{<br />

int nnz, size;<br />

}<br />

// set nnz and size<br />

int ∗aip= (int∗) malloc(nnz ∗ sizeof(double));<br />

int ∗ajp= (int∗) malloc(nnz ∗ sizeof(double));<br />

double ∗avp= (double∗) malloc(nnz ∗ sizeof(double));<br />

double ∗x= (double∗) malloc(size ∗ sizeof(double));<br />

double ∗b= (double∗) malloc(size ∗ sizeof(double));<br />

// set A and b<br />

cg(size, nnz, aip, ajp, avp, x, b, ilu, 1e−9);<br />

return 0 ;<br />

Listing 1.1: Low Abstraction Implementation of CG<br />

As said be<strong>for</strong>e the details do not matter here but only the principal approach. The good thing<br />

about this code is that it is self-contained. But this is about the only advantage. The problem<br />

with this implemenation is its low abstraction level. This creates three major disadvantages:<br />

• Bad readability;<br />

• No flexibility; and<br />

• High error-proneness.<br />

The bad readability manifests in the fact that almost every operation is implemented in one<br />

or multiple loops. For instance, would we have found the matrix vector multiplication q = Ap<br />

13


14 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />

without the comments? We would easily catch where the variables representing q, A, and p<br />

are used but to see that this is a matrix vector product will take a closer look and a good<br />

understanding how the matrix is stored.<br />

This leads us to the second problem: the implementation commits to many technical details<br />

and only works in precisely this context. Algorithm 1 only requires that matrix A is symmetric<br />

positive-definite but it does not demand a certain storage scheme. There are many other sparse<br />

matrix <strong>for</strong>mats that we can all use in the CG method but not with this implementation. The<br />

matrix <strong>for</strong>mat is not the only detail the code commits to. What if we want to compute in lower<br />

(float) or higher precision (long double)? Or solve a complex linear system? For every such<br />

new CG application, we need a new implementation. Needless to say that running on parallel<br />

computers or exploring GPGPU (General-Purpose Graphic Processing Units) acceleration<br />

needs reimplementations as well. Much worse, every combination of the above needs a new<br />

implementation.<br />

Some of the readers might think: “It is only one function of 20–30 lines. Rewriting this little<br />

function, how much work can this be. And we do not introduce new matrix <strong>for</strong>mats or computer<br />

architectures every month.” Certainly true but in some sense it is putting the cart be<strong>for</strong>e<br />

the horse. Because of such inflexible and detail-obsessed programming style, many scientific<br />

applications grew into the 100,000s and millions of lines of code. Once an application or library<br />

reached such a monstruous size, it is very arduous modifying features of the software and only<br />

rarely done. The road to success is starting scientific software from a higher level of abstraction<br />

from the beginning, even if it is more work initially.<br />

The last major disadvantage is how error-prone it is. All arguments are given as pointers and<br />

the size of the underlying arrays is given as an extra argument. We as programmer of function<br />

cg can only hope that the caller did everything right because we have no way to verify it. If<br />

the user does not allocate enough memory (or does not allocate at all) the execution will crash<br />

at some more or less random position or even worse, will generate some nonsensical results<br />

because data and software can be randomly owerwritten. Good programmers must avoid such<br />

fragile interfaces because the slightest mistake can have catastrophic consequences and the<br />

program errors are extremely difficult to find. Un<strong>for</strong>tunately, even recently released and widely<br />

used software is written in this manner, either <strong>for</strong> backward-compatibility to C and Fortran or<br />

because it is written in one of these two languages. In fact, the implementation above is C and<br />

not C ++. If this is way you love software, you probably will not like this script.<br />

So much about software we do not like. In Listing 1.2 we show how scientific software could<br />

look like.<br />

// This source is part of MTL4<br />

#include <br />

#include <br />

template < typename LinearOperator, typename HilbertSpaceX, typename HilbertSpaceB,<br />

typename Preconditioner, typename Iteration ><br />

int conjugate gradient(const LinearOperator& A, HilbertSpaceX& x, const HilbertSpaceB& b,<br />

const Preconditioner& L, Iteration& iter)<br />

{<br />

typedef HilbertSpaceX Vector;<br />

typedef typename mtl::Collection::value type Scalar;<br />

Scalar rho(0), rho 1(0), alpha(0);


Vector p(resource(x)), q(resource(x)), r(resource(x)), z(resource(x));<br />

r = b − A∗x;<br />

while (! iter.finished(r)) {<br />

z = solve(L, r);<br />

rho = dot(r, z);<br />

if (iter.first())<br />

p = z;<br />

else<br />

p = z + (rho / rho 1) ∗ p;<br />

q = A ∗ p;<br />

alpha = rho / dot(p, q);<br />

x += alpha ∗ p;<br />

r −= alpha ∗ q;<br />

rho 1 = rho;<br />

++iter;<br />

}<br />

return iter;<br />

}<br />

int main (int argc, char∗ argv[])<br />

{<br />

int size;<br />

}<br />

// set size<br />

mtl::compressed2D A(size, size);<br />

mtl::dense vector x(size), b(size);<br />

// set A and b<br />

// Create preconditioner<br />

itl::pc::ic 0 L(A);<br />

// Object that controls iteration, terminate if residual is below 10ˆ−9 or decrease<br />

// by 6 orders of magnitude, abord after 30 iterations if not converged<br />

itl::basic iteration iter(b, 30, 1.e−6, 1.e−9);<br />

conjugate gradient(A, x, b, L, iter);<br />

return 0 ;<br />

Listing 1.2: High Abstraction Implementation of CG<br />

The first thing you might realize is that the CG implementation is readable without comments.<br />

As a thumb of rule, if other people’s comments look like your program sources then you are<br />

a really good programmer. If you compare the mathematical notation in Algorithm 1 with<br />

Listing 1.2 you will realize that — except <strong>for</strong> the type and variable declarations at the beginnig<br />

— they are identical. Some readers might think that it looks more like Matlab or Mathematica<br />

15


16 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />

than C ++. Yes, C ++ can look like this if one puts enough ef<strong>for</strong>t in good software.<br />

Evidently, it is also much easier to write algorithms at this abstraction level than expressing it<br />

with low-level operations.<br />

The Purpose of Scienfic Software<br />

<strong>Scientists</strong> shall do science.<br />

Excellent scientific software is expressed only in mathematical and domainspecific<br />

operations without any technical detail exposed.<br />

At this abstraction level, scientists can focus on models and algorithms, being<br />

much more productive and progress scientific discovery.<br />

Nobody knows how many scientists wasting how much time every year dwelling on small technical<br />

details of bad software like in Listing 1.1. Of course, the technical details have to be realized<br />

in some place but not in a scientific application. This is the worst possible location. Use a<br />

two-level approach: write your applications in terms of expressive mathematical operations and<br />

if they do not exist, implement them separately. These mathematical operations must be carefully<br />

implemented <strong>for</strong> maximal per<strong>for</strong>mance or use other operations with maximal per<strong>for</strong>mance.<br />

Investing time in the per<strong>for</strong>mance of these fundamental operations is highly rentable because<br />

the functions will be reused very often.<br />

Advise<br />

Use the right abstractions!<br />

If they do not exist, implement them.<br />

Speaking of abstractions, the CG implementation in Listing 1.2 does not commit to any technical<br />

detail. In no place, the function is restricted to a numerical type like double. It works as well<br />

<strong>for</strong> float, GNU’s multiprecision, complex, interval arithmetic, quaternions, . . .<br />

The matrix A can have any internal <strong>for</strong>mat, as long as it can be multiplied with a vector<br />

it can be used in the function. In fact, it does not even need to be matrix but can be any<br />

linear operator. For instance, an object that per<strong>for</strong>ms a Fast Fourier Trans<strong>for</strong>mation (FFT)<br />

on a vector can be used as A when the FFT is expressed by a product of A with the vector.<br />

Similarly, the vectors do not need to be represented by finite-dimensional arrays but can be<br />

elements of any vector space that is somehow computer-representable as long as all operations<br />

in the algorithm can be per<strong>for</strong>med.<br />

We are also open to other computer architectures. If the matrix and the vectors are distributed<br />

over the nodes of a parallel supercomputer and according parallel operations are available, the<br />

functions runs in parallel without changing any single line. (GP)GPU acceleration can be also<br />

realized within the data structures and their operations without changing the algorithm. In


general, any existing or new plat<strong>for</strong>m that is supported in the operations of the matrix and<br />

vector types is also supported by our ‘Generic’ conjugate gradient function. As mentioned<br />

be<strong>for</strong>e, we do not even need to change it. If we have a sophisticated scientific application of<br />

several thousand lines (not 100,000s) written with appropriate abstractions, we need to modify<br />

it either.<br />

Starting with the next chapter, we will explain you how to write good scientific software.<br />

17


18 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE


C ++ Basics<br />

Chapter 2<br />

In this first chapter we will briefly introduce some basic knowledge about C ++. A useful site<br />

with a reference manual to C ++ commands is http://www.cplusplus.com/.<br />

2.1 Our First Program<br />

As an introduction to the C ++ language, let us look at the following example:<br />

#include <br />

int main ()<br />

{<br />

std::cout ≪ ”Answer to the Ultimate Question of Life, the Universe, and Everything is ”<br />

≪ 6 ∗ 7 ≪ std::endl;<br />

return 0;<br />

}<br />

according to Douglas Adams’ “Hitchhiker’s Guide to the Galaxy.” This short example shows<br />

already many things about C ++:<br />

• The first line includes a file name “iostream.” Whatever is defined in this file will be<br />

defined in our program as well. The file “iostream” contains the standard I/O of C ++.<br />

Input and output is not part of the core language in C ++ but part of the standard libraries.<br />

This means that we cannot program I/O commands without including “iostream” (or<br />

something similar). But it also means that this file must exist in every compiler because<br />

it is part of the standard. Include commands should be at the beginning of the file if<br />

possible.<br />

• the main program is called main and has an integer return value, which is set to 0 by the<br />

return command. The caller of a program (usually the operating system) knows that it<br />

finished successfully when a 0 is returned. A return code other than 0 symbolizes that<br />

something went wrong and often the return code also says something about what went<br />

wrong.<br />

• Braces “{ }” denote a block/group of code (also called a compound statement). Variables<br />

declared within “{ }” groups are only accessible within this block.<br />

19


20 CHAPTER 2. <strong>C++</strong> BASICS<br />

• std::cout and std::endl are defined in “iostream.” The <strong>for</strong>mer is an output stream that prints<br />

text on the screen (unless it is redirected). With std::endl a line is terminated.<br />

• The special operator≪ is used to pass objects to the output to an stream std::cout that is<br />

to print it on that stream.<br />

• The double quotes surround string constants, more precisely string literals. This is the<br />

same as in C. For string manipulation, however, one should use C ++’s string class instead<br />

of C’s cumbersome and error-prone functions.<br />

• The expression 6 ∗ 7 is evaluated and a temporary integer is passed to std::cout. In C ++<br />

everything has a type. Sometimes we as programmers have to declare the type and<br />

sometimes the compiler deduces it <strong>for</strong> us. 6 and 7 are literal constants that have type int<br />

and so has their product.<br />

This was a lot of in<strong>for</strong>mation <strong>for</strong> such a short program. So let us start step by step.<br />

TODO: A little explanation how to compile and run it. For g++ and Visual Studio.<br />

2.2 Variables<br />

In contrast to most scripting languages C ++ is strongly typed, that is every variable has a type<br />

and this type never change. A variable is declared by a statement TYPE varname. 1 Basic types<br />

are int, unsigned int, long, float, double, char, and bool.<br />

int integer1 = 2;<br />

int integer2, integer3;<br />

float pi = 3.14159;<br />

char mycharacter = ’a’;<br />

bool cmp = integer1 < pi;<br />

Each statement has to be terminated by a “;”. In the following section, we show operations<br />

that are often applied to integer and float types. In contrast to other languages like Python,<br />

where ’ and ” is used <strong>for</strong> both characters and strings, C ++ distinguishes between the two of<br />

them. The C ++ compiler considers ’a’ as the character ‘a’ (it has type char) and ”a” is the string<br />

containing ‘a’ (it has type char[1]). If you are used to Python please pay attention to this.<br />

Advise<br />

Define variables right be<strong>for</strong>e using them the first time. This makes your<br />

programs more readable when they grow long. It also allows the compiler to<br />

use your memory more efficiently when you have nested scopes (more details<br />

later). Old C versions required to define all variables at the beginning of a<br />

function and several people stick to this till today. However, in C ++ it leads<br />

generally to higher efficiency and more importantly to higher readability to<br />

define variables as late as possible.<br />

1 TODO: too simple, variable lists and in-place initialization is missing


2.2. VARIABLES 21<br />

2.2.1 Constants<br />

Syntactically, constants are like special variables in C ++ with the additional attribute of immutability.<br />

const int integer1 = 2;<br />

const int integer3; // Error<br />

const float pi = 3.14159;<br />

const char mycharacter = ’a’;<br />

const bool cmp = integer1 < pi;<br />

As they cannot be changed, it is mandatory to set the value in the definition. The second<br />

constant definition violates this rule and the compiler will complain about it.<br />

Constants can be used where ever variables are allowed — as long as they are not modified, of<br />

course. On the other hand, constants like those above are already known during compilation.<br />

This enables many kinds of optimizations and the constants can be even used as arguments of<br />

types (we will come back to this later).<br />

2.2.2 Literals<br />

Literals like “2” or “3.14” have types as well. Simply spoken, integral numbers are treated as<br />

int, long or unsigned long depending on the number of digits. Every number with a dot or an<br />

exponent (e.g. 3e12 ≡ 3 · 10 12 ) is considered a double.<br />

Usually this does not matter much in practice since C ++ has implicit conversation between<br />

built-in numeric types and most programs work well without explicitly specifying the type of<br />

the literals. There are however three major reasons why paying attention to the types of literals:<br />

• Availability;<br />

• Ambiguity and<br />

• Accuracy.<br />

Without going into detail here, the implicit conversation is not used with template functions<br />

(<strong>for</strong> good reasons). The standard library provides a type <strong>for</strong> complex numbers where the type<br />

<strong>for</strong> the real and imaginary part can be parametrized by the user:<br />

std::complex z(1.3, 2.4), z2;<br />

These complex numbers provide of course the common operations. However, when we write:<br />

z2= 2 ∗ z; // error<br />

z2= 2.0 ∗ z; // error<br />

we will get an error message that the multiplication is not available. More specifically, the<br />

compiler will tell us that there is no operator∗() <strong>for</strong> int and std::complex respectively <strong>for</strong><br />

double and std::complex. 2 The library provides a multiplication <strong>for</strong> the type that we use<br />

<strong>for</strong> the real and imaginary part, here float. There are two ways to ascertain that “2” is float:<br />

z2= float(2) ∗ z;<br />

z2= 2.0f ∗ z;<br />

2 It is however possible to implement std::complex in a fashion such that these expressions work [Got11].


22 CHAPTER 2. <strong>C++</strong> BASICS<br />

In the first case, we have an int literal that is converted into float and in the second case, the<br />

literal is float from the beginning. For the sake of clarity, the float literal is preferable.<br />

Later in this book we will introduce function overloading, that is a function with different<br />

implementations <strong>for</strong> different argument types (or argument tuples). The compiler selects the<br />

function overload that fits best. Sometimes the best fit is not clear, <strong>for</strong> instance if function f<br />

accepts an unsigned or a pointer and we call:<br />

f(0);<br />

“0” is considered as int and can be implicitly converted into unsigned or any pointer type. None<br />

of the conversions is prioritized. As be<strong>for</strong>e we can address the issue by explicit conversion and<br />

by a literal of the desired type:<br />

f(unsigned(0));<br />

f(0u);<br />

Again, we prefer the second version because it is more direct (and shorter).<br />

The accuracy issue comes up when work with long double. On the author’s computer, the <strong>for</strong>mat<br />

can handle at least 19 digits. Let us define one third with 20 digits and print out 19 of it:<br />

long double third= 0.3333333333333333333;<br />

cout.precision(19);<br />

cout ≪ ”One third is ” ≪ third ≪ ”.\n”;<br />

The result is:<br />

One third is 0.3333333333333333148.<br />

The program behavior is more satisfying if we append an “l” to the number:<br />

long double third= 0.3333333333333333333l;<br />

yielding the print-out that we hoped <strong>for</strong>:<br />

One third is 0.3333333333333333333.<br />

The following table gives examples of literals and their type:<br />

Literal Type<br />

2 int<br />

2u unsigned<br />

2l long<br />

2ul unsigned long<br />

2.0 double<br />

2.0f float<br />

2.0l long double<br />

For more details, see <strong>for</strong> instance [Str97, § 4.4f,§ C.4]. There you also find a description how to<br />

define literals on an octal or hexadecimal basis.


2.2. VARIABLES 23<br />

2.2.3 Scope of variables<br />

Global definition: Every variable that we intend to use in a program must have been declared<br />

with its type specifier at an earlier point in the code. A variable can be either of global or local<br />

scope. A global variable is a variable that has been declared in the main body of the source<br />

code, outside all functions. After declaration, global variables can be referred from anywhere in<br />

the code, even inside functions. This sounds very handy because it is easily available but when<br />

your software grows it becomes more difficult and painful to keep track of the global variables’<br />

modifications. At some point, every code change bears the potential of triggering an avalanche<br />

of errors. Just do not use global variables. Sooner or later you will regret this. Believe us.<br />

Global constants like<br />

const double pi= 3.14159265358979323846264338327950288419716939;<br />

are fine because they cannot cause side effects.<br />

Local definition: Opposed to it, a local variable is declared within the body of a function<br />

or a block. Its visibility/availability is limited to the block enclosed in curly braces { } where<br />

it is declared. More precisely, the scope of a variable is from its definition to the end of the<br />

enclosing braces. Recalling the example of output streams<br />

int main ()<br />

{<br />

std::ofstream myfile(”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

return 0;<br />

}<br />

the scope of myfile is the from its definition to the end of function main. If we would write:<br />

int main ()<br />

{<br />

int a= 5;<br />

{<br />

std::ofstream myfile(”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

}<br />

myfile ≪ ”a is ” ≪ a ≪ std::endl; // error<br />

return 0;<br />

}<br />

then the second output is not valid because myfile is out of scope. The program would not<br />

compile and the compiler would tell you something like “myfile is not defined in this<br />

scope”.<br />

Hiding: If variables with the same name exist in different scopes then only variable is visible<br />

the others are hidden. A variables in an inner scope hides all variables in outer scopes. For<br />

instance: 3<br />

3 TODO: Picture would be nice.


24 CHAPTER 2. <strong>C++</strong> BASICS<br />

int main ()<br />

{<br />

int a= 5; // define #1<br />

{<br />

a= 3; // assign #1, #2 is not defined yet<br />

int a; // define #2<br />

a= 8; // assign #2, #1 is hidden<br />

{<br />

a= 7; // #2<br />

}<br />

} // end of #2’s scope<br />

a= 11; // #1, #2 is now out of scope<br />

return 0;<br />

}<br />

Defining the same variable name twice in the same scope is an error.<br />

The advantage of scopes is that you do not need to worry whether a variable (or something<br />

else) is already defined outside the scope. It is just hidden but does not create a conflict. 4<br />

Un<strong>for</strong>tunately, the hiding makes the homonymous variables in the outer scope inaccessible.<br />

Best thing you can do is to rename the variable in the inner scope (and eventually in the nextouter<br />

scope(s) to access more of those variables). Renaming the outermost variable also solves<br />

the problem of accessibility but tends to be more work because it is probably more often used<br />

due to its longer life time. A better solution to manage nesting and accessibility are namespaces,<br />

see next section.<br />

Scopes also have the advantage to reuse memory, e.g.:<br />

int main ()<br />

{<br />

int x, y;<br />

float z;<br />

cin ≫x;<br />

if (x < 4) {<br />

y= x ∗ x;<br />

// something with y<br />

} else {<br />

z= 2.5 ∗ float(x);<br />

// something with z<br />

}<br />

}<br />

The example uses three variables. However, they are never used at the same time. y is only<br />

used in the first branch and z only in the second one.<br />

Thus, we rewrite the program as follows<br />

int main ()<br />

{<br />

int x;<br />

cin ≫x;<br />

4 As opposed to macros, an obsolete and reckless legacy feature from C that should be avoided at any price<br />

because it undermines all structure and reliability of the language.


2.3. OPERATORS 25<br />

if (x < 4) {<br />

int y= x ∗ x;<br />

// something with y<br />

} else {<br />

float z= 2.5 ∗ float(x);<br />

// something with z<br />

}<br />

}<br />

then y exists only in the first branch and z only exists in the second one. In general, it helps<br />

us saving memory to let variables only live as long as necessary, especially when we have very<br />

large objects. That is define variables as late as possible — ideally directly be<strong>for</strong>e using — then<br />

they are implicitly in the innermost possible scope, e.g. in the branches in the previous example<br />

instead of the main function. The reduced code complexity of having less active variables at<br />

any point in your program also simplifies your life if program does not what it should (in very<br />

rare cases, of course) and you have to debug it.<br />

For all those reasons, it is also preferable defining loop indices directly in the loop:<br />

<strong>for</strong> (int i= 0; i < n; i++) { ... }<br />

If you need the loop index afterwards you must define it outside, e.g.:<br />

cin ≫x;<br />

<strong>for</strong> (int i= 0; abs(x) > 0.001 && i < 100; i++)<br />

x= f(x);<br />

cout ≫”Did ” ≪ i ≪ ” iterations.\n” // error, which i????<br />

The example is some kind of (probably useless) fix point calculation. It stops when |x| ≤<br />

0.001 or 100 iterations were per<strong>for</strong>med (remember the second term is not a termination but a<br />

continuation criterion). When we finished the loop we want to know how many iterations we<br />

per<strong>for</strong>med. But our loop index already died. Let’s try again:<br />

cin ≫x;<br />

int i;<br />

<strong>for</strong> (i= 0; abs(x) > 0.001 && i < 100; i++)<br />

x= f(x);<br />

cout ≫”Did ” ≪ i ≪ ” iterations.\n”<br />

Now it works.<br />

2.3 Operators<br />

C ++ is rich in built-in operators. An operator is a symbol that tells the compiler to per<strong>for</strong>m<br />

specific mathematical or logical manipulations. C ++ has three general classes of operators,<br />

arithmetic, boolean, and bitwise. This section gives a short overview of the different operators<br />

and their meaning.<br />

2.3.1 Arithmetic operators<br />

The following table lists the arithmetic operators allowed in C ++:


26 CHAPTER 2. <strong>C++</strong> BASICS<br />

Operator Action<br />

− subtraction, also unary minus<br />

+ addition<br />

∗ multiplication<br />

/ division<br />

% modulus<br />

−− decrement<br />

++ increment<br />

The modulus operator yields the remainder of the integer division. The ++ operator adds one<br />

to its operand and −− subtracts one. Both can precede or follow the operand. When they<br />

precede the operand, the corresponding operation will be per<strong>for</strong>med be<strong>for</strong>e using the operand’s<br />

value to evaluate the rest of the expression. If the operator follows its operand, C ++ will use<br />

the operand’s value be<strong>for</strong>e incrementing or decrementing it. Consider the following example:<br />

x = 1;<br />

y = ++x;<br />

x = 1;<br />

z = x++;<br />

As a result of executing these four lines of code, y will be set to 2, x will be set to 2 and z will<br />

be set to 1.<br />

The priority and associativity of binary arithmetic operators is the same as we know it from<br />

math: multiplication and division precedes addition and subtraction. Thus, x + y ∗ z is evaluated<br />

as x + (y ∗ z). Operations of the same priority are left-associative, i.e. x / y ∗ z is<br />

equivalent to (x / y) ∗ z. Unary operators have precedence over binary: x ∗ y++ / −z means<br />

(x ∗ (y++)) / (−z). Nevertheless, as long as you are still learning C ++ and not entirely sure<br />

about the precedences, you might want to add redundant parenthesis instead of wasting hours<br />

debugging your program.<br />

With these operators we can write our first numeric program:<br />

#include <br />

int main ()<br />

{<br />

float r1 = 3.5, r2 = 7.3, pi = 3.14159;<br />

float area1 = pi ∗ r1∗r1;<br />

std::cout ≪ ”A circle of radius ” ≪ r1 ≪ ” has area ”<br />

≪ area1 ≪ ”.” ≪ std::endl;<br />

std::cout ≪ ”The average of ” ≪ r1 ≪ ” and ” ≪ r2 ≪ ” is ”<br />

≪ (r1+r2)/2 ≪ ”.” ≪ std::endl;<br />

return 0 ;<br />

}<br />

2.3.2 Boolean operators<br />

Boolean operators are logical and relational operators. Both return boolean values, there<strong>for</strong>e<br />

the name. Operators and their significations are:


2.3. OPERATORS 27<br />

Operator Meaning<br />

> greater than<br />

>= greater than or equal to<br />

< less than<br />

= 1 + 7 is evaluated as if it were written 4 >= (1 + 7).<br />

Advise<br />

Integer values can be treated in C ++ as boolean. For the sake of clarity it is<br />

always better to use bool <strong>for</strong> all logical expression.<br />

This is a legacy of C where bool does not exist. Almost all techniques from C work also in<br />

C ++— as the language name suggests — but using the new features of C ++ allows you to write<br />

programs with better structure. For instance, if you want to store the result of a comparison<br />

do not use an integer variable but a bool.<br />

bool out of bound = x < min || x > max;<br />

2.3.3 Bitwise operators<br />

Bitwise operators allow you to test or change the bits of integers. 5 There are the following<br />

operations:<br />

Operator Action<br />

& AND<br />

| OR<br />

ˆ exclusive OR<br />

∼ one’s complement (NOT)<br />

≫ shift right<br />

≪ shift left<br />

The shift operators bitwise shift the value on their left by the number of bits on their right:<br />

• ≪ shifts left and adds zeros at the right end.<br />

• ≫ shifts right and adds either 0s, if value is an unsigned type, or extends the top bit (to<br />

preserve the sign) if it is a signed type.<br />

5 The bitwise operators work also on bool but it is favorable to use the logical operators from the previous<br />

section. Especially the shift operators are rather silly <strong>for</strong> bool.


28 CHAPTER 2. <strong>C++</strong> BASICS<br />

The bitwise operations can be used to characterize properties in a very compact <strong>for</strong>m as in the<br />

following example:<br />

#include <br />

int main ()<br />

{<br />

int concave = 1, monotone = 2, continuous = 4;<br />

int f is = concave | continuous;<br />

std::cout ≪ ”f is ” ≪ f is ≪ std::endl;<br />

std::cout ≪ ”Is f concave? (0 means no, 1 means yes) ”<br />

≪ (f is & concave) ≪ std::endl;<br />

f is = f is | monotone;<br />

f is = f is ˆ concave;<br />

std::cout ≪ ”f is now ” ≪ f is ≪ std::endl;<br />

return 0 ;<br />

}<br />

Line 5 introduces three properties that can be combined arbitrarily. The numbers are powers<br />

of two so that their binary representations contain a single 1-bit respectively. In line 7 we used<br />

bitwise OR to combine two properties. Bitwise AND allows <strong>for</strong> masking single or multiple bits<br />

as shown in line 11. In line 13 an additional property is set with bitwise OR. Bitwise exclusive<br />

OR (XOR) like in line 14 allows <strong>for</strong> toggling a property. Operating systems and hardware driver<br />

use this style of operations exhaustively. But it needs some practice to get used to it.<br />

Shift operations provide an efficient way to multiply with or divide by powers of 2 as shown in<br />

the following code:<br />

int i = 78;<br />

std::cout ≪ ”i ∗ 8 is ” ≪ (i ≪ 3)<br />

≪ ”, i / 4 is ” ≪ (i ≫2) ≪ std::endl;<br />

Obviously, that needs some familiarization as well.<br />

On the per<strong>for</strong>mance side, processors are today quite fast in multiplying integers so that you<br />

will not see a big per<strong>for</strong>mance boost when replacing your products by left shifts. Division is<br />

still a bit slow and a right shift can make a difference. Even then the price of this source code<br />

obfuscation is only justified if the operation is critical <strong>for</strong> the overall per<strong>for</strong>mance of your entire<br />

application.<br />

2.3.4 Compound assignment<br />

The compound assignment operators apply an arithmetic operation to the left and right-hand<br />

side and store the result in the left hand side.<br />

There operators are +=, −=, ∗=, /=, %=, ≫=, ≪=, &=, ˆ=, and |=.<br />

The statement a+=b is equal to the statement a=a+b.


2.3. OPERATORS 29<br />

2.3.5 Bracket operators<br />

The operator [] is used access elements of an arrays (see § 2.9), and () <strong>for</strong> function calls.<br />

2.3.6 All operators<br />

We haven’t introduced all operators yet. They will be shown in an appropriate context. For<br />

now, we only list the entire operator set with their precedences and associativity. The table is<br />

taken from [?] (by courtesy of Bjarne Stroustrup). For more details about specific operators<br />

see there. The operators on top have the highest priorities. 6<br />

Operator Summary<br />

scope resolution class name :: member<br />

scope resolution namespace name :: member<br />

global :: name<br />

global :: qualified-name<br />

member selection object . member<br />

member selection pointer → member<br />

subscripting expr[ expr ]<br />

subscripting (user-defined) object [ expr ] 7<br />

function call expr ( expr list )<br />

value construction type ( expr list )<br />

post increment lvalue ++<br />

post decrement lvalue −−<br />

type identification typeid ( type )<br />

run-time type identification typeid ( expr )<br />

run-time checked conversion dynamic cast < type > ( expr )<br />

compile-time checked conversion static cast < type > ( expr )<br />

unchecked conversion reinterpret cast < type > ( expr )<br />

cast conversion const cast < type > ( expr )<br />

size of object sizeof expr<br />

size of type sizeof ( type )<br />

pre increment ++ lvalue<br />

pre decrement −− lvalue<br />

complement ∼ expr<br />

not ! expr<br />

unary minus − expr<br />

unary plus + expr<br />

address of & lvalue<br />

dereference ∗ lvalue<br />

create (allocate) new type<br />

create (allocate and initialize) new type( expr list )<br />

create (place) new ( expr list ) type<br />

create (place and initialize) new ( expr list ) type( expr list )<br />

destroy (deallocate) delete pointer<br />

destroy array delete [ ] pointer<br />

6 TODO: If possible references<br />

7 Not in [?].


30 CHAPTER 2. <strong>C++</strong> BASICS<br />

cast (type conversion) ( type ) expr<br />

member selection object.∗ pointer to member<br />

member selection pointer → ∗ pointer to member<br />

multiply expr ∗ expr<br />

divide expr / expr<br />

modulo (remainder) expr % expr<br />

add (plus) expr + expr<br />

subtract (minus) expr − expr<br />

shift left expr ≪ expr<br />

shift right expr ≫ expr<br />

less than expr < expr<br />

less than or equal expr expr<br />

greater than or equal expr >= expr<br />

equal expr == expr<br />

not equal expr != expr<br />

bitwise AND expr & expr<br />

bitwise exclusive OR (XOR) expr ˆ expr<br />

bitwise inclusive OR expr | expr<br />

logical AND expr && expr<br />

logical OR expr || expr<br />

conditional expression expr ? expr: expr<br />

simple assignemt lvalue = expr<br />

mulitply and assignemt lvalue ∗= expr<br />

divide and assignemt lvalue /= expr<br />

modulo and assignemt lvalue %= expr<br />

add and assignemt lvalue += expr<br />

subtract and assignemt lvalue −= expr<br />

shift left and assignemt lvalue ≪= expr<br />

shift right and assignemt lvalue ≫= expr<br />

AND and assignemt lvalue &= expr<br />

inclusive OR and assignemt lvalue |= expr<br />

exclusive OR and assignemt lvalue ˆ= expr<br />

throw exception throw expr<br />

comma (sequencing) expr , expr<br />

To see the operator precedences at one glance, use Table 2.13 on page 64. 8<br />

2.3.7 Overloading<br />

A very powerful aspect of C ++ is that the programmer can define operators <strong>for</strong> new types. This<br />

will be explained in section ??. Operators of built-in types cannot be changed. New operators<br />

cannot be added as in some other languages. If you redefine operators make sure that the<br />

expected priority of the operation corresponds to the operator precedence. For instance, you<br />

might have the idea using the L ATEX notation <strong>for</strong> exponentiation of matrices:<br />

8 TODO: Associativity?


2.4. EXPRESSIONS AND STATEMENTS 31<br />

A= Bˆ2;<br />

A is B squared. So far so good. That the original meaning of ˆ is a bitwise XOR does not<br />

worry us because we do not plan implementing bitwise operations on matrices.<br />

Now we add C:<br />

A= Bˆ2 + C;<br />

Looks nice. But does not work (or does something weird). — Why?<br />

Because + has a higher priority than ˆ. Thus, the compiler understands our expression as:<br />

A= B ˆ (2 + C);<br />

Oops. That looks wrong. 9 The operator gives a concise and intuitive interface but its priority<br />

would cause a lot of confusion. Thus, it is advisable to refrain from this overloading.<br />

2.4 Expressions and Statements<br />

C and C ++ distinguish between expressions and statements. Very casually spoken, one could<br />

just say that every expression becomes a statement if an semicolon is appended. However, we<br />

would like to discuss this topic a bit more.<br />

Let us build this recursively from bottom up. Any variable name (x, y, z, . . . ), constant or<br />

literal is an expression. One or more expressions with an operator is an expression, e.g. x + y or<br />

x ∗ y + z. In several languages, e.g. Pascal, the assignment is a statement. In C and C ++ it is an<br />

expression, e.g. x= y + z. As a consequence it can be used in another assignment: x2= x= y + z.<br />

Assignments are evaluated from right to left. Input and output operations as<br />

std::cout ≪ ”x is ” ≪ x ≪ ”\n”;<br />

are also expressions.<br />

A function call with expressions as arguments is an expression, e.g. abs(x), abs(x ∗ y + z). There<strong>for</strong>,<br />

function calls can be nested: pow(abs(x), y). In languages where a function call is a statement<br />

this would not be possible. As the assignment is an expression, it can be used as argument of a<br />

function: abs(x= y). Or I/O operations as those above. Needless to say that this is quite bad programming<br />

style. An expression surrounded by parenthesis is an expression as well, e.g. x + y.<br />

This allows us to change the order of evaluation, e.g. x ∗ (y + z) computes the addition first<br />

although the multiplication has the higher priority.<br />

A very special operator in C ++ is the ‘comma operator’ that provides a sequential evaluation.<br />

The meaning is simply evaluating first the sub-expression left of the comma and then that right<br />

of it. The value of the whole expression is that of the right sub-expression. The sub-expressions<br />

can contain the comma operator as well so that arbitrarily long sequences can be defined. With<br />

the help of the comma operator, one can evaluate multiple expressions in program locations<br />

where only one expression is allowed. If used as function argument it the comma expression<br />

needs surrounding parentheses; otherwise the comma is interpreted as separation of function<br />

arguments. The comma operator can be overloaded with a user-defined semantics. This can<br />

9 The precise interpretation is A.operator=(operatorˆ(B, operator+(2, C)));


32 CHAPTER 2. <strong>C++</strong> BASICS<br />

complicate the understanding of the program behavior dramatically and has to be used with<br />

utter care. In general it is advisable to use it not too often.<br />

Any of the above expression followed by a semicolon 10 is a statement, e.g.:<br />

x= y + z;<br />

y= f(x + z) ∗ 3.5;<br />

A statement like y + z; is allowed although it is most likely useless. During program execution<br />

the sum of y and z would be computed and then thrown away. Decent compilers would optimize<br />

out this useless computation. However, it is not guaranteed that this statement can be always<br />

omitted. If y or z is an object of a user type then the addition is also user-defined and might<br />

change y or z or something else. This is obviously bad programming style but legitimate in<br />

C ++.<br />

A single semicolon is an empty statement. There<strong>for</strong>, one can put as many semicolons after an<br />

expression as wanted. Some statements do not end with a semicolon, e.g. function definitions.<br />

If a semicolon is appended to such a statement it is not an error but just an extra empty<br />

statement. 11 Any sequence of statements surrounded by curly braces is a statement — called a<br />

compound statement.<br />

The variable and constant declarations we have seen be<strong>for</strong>e are also statements. As initial<br />

value of a variable or constant, one can use any of the expressions mentioned be<strong>for</strong>e (however<br />

involving assignment or comma operator is probably rather confusing). Other statements — to<br />

be discussed later — are function and class definitions, as well as control statements that we<br />

will introduce in the next section.<br />

2.5 Control statements<br />

Control statements allow us to steer the program execution be means of branching and repeating.<br />

2.5.1 If-statement<br />

This is the simplest <strong>for</strong>m of control and its meaning is intuitively clear, <strong>for</strong> instance in:<br />

if (weight > 100.0)<br />

cout ≪ ”This is quite heavy.\n”;<br />

else<br />

cout ≪ ”I can carry this.\n”;<br />

Often, the else branch is not needed and can be omitted. Say we have some value in variable x<br />

and compute something on its magnitude:<br />

if (x < 0.0)<br />

x= −x;<br />

// Now we now that x >= 0.0<br />

10 The usage of the semicolon in Pascal looks similar at the first glance. However, in Pascal the semicolon has<br />

a slightly different purpose which is separating statements. Thus, the semicolon can be omitted when only one<br />

statement exist in a line. Coming from Pascal, it takes some time to get used to this difference.<br />

11 Nonetheless some compilers print a warning in pedantic mode.


2.5. CONTROL STATEMENTS 33<br />

The expression in the parentheses must be logic expression or something convertible to bool.<br />

For instance, one can write:<br />

int i;<br />

// ...<br />

if (i) // bad style<br />

do something();<br />

In the example, do something is called if i is different from 0. Experienced C and C ++ programmers<br />

know that (from heart) but the intentions of the developer are better communicated if<br />

this is stated explicitly:<br />

int i;<br />

// ...<br />

if (i != 0) // much better<br />

do something();<br />

The branches of if consist each of one single statement. To per<strong>for</strong>m multiple operations one can<br />

use braces: 12<br />

int nr then= 0, nr else= 0;<br />

// ...<br />

if (...) {<br />

nr then++;<br />

cout ≪ ”In then−branche\n”;<br />

} else {<br />

nr else++;<br />

cout ≪ ”In else−branche\n”;<br />

}<br />

In the beginning, it is helpful to always write the braces. With more experience, most developers<br />

only write the braces where necessary. At any rate it is highly advisable to intend the branches<br />

<strong>for</strong> better readable with any degree of experience.<br />

An if statement can contain other if-statements:<br />

if (weight > 100.0) {<br />

if (weight > 200.0)<br />

cout ≪ ”This is extremely heavy.\n”;<br />

else<br />

cout ≪ ”This is quite heavy.\n”;<br />

} else {<br />

if (weight < 50.0)<br />

cout ≪ ”A child can carry this.\n”;<br />

else<br />

cout ≪ ”I can carry this.\n”;<br />

}<br />

In the above example, the parentheses could be omitted without changing the behavior but it<br />

is clearer to have them. The example is more readable if we reorganize the nesting:<br />

if (weight < 50.0) {<br />

cout ≪ ”A child can carry this.\n”;<br />

} else if (weight


34 CHAPTER 2. <strong>C++</strong> BASICS<br />

} else if (weight 100.0)<br />

if (weight > 200.0)<br />

cout ≪ ”This is extremely heavy.\n”;<br />

else<br />

cout ≪ ”This is quite heavy.\n”;<br />

It looks like the last line is executed when weight is between 100 and 200 assuming the first if<br />

has no else-branch. But we could also assume the second if comes without else-branch and the<br />

last line is executed when weight is less or equal 100. Fortunately, the C ++ standard specifies<br />

that an else-branch always belongs to the innermost possible if. So, we can count on our first<br />

interpretation. In case that the else-branch should belong to the first if we need braces:<br />

if (weight > 100.0) {<br />

if (weight > 200.0)<br />

cout ≪ ”This is extremely heavy.\n”;<br />

} else<br />

cout ≪ ”This is not so heavy.\n”;<br />

Maybe these examples convinced you that it is more productive to set more braces and save<br />

the time guessing what the branches belong to.<br />

Advise<br />

If you use an editor that understands C ++ (like the IDE from Visual Studio<br />

or emacs in C ++ mode) then automatic indentation is a great help with<br />

structured programming. Whenever a line is not indented as you expected,<br />

something is most likely not nested as you intended.<br />

2.5.2 Conditional Expression<br />

Although this section describes statements, we like to talk about the conditional expression<br />

here because of its proximity to the if-statement. The semantic of<br />

condition ? result <strong>for</strong> true : result <strong>for</strong> false<br />

is that if the condition in first sub-expression evaluates to true then the entire expression is the<br />

second expression otherwise the third one. For instance, we can compute the minimum of two<br />

values with either if-then-else or the conditional expression:


2.5. CONTROL STATEMENTS 35<br />

if (x


36 CHAPTER 2. <strong>C++</strong> BASICS<br />

eps/= 2.0;<br />

} while (eps > 0.0001);<br />

The loop is per<strong>for</strong>med at least one time — even with an extremely small value <strong>for</strong> eps in our<br />

example. The difference between a while-loop and a do-while-loop is irrelevant to most scientific<br />

software. Only loops with very few iterations and with extremely strong impact on the overall<br />

per<strong>for</strong>mance might matter because a do-while-loop per<strong>for</strong>ms one comparison and one jump less.<br />

2.5.4 For Loop<br />

The most common loop in C ++ is the <strong>for</strong>-loop. As simple example we like to add two vectors 15<br />

and print the result afterward:<br />

double v[3], w[]= {2., 4., 6.}, x[]= {6., 5., 4};<br />

<strong>for</strong> (int i= 0; i < 3; i++)<br />

v[i] = w[i] + x[i];<br />

<strong>for</strong> (int i= 0; i < 3; i++)<br />

cout ≪ ”v[” ≪ i ≪ ”] = ” ≪ v[i] ≪ ’\n’;<br />

The loop head consists of three components:<br />

• The initialization;<br />

• A Continuation criterion; and<br />

• A step operation.<br />

The example above is typical <strong>for</strong> a <strong>for</strong>-loop. In the initialization, one typically declares a new<br />

variable and initializes it to 0 because this is the start index of most indexed data structures.<br />

The condition usually tests if the loop index is smaller than a certain size and the last operation<br />

typically increments the loop index.<br />

It is a very popular beginners’ mistake to write conditions like “i


2.5. CONTROL STATEMENTS 37<br />

Here it was simpler to take out term 0 and start with term 1. We also used less-equal to assure<br />

that the term x 10 /10! is considered.<br />

The <strong>for</strong>-loop in C ++ is very flexible. The initialization part can be any expression, a variable<br />

declaration or empty. It is possible to introduce multiple new variables of the same type. This<br />

can be used to avoid repeating the same operation in the condition, e.g.:<br />

<strong>for</strong> (int i= xyz.begin(), end= xyz.end(); i < end; i++) ...<br />

Variables declared in the initialization are only visible within the loop and hide variables of the<br />

same names from outside the loop.<br />

The condition can be any expression that can be converted to a bool. An empty condition is<br />

always true and the loop is repeated infinitely unless from inside the body as we will discuss<br />

in the next section. We said that loop indices are typically incremented in the head’s third<br />

part. In principle, one can modify it within the loop body but programs are much clearer if it<br />

is done in the loop head. On the other hand, there is no limitation that only one variable is<br />

increased by 1. One can modify as many variables as wanted using the comma operator and by<br />

any modification desired such as:<br />

<strong>for</strong> (int i= 0, j= 0, p= 1; ...; i++, j+= 4, p∗= 2) ...<br />

This is of course more complex than having just one loop index but still more readable than<br />

declaring/modifying indices be<strong>for</strong>e the loop or inside the loop body.<br />

In fact, the <strong>for</strong>-loop in C and C ++ is just another notation of a while-loop. Any <strong>for</strong>-loop:<br />

<strong>for</strong> (init; cond; incr) {<br />

st1; st2; ... stn;<br />

}<br />

can be written with a while-loop:<br />

{<br />

}<br />

init;<br />

while (cond) {<br />

st1; st2; ... stn;<br />

incr;<br />

}<br />

Conversely, any while-loop can evidently be written as <strong>for</strong>-loop. We do not know if there is<br />

a design guideline from a software engineering guru when to use while or <strong>for</strong> but <strong>for</strong> is more<br />

concise if there is a local initialization or some incremental operation.<br />

2.5.5 Loop Control<br />

There are two statements to deviate from the regular loop evaluation:<br />

• break and<br />

• continue.<br />

A break terminates the loop entirely and continue ends only the current iteration and continues<br />

the loop with the next iteration, <strong>for</strong> instance:


38 CHAPTER 2. <strong>C++</strong> BASICS<br />

<strong>for</strong> (...; ...; ...) {<br />

...<br />

if (dx == 0.0) continue;<br />

x+= dx;<br />

...<br />

if (r < eps) break;<br />

...<br />

}<br />

In the example above we assumed that the remainder of the iteration is not needed when<br />

dx == 0.0. In some iterative computations it might be clear in the middle of an iteration (here<br />

when r < eps) that work is already done.<br />

Understanding the program behavior becomes more difficult the more breaks and continues<br />

are used. One should always aim <strong>for</strong> moving as much loop control as possible into the loop<br />

head. However, avoiding breaks and continues by excessive if-then-else branches is even less<br />

comprehensible.<br />

Sometimes, one might prefer per<strong>for</strong>ming some surplus operations inside a loop (if it has no<br />

perceivable impact on the overall per<strong>for</strong>mance) and keep the program simpler. Simpler programs<br />

on the other hand have a better chance to get optimized by the compiler. There is certainly<br />

no golden rule but as practical approach one should implement software first <strong>for</strong> maximal<br />

clarity and simplicity (but using efficient algorithms as early as possible). Once the software is<br />

working correctly one can try variations to investigate the impact of implementation details on<br />

per<strong>for</strong>mance.<br />

2.5.6 Switch Statement<br />

A switch is like a special kind of if. It provides a concise notation when different computations<br />

<strong>for</strong> different cases of a given integral value are per<strong>for</strong>med:<br />

switch(op code) {<br />

case 0: z= x + y; break;<br />

case 1: z= x − y; cout ≪ ”compute diff\n”; break;<br />

case 2:<br />

case 3: z= x ∗ y; break;<br />

default: z= x / y;<br />

}<br />

When people see the switch statement <strong>for</strong> the first time, they are usually surprised that one<br />

needs to say at end of each case that the statement is terminated. Otherwise the statements of<br />

the next case are executed as well. This can be used to per<strong>for</strong>m the same operation <strong>for</strong> different<br />

cases, e.g. <strong>for</strong> 2 and 3 in the example above.<br />

The continuation allows us also to implement short loops without the termination test after<br />

each iteration. Say we have vectors with dimension ≤ 5. Then we could implement a vector<br />

addition without a loop:<br />

assert(size(v)


2.6. FUNCTIONS 39<br />

case 3: v[i] = w[i] + x[i]; i++;<br />

case 2: v[i] = w[i] + x[i]; i++;<br />

case 1: v[i] = w[i] + x[i];<br />

case 0: ;<br />

}<br />

This technique is called Duff’s device. Although this is an interesting technique to realize an<br />

iterative computation without a loop, the per<strong>for</strong>mance impact is probably limited in practice.<br />

Such technique should be only considered in program parts with a significant fraction on the<br />

overall run time; otherwise readability of sources is more important.<br />

2.5.7 Goto<br />

DO NOT USE IT. NEVER! EVER!<br />

2.6 Functions<br />

Functions are important building blocks of C ++ programs. The first example we have seen is<br />

the main function in the hello-world program. main must be present in every executable and is<br />

called when the program starts. Other than that there is noting special about main.<br />

The general <strong>for</strong>m of a C ++ function is:<br />

[inline] return type function name (argument list)<br />

{<br />

body of the function<br />

}<br />

For instance, one can be implement a very simple function to square a value:<br />

double square(double x)<br />

{<br />

return x ∗ x;<br />

}<br />

In C and C ++ each function has a return type. A function that does not return a value has the<br />

pseudo-return-type “void”:<br />

void print(double x)<br />

{<br />

std::cout ≪ ”x is ” ≪ x ≪ ’\n’;<br />

}<br />

void is not a real type but moreover a placeholder that enables us to omit returning a value.<br />

We cannot define objects of it:<br />

void nothing; // error


40 CHAPTER 2. <strong>C++</strong> BASICS<br />

2.6.1 Inline Functions<br />

Calling a function requires a fair amount of activities:<br />

• The arguments (or at least their addresses) must be copied on the stack;<br />

• The current program counter must be copied on the stack to continue the execution at<br />

this point when the function is finished;<br />

• Save registers to allow the function using them;<br />

• Jump to the code of the function;<br />

• Execute the function;<br />

• Clean the arguments from the stack;<br />

• Copy the result on the stack;<br />

• Jump back to the calling code;<br />

• Store back registers.<br />

What happens exactly depends on the hardware. The good news is that the function call<br />

overhead is dramatically lower than in the past. Furthermore, the compiler can optimize out<br />

those activities not needed in a specific call.<br />

Nonetheless, <strong>for</strong> small functions, like the square above, the ef<strong>for</strong>t <strong>for</strong> calling the function is still<br />

significantly higher than what the function actually does. C programmers avoid the function-call<br />

overhead by macros. Macros create so many problems in the software development that they<br />

must only be used when there is absolutely no alternative whatsoever. Bjarne Stroustrup<br />

says “Almost every macro demonstrates a flaw in the programming language, in the program,<br />

or in the programmer.” We like to add a flaw “in the compiler optimization”. 16<br />

Fortunately, we have an excellent alternative to macros: inline functions. The programmer just<br />

adds the keyword inline to the function definition:<br />

inline double square(double x)<br />

{<br />

return x ∗ x;<br />

}<br />

and all the overhead of the function call vanishes into thin air.<br />

An excessive use of inline can have a negative effect on per<strong>for</strong>mance. When many large functions<br />

are inlined then the binary executable becomes very large. The consequence is that a lot of<br />

time is spend loading the binary from memory and lots of cache memory is wasted <strong>for</strong> it as<br />

well. This decreases the memory bandwidth and cache available <strong>for</strong> data, causing more slow<br />

down than what is saved on function calls.<br />

16 Advanced: Compilers are today really smart in eliminating unused code. However, we experienced that<br />

arguments of inline functions might be constructed although they are not used. This are usually only few<br />

machine instructions. But when this happens extremely frequently as in an index range check that should<br />

disappear in release mode, it can ruin the overall per<strong>for</strong>mance. We hope that further compiler improvement can<br />

rescue us from this kind of macro usage.


2.6. FUNCTIONS 41<br />

It should be mentioned here that the inline keyword is not mandatory. The compiler can decide<br />

against inlining <strong>for</strong> the reasons given in the previous paragraph. On the other hand, the compiler<br />

is free to inline functions without the inline keyword.<br />

For obvious reasons, the definition of an inline function must be visible in every compile unit<br />

where it is called. In contrast to other functions, it cannot be compiled separately. Conversely,<br />

a non-inline function cannot be visible in multiple compile units because it collides when the<br />

compiled parts are ‘linked’ together. Thus, there are two ways to avoid such collisions: assuring<br />

that the function definition is only present in one compile unit or declaring the function as<br />

inline.<br />

2.6.2 Function Arguments<br />

If we pass an argument to a function it creates by default a copy. For instance, the following<br />

would not work (as expected):<br />

void increment(int x)<br />

{<br />

x++;<br />

}<br />

int main()<br />

{<br />

int i= 4;<br />

increment(i);<br />

cout ≪ ”i is ” ≪ i ≪ ’\n’;<br />

}<br />

The output would be 4. The operation x++ in the second line only increments a local copy but<br />

not the original value. This kind of argument transfer is called ‘call-by-value’ or ‘pass-by-value’.<br />

To modify the value itself we have to ‘pass-by-reference’ the variable:<br />

void increment(int& x)<br />

{<br />

x++;<br />

}<br />

Now the variable itself is increment and the output will be 5 as expected. We will discuss<br />

references more detailed in § 2.10.2.<br />

Temporary variables — like the result of an operation — cannot be passed by reference:<br />

increment(i + 9); // error<br />

We could not compute (i + 9)++ anyway. In order to call such a function with some temporary<br />

value one needs to store it first in a variable and pass this variable to the function.<br />

Larger data structures like vectors and matrices are almost always passed by reference <strong>for</strong><br />

avoiding expensive copy operations:<br />

double two norm(vector& v) { ... }<br />

An operation like a norm should not change its argument. But passing the vector by reference<br />

bears the risk of accidentally overwriting it.


42 CHAPTER 2. <strong>C++</strong> BASICS<br />

To make sure that our vector is not changed (and not copied either), we pass it as constant<br />

reference:<br />

double two norm(const vector& v) { ... }<br />

If we would change v in this function the compiler would emit an error. Both call-by-value and<br />

constant references ascertain that the argument is not altered but by different means:<br />

• Arguments that are passed by value can be changed in the function since the function<br />

works with a copy. 17<br />

• With const references one works on the passed argument directly but all operations that<br />

might change the argument are <strong>for</strong>bidden. In particular, const-referred arguments cannot<br />

appear on the left side of an assignment or passed as non-const references to other functions<br />

(in fact the LHS of an assignment is also a non-const reference).<br />

In contrast to mutable references, constant ones allow <strong>for</strong> passing temporaries:<br />

alpha= two norm(v + w);<br />

This is admittedly not entirely consequent on the language design side, but it makes the life of<br />

programmers much easier.<br />

Values that are quite frequent as argument might be declared as default. Say we implement a<br />

function the computes the n nt root and mostly the square root then we can write:<br />

double root(double x, int degree= 2) { ... }<br />

This function can be called with one or two arguments:<br />

x= root(3.5, 3);<br />

y= root(7.0);<br />

One can declare multiple default arguments but only at the end. In other words, after an<br />

argument with a default value one cannot have one without.<br />

2.6.3 Returning Results<br />

In the examples be<strong>for</strong>e, we only returned double or int. These are the nice ones. Functions that<br />

compute new values of large data structures are more difficult.<br />

Default arguments<br />

Sometimes functions have arguments that are used very infrequently. To address this, you can<br />

give a parameter a default value that is automatically used when no argument corresponding<br />

to that parameter is specified. In this way the caller only needs to specify those arguments that<br />

are meaningful at a particular instance. Consider the following example:<br />

void foo( int a = 5, char ch =’A’ )<br />

{ std::cout ≪ a ≪ ” ” ≪ ch ≪ std::endl ;}<br />

17 This assumes that the argument is properly copied. For user-defined types one can implement its own copy<br />

operation with aliasing effect (on purpose or by accident). Then modifications of the copy also affect the original<br />

object.


2.6. FUNCTIONS 43<br />

foo takes one integer argument with default value 5 and one character argument with a default<br />

value of ‘A’. Now this function can be called by one of the three methods shown here:<br />

foo( 1, ’J’ );<br />

foo(24);<br />

foo();<br />

Which results in the following output:<br />

1 J<br />

24 A<br />

5 A<br />

Void functions<br />

When the result type of a function is void, we do not return a result. For example<br />

void foo( int i ) {<br />

std::cout ≪ ”My value is ” ≪ i ≪ std::endl ;<br />

}<br />

Constant arguments<br />

We can use const objects as arguments in functions to protect them from being changed. For<br />

example :<br />

bool bar( int const& x, int y ) {<br />

y = y+2;<br />

return y ==x ;<br />

}<br />

Since we do not want to modify x, we can add the keyword const. Note that const can be put<br />

be<strong>for</strong>e or behind the type, but it is recommended by the authors of this course to put it behind.<br />

2.6.4 Overloading<br />

In C ++, functions can share the same name as long as their parameter declarations are different.<br />

More precisely, the functions should differ in the number or the type of their parameters.<br />

The compiler can then use the number/type of the arguments to determine which version of<br />

the overloaded function should be used. Note that although overloaded functions may have<br />

different return types, a difference in return type alone is not sufficient to distinguish between<br />

two versions of a function.<br />

Consider the following example:<br />

#include <br />

#include <br />

int divide (int a, int b){<br />

return a / b ;<br />

}


44 CHAPTER 2. <strong>C++</strong> BASICS<br />

float divide (float a, float b){<br />

return std::floor( a / b ) ;<br />

}<br />

int main (){<br />

int x=5,y=2;<br />

float n=5.0,m=2.0;<br />

std::cout ≪ divide (x,y) ≪ std::endl;<br />

std::cout ≪ divide (n,m) ≪ std::endl;<br />

return 0;<br />

} % ≫ ≫ ≫ ≫<br />

In this case we have defined two functions with the same name, divide, but one of them accepts<br />

two parameters of type int and the other one accepts them of type float. In the first call to<br />

divide the two arguments passed are of type int, there<strong>for</strong>e, the function with the first prototype<br />

is called. This function returns the result of dividing one parameter by the other. The second<br />

call passes two arguments of type float, so the function with the second prototype is called.<br />

This one executes a similar division and rounds the result.<br />

2.6.5 Assertions<br />

The function assert is a special kind of function and has the following interface:<br />

void assert (int expression);<br />

If the argument expression evaluates to 0, this causes an assertion failure that terminates the<br />

program. A message is written to the standard error device and abort is called, terminating<br />

the program execution.<br />

The specifics of the message shown depend on the specific implementation in the compiler, but<br />

it shall include: the expression whose assertion failed, the name of the source file, and the line<br />

number where it happened. A usual expression <strong>for</strong>mat is:<br />

Assertion failed: expression, file filename, line linenumber<br />

This allows <strong>for</strong> a programmer to include many assert calls in a source code while debugging<br />

the program. The many assert calls may reduce the per<strong>for</strong>mance of the code and so it would<br />

be desirable to disable asserts <strong>for</strong> high-per<strong>for</strong>mance libraries. Asserts are disabled by including<br />

the following line<br />

#define NDEBUG<br />

at the beginning of his code, be<strong>for</strong>e the inclusion of cassert or by defining the variable in the<br />

compiler, e.g.<br />

g++ -DNDEBUG foo.cpp<br />

Example:<br />

#include <br />

#include <br />

int main ()<br />

{


2.7. INPUT AND OUTPUT 45<br />

std::ifstream datafile( ”file.dat” ) ;<br />

assert( datafile.is open() );<br />

datafile.close();<br />

return 0;<br />

}<br />

In this example, assert is used to abort the program execution if datafile compares equal to 0,<br />

which happens when the opening of the file was unsuccessful.<br />

2.7 Input and output<br />

C ++ uses a convenient abstraction called streams to per<strong>for</strong>m input and output operations in<br />

sequential media such as the screen or the keyboard. A stream is an object where a program<br />

can either insert characters to or extract from. The standard C ++ library includes the header<br />

file iostream, where the standard input and output stream objects are declared.<br />

2.7.1 Standard Output (cout)<br />

By default, the standard output of a program is the screen, and the C ++ stream object defined<br />

to access it, is cout.<br />

cout is used in conjunction with the insertion operator, which is written as ≪ . It may be used<br />

more than once in a single statement. This is especially useful if we want to print a combination<br />

of variables and constants or more than one variable. Consider this example:<br />

std::cout ≪ ”Hello World, my name is ” ≪ name ≪ std::endl ;<br />

std::cout ≪ ”I am ” ≪ age ≪ ” years old.” ≪ std::endl ;<br />

If we assume the name variable to contain the value Jane and the age variable to contain 25<br />

the output of the previous statement would be:<br />

Hello World, my name is Jane.<br />

I am 25 years old.<br />

The endl manipulator produces a newline character. An alternative representation of endl is the<br />

character ’n’.<br />

2.7.2 Standard Input (cin)<br />

The standard input device is usually the keyboard. Handling the standard input in C ++ is done<br />

by applying the overloaded operator of extraction ≫ on the cin stream. The operator must be<br />

followed by the variable that will store the data that is going to be extracted from the stream.<br />

For example:<br />

int age;<br />

std::cin ≫age;


46 CHAPTER 2. <strong>C++</strong> BASICS<br />

The first statement declares a variable of type int called age, and the second one waits <strong>for</strong> an<br />

input from cin (the keyboard) in order to store it in this integer variable. The input from the<br />

keyboard is processed once the RETURN key has been pressed.<br />

You can also use cin to request more than one datum input from the user:<br />

std::cin ≫a ≫b;<br />

is equivalent to:<br />

std::cin ≫a;<br />

std::cin ≫b;<br />

In both cases the user must give two data, one <strong>for</strong> variable a and another one <strong>for</strong> variable b that<br />

may be separated by any valid blank separator: a space, a tab character or a newline.<br />

2.7.3 Input/Output with files<br />

C ++ provides the following classes to per<strong>for</strong>m output and input of characters to/from files:<br />

• std::ofstream: used to write to files<br />

• std::ifstream: used to read from files<br />

• std::fstream: used to both read and write from/to files.<br />

We can use file streams the same way we are already used cin and cout, with the only difference<br />

that we have to associate these streams with physical files. Here is an example:<br />

#include <br />

#include <br />

int main () {<br />

std::ofstream myfile;<br />

myfile.open (”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

myfile.close();<br />

return 0;<br />

}<br />

This code creates a file called example.txt (or overwrites it if it already exists) and inserts a<br />

sentence into it in a way that is similar to the use of cout. C ++ has the concept of an output<br />

streams that is satisfied by an output file as well as be std::cout. That means that everything<br />

that can be written to std::cout can also be written to a file, and vice versa. If you define yourself<br />

the operator ≪ <strong>for</strong> a new type you do not need to program it <strong>for</strong> different output type but<br />

only once <strong>for</strong> a general output stream, see 18<br />

Alternatively, one can give the file stream object the file name as argument. This opens the file<br />

implicitly. The file is also implicitly closed when myfile at some point, in this case at the end<br />

of the main function. The mechanisms that control such implicit actions will become clear in<br />

§ 2.2.3. The bottom line is that you only in few cases must close your files explicitly. The short<br />

version of the previous listing is<br />

18 TODO: Where? New section needed.


2.8. STRUCTURING SOFTWARE PROJECTS 47<br />

#include <br />

#include <br />

int main () {<br />

std::ofstream myfile(”example.txt”);<br />

myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />

return 0;<br />

}<br />

2.8 Structuring Software Projects<br />

2.8.1 Namespaces<br />

In the last section we mentioned that equal names in different scopes hides the variables (or<br />

functions, types, . . . ) of the outer scopes while defining the same name in one scope is an error.<br />

Common function names like min, max or abs already exists and if you write a function with<br />

the same name (and same argument types) the compiler will tell you that the name already<br />

exists. But this does not only concern common names; you must be sure that every name you<br />

use is not already used in some other library. This really can be a hustle because you might<br />

add more libraries later and there is new potential <strong>for</strong> conflicts. Then you have to rename some<br />

of your functions and in<strong>for</strong>m everybody who uses your software. Or one of your software users<br />

is including a library that you do not know and has a name conflict. This can grow to a serious<br />

problem and it happens in C all the time.<br />

One possibility to deal with this is using different names like max , my abs, or library name abs.<br />

This in fact what is done in C. Main libraries have short function names, user-libraries longer<br />

names, and OS-related internals typically start with . This decreases the probability of conflicts<br />

but does not eliminate it entirely.<br />

Remark: Particularly annoying are macros. This is an old technique of code reuse by expanding<br />

macro names to their text definition, potentially with arguments. This gives a lot of possibilities<br />

to empower your program but much more <strong>for</strong> ruin it. Macros are resistent against namespaces<br />

because they are reckless text substitution without any notion of types, scopes or any other<br />

language feature. Un<strong>for</strong>tunately, some libraries define macros with common names like major.<br />

We uncompromisingly undefine such macros, e.g. #undef major, without merci <strong>for</strong> people that<br />

might want use those macros. Visual Studio defines — till today!!! — min and max as macros<br />

and we advise you to disable this by compiling with /DNO MIN MAX. Almost all macros can be<br />

replaced by other techniques (constants, templates, inline functions). But if you really do not<br />

find another way of implementing it use LONG AND UGLY NAMES IN CAPITALS like the library<br />

collection Boost does.<br />

2.8.2 Header and implementation<br />

It is usual to split class (Chapter 3) and function definition and implementation into different<br />

files. Classes and functions are typically defined in a header file (.hpp), and implemented in a<br />

cpp file, which is then compiled and added to a library. For example, the header file foo.hpp<br />

could be:<br />

foo.hpp:


48 CHAPTER 2. <strong>C++</strong> BASICS<br />

#ifndef athens foo hpp<br />

#define athens foo hpp<br />

double foo (double a, double b);<br />

#endif<br />

Note the ifndef and define C-preprocessor commands. These commands are called include guards<br />

and prevent the file from being included several times. The use of such guards in header files is<br />

quite common.<br />

The source file in the library would be contained in the file foo.cpp.<br />

#include ”foo.hpp”<br />

double foo (double a, double b)<br />

{ return a+b; }<br />

The main program file is contained in the file bar.cpp:<br />

#include <br />

#include ”foo.hpp”<br />

int main() {<br />

double a = 2.1;<br />

double b = 3.9;<br />

std::cout ≪ foo(a,b) ≪ std::endl ;<br />

}<br />

Include files usually contain the interface of software packages and are stored somewhere on<br />

disk. The compiler is told where to look <strong>for</strong> the include files. The programmer can partially<br />

control this as follows:<br />

• #include ”foo.hpp”: the compiler looks in the directory of the including file and the list of<br />

directories it is given.<br />

• #include : the compiler only looks in the list of directories it is given.<br />

Frequently used include files<br />

The types and functions defined in the following include files are in the namespace std.<br />

• : input and output stream, e.g. std::cin and std::cout<br />

• : file input and output<br />

• : For assertions, see §2.6.5.<br />

• : Headers <strong>for</strong> the C functions from math.h, among others: abs, fabs, pow, acos,<br />

asin, atan, atan2, ceil, floor, cos, cosh, sin, sinh, exp, fmod (floating point mod), modf (split in<br />

integer and fractional part (< 1)), log, log10, sqrt, tan, tanh<br />

And other useful functions such as isnan.<br />

• : String operations


2.9. ARRAYS 49<br />

• : Complex numbers<br />

• , , , , ...: STL, see Section 4.9<br />

Inline keyword<br />

Instead of creating a library as described at the beginning of this section, we can also store the<br />

implementation in the header file. We then have to add the keyword inline <strong>for</strong> two reasons. The<br />

code will not be stored in a library but inlined in the calling functions: this may lead to more<br />

efficient code when the functions are small. If we do not use the inline keyword, we may end<br />

up with multiple defined functions, since the compiler will create the methods in every source<br />

file they are used.<br />

Consider <strong>for</strong> example the following header file sqr.hpp:<br />

#ifndef athens sqr hpp<br />

#define athens sqr hpp<br />

inline double sqr(double a)<br />

{ return a∗a;}<br />

#endif<br />

2.9 Arrays<br />

C based programming languages are not very good at working with arrays. In this section, we<br />

discuss the language concepts <strong>for</strong> arrays. In Section 4.9, we will present more practical software<br />

<strong>for</strong> arrays and other complicated mass data structures.<br />

An array is created as follow<br />

int x[10];<br />

The variable x is a constant size array. It allows <strong>for</strong> fast creation (it is typically stored on the<br />

stack).<br />

Arrays are accessed by square brackets: x[i] is a reference to the ith element. The first element<br />

is x[0], the last one is x[9]. Arrays can be initialized at the definition<br />

float v[]= {1.0, 2.0, 3.0}, w[]= {7.0, 8.0, 9.0};<br />

In this case, the array size is deduced.<br />

Operations on arrays are typically per<strong>for</strong>med in loops, e.g. to compute x = v − 3w as vector<br />

operation is realized by<br />

float x[3];<br />

<strong>for</strong> (int i= 0; i < 3; i++)<br />

x[i]= v[i] − 3.0 ∗ w[i];<br />

One can also define arrays of higher dimension


50 CHAPTER 2. <strong>C++</strong> BASICS<br />

float A[7][9]; // a 7 by 9 matrix<br />

int q[3][2][3] // a 3 by 2 by 3 array<br />

The language does not provide linear algebra operations upon the arrays. There<strong>for</strong>e we will<br />

build our own linear algebra and look <strong>for</strong>ward to future C ++ standards coming with intrinsic<br />

higher math.<br />

Arrays have the following two disadvantages:<br />

• Indices are not checked be<strong>for</strong>e accessing an array and one can find himself outside the<br />

array and the program crashed with segmentation fault/violation. This is not even the<br />

worst case. If your program crashes you see that things go wrong. The false access can<br />

also mess up your own data, the program keeps running and produces entirely wrong<br />

results with whatever consequence you can imagine.<br />

• The size of the array must be known at compile time. 19 For instance, we have an array<br />

stored to a file and need to read it back into memory<br />

ifstream ifs(”some array.dat”);<br />

ifs ≫size;<br />

float v[size]; // error, size not known at compile time<br />

This does not work because we need the size already when the program is compiled.<br />

The first problem can be only solved with new array types and the second one with dynamic<br />

allocation. This leads us to pointers.<br />

2.10 Pointers and References<br />

2.10.1 Pointers<br />

A pointer is a variable that contains a memory address. This address can be that of another<br />

variable or dynamically allocated memory. Let’s start with the latter as we were looking <strong>for</strong><br />

arrays of dynamic size.<br />

int∗ y = new int[10];<br />

This allocates an array of 10 int. The size can now be chosen at run-time. We can also implement<br />

the vector reading example from the previous section<br />

ifstream ifs(”some array.dat”);<br />

int size;<br />

ifs ≫size;<br />

float∗ v= new float[size];<br />

<strong>for</strong> (int i= 0; i < size; i++)<br />

ifs ≫v[i];<br />

Pointers bear the same danger as arrays of risking to access out of range data with program<br />

crashes or data invalidation. It is also the programmer’s responsability to keep the in<strong>for</strong>mation<br />

of the array size.<br />

19 Some compilers support run-time values as array sizes. Since this is not guaranteed to with other compilers<br />

one should avoid this in portable software.


2.10. POINTERS AND REFERENCES 51<br />

Furthermore, the programmer is responsible <strong>for</strong> releasing the memory when not needed anymore.<br />

This is done by<br />

delete[] v;<br />

As we came from arrays, we made the second step be<strong>for</strong>e the first one regarding pointer usage.<br />

The simple use of pointers is allocating one single data item.<br />

int∗ ip = new int;<br />

Releasing such memory is per<strong>for</strong>med by<br />

delete ip;<br />

Note the duality of allocation and release: the single-object allocation requires a single-object<br />

release and the array allocation demands an array release. 20<br />

Pointers can also refer to other variables<br />

int i= 3;<br />

int∗ ip2= &i;<br />

The operator & takes an object and returns its address. The reverse operator is ∗ that takes<br />

an address and returns object.<br />

int j= ∗ip2;<br />

This is called dereferencing. It is clear from the context whether the symbol ∗ represents a<br />

dereference or a multiplication.<br />

A danger of pointers are memory leaks. For instance, our array y became too small and we<br />

want assign a new array<br />

int∗ y = new int[15];<br />

We can now use more space in y. Nice. But what happened with the memory that we allocated<br />

be<strong>for</strong>e? It is still there but we have no access to it anymore. We cannot release it anymore.<br />

This memory is lost <strong>for</strong> the rest of our program execution. Only when the program is finished<br />

the operation system will be able to free it. In the example it is only 40 byte out of how many<br />

Gigabyte you might have. But if this happens with larger data in an iterative process the dead<br />

memory grows and at some point the program crashes when all memory is used.<br />

The warnings above are not intended as fun killers. And we do not discourage the use of<br />

pointers. Many things can be only achieved with pointers: lists, queues, trees, graphs, . . . But<br />

pointers must be used with utter care to avoid all the really serious problems mentioned above.<br />

There are two strategies to minimize pointer-related errors:<br />

Use standard implementations from the standard library or other validated libraries. std::vector<br />

from the standard library provides you all the functionality of dynamic arrays, including<br />

resizing and range check, and the memory is released automatically, see § 4.9. Smart pointers<br />

from Boost provide automatic resource management: dynamically allocated memory<br />

that is not referred by a smart pointer is released automatically, see § 11.2.<br />

20 TODO: Otherwise?


52 CHAPTER 2. <strong>C++</strong> BASICS<br />

Encapsulate your dynamic memory management in classes. Then you have to deal with it<br />

only once per class. 21 If all memory allocated by an object is released when the object<br />

is destroyed then it does not matter how many memory you allocate. If you have 738<br />

objects with dynamic memory then it will be released 738 times. If you have called new<br />

738 times, partly in loops and branches, can you be sure that you have called delete 738<br />

times? We know that there are tools <strong>for</strong> this but these are errors you better prevent than<br />

fix. Even with the encapsulation there is probably something to fix inside the classes but<br />

this is orders of magnitude less work than having pointers spread all over your program.<br />

We have shown two main purposes of pointers:<br />

• Dynamic memory management; and<br />

• Referring other objects.<br />

For the <strong>for</strong>mer there is no alternative to pointers, dynamic memory handling needs pointers,<br />

either directly or using classes that contain pointers. To refer to other objects, there exist<br />

another kind of types called reference (surprise, surprise) that we will introduce in the next<br />

section.<br />

2.10.2 References<br />

The following code introduces a reference:<br />

int i= 5;<br />

int& j= i;<br />

j= 4;<br />

std::cout ≪ ”j = ” ≪ j ≪ ’\n’;<br />

The variable j is referring to i. Changing j will also alter i and vice versa, as in the example. i<br />

and j will always have the same value. One can think of a reference as an alias. Whenever one<br />

defines a reference, one must directly say what it is referring to (other then pointers). It is not<br />

possible to refer to another variable later.<br />

So far, that does not sound extremely useful. References are extremely useful <strong>for</strong> function<br />

arguments (§ 2.6), <strong>for</strong> refering parts of other objects (e.g. the seventh entry of a vector), and<br />

<strong>for</strong> building views ( 22 ).<br />

2.10.3 Comparison between pointers and references<br />

The advantage of pointers over references is the ability of dynamic memory management and<br />

address calculation. On the other hand, references refer to defined locations 23 , they always<br />

must refer to something, they do not leave memory leaks (unless you play really evil tricks),<br />

and they have the same notation in usage as the referred object.<br />

21<br />

It is save to assume that there are many more objects than classes; otherwise there is something wrong with<br />

the program.<br />

22<br />

TODO: reref to a section when it is written<br />

23<br />

References can refer to arbitrary addresses but one must work hard to achieve this. For your own savefy we<br />

will not show you how to make reference to behave as badly as pointers.


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 53<br />

Feature Pointers References<br />

Referring defined location - +<br />

Mandatory initialisation - +<br />

Avoidance of memory leaks - +<br />

Object-like notation - +<br />

Memory management + -<br />

Adress calculation + -<br />

Table 2.2: comparison between pointers and references<br />

For short, references are not idiot-proof but much less error-prone than pointers. Pointers<br />

should be only used when dealing with dynamic memory and even then one should do this via<br />

well-tested types or encapsulate the pointer within a class.<br />

2.10.4 Do Not Refer Outdated Data<br />

Variables in functions are only valid within this function, <strong>for</strong> instance:<br />

double& square ref(double d) // DO NOT!<br />

{<br />

double s= d ∗ d;<br />

return s;<br />

}<br />

The variable s is not valid anymore after the function is finished. If you are lucky the memory<br />

where s was stored is not overwritten yet. But this is nothing one can count on. Good compilers<br />

will warn you that you are referring local variable. Sadly enough we have seen examples in web<br />

tutorial that do this!<br />

The same applies correspondingly to pointers:<br />

double∗ square ptr(double d) // DO NOT!<br />

{<br />

double s= d ∗ d;<br />

return &s;<br />

}<br />

This is as wrong as it is <strong>for</strong> references.<br />

There are cases where functions, esp. member functions return references and addresses and<br />

the destruction order of object impedes the invalidation of references, 24 cf. § ??.<br />

2.11 Real-world example: matrix inversion<br />

TODO: I am not sure anymore if this is very good here. I still think we should propagate<br />

abstraction and demonstrate how to develop reusable software but the section feels now a bit<br />

misplaced. At the beginning of the next chapter is not much better. Maybe a good intro<br />

paragraph saves the situation.<br />

24 Un<strong>for</strong>tunately there are ways to circumvent this and an exception to this rule.


54 CHAPTER 2. <strong>C++</strong> BASICS<br />

As a practical exercise, we now go step-by-step through the development process of a function<br />

<strong>for</strong> matrix inversion. This is easier than it seems. 25 For it, we use the Matrix Template Library 4<br />

— see http://www.mtl4.org. It already provides most of the functionality we need. 26<br />

In the program development, we follow some principles of Extreme Programming, especially<br />

writing tests first and implement the functionality afterwards. This has two significant advantages:<br />

• It prevents you as programmer (to some extend) from featurism — the obsession to add<br />

more features instead of finishing one thing after another. If you write down what you<br />

want to achieve you work more directly towards this goal and accomplish it usually much<br />

earlier. When writing the function call you specify the interface of the function you plan<br />

implementing, when testing your results against expected values you say something about<br />

the semantics of your function. Thus, tests are compilable documentation. The tests<br />

might not tell everything about the functions and classes you are going to implement, but<br />

what it says it does very precisely. Documentation in text can be much more detailed and<br />

comprehensible but also much vaguer than tests.<br />

• If you start writing tests after you finally finished the implementation — say on a late<br />

Friday afternoon — You Do Not Want To See It Failing. You will write the test with your<br />

nice data (whatever this means <strong>for</strong> the program in question) and minimize the risk that<br />

it fails. You might decide going home and swear to God that you will test it on Monday.<br />

For those reasons, you will be more honest if you write your tests first. Of course, you can<br />

modify your tests later if you realize that something does not work or you changed the design<br />

of some item or you want test more details. It goes without saying that verifying partial<br />

implementations requires uncommenting parts of your test, temporarily.<br />

Be<strong>for</strong>e we start implementing our inverse function and even the tests we have to choose an<br />

algorithm. We can use determinants of sub-matrices, block algorithms, Gauß-Jordan, or LU<br />

decomposition with or without pivoting. Let’s say we prefer LU factorization with column<br />

pivoting so that we have<br />

LU = P A,<br />

with a unit lower triangular matrix L, an upper triangular matrix U, and a permutation matrix<br />

P . Thus it is<br />

A = P −1 LU<br />

and<br />

A −1 = U −1 L −1 P. (2.1)<br />

We use the LU factorization from MTL4, implement the inversion of the lower and upper<br />

triangular matrix and compose it appropriately.<br />

Now we start with our test by defining an invertible matrix and printing it out.<br />

int main(int argc, char∗ argv[])<br />

{<br />

const unsigned size= 3;<br />

typedef dense2D Matrix;<br />

Matrix A(size, size);<br />

A= 4, 1, 2,<br />

25 At least with the implementations we already have.<br />

26 It actualy provides the inversion function inv already but we want to learn now how to get there.


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 55<br />

1, 5, 3,<br />

2, 6, 9;<br />

cout ≪ ”A is:\n” ≪ A;<br />

For later abstraction we define the type Matrix and the constant size. The LU factorization in<br />

MTL4 is per<strong>for</strong>med in place. To not alter our original matrix we copy it into a new one.<br />

Matrix LU(A);<br />

We also define a vector <strong>for</strong> the permutation computed in the factorization.<br />

mtl::dense vector Pv(size);<br />

These are the two arguments <strong>for</strong> the LU factorization<br />

lu(LU, Pv);<br />

For our purpose it is more convenient to represent the permutation as matrix<br />

Matrix P(permutation(Pv));<br />

cout ≪ ”Permutation vector is ” ≪ Pv ≪ ”\nPermutation matrix is\n” ≪ P;<br />

For instance to show A in its permutated <strong>for</strong>m 27<br />

cout ≪ ”Permuted A is \n” ≪ Matrix(P ∗ A);<br />

We now define an identity matrix of appropriate size and extract L and U from our in-place<br />

factorization<br />

Matrix I(matrix::identity(size, size)), L(I + strict lower(LU)), U(upper(LU));<br />

Note that the unit diagonal of L is not stored and needs to be added. It could also be treated<br />

implicitly but we refrain from it <strong>for</strong> the sake of simplicity. We have now finished the preliminaries<br />

and come to our first test. If we had computed the inverse of U, say UI, the product must be<br />

the identity matrix, approximately.<br />

Matrix UI(inverse upper(U));<br />

cout ≪ ”inverse(U) [permuted] is:\n” ≪ UI ≪ ”UI ∗ U is:\n” ≪ Matrix(UI ∗ U);<br />

assert(one norm(Matrix(LI ∗ L − I)) < 0.1);<br />

Testing results of non-trivial numeric calculation <strong>for</strong> equality is quite certain to fail. There<strong>for</strong>e,<br />

we used the norm of the matrix difference as criterion. Likewise, the inversion of L (with a<br />

different function) is tested.<br />

Matrix LI(inverse lower(L));<br />

cout ≪ ”inverse(L) [permuted] is:\n” ≪ LI ≪ ”LI ∗ L is:\n” ≪ Matrix(LI ∗ L);<br />

assert(one norm(Matrix(LI ∗ L − I)) < 0.1);<br />

This enables us to calculate the inverse of A itself and test its correctness<br />

Matrix AI(UI ∗ LI ∗ P);<br />

cout ≪ ”inverse(A) [UI ∗ LI ∗ P] is \n” ≪ AI ≪ ”A ∗ AI is\n” ≪ Matrix(AI ∗ A);<br />

assert(one norm(Matrix(AI ∗ A − I)) < 0.1);<br />

27 If you wonder why we explicitly built a matrix <strong>for</strong> P ∗ A, you have wait until Chapter 5.3 <strong>for</strong> understanding<br />

that some functions return special types that need special treatment. Future versions of MTL4 will minimize the<br />

need of such special treatments.


56 CHAPTER 2. <strong>C++</strong> BASICS<br />

A function computing the inverse must return the same value and also pass the test agains<br />

identity:<br />

Matrix A inverse(inverse(A));<br />

cout ≪ ”inverse(A) is \n” ≪ A inverse ≪ ”A ∗ AI is\n” ≪ Matrix(A inverse ∗ A);<br />

assert(one norm(Matrix(A inverse ∗ A − I)) < 0.1);<br />

After establishing tests <strong>for</strong> all components of our calculation we start with their implementations.<br />

The first function we program is the inversion of an upper triangular matrix. This function<br />

takes a dense matrix as argument and returns another matrix:<br />

dense2D inline inverse upper(dense2D const& A) {<br />

}<br />

Since we do not need another copy of the input matrix we pass it as reference. The argument<br />

shall not be changed so we can pass it as const. The constancy has several advantages:<br />

• We improve the reliability of our program. Arguments passed as const are guaranteed<br />

not to change, if we accidentally modify them the compiler will tell us and abort the<br />

compilation. There is a way to remove the constancy but this should only be used as<br />

last resort, e.g. <strong>for</strong> interfacing obsolete libraries written by others. Everything you write<br />

yourself can be realized without eliminating the constancy of arguments.<br />

• Compilers can optimize better when the objects are guaranteed not to alter.<br />

• In case of references, the function can be called with expressions. Non-const references<br />

require to store the expression into a variable and pass the variable to the function.<br />

Another comment, people might tell you that it is too expensive to return containers as results<br />

and it is more efficient to use references. This is true — in principle. For the moment we accept<br />

this extra cost and pay more attention to clarity and convenience. Later in this book we will<br />

introduce techniques how to minimize the cost of returning containers from functions.<br />

So much <strong>for</strong> the function signature, let us now turn our attention to the function body. The<br />

first thing we do is verifying that our argument is valid. Obviously the matrix must be square:<br />

const unsigned n= num rows(A);<br />

assert(num cols(A) == n); // Matrix must be square<br />

The number of rows is needed several times in this function and is there<strong>for</strong>e stored in a variable,<br />

well constant. Another prerequisite is that the matrix has no zero entries in the diagonal. We<br />

leave this test to the triangular solver.<br />

Speaking of which, we can get our inverse triangular matrix with a triangular solver of a linear<br />

system, which we find in MTL4, more precisely the k-th vector of U −1 is the solution of<br />

Ux = ek<br />

where ek is the k-th unit vector. First we define a temporary variable <strong>for</strong> the result.<br />

dense2D Inv(n, n);<br />

Then we iterate over the columns of Inv:


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 57<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />

}<br />

In each iteration we need the k-th unit vector.<br />

dense vector e k(n);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

if (i == k)<br />

e k[i]= 1.0;<br />

else<br />

e k[i]= 0.0;<br />

The triangular solver returns a column vector. We could assign the entries of this vector directly<br />

to entries of the target matrix:<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

Inv[i][k]= upper trisolve(A, e k)[i];<br />

This is nicely short but we would compute upper trisolve n times! Although we said that per<strong>for</strong>mance<br />

is not our primary goal at this point, the raise of overall complexity from order 3 to 4 is<br />

too much waste of resources. There<strong>for</strong>e, we better store the vector and copy the entries from<br />

there.<br />

dense vector res k(n);<br />

res k= upper trisolve(A, e k);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

Inv[i][k]= res k[i];<br />

Return our temporary matrix finishes the function that we now give in its complete <strong>for</strong>m.<br />

dense2D inverse upper(dense2D const& A)<br />

{<br />

const unsigned n= num rows(A);<br />

assert(num cols(A) == n); // Matrix must be square<br />

}<br />

dense2D Inv(n, n);<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />

dense vector e k(n);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

if (i == k)<br />

e k[i]= 1.0;<br />

else<br />

e k[i]= 0.0;<br />

dense vector res k(n);<br />

res k= upper trisolve(A, e k);<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

Inv[i][k]= res k[i];<br />

}<br />

return Inv;


58 CHAPTER 2. <strong>C++</strong> BASICS<br />

Now that the function is complete, we first run our test. Evidently, we have to uncomment<br />

part of the test because we only implemented one function so far. But it is worth to know if<br />

this first function already behaves as expected. It does and we could be now happy with it and<br />

turn our attention to the next task, there are still many. But we will not.<br />

Well, at least we can be happy to have a correctly running function. Nevertheless, it is still<br />

worth spending some time to improve it. Such improvements are called refactoring. Experience<br />

from practise has shown that refactoring immediately after implementation is takes much less<br />

time than later modification when bugs are discovered, the software is ported to other plat<strong>for</strong>ms<br />

or extended <strong>for</strong> more usability. Obviously, it is much easier now to simplify and structure our<br />

software immediately when we still know what is going on than in some week/months/years or<br />

when somebody else is refactoring it.<br />

First thing we might dislike is that something so simple as the initialization of a unit vector<br />

takes 5 lines. This is rather verbose. Putting the if statement in one line<br />

is badly structured.<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

if (i == k) e k[i]= 1.0; else e k[i]= 0.0;<br />

C ++ and even good ole C have a special operator <strong>for</strong> conditions<br />

<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />

e k[i]= i == k ? 1.0 : 0.0;<br />

The conditional operator ‘?:’ usually needs some time to get used to but it results in a more<br />

concise representation. There are also situations where one cannot use an if but the ?: operator.<br />

Although, we have not changed anything semantically in the program and it seems obvious that<br />

the result will still be the same, it cannot harm to run our test again. You will see, how often<br />

you are sure that your program changes could never possibly change the behavior but still do.<br />

And the sooner you realize the better. And with the test we already wrote it only takes a few<br />

seconds and makes you feel more confident.<br />

If we would like to be really cool we could explore some insider know how. The expression<br />

‘i == k’ returns a boolean and we know that bool can be converted implicitely into int. In this<br />

conversation false results in 0 and true returns 1 according to the standard. This are precisely<br />

the values we want as double:<br />

e k[i]= double(i == k);<br />

In fact, the conversion from int to double is per<strong>for</strong>med implicitly and can be omitted:<br />

e k[i]= i == k;<br />

As cute as this looks, it is some stretch to assign a logical value to a floating point number. It is<br />

well-defined by the implicit conversion chain bool → int → double but it will confuse potential<br />

readers and you might end up explaining them what is happening on a mailing list or you add<br />

a comment to the program. In both cases you end up writing more <strong>for</strong> the explication than you<br />

saved in the program.<br />

Another thought that might occur to us is that it is probably not the last time we need a unit<br />

vector. So, why don’t writing a function <strong>for</strong> it?


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 59<br />

dense vector inline unit vector(unsigned k, unsigned n)<br />

{<br />

dense vector v(n, 0.0);<br />

v[k]= 1;<br />

return v;<br />

}<br />

As the function returns the unit vector we can just take it as argument of the triangular solver<br />

res k= upper trisolve(A, unit vector(k, n));<br />

For a dense matrix, MTL4 allows us to access a matrix column as column vector (instead of a<br />

sub-matrix). Then we can assign the result vector directly without a loop.<br />

Inv[irange(0, n)][k]= res k;<br />

As short explanation, the bracket operator is implemented in a manner that integer indices<br />

<strong>for</strong> rows and columns returns the matrix entry while ranges <strong>for</strong> rows and columns returns a<br />

sub-matrix. Likewise, a range of rows and a single column gives you a column of the according<br />

matrix — or part of this column. Vice versa, a row vector can be extracted from a matrix with<br />

an integer as row index and a range <strong>for</strong> the columns.<br />

This is an interesting example how to deal with the limitations as well as possibilities of C ++.<br />

Other languages have ranges as part of their intrinsic notation, e.g. Python has a symbol ‘:’<br />

<strong>for</strong> expressing ranges of indices. C ++ does not have this symbol but we can introduce a new<br />

type — like MTL4’s irange — and define the behavior of operator[] <strong>for</strong> this type. This leads to<br />

an extremely powerful mechanism!<br />

Extending Operator Functionality<br />

Since we cannot introduce new operators into C ++— not now (in 2010), not<br />

in the next standard (C ++0x), maybe in that afterwards — we define new<br />

types and give operators the desired behavior when applied to those types.<br />

This technique allows us providing a very broad functionality with a limited<br />

number of operators.<br />

The operator semantics on user types shall be intuitive and must be consistent with the operator<br />

priority (see example in § 2.3.7).<br />

Back to our algorithm. We store the result of the solver in a vector and then we assign it to a<br />

matrix column. In fact, we can assign the triangular solver’s result directly.<br />

Inv[irange(0, n)][k]= upper trisolve(A, unit vector(k, n));<br />

The range of all indices is predefined as iall:<br />

Inv[iall][k]= upper trisolve(A, unit vector(k, n));<br />

Next, we explore some mathematical back-ground. The inverse of an upper triangular matrix<br />

is also upper triangular. Thus, we only need to compute the upper part of the result and set<br />

the remainder to 0 — or the whole matrix to zero be<strong>for</strong>e computing the upper part. Of course,


60 CHAPTER 2. <strong>C++</strong> BASICS<br />

we need smaller unit vectors now and only sub-matrices of A. This can nicely be expressed with<br />

ranges:<br />

Inv= 0;<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k)<br />

Inv[irange(0, k+1)][k]= upper trisolve(A[irange(0, k+1)][irange(0, k+1)], unit vector(k, k+1));<br />

Admittedly, the irange makes the expression hard to read. Although it looks like a function,<br />

irange is a type and we just created objects on the fly and passed them to passed them to the<br />

operator[]. As we use the same range 3 times, it is shorter to create a variable (or a constant).<br />

<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />

const irange r(0, k+1);<br />

Inv[r][k]= upper trisolve(A[r][r], unit vector(k, k+1));<br />

}<br />

This does not only make the second line shorter, it is also easier to see that this is all the same<br />

range.<br />

Another observation: after shortening the unit vectors they all have the one in the last entry.<br />

Thus, we only need the size of the vector and the position of the one is implied:<br />

dense vector inline last unit vector(unsigned n)<br />

{<br />

dense vector v(n, 0.0);<br />

v[n−1]= 1;<br />

return v;<br />

}<br />

We choose a different name to reflect the different meaning. Nonetheless, we wonder if we really<br />

want such a function. How is the probability to need this ever again? Charles H. Moore,<br />

the creator of the programming language Forth once said that “The purpose of functions is not<br />

to hash a program into tiny pieces but to create highly reusable entities.” All this said, we<br />

prefer the more general function that is much more likely to be useful later.<br />

After all these modifications, we are now satisfied with the implementation and go to the next<br />

function. We still might change something at a later point in time but having it made clearer<br />

and better structured will make the later modification much easier <strong>for</strong> us or somebody else.<br />

The more experience you gain, the less steps you will need to achieve the implementation that<br />

makes you happy. And it goes without saying that we tested the inverse upper repeatedly while<br />

modifying it.<br />

Now that we know how to invert triangular matrices we can do the same <strong>for</strong> the lower triangular<br />

accordingly. Alternatively we can just transpose the input and output:<br />

dense2D inline inverse lower(dense2D const& A)<br />

{<br />

dense2D T(trans(A));<br />

return dense2D(trans(inverse upper(T)));<br />

}<br />

Ideally this implementation should look like this:<br />

dense2D inline inverse lower(dense2D const& A)<br />

{<br />

return trans(inverse upper(trans(T)));<br />

}


2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 61<br />

This does not work yet <strong>for</strong> technical reasons but will in the future.<br />

You may argue that the transpostions and passing the matrix and the vector once more takes<br />

more time. More importantly, we know that the lower matrix has a unit diagonal and we did<br />

not explore this property, e.g. <strong>for</strong> avoiding the divisions in the triangular solver. We could<br />

even ignore or omit the diagonal and treat this implicitly in the algorithms. This is all true.<br />

However, we prioritized the simplicity and clarity of the implementation and the reusability<br />

aspect higher than per<strong>for</strong>mance here. 28<br />

We have now all we need to put the matrix inversion together. As above we start we checking<br />

the squareness.<br />

dense2D inline inverse(dense2D const& A)<br />

{<br />

const unsigned n= num rows(A);<br />

assert(num cols(A) == n); // Matrix must be square<br />

Then we per<strong>for</strong>m the LU factorization. For per<strong>for</strong>mance reasons this function does not return<br />

the result but takes its arguments as mutable references and factorizes in place. Thus, we need<br />

a copy of a matrix to pass and a permutation vector of appropriate size.<br />

dense2D PLU(A);<br />

dense vector Pv(n);<br />

lu(PLU, Pv);<br />

The upper triangular factor PU of the permuted A is stored in the upper triangle of PLU. The<br />

lower triangular factor PL is partly stored in the strict lower triangle of PLU while the unit<br />

diagonal is omitted. We there<strong>for</strong>e need to add it be<strong>for</strong>e inversion (or alternatively handle the<br />

unit diagonal implicitly in the inversion).<br />

dense2D PU(upper(PLU)), PL(strict lower(PLU) + matrix::identity(n, n));<br />

The inversion of a square matrix according to Equation (2.1) can then be per<strong>for</strong>med in one<br />

single line: 29<br />

return dense2D(inverse upper(PU) ∗ inverse lower(PL) ∗ permutation(Pv));<br />

During this section you have seen that you have always alternatives to implement the same<br />

behavior, most likely you already made this experience be<strong>for</strong>e. Despite we suggested <strong>for</strong> every<br />

choice we made that it is the most appropriate, there is not always THE single best solution and<br />

even while trading off pro and cons of the alternatives, one might not come to a final conclusion<br />

and just pick one. We also illustrated that the choices depend on the goals, <strong>for</strong> instance the<br />

implementation would look different if per<strong>for</strong>mance were the primary goal.<br />

The section shall show as well that that non-trivial programs are not written in a single sweep<br />

by an ingenious mind — exceptions might prove the rule — but are the result of a gradually<br />

improving development. Experience will make this journey shorter and directer but we will not<br />

write the perfect program at the first glance.<br />

28 People that care about per<strong>for</strong>mance do not use matrix inversion in the first place.<br />

29 The explicit conversion can probably be omitted in later versions of MTL4.


62 CHAPTER 2. <strong>C++</strong> BASICS<br />

2.12 Exercises<br />

2.12.1 Age<br />

Write a program that asks input from the keyboard and prints the result on the screen and a<br />

file. The question is: What is your age?<br />

2.12.2 Exercise on include<br />

We provide you the following files: foo.hpp included by bar1.hpp and bar2.hpp. The main<br />

program is in main.cpp.<br />

Compile and try to link the program. It should not link. Correct errors so that it links.<br />

2.12.3 Arrays and pointers<br />

1. Write the following declarations: pointer to a character, array of 10 integers, pointer to<br />

an array of 10 integers, pointer to an array of character strings, pointer to pointer to a<br />

character, integer constant, pointer to an integer constant, constant pointer to an integer.<br />

Initialize all of the objects.<br />

2. Read a sequence of double’s from an input stream. Let the value 0 define the end of a<br />

sequence. Print the values in the input order. Remove duplicate values. Sort the values<br />

be<strong>for</strong>e printing.<br />

3. Make a small program that creates arrays on the stack (fixed size arrays) and arrays on<br />

the heap (using allocation, i.e. new). Use valgrind to check what happens when you do<br />

not use delete correctly.<br />

2.12.4 Read the header of a Matrix-Market file<br />

The Matrix Market data <strong>for</strong>mat is used to store dense and sparse matrices in ASCII <strong>for</strong>mat.<br />

The header contains some in<strong>for</strong>mation about the type and the size of the matrix. For a sparse<br />

matrix, the data are stored in three columns. The first column is the row number, the second<br />

column the column number, and the third column the numerical value. If the matrix is complex,<br />

a fourth column is added <strong>for</strong> the imaginary part.<br />

An example of a Matrix Market file is:<br />

%%MatrixMarket matrix coordinate real general<br />

%<br />

% ATHENS course matrix<br />

%<br />

2025 2025 100015<br />

1 1 .9273558001498543E-01<br />

1 2 .3545880644900583E-01<br />

...................


2.12. EXERCISES 63<br />

The first line that does not start with % contains the number of rows, the number of columns<br />

and the number of non-zero elements on the sparse matrix.<br />

Use fstream to read the header of a MatrixMarket file and print the number of rows and columns,<br />

and the number of nonzeroes on the screen.<br />

2.12.5 String manipulation programs<br />

There is a type string in the standard library. This type contains a large number of string<br />

operations, such as string concatenation, string comparison, etc. Note the include of the header<br />

file string.<br />

#include <br />

#include <br />

int main()<br />

{<br />

std::string s1 = ”Hello”;<br />

std::string s2 = ”World”;<br />

std::string s3 = s1 + ”, ” + s2 ;<br />

std::cout ≪ s3 ≪ std::endl ;<br />

return 0;<br />

}<br />

In this example we have concatenated the strings s1 and s2 together with a string constants.<br />

Per<strong>for</strong>m the following exercises:<br />

1. Write a function itoa (int i, std::string& b) that constructs a string representation of i in b<br />

and returns b.<br />

2. Write a simple encryption program. It should read the input from cin and write the<br />

encrypted symbols in cout. Use the following simple encryption scheme: the code <strong>for</strong> a<br />

symbol c is c key[i] , where key is a string given as a parameter to a function. The symbols<br />

from key are used in a cyclic way. (After the repeated encryption with a same key key you<br />

should get the source string.)


64 CHAPTER 2. <strong>C++</strong> BASICS<br />

2.13 Operator Precedence<br />

The following table gives all operators on one page <strong>for</strong> quickly seeing their priorities, <strong>for</strong> meaning<br />

see Table 2.3.6. Semicolons are only separators.<br />

Operator Precedence<br />

class name :: member; namespace name :: member; :: name; :: qualified-name<br />

object . member; pointer → member; expr[ expr ]<br />

object [ expr ]; expr ( expr list ); type ( expr list ); lvalue ++; lvalue −−<br />

typeid ( type ); typeid ( expr ); dynamic cast < type > ( expr )<br />

static cast < type > ( expr ); reinterpret cast < type > ( expr )<br />

const cast < type > ( expr )<br />

sizeof expr; sizeof ( type ); ++ lvalue; −− lvalue; ∼ expr; ! expr; − expr<br />

+expr; & lvalue; ∗ lvalue; new type; new type( expr list )<br />

new ( expr list ) type; new ( expr list ) type( expr list )<br />

delete pointer; delete [ ] pointer; ( type ) expr<br />

object.∗ pointer to member; pointer → ∗ pointer to member<br />

expr ∗ expr; expr / expr; expr % expr<br />

expr + expr; expr − expr<br />

expr ≪ expr; expr ≫ expr<br />

expr < expr; expr expr; expr >= expr<br />

expr == expr; expr != expr<br />

expr & expr<br />

expr ˆ expr<br />

expr | expr<br />

expr && expr<br />

expr || expr<br />

expr ? expr: expr<br />

lvalue = expr; lvalue ∗= expr; lvalue /= expr; lvalue %= expr; lvalue += expr<br />

lvalue −= expr; lvalue ≪= expr; lvalue ≫= expr; lvalue &= expr<br />

lvalue |= expr; lvalue ˆ= expr<br />

throw expr<br />

expr , expr


Classes<br />

Chapter 3<br />

“Computer science is no more about computers than astronomy is about telescopes.”<br />

— Edsger W. Dijkstra.<br />

“Accordingly, computer science is more than programming language details.”<br />

Good programming is more then drilling on small language details and more then cleverly<br />

manipulating specific bits on the latest and greatest computer hardware. Focusing primarily<br />

on technical details can lead to clever codes that per<strong>for</strong>m a certain task in a certain context<br />

extremely efficiently. If one is good at this one might even create the fastest solution <strong>for</strong> this<br />

task and gain the admiration of the geeks.<br />

3.1 Program <strong>for</strong> universal meaning not <strong>for</strong> technical details<br />

Writing leading-edge scientific software with such an attitude is very painful and likely to fail.<br />

The most important tasks in scientific programming are:<br />

• Identifying the mathematical abstractions that are important in the domain; and<br />

• Representing this abstractions comprehensively and efficiently in software.<br />

Common abstractions that appear in almost every scientific application are vector spaces and<br />

linear operators. A linear operator projects from one vector space to another one.<br />

First we should decide how to represent this abstraction in a program. Be v an element of a<br />

vector space and L a linear operator. Then C ++ allows us to represent the application of L on<br />

v as<br />

or<br />

L(v)<br />

L ∗ v<br />

Which one is better suited is not so easy to say. What is easy to say is that both are better<br />

then<br />

65


66 CHAPTER 3. CLASSES<br />

apply symm blk2x2 rowmajor dnsvec multhr athlon(L.data addr, L.nrows, L.ncols,<br />

L.ldim, L.blksch, v.data addr, v.size);<br />

Developing software in this fashion is far from being fun. It wastes so much energy of the<br />

programmer. Getting such calls right is of course much more work than the <strong>for</strong>mer notations.<br />

If one of the arguments is stored in a different <strong>for</strong>mat, the function call must be meticulously<br />

adapted. Remember the person who implements the linear projection wanted to do science,<br />

actually.<br />

The cardinal error of scientific software providing such interfaces — there is even worse than<br />

our example — is to commit to too many technical details in the user interface. The reason lies<br />

partly in the usage of simplistic programming languages as C and Fortran 77 or in the ef<strong>for</strong>t to<br />

interoperate with software in these languages.<br />

Advise<br />

If you ever get <strong>for</strong>ced to write software that interoperates with C or Fortran,<br />

write your software first with a concise and intuitive interface in C ++ <strong>for</strong><br />

yourself and other C ++ programmer and add the C and Fortran interface on<br />

top of it.<br />

The elegant way of writing scientific software is to use and to provide the best abstraction. A<br />

good implementation reduces the user interface to the essential behavior and omits all surplus<br />

commitments to technical details. Applications with a concise and intuitive interface can be as<br />

efficient as their ugly and detail-obsessed counterparts.<br />

In our example, this is achieved by providing a class <strong>for</strong> every specific linear operator and implement<br />

the projection type-dependently. 1 This way, we can apply the projection without given<br />

all details and the user application is short and nice. This chapter will show the foundations of<br />

how providing new abstraction in scientific software and the following chapters will elaborate<br />

this.<br />

3.2 Class members<br />

Object types are called classes in C ++, defined by the class keyword. A class defines a new data<br />

type, which can be used to create objects. A class is a collection of:<br />

• data;<br />

• functions which are also referred to as member functions or methods;<br />

• types<br />

Furthermore class members can be public or private and classes can inherit from each other.<br />

Let us now give an example to illustrate the class concept. To have something tangible <strong>for</strong><br />

scientists, we refrain from foo and bar examples but implement gradually a class complex (al-<br />

1 Specializations <strong>for</strong> specific plat<strong>for</strong>ms can also be handled with the type system.


3.2. CLASS MEMBERS 67<br />

though this already exist). This class must contain variables to store the real and the imaginary<br />

part:<br />

class complex<br />

{<br />

double r, i;<br />

};<br />

Variables within a class are called ‘member variables’.<br />

3.2.1 Access attributes<br />

All items — variables, constants, functions, and types — of a class have access attributes. C ++<br />

provides the following three attributes:<br />

• public: Accessible from everywhere;<br />

• private: Accessible only within the class; and<br />

• protected: Accessible only within the class and in derived classes.<br />

The access attributes give the class designer good control how the class users can utilize the<br />

class. Defining more public members gives more freedom in usage but less control and vice<br />

versa more private members establishes a stricter user interface. Protected members are less<br />

restrictive then private ones and more restrictive then public ones. Since inheritence is not a<br />

major topic in this book, they are not very important in this context. All class members are<br />

by default ‘private’.<br />

3.2.2 Member functions<br />

It is common practice in object-oriented software to declare member variables as private and<br />

access them with functions. We do this here in a Java style:<br />

class complex<br />

{<br />

public:<br />

double get r() { return r; }<br />

void set r(double newr) { r = newr; }<br />

double get i() { return i; }<br />

void set i(double newi) { i = newi; }<br />

private:<br />

double r, i;<br />

};<br />

Functions in a class are called ‘member functions’. Member functions are also private by default,<br />

i.e. they can only be called by functions within the class. This is evidently not particularly<br />

useful <strong>for</strong> our getters and setters.<br />

There<strong>for</strong>e we declared them ‘public’. Public member functions and variables can be accessed<br />

outside the class. So, we can write c.get r() but not c.r. The class above can be used in the<br />

following way:


68 CHAPTER 3. CLASSES<br />

int main()<br />

{<br />

complex c1, c2;<br />

// set c1<br />

c1.set r(3.0);<br />

c1.set i(2.0);<br />

// copy c1 to c2<br />

c2.set r(c1.get r());<br />

c2.set i(c1.get i());<br />

return 0;<br />

}<br />

In line 3 we created two objects of type complex. Then we set one of the objects and copied it<br />

to the other one. This works but it is a bit clumsy, isn’t it?<br />

C ++ provides another keyword <strong>for</strong> defining classes: struct. The only difference 2 is that members<br />

are by default public, there<strong>for</strong>e the example above is equivalent to:<br />

struct complex<br />

{<br />

double get r() { return r; }<br />

void set r(double newr) { r = newr; }<br />

double get i() { return i; }<br />

void set i(double newi) { i = newi; }<br />

private:<br />

double r, i;<br />

};<br />

Our member variables can only be accessed via functions. This gives the class designer the<br />

maximal control over the behavior. The setter could only accept values in a certain range. We<br />

could count how often the setter and getter is called <strong>for</strong> each complex number or <strong>for</strong> all complex<br />

numbers in the execution. The functions could have additional print-outs <strong>for</strong> debugging. 3 We<br />

could even allow the reading only at certain times of the day or writing only if the program runs<br />

on a computer with a certain IP. We will most likely not do the latter, at least not <strong>for</strong> complex<br />

numbers, but we could. If the variables are public and accessed directly, such modifications<br />

would not be possible. Nevertheless, handling the real and imaginary part of a complex number<br />

is cumbersome and we will discuss alternatives.<br />

Most C ++ programmer would not implement it this way. What would a C ++ programmer do<br />

first then? Writing constructors.<br />

3.3 Constructors<br />

What are constructors? Constructors initialize objects of classes and create a working environment<br />

<strong>for</strong> member functions. Sometimes such an environment includes resources like files,<br />

memory or locks that have to be freed after use. We come back to this later.<br />

To start with let us define a constructor <strong>for</strong> complex:<br />

2 There is really no other difference. One can define operators and virtual functions or derived classes in the<br />

same manner as with class. Per<strong>for</strong>mance of class and struct is also absolutely identical.<br />

3 A debugger is usually a better alternative to putting print-outs into programs.


3.3. CONSTRUCTORS 69<br />

class complex<br />

{<br />

public:<br />

complex(double rnew, double inew)<br />

{<br />

r= rnew; i= inew;<br />

}<br />

// ...<br />

};<br />

Thus, a constructor is a member function with the same name as the class itself. It can have<br />

an arbitrary number of arguments. In our case, two arguments are most suitable because we<br />

want to set two member variables. This constructor allows us to set c1’s values directly in the<br />

definition:<br />

complex c1(2.0, 3.0);<br />

There is a special syntax <strong>for</strong> setting member variables in constructors<br />

class complex<br />

{<br />

public:<br />

complex(double rnew, double inew) : r(rnew), i(inew) {}<br />

// ...<br />

};<br />

This not only shorter but has also another advantage. It calls the constructors of the variables in<br />

class’s constructor. For plain old data types (POD) this does not make a significant difference.<br />

The situation is another one if the members are themselves classes.<br />

Imagine you have a class that solves linear systems with the same matrix and you store the<br />

matrix in your class<br />

class solver<br />

{<br />

public:<br />

solver(int nrows, int ncols) // : A() #1 → error<br />

{<br />

A(nrows, ncols); // this is not a constructor here #2 → error<br />

}<br />

// ...<br />

private:<br />

matrix type A;<br />

};<br />

Suppose our matrix class has a constructor setting the dimensions. This constructor cannot<br />

be called in the function body of the constructor (#2). The call in #2 is interpreted as<br />

A.operator()(nrows, ncols), see § 4.8.<br />

All member variables of the class are constructed be<strong>for</strong>e the class constructor reaches the opening<br />

{. Those members — like A — that do not appear in the list after the colon are built by a constructor<br />

without arguments, called the default constructor. Correspondingly, classes that have<br />

such a constructor are called default-constructible. Our matrix class is not default-constructible<br />

and the compiler will tell us something like “Operator matrix type::matrix type() not<br />

found”. Thus, we need


70 CHAPTER 3. CLASSES<br />

class solver<br />

{<br />

public:<br />

solver(int nrows, int ncols) : A(nrows, ncols) {}<br />

// ...<br />

private:<br />

matrix type A;<br />

};<br />

Often the matrix (or whatever other object) is already constructed and we do not like to waste<br />

the memory <strong>for</strong> a copy. In this case we will use a reference to the object. A reference must<br />

be set in the constructor because this is the only place to declare what it is referring to. The<br />

solver shall not modify the matrix, so we write:<br />

class solver<br />

{<br />

public:<br />

solver(const matrix type& A) : A(A) {}<br />

// ...<br />

private:<br />

const matrix type& A;<br />

};<br />

The code also shows that we can give the constructor arguments the same names as the member<br />

variables. After the colon, which A is which? The rule is that names outside the parenthesis<br />

refer to members and inside the parenthesis the constructor arguments are hiding the member<br />

variables. Some people are confused by this rule and use different names. To what refers A<br />

inside {}? To the constructor argument. Only names that does not exist as argument names<br />

are interpreted as member variables. In fact, this is a pure scope resolution: the scope of the<br />

function — in this case the constructor — is inside the scope of the class and thus the argument<br />

names hide the class member names.<br />

Let us return to our complex example. So far, we have a constructor allowing us to set the real<br />

and the imaginary part. Often only the real part is set and the imaginary is defaulted to 0.<br />

class complex<br />

{<br />

public:<br />

complex(double r, double i) : r(r), i(i) {}<br />

complex(double r) : r(rnew), i(0) {}<br />

// ...<br />

};<br />

We can also say that the number is 0 + 0i if no value is given, i.e. if the complex number is<br />

default-constructed:<br />

complex() : r(0), i(0) {}


3.3. CONSTRUCTORS 71<br />

Advise<br />

Define a default constructor <strong>for</strong> where it is possible although it might not<br />

seem necessary when you implement the class.<br />

For the complex class, we might think that we do not need a default constructor because we<br />

can delay its declaration until we know its value. The absence of a default constructor creates<br />

(at least) two problems:<br />

• We might need the variable outside the scope in which the values are computed. For<br />

instance, if the value depends on some condition and we would declare the (complex)<br />

variable in the two branches of if, the variable would not exist after the if.<br />

• We build containers of the type, e.g. a matrix of complex values. Then the constructor of<br />

the matrix must call constructors of complex <strong>for</strong> each entry and the default constructor<br />

is the most convenient fashion to handle this.<br />

For some classes, it might be very difficult to define a default constructor, e.g. when some of<br />

the members are references. In those cases, it can be easier to accept the be<strong>for</strong>e-mentioned<br />

drawbacks instead of building badly designed default constructors.<br />

We can combine all three of them with default arguments:<br />

class complex<br />

{<br />

public:<br />

complex(double r= 0, double i= 0) : r(r), i(i) {}<br />

// ...<br />

};<br />

In the previous main function we defined two objects, one a copy of the other. We can write a<br />

constructor <strong>for</strong> this — called copy constructor:<br />

class complex<br />

{<br />

public:<br />

complex(const complex& c) : i(c.i), r(c.r) {}<br />

// ...<br />

};<br />

But we do not have to. C ++ is doing this itself. If we do not define a copy constructor, i.e. a<br />

construstor that has one argument and which is a const reference to its type, than the compiler<br />

creates this construstor implicitly. This automatically built copies each member variable by<br />

calling the variables’ copy constructors and this is exactly what we did. In cases like this where<br />

copying all members is precisely what you want <strong>for</strong> your copy constructor you should use the<br />

default <strong>for</strong> the following reasons:<br />

• It is less verbose;<br />

• It is less error-prone;<br />

• Other people know directly what your copy constructor does without reading your code;<br />

and


72 CHAPTER 3. CLASSES<br />

• Compilers might find more optimizations.<br />

There are cases where the default copy constructor does not work, especially when the class<br />

contains pointers. Say we have a simple vector class with a copy constructor:<br />

class vector<br />

{<br />

public:<br />

vector(const vector& v)<br />

: size(v.size), data(new double[size])<br />

{<br />

<strong>for</strong> (unsigned i= 0; i < size; i++)<br />

data[i]= v.data[i];<br />

}<br />

// ...<br />

private:<br />

unsigned size;<br />

double ∗data;<br />

};<br />

If we omit this copy constructor the compiler would not complain and voluntarily built one<br />

<strong>for</strong> us. We are glad that our program is shorter and sexier but sooner or later we find that it<br />

behaves bizarrely. Changing one vector, modifies another one as well and when we observe this<br />

strange behavior we have to find the error in our program. This is particularly difficult because<br />

there is no error in what we have written but in what we have omitted.<br />

Another problem we can observe is that the run-time library will complain that we freed the<br />

same memory twice. 4 The reason <strong>for</strong> this is the way pointers are copied. Only the address is<br />

copied and the result is that both pointers point to the same memory. This might be useful in<br />

some cases but most of the time it is not, at least in our domain. Some pointer-addicted geeks<br />

might see this differently.<br />

3.3.1 Explicit and implicit constructors<br />

In C ++ we distinguish implicit and explicit constructors. Implicit constructors enable in addition<br />

to object initialization implicit conversions and assignment-like notation <strong>for</strong> construction.<br />

Instead of:<br />

complex c1(3.0);<br />

we can also write:<br />

or<br />

complex c1= 3.0;<br />

complex c1= pi∗pi/6.0;<br />

This notation is <strong>for</strong> many scientifically educated people more readable. Older compilers might<br />

generate more code in initializations using ‘=’ (the object is first created with the default<br />

constructor and the value is copied afterwards) while current compiler generate the same code<br />

<strong>for</strong> both notations.<br />

4 This is an error message every programmer experiences at least once in his/her life (or he/she is not doing<br />

serious business).


3.3. CONSTRUCTORS 73<br />

The implicit conversion kicks in when one type is needed and another one is given, e.g. a double<br />

instead of a complex. Assume we have a function: 5<br />

double inline complex abs(complex c)<br />

{<br />

return std::sqrt(real(c) ∗ real(c) + imag(c) ∗ imag(c));<br />

}<br />

and call this with a double, e.g.:<br />

cout ≪ ”|7| = ” ≪ complex abs(7.0) ≪ ’\n’;<br />

The constant ‘7.0’ is considered as a double but there is no function ‘complex abs’ <strong>for</strong> double.<br />

There is a function <strong>for</strong> complex and complex has a constructor that accepts a double. So, the<br />

complex value is implicitly built from the double.<br />

This can be <strong>for</strong>bidden by declaring the constructor as ‘explicit’:<br />

class complex { public:<br />

explicit complex(double nr= 0.0, double i= 0.0) : r(nr), i(i) {}<br />

};<br />

Then complex abs would not be called with a double or any other type complex. To call this<br />

function with a double we can write an overload <strong>for</strong> double or construct a complex explicitly in<br />

the call:<br />

cout ≪ ”|7| = ” ≪ complex abs(complex(7.0)) ≪ ’\n’;<br />

The explicit attribute is really important <strong>for</strong> the vector class. There will be a constructor taken<br />

the size of the vector as argument:<br />

class vector<br />

{<br />

public:<br />

vector(int n) : my size(n), data(new double[my size]) {}<br />

};<br />

A function computing a scalar product will expect two vectors as arguments:<br />

double dot(const vector& v, const vector& w) { ... }<br />

Calling this function with integer arguments<br />

double d= dot(8, 8);<br />

will compile. What happened? Two temporary vectors of size 8 are created with the implicit<br />

constructor and passed to the function dot. This nonsense can be easily avoided by declaring<br />

the constructor explicit.<br />

Discussion 3.1 Which constructor shall be explicit is in the end the class designer’s decision.<br />

It is pretty obvious in the vector example: no right-minded programmer wants the compiler<br />

converting integers automatically into vectors.<br />

Whether the constructor of the complex class should be explicit depends on the expected utilization.<br />

Since a complex number with a zero imaginary part is mathematically identical with<br />

5 The definitions of real and imag will be given soon.


74 CHAPTER 3. CLASSES<br />

a real number, the implicit conversion does not create semantic inconsistencies. An implicit<br />

constructor is more convenient because doubles and double literals can be given whereever a<br />

complex is expected. Functions that are not per<strong>for</strong>mance-critical can be implemented only once<br />

<strong>for</strong> complex and used <strong>for</strong> double. Vice versa, in per<strong>for</strong>mance-critical applications it might be<br />

preferable using an explicit constructor because the compiler will refuse to call complex functions<br />

with double arguments. Then the programmer can implement overload of those functions with<br />

double arguments that do not waste run time on null imaginaries.<br />

That does not mean that high-per<strong>for</strong>mance implementations necessarily have to be realized with<br />

explicit constructors. The implicit conversion might happen in rarely called functions and the<br />

impact on the overall per<strong>for</strong>mance might be negligible. The compiler cannot tell us but a profiling<br />

tool can. A function that consumes less than 1 % of the execution time is not worth to spend<br />

much time on tuning it. All this considered, there are more reasons <strong>for</strong> an implicit constructor<br />

than <strong>for</strong> an explicit one and so it is implemented in std::complex.<br />

3.4 Destructors<br />

A destructor is a function that is called every time an object of this class is destroyed, <strong>for</strong><br />

example:<br />

∼complex()<br />

{<br />

std::cout ≪ ”So long and thanks <strong>for</strong> the fish.\n”;<br />

}<br />

Since the destructor is the complementary operation of the default constructor it uses the<br />

complementary notation in the signature. Opposed to the constructor there is only one single<br />

overload and arguments are not allowed — what could they are good <strong>for</strong> anyway, as grave<br />

goods? There is no live after death in C ++.<br />

In our example, there is nothing to do when a complex number is destroyed and we can omit<br />

the destructor. A destructor is needed when the object acquired resources, e.g. memory. In<br />

this cases the memory must be freed in the destructor and the other ressource be released.<br />

class vector<br />

{<br />

public:<br />

// ...<br />

∼vector()<br />

{<br />

if (data) // check if pointer was allocated<br />

delete[] data;<br />

}<br />

// ...<br />

private:<br />

unsigned my size;<br />

double ∗data;<br />

};<br />

Files that are opened with std::ifstream or std::ofstream does not need to closed explicitly, their<br />

destructors will do this if necessary. Files that are opened with old C handles require explicit<br />

closing and this is only one reason <strong>for</strong> not using them.


3.5. ASSIGNMENT 75<br />

It must be paid attention that the freed ressources are not used or released somewhere else in<br />

the program afterwards. C ++ generates a default destructor in the same way as the default<br />

constructor: calling the destructor of each member but in the reverse order. 6<br />

3.5 Assignment<br />

Assignment operators are used to enable <strong>for</strong> user-defined types expressions like:<br />

x= y;<br />

u= v= w= x;<br />

As usual we consider first the class complex. Assigning a complex to a complex requires an<br />

operator like:<br />

complex& operator=(const complex& src)<br />

{<br />

r= src.r; i= src.i;<br />

return ∗this;<br />

}<br />

Evidently, we copy the members ‘r’ and ‘i’. The operator returns a reference to the object<br />

<strong>for</strong> enabling multiple assignments. ‘this’ is a pointer to the object itself and since we need a<br />

reference <strong>for</strong> syntactic reasons it is dereferred. What happens if we assign a double?<br />

c= 7.5;<br />

It compiles without the definition of an assignment operator <strong>for</strong> double. Once again, we have a<br />

implicit conversion: the implicit constructor creates a complex on the fly and assigns this one.<br />

If this becomes a per<strong>for</strong>mance issue we can add an assignment <strong>for</strong> double:<br />

complex& operator=(double nr)<br />

{<br />

r= nr; i= 0;<br />

return ∗this;<br />

}<br />

An assignment operator like the first one that assigns a an object of the same type is called<br />

Copy Assignment and this operator is synthesized by the compiler. In the case of complex<br />

numbers the generated copy assignment operator per<strong>for</strong>ms exactly what we need, copying all<br />

members.<br />

As <strong>for</strong> the vector the synthesized operator is not satisfying because it only copies the address<br />

of the data and not the data itself. The implementation is very similar to the copy constructor:<br />

vector& operator=(const vector& src)<br />

{<br />

if (this == &src)<br />

return ∗this;<br />

assert(my size == src.my size);<br />

<strong>for</strong> (int i= 0; i < my size; i++)<br />

data[i]= src.data[i];<br />

6 TODO: Good and short explanation why. If possible with example.


76 CHAPTER 3. CLASSES<br />

}<br />

return ∗this;<br />

In fact every class implementation where the copy assignment and the copy constructor have<br />

essential differences in their implementation are very confusing in their behavior and should not<br />

be used, cf. [SA05, p. 94]. The two operations differ in the respect that a constructor creates<br />

content in a new object while an assignment replaces content in an existing object. However,<br />

both the creation as well as the replacement is per<strong>for</strong>med with a copy semantics and the two<br />

operations should behave consistently there<strong>for</strong>e.<br />

An assignment of an object to itself (source and target have the same address) can be skipped,<br />

line 3 and 4. In line 5 it is tested whether the assignment is a legal operation by checking<br />

the equality of their size. Alternatively the assignment could resize the target if the sizes are<br />

different but that does not correspond to the authors’ understanding of vector behavior — or<br />

can you think of a context in mathematics or physics where a vector space all of a sudden<br />

changes its dimension.<br />

3.6 Automatically Generated Operators<br />

If you define a class without operators C ++ will generate the following four:<br />

• Default constructor;<br />

• Copy constructor;<br />

• Destructor; and<br />

• Copy assignment.<br />

Assume you have a class without any function but with some member variables like this:<br />

class my class<br />

{<br />

type1 var1;<br />

type2 var2;<br />

// ...<br />

typen varn;<br />

};<br />

Then the compiler adds the four operators and your class behaves as you would have written:<br />

class my class<br />

{<br />

public:<br />

my class()<br />

: var1(),<br />

var2(),<br />

// ...<br />

varn()<br />

{}<br />

my class(const my class& that)<br />

: var1(that.var1),<br />

var2(that.var2),


3.6. AUTOMATICALLY GENERATED OPERATORS 77<br />

{}<br />

//...<br />

varn(that.varn)<br />

∼my class()<br />

{<br />

varn.∼typen();<br />

// ...<br />

var2.∼type2();<br />

var1.∼type1();<br />

}<br />

my class& operator=(const my class& that)<br />

{<br />

var1= that.var1;<br />

var2= that.var2;<br />

// ...<br />

varn= that.varn;<br />

return ∗this;<br />

}<br />

private:<br />

type1 var1;<br />

type2 var2;<br />

// ...<br />

typen varn;<br />

};<br />

The generation is straight <strong>for</strong>ward. The four operators are respectively called on each member<br />

variable. The careful reader has realized that the constructors and the assignment is per<strong>for</strong>med<br />

in the exact order as the variables are defined. The destructors are called in reverse order.<br />

The generation of these operators will be disabled if you define your own. The rules <strong>for</strong> this<br />

are quite simple. The simplest is <strong>for</strong> the destructor: either you define it or the compiler does.<br />

There is only one destructor (because it has no arguments). The default constructor generation<br />

is disabled when any constructor is defined by the user — even a private constructor.<br />

The copy constructor and copy assignment operator are generated automatically unless there<br />

is a user-defined version <strong>for</strong> the class type or a reference of it. In detail, if the user defines one<br />

or two of the following:<br />

• return type operator=(my class that);<br />

• return type operator=(const my class& that); or<br />

• return type operator=(my class& that);<br />

Then the compiler does not generated it. Typically, one defines only the second operator<br />

because the first one causes an extra copy 7 and the last one requires mutability what is usually<br />

not necessary <strong>for</strong> the assignment. The copy constructor can only be defined <strong>for</strong> references<br />

because it need itself <strong>for</strong> passing a value as argument. Defining a constructor or assignment <strong>for</strong><br />

any other type does not disable the generation of the copy operators.<br />

7 An exception is user-defined move semantics. 8


78 CHAPTER 3. CLASSES<br />

This mechanism applies recursively. For instance, if type1 is itself a class with an automatically<br />

generated default constructor the default constructors of its members are called in the order<br />

of their definition. Of those variables or some of them are also classes then their default<br />

constructors are called and so <strong>for</strong>th. If the type of a member variable is an intrinsic type like int<br />

or float then there are evidently no such operators because the types are no classes. However,<br />

the behavior can be easily emulated: the “default constructor” just creates it with a random<br />

value (whatever bits where set on the according memory position be<strong>for</strong>e determine its value),<br />

the “copy constructor” and the “copy assignment” copy the values and the “destructor” does<br />

nothing.<br />

3.7 Accessing object members<br />

3.7.1 Access functions<br />

In § 3.2.2 we introduced getters and setters to access the variables of the class complex. This<br />

becomes cumbersome when we want <strong>for</strong> instance increment the real part:<br />

c.set r(c.get r() + 5.);<br />

This does not really look like numeric operations and is not very readable either. A better way<br />

dealing with this is writing a member function that returns a reference:<br />

class complex { public:<br />

double& real() { return r; }<br />

};<br />

With this function we can write:<br />

c.real()+= 5.;<br />

This looks already much better but still a little bit weird. Why not incrementing like this:<br />

real(c)+= 5.;<br />

To do this, we write a free function:<br />

inline double& real(complex& c) { return c.r; }<br />

But this function access the private member ‘r’. We can modify the free function calling the<br />

member function:<br />

inline double& real(complex& c) { return c.real(); }<br />

Or alternatively declaring the free function as friend of complex:<br />

class complex { public:<br />

friend double& real(complex& c);<br />

};<br />

Functions or classes that are friends can access private and protected data. A strange issue<br />

with this free function is that the inline attribute must be written be<strong>for</strong>e the reference type.<br />

Usually it does not matter whether the inline is written be<strong>for</strong>e or after the return type. 9<br />

9 TODO: Anybody a decent explanation <strong>for</strong> this?


3.7. ACCESSING OBJECT MEMBERS 79<br />

This function works only the complex number is not constant. So we also need a function that<br />

takes a constant reference as argument. In return it can only provide a constant reference of<br />

the number’s real part.<br />

inline const double& real(const complex& c) { return c.r; }<br />

This function requires a friend declaration, too.<br />

The functions — in free as well as in member <strong>for</strong>m — can evidently only be called when object<br />

is created. The references of the number’s real part that we use in the statement<br />

real(c)+= 5.;<br />

exist only until the end of the statement. The variable c lives longer. We can create a reference<br />

variable:<br />

double &rr= real(c);<br />

C ++ destroys objects in reverse order. That means that even if rr and c are in the same function<br />

or block, c lives longer than rr.<br />

The same is true <strong>for</strong> constant references if objects from variable declarations are referred.<br />

Temporary objects can also be passed as constant references enabling the definition of outdated<br />

references:<br />

const double &rr= real(complex()); // Bad thing!!!<br />

cout ≪ ”The real part is ” ≪ rr ≪ ’\n’;<br />

The complex variable is created temporarily and only exist until the end of the first statement.<br />

The reference to its real part lives till the end of the surrounding block.<br />

Advise<br />

Do Not Make Constant References Of Temporary Expressions!<br />

They are invalid be<strong>for</strong>e you use them the first time.<br />

3.7.2 Subscript operator<br />

A really stupid way to access vector entries would be writing a function <strong>for</strong> each one:<br />

class vector<br />

{<br />

public:<br />

double& zeroth() { return data[0]; }<br />

double& first() { return data[1]; }<br />

double& second() { return data[2]; }<br />

// ...<br />

int size() const { return my size; }<br />

};


80 CHAPTER 3. CLASSES<br />

One could not even write a loop over all elements.<br />

To enable such iteration, we need a function like:<br />

class vector<br />

{<br />

public:<br />

double at(int i)<br />

{<br />

assert(i >= 0 && i < my size);<br />

return data[i];<br />

}<br />

};<br />

Summing the entries of vector v reads:<br />

double sum= 0.0;<br />

<strong>for</strong> (int i= 0; i < v.size(); i++)<br />

sum+= v.at(i);<br />

C ++ and C access entries of (fixed-size) arrays with the subscript operator. It is, thus, only<br />

natural doing the same <strong>for</strong> (dynamically sized) vectors. Then we could rewrite the previous<br />

example as:<br />

double sum= 0.0;<br />

<strong>for</strong> (int i= 0; i < v.size(); i++)<br />

sum+= v[i];<br />

This is more concise and shows more clearly what we are doing.<br />

The operator overloading has the same syntax as the assignment operator and the implementation<br />

from function at:<br />

class vector<br />

{<br />

public:<br />

double& operator[](int i)<br />

{<br />

assert(i >= 0 && i < my size);<br />

return data[i];<br />

}<br />

};<br />

With this operator we can access vector elements with brackets but only if the vector is mutable<br />

vectors.<br />

3.7.3 Constant member functions<br />

This raises the more general question: How can we write operators and member functions that<br />

accept constant objects? In fact, operators are a special <strong>for</strong>m of member functions and can be<br />

called like a member function:<br />

v[i]; // is syntactic sugar <strong>for</strong>:<br />

v.operator[](i);


3.7. ACCESSING OBJECT MEMBERS 81<br />

Of course, the long <strong>for</strong>m is almost never called but it illustrates that operators are regular<br />

functions that only provide an extra syntax to call them.<br />

Free functions allow qualifying the const-ness of each argument. Member functions do not even<br />

mention the processed object in the signature. How const-ness can be specified then? There is<br />

a special notation that notates the applicability of a member function to constant objects after<br />

the function header, e.g. our subscript operator:<br />

class vector<br />

{<br />

public:<br />

const double& operator[](int i) const<br />

{<br />

assert(i >= 0 && i < my size);<br />

return data[i];<br />

}<br />

};<br />

The const attribute is not just a casual gesture of the programmer that he/she does not mind<br />

calling this member function with a constant object. C ++ takes this constancy very seriously<br />

and will verify that the function does not modify the object, i.e. some of its members, that the<br />

object is only passed as const when free functions are called and that called member functions<br />

have the const attribute as well.<br />

This constancy guarantee also impedes returning non-constant pointers or references. One can<br />

return constant pointers or references as well as objects. A returned object does not need to<br />

be constant (but it could) because it is a copy of the object, of one of its member variables<br />

(or constants), or of a temporary variable; and because it is a copy the object is guaranteed to<br />

remain unchanged.<br />

Constant member functions can be called <strong>for</strong> non-constant objects (because C ++ implicitly<br />

converts non-constant references into constant references when necessary). There<strong>for</strong>e, it is<br />

often sufficient to provide only the constant member function. For instance a function that<br />

returns the size of the vector:<br />

class vector<br />

{<br />

public:<br />

int size() const { return my size; }<br />

// int size() { return my size; } // futile<br />

};<br />

The non-constant size function does the same as the constant one and is there<strong>for</strong>e useless.<br />

For our subscript operator we need both the constant and the mutable version. If we only<br />

had the constant member function, we could use it to read the elements of both constant and<br />

mutable vectors but we could not modify the elements. By the way, our abandonned getters<br />

should have been const since they are only used to read values regardless of whether the object<br />

is constant or mutable.<br />

3.7.4 Accessing multi-dimensional arrays<br />

Let us assume that we have a simple matrix class like the following:


82 CHAPTER 3. CLASSES<br />

class matrix<br />

{<br />

public:<br />

matrix() : nrows(0), ncols(0), data(0) {}<br />

matrix(int nrows, int ncols)<br />

: nrows(nrows), ncols(ncols), data( new double[nrows ∗ ncols] ) {}<br />

matrix(const matrix& that)<br />

: nrows(that.nrows), ncols(that.ncols), data(new double[nrows ∗ ncols])<br />

{<br />

<strong>for</strong> (int i= 0, size= nrows∗ncols; i < size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

∼matrix() { if (data) delete [] data; }<br />

void operator=(const matrix& that)<br />

{<br />

assert(nrows == that.nrows && ncols == that.ncols);<br />

<strong>for</strong> (int i= 0, size= nrows∗ncols; i < size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

int num rows() const { return nrows; }<br />

int num cols() const { return ncols; }<br />

private:<br />

int nrows, ncols;<br />

double∗ data;<br />

};<br />

So far, the implementation is done in the same manner as be<strong>for</strong>e: variables are private, the<br />

constructors establish defined values <strong>for</strong> all members, the copy constructor and the assignment<br />

are consistent, size in<strong>for</strong>mation are provided by a constant function.<br />

What is still missing is the access to the matrix entries.<br />

Be aware!<br />

The bracket operator accepts only one argument.<br />

That means we cannot define<br />

double& operator[](int r, int c) { ... }<br />

Approach 1: Parenthesis<br />

The simplest way handling multiple indices is replacing the square brackets with parentheses:<br />

double& operator()(int r, int c)


3.7. ACCESSING OBJECT MEMBERS 83<br />

{<br />

}<br />

return data[r∗ncols + c];<br />

Adding range checking — in a separate function <strong>for</strong> better reuse — can safe us a lot of debug<br />

time in the future. We also implement the constant access:<br />

private:<br />

void check(int r, int c) const { assert(0


84 CHAPTER 3. CLASSES<br />

Approach 3: Returning proxies<br />

Instead of returning a pointer we can build a specific type that keeps a reference to the matrix<br />

and the row index and that provide an operator[] <strong>for</strong> accessing matrix entries. This proxy must<br />

be there<strong>for</strong>e a friend of the matrix class to reach its private data. Alternatively, we can keep<br />

the operator with the parentheses and call this one from the proxy. In both cases, we encounter<br />

cyclic dependencies. 10<br />

If we have several matrix types, each of them would need its own proxy. We would also need<br />

different proxies <strong>for</strong> constant and mutable access respectively. In Section 6.5 we will show how<br />

to write a proxy that works <strong>for</strong> all matrix types. The same templated proxy will handle constant<br />

and mutable access. Fortunately, it even solves the problem of mutual dependencies. The only<br />

minor flaw is that eventual errors cause lenghty compiler messages.<br />

Approach 4: Multi-index type (advanced)<br />

Preliminary note: this approach contains several new language features and discusses some<br />

subtle details. If you do not understand the first time, don’t worry. If you like to skip it, do<br />

it. That will not be a problem <strong>for</strong> understanding the rest of the book. But please read the<br />

comparing discussion.<br />

The fact that operator[] accepts only one argument does not necessarily mean that we cannot<br />

give two. But we need a tricky technique to build one object out of two, without explicitly<br />

constructing the object. The implementation is based on the matrix example from an onlinetutorial<br />

[Sch].<br />

First, we define a type:<br />

struct double index<br />

{<br />

double index (int i1, int i2): i1 (i1), i2 (i2) {}<br />

int i1, i2;<br />

};<br />

For this type we define the access operator:<br />

double& operator[](double index i) { return data[i.i1∗ncols + i.i2]; }<br />

const double& operator[](double index i) const { return data[i.i1∗ncols + i.i2]; }<br />

Now we can write:<br />

A[double index(1, 0)];<br />

This works but it was not the concise notation we were looking <strong>for</strong>.<br />

We introduce a second type:<br />

struct single index<br />

{<br />

single index (int i1): i1 (i1) {}<br />

double index operator, (single index j) const {<br />

10 The dependencies cannot be resolved with <strong>for</strong>ward declaration because we not only define references or<br />

pointers but call member functions in the matrix and in the proxy. We will explain this in § ??.


3.7. ACCESSING OBJECT MEMBERS 85<br />

};<br />

}<br />

return double index (i1, j.i1);<br />

operator int() const { return i1; }<br />

single index& operator++ () {<br />

++i1; return ∗this;<br />

}<br />

int i1;<br />

This new type overloaded the comma operator so that a second index creates a double index.<br />

The constructor is implicit and the class contains an operator to int. This enables the compiler<br />

to switch between single index and int in both ways.<br />

This allows us to write code like:<br />

or<br />

single index i= 0, j= 1;<br />

std::cout ≪ ”A[0, 1] is ” ≪ A[i, j] ≪ ’\n’;<br />

<strong>for</strong> (single index i= 0; i < A.num rows(); ++i)<br />

<strong>for</strong> (single index j= 0; j < A.num cols(); ++j)<br />

std::cout ≪ ”A[” ≪ i ≪ ”, ” ≪ j ≪ ”] is ” ≪ A[i, j] ≪ ’\n’;<br />

In the loop, an single index (i) is compared with an int (A.num rows()). This comparison operator<br />

is not defined. The compiler converts i implicitly to an int and compares the values as int.<br />

Thus, the conversion operator allows us to use all operations that are defined <strong>for</strong> int without<br />

implementing them.<br />

At this opportunity we can introduce another operator. C and C ++ provide a prefix and postfix<br />

increment/decrement. The difference only manifests if we read the incremented/decremented<br />

value, e.g., j= i++; is differs from j= ++i; by having the old value of i in j (in the first statement)<br />

or the already incremented i (in the second statement). If the increment is the only expression<br />

in the statement, e.g., i++; or ++i;, there is no semantic difference. There<strong>for</strong>e, it does not<br />

matter <strong>for</strong> loops whether we use the postfix or prefix notation.<br />

<strong>for</strong> (single index i= 0; i < A.num rows(); ++i)<br />

is (semantically) equivalent to:<br />

<strong>for</strong> (single index i= 0; i < A.num rows(); i++)<br />

For C ++ integer types it really does not matter. For user-defined types, the compiler will tell<br />

us that this operation is not defined. The GNU Compiler emits the following error message:<br />

no ≫operator++(int)≪ <strong>for</strong> suffix ≫++≪ declared, instead prefix operator tried<br />

Fortunately, it already reveals the solution.<br />

The operator++ without arguments is understood as prefix operator. To define a postfix operator<br />

we must define it with a dummy int argument. This argument has no effect but we need<br />

a way to define the symbol ++ as prefix and postfix operator. Unary operators are defined<br />

as member functions without argument. This works <strong>for</strong> all other unary operators but in case


86 CHAPTER 3. CLASSES<br />

of the decrement/increment we have the same symbol <strong>for</strong> two operators respectively that are<br />

distinguished by the position.<br />

To make a long story short, if we write i++ we must define the postfix increment:<br />

single index operator++ (int)<br />

{<br />

single index tmp(∗this);<br />

++i1;<br />

return tmp;<br />

}<br />

We see that the operation requires an extra copy. The object itself must be incremented but<br />

the returned valued must be still the old one. If we returned the object itself, i.e. ∗this, then we<br />

had no possibility to increment it after the return. There<strong>for</strong>e we need a copy be<strong>for</strong>e we modify<br />

the object. Alternatively we could omit the copy and return a new object with the old value:<br />

single index operator++ (int)<br />

{<br />

++i1;<br />

return single index(i1 − 1);<br />

}<br />

This avoids the copy at the beginning but we still create a new object. These implementations<br />

show that the postfix operators are somewhat more expensive than prefix operators; and this<br />

true <strong>for</strong> all user-defined types. For C ++-own types the compiler can generate efficient executables<br />

<strong>for</strong> both <strong>for</strong>ms.<br />

The really sad part of the story is that we put so much ef<strong>for</strong>t returning the old value of our<br />

index and does not even use it. There<strong>for</strong>e, we give the following<br />

Advise<br />

If you increment or decrement user-defined types prefer the prefix notation,<br />

especially if the value of the changed variable is not used in the statement.<br />

In the examples, we declared both indices as single index. It is sufficient doing this <strong>for</strong> the first<br />

one and let the implicit constructor convert the second one:<br />

A[single index(0), 1]<br />

Un<strong>for</strong>tunately, we cannot write<br />

A[0, 1]<br />

The compiler will give an error message 11 like:<br />

no match <strong>for</strong> ≫operator[]≪ in ≫A[(0, 0)]≪<br />

To call operator[], the compiler would need to per<strong>for</strong>m multiple steps that depend on each other:<br />

first the zeros that are considered int would need to be converted to single index and then the<br />

11 This is the message from GNU compiler.


3.7. ACCESSING OBJECT MEMBERS 87<br />

comma operator has to be applied on them. A language that would allow such dependent<br />

conversions would end up in extremely long compile times to considered all possibilities 12 and<br />

probability of ambiguities would increase tremendously.<br />

Instead the compiler considers ‘0, 0’ as a sequence of two expressions where each expression is<br />

an integer constant. The result of a sequence is the result of the last expression, i.e. the integer<br />

constant zero in our case. This cannot be converted into a double index.<br />

To throw in a really bad idea, we give the second constructor argument of double index a default<br />

value:<br />

struct double index<br />

{<br />

double index (int i1, int i2= 0) // Very bad<br />

: i1 (i1), i2 (i2) {}<br />

int i1, i2;<br />

};<br />

Then the expression A[0, 1] compiles, as well as A[0, 1, 2, 3, 4]. It evaluates the integer sequence<br />

and the result is the last expression. A single integer can be implicitly converted into double index.<br />

As a result, the last integer is considered the row and the column is zero.<br />

Comparing the approaches<br />

The previous implementations show that C ++ allows us to provide different notations <strong>for</strong> userdefined<br />

types and we can implement it in the manner that seems most appropriate to us. The<br />

first approach was replacing square brackets by round parentheses to enable multiple arguments.<br />

This was the simplest solution and if one is willing to accept this syntax, one can safe oneself<br />

the length we went through to come up with a fancier notation. The technique of returning a<br />

pointer was not complicated either but it relies to strongly on the internal representation. If<br />

we use some internal blocking or some other specialized internal storage scheme, we will need<br />

an entirely different technique. Another drawback was that we cannot test the range of the<br />

column index.<br />

The last approach, introduced special types and the fact that we must always specify the type<br />

of the index explicitly makes the notation <strong>for</strong> constant indices clumsier instead of clearer. It<br />

also introduced a lot of implicit conversions and in a large code base we might have enormous<br />

trouble to avoid ambiguities. Another un<strong>for</strong>tunate aspect is the overloading of the comma<br />

operator. It makes the understanding of programs more difficult — because one has to pay a<br />

lot of attention to the types of expressions to distinguish it from non-overloaded sequences —<br />

and can cause weird affects. Thus, our first recommendation is keep reading since the proxy<br />

solution in § ?? is in our opinion preferable to the previous approaches (although not perfect<br />

either).<br />

Resuming, C ++ gives us the opportunity to handle programming tasks in different ways. Several<br />

times, none of the solutions will be perfect. Even if oneself is satisfied with the solution,<br />

then there will be most certainly some (allegedly) experienced C ++ programmer who finds a<br />

disadvantage.<br />

12 It might even become undecidable.


88 CHAPTER 3. CLASSES<br />

There are two lessons we can learn from this, firstly:<br />

Advise<br />

Don’t push C ++ too far! Avoid fragile features and minimize implicit conversions.<br />

C ++ enables many techniques but that doesn’t mean one have to use them all. Especially the<br />

comma operator bears so much danger that its utilization must be limited to very rare cases<br />

or better avoided entirely. It is important to have an appropriate notation and time spent on<br />

syntactic sugar is really worthwhile <strong>for</strong> the sake of better usability of new classes. But some<br />

tricks provide a little improvement in the syntax and create large problems in the interplay with<br />

other techniques.<br />

Secondly:<br />

Advise<br />

If you can’t find a perfect solution, pick what serves you best and accept it.<br />

We dare the hypothesis that there is no single C ++ program that everybody is happy with. The<br />

attempt to come up with the world’s first perfect C ++ program will end in failure and bitterness.<br />

Of course that does not mean always willingly accepting the first working implementation one<br />

comes up with. Software always can be improved and should be. As mentions in § 2.11,<br />

experiences have shown that is most efficient to refactor software as early as possible than<br />

retroactively fixing issues when important applications crash, users are angry and the program<br />

author(s) <strong>for</strong>got the details or are already gone. On the other hand, by the time one reaches<br />

a really good implementation one has certainly spent already much more time than initially<br />

planned.<br />

3.8 Other Operators


Generic programming<br />

Chapter 4<br />

In this chapter we will explain the use of templates in C ++ to create generic functions and<br />

classes. We will also discuss metaprogramming and the Standard Template Library.<br />

4.1 Templates<br />

Templates are a feature of the C ++ programming language that create functions and classes<br />

that operate with generic types — also called parametric types. As a result, a function or class<br />

can work with many different data types without being manually rewritten <strong>for</strong> each one.<br />

A template parameter is a special kind of parameter that can be used to pass a type as an<br />

argument: just like regular function parameters can be used to pass values to a function,<br />

template parameters allow to pass also types to a function or a class. These generic functions<br />

can use these parameters as if they were any other regular type.<br />

4.2 Generic functions<br />

Generic functions — also called function templates — are in some sort generalizations of overloaded<br />

functions.<br />

Suppose we want to write the function max(x,y) where x and y are variables or expressions of<br />

some type. Using overloading, we can easily do this as follows:<br />

int inline max (int a, int b)<br />

{<br />

if (a > b)<br />

return a;<br />

else<br />

return b;<br />

}<br />

double inline max (double a, double b)<br />

{<br />

if (a > b)<br />

return a;<br />

89


90 CHAPTER 4. GENERIC PROGRAMMING<br />

}<br />

else<br />

return b;<br />

Note that the function body is exactly the same <strong>for</strong> both int and double.<br />

With the template mechanism we can write just one generic implementation:<br />

template <br />

T inline max (T a, T b)<br />

{<br />

if (a > b)<br />

return a;<br />

else<br />

return b;<br />

}<br />

The function can be used in the same way as the overloaded functions:<br />

std::cout ≪ ”The maximum of 3 and 5 is ” ≪ max(3, 5) ≪ ’\n’;<br />

std::cout ≪ ”The maximum of 3l and 5l is ” ≪ max(3l, 5l) ≪ ’\n’;<br />

std::cout ≪ ”The maximum of 3.0 and 5.0 is ” ≪ max(3.0, 5.0) ≪ ’\n’;<br />

In the first case, ‘3’ and ‘5’ are literals of type int and the max function is instantiated to<br />

int inline max (int, int);<br />

Likewise the second and third call of max instantiate<br />

long inline max (long, long);<br />

double inline max (double, double);<br />

as the literals are interpreted as long and double.<br />

In the same way the template function can be called with variables and expressions:<br />

unsigned u1= 2, u2= 8;<br />

std::cout ≪ ”The maximum of u1 and u2 is ” ≪ max(u1, u2) ≪ ’\n’;<br />

std::cout ≪ ”The maximum of u1∗u2 and u1+u2 is ” ≪ max(u1∗u2, u1+u2) ≪ ’\n’;<br />

Here the function is instantiated <strong>for</strong> short.<br />

Instead of typename one can also write class in this context but we do not recommend this<br />

because typename expresses better the intention of a generic function.<br />

What does instantiation mean? When you write a non-generic function, the compiler reads<br />

its definition, checks <strong>for</strong> errors, and generates executable code. When the compiler processes<br />

a generic function’s definition it only checks certain errors (parsing errors) and generates no<br />

executable code. For instance:<br />

template <br />

T inline max (T a, T b)<br />

{<br />

if a > b // Error !<br />

return a;<br />

else<br />

return b;<br />

}


4.2. GENERIC FUNCTIONS 91<br />

would not compile because the if statement without the parentheses is not a legal expression of<br />

the C ++ grammar. Meanwhile the following stupid implementation:<br />

template <br />

T inline max (T a, T b)<br />

{<br />

if (a > b)<br />

return max(a, b); // Infinite loop !<br />

else<br />

return max(b, a); // Infinite loop !<br />

}<br />

compiles because it does not violate any grammar rule. It obviously results in an infinite loop<br />

but this is beyond the compiler’s responsibility.<br />

So far, the compiler only checked the grammatical correctness of the definition but did not<br />

generate code. If we do not call the template function, the binary will have no trace of our max<br />

function. What happens when we call the generic function and cause their instantiation. The<br />

compiler first checks if the function can be compiled with the given argument type. It can do<br />

it <strong>for</strong> int or double as we have seen be<strong>for</strong>e. What about types that have no ‘>’? For instance<br />

std::complex. Let us try to compile:<br />

std::complex z(3, 2), c(4, 8);<br />

std::cout ≪ ”The maximum of c and z is ” ≪ ::max(c, z) ≪ ’\n’;<br />

The double colons in front of max shall avoid ambiguities with the standard libraries max that<br />

some compilers may include implicitly (as g++ apparently). Our compilation attempt will end<br />

in error like:<br />

Error: no match <strong>for</strong> ≫operator>≪ in ≫a > b≪<br />

Obviously, we cannot call the max function with types that have no “greater than” operator.<br />

In fact, there is no maximum function <strong>for</strong> complex numbers.<br />

What happens when our template function calls another template function which in turn . . . ?<br />

Likewise, these functions are only completely checked at instantiation time. Let us look at the<br />

following program:<br />

#include <br />

#include <br />

#include <br />

#include <br />

int main ()<br />

{<br />

using namespace std;<br />

vector v;<br />

sort(v.begin(), v.end());<br />

}<br />

return 0 ;<br />

Without going into detail, the problem is the same as be<strong>for</strong>e: we cannot compare complex<br />

numbers and thus not sort arrays of it. This time the missing comparison is discovered in<br />

an indirectly called function and the compiler provides you the entire call stack so that you


92 CHAPTER 4. GENERIC PROGRAMMING<br />

can trace back the error. Please try to compile this example on different compilers at your<br />

availability and see if you can make any sense out of the error messages.<br />

If you run into such lengthy error message 1 DON’T PANIC! First, look at the error itself<br />

and take out what is useful <strong>for</strong> you, e.g. missing “operator>” or something not assignable,<br />

i.e. missing “operator=” or something const that should not. Then find in the call stack your<br />

innermost code that is the part of your program where you call somebody else’s template<br />

function. Stare <strong>for</strong> a while at this and its preceding lines because this is the most likely place<br />

where the error is made. Does a type of the template function function’s argument is missing<br />

an operator or function as mentioned in the error? Do not get scared away from this, often<br />

the problem is much simpler than it seems from the never-ending error message. From our<br />

experience, most errors in template functions one can find faster than run-time errors.<br />

Another question we have not answered so far is what happens if we use two different types:<br />

unsigned u1= 2;<br />

int i= 3;<br />

std::cout ≪ ”The maximum of u1 and i is ” ≪ max(u1, i) ≪ ’\n’;<br />

The compiler tell us — this time briefly — something like<br />

Error: no match <strong>for</strong> function call ≫max(unsigned int&, int)≪<br />

Indeed, we assumed that both types are the same. Now can we write a template function with<br />

two template parameters? Of course, we can. But that does not help us much here because we<br />

would not know what return type the function had.<br />

There are different options. First we could add a non-templated function like:<br />

int inline max (int a, int b) { return a > b ? a : b; }<br />

This can be called with mixed types and the unsigned argument would be implicitly converted<br />

into an int. But what would happen if we also add a function <strong>for</strong> unsigned?<br />

int max(unsigned a, unsigned b) { return a > b ? a : b; }<br />

Shall the int be converted into an unsigned or vice versa? The compiler does not know and will<br />

complain about this ambibuity.<br />

At any rate, adding non-templated overloads to the templated implemention is far from being<br />

elegant nor productive. So, we remove all non-templated overloads and look what we can do in<br />

the function call. We can explicitly convert one argument to the type of the other:<br />

unsigned u1= 2;<br />

int i= 3;<br />

std::cout ≪ ”The maximum of u1 and i is ” ≪ max(int(u1), i) ≪ ’\n’;<br />

Now max is called with two ints. Another option is specifying the template type explicitly in<br />

the function call:<br />

unsigned u1= 2;<br />

int i= 3;<br />

std::cout ≪ ”The maximum of u1 and i is ” ≪ max(u1, i) ≪ ’\n’;<br />

1 The longest we have heard off was 18MB what corresponds to about 9000 pages of text.


4.2. GENERIC FUNCTIONS 93<br />

Then the arguments are converted to int. 2<br />

After these less pleasant details on templates one really good news: template functions per<strong>for</strong>m<br />

as efficient as their non-templated counterpart! The reason is that C ++ generates new code<br />

<strong>for</strong> every type or type combination that the function is called with. Java in contrast compiles<br />

templates only once and executes them <strong>for</strong> different types by casting them to the corresponding<br />

types. This results in faster compilation and shorter executables but it is less efficient than<br />

non-templated implementations (which are already less efficient than C ++ programs).<br />

Another price we have to pay <strong>for</strong> the fast templates is that we have longer executables because<br />

of the multiple instantiations <strong>for</strong> each type (combination). However, in practice the number of<br />

a function’s instances will not be that large and it only really matters <strong>for</strong> non-inline functions<br />

with long implementations (including called template functions). Inline functions’ binary codes<br />

are at any rate inserted directly in the exutable at the location of the function call so that the<br />

impact on the executable length is the same <strong>for</strong> template and non-template functions.<br />

4.2.1 The function accumulate<br />

TODO: An example on containers is much better than with ugly pointer arithmetic.<br />

Consider an array double a[n] which is described by its begin and end pointers a and a + n<br />

respectively. 3<br />

We create a function <strong>for</strong> the sum of an array of doubles. The loop over the array uses pointers<br />

as was explained in Section 2.9. Figure 4.1 shows the positions of the begin pointer a and the<br />

end pointer a+n that is directly past the end of the array.<br />

a<br />

❄<br />

✲<br />

a + n<br />

Figure 4.1: An array of length n with begin and end pointers<br />

Thus, we specify the range of entries by an right-open interval of adresses.<br />

2<br />

For complicated reasons of compiler internals the explicit type parameter turns off argument-dependent<br />

name lookup (ADL).<br />

3<br />

An array and a pointer are treated in much the same way in C/C ++. So one can pass an array when a<br />

pointer is expected and it takes the address of the first entry &a[0]. a + n is <strong>for</strong> a pointer or array a and an<br />

integer n equivalent to &a[n].<br />


94 CHAPTER 4. GENERIC PROGRAMMING<br />

Advise<br />

Unless you have strong reasons against it, use right-open intervals because:<br />

• It is easy to represent empty sets by two equal locations (pointers,<br />

iterators, . . . ).<br />

• It works on types without an ordering: if you specify the end by the<br />

locaation of the last element you need an operator


4.3. GENERIC CLASSES 95<br />

+= operator to variables of type T. This operator is defined <strong>for</strong> int and double types. This<br />

implies that the following main program will compile without the need <strong>for</strong> another definition of<br />

the accumulate function:<br />

int main()<br />

{<br />

const int n = 10;<br />

float a[n] ;<br />

int b[n] ;<br />

<strong>for</strong> (int i= 0; i < n; ++i) {<br />

a[i]= float(i) + 1.0f;<br />

b[i]= i + 1;<br />

}<br />

float s= accumulate(a, a + n);<br />

int r= accumulate(b, b + n);<br />

return 0;<br />

}<br />

As well as in the previous example we do not need to indicate explicitly that T is double or int.<br />

The compiler deduces this <strong>for</strong> us from the function. We can, however, fill in the correct value<br />

of the type as follows:<br />

int r = accumulate(b, b+n);<br />

If you fill in the wrong type the compiler will give you a type error by saying that no matching<br />

function exists.<br />

4.3 Generic classes<br />

In the previous section, we described the use of templates to create generic functions. Templates<br />

can also be used to create generic classes, that define a certain behaviour that is independent<br />

of the types they operate on. Good candidates are <strong>for</strong> example container classes like vectors,<br />

matrices and lists. We can also extend the complex class with a parametric value type but we<br />

spent already so much time with it that we will now look at something else.<br />

Let us write a generic vector class. 4 First we just implement a class with the most fundamental<br />

operators:<br />

template <br />

class vector<br />

{<br />

void check size(int that size) const { assert(my size == that size); }<br />

void check index(int i) const { assert(i >= 0 && i < my size); }<br />

public:<br />

explicit vector(int size)<br />

: my size(size), data( new T[my size] )<br />

{}<br />

vector()<br />

: my size(0), data(0)<br />

{}<br />

4 In the sense of linear algebra not like STL vector.


96 CHAPTER 4. GENERIC PROGRAMMING<br />

vector( const vector& that )<br />

: my size(that.my size), data( new T[my size] )<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

∼vector() { if (data) delete [] data ; }<br />

vector& operator=( const vector& that )<br />

{<br />

check size(that.my size);<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

int size() const { return my size ; }<br />

const T& operator[]( int i ) const<br />

{<br />

check index(i);<br />

return data[i];<br />

}<br />

T& operator[]( int i )<br />

{<br />

check index(i);<br />

return data[i] ;<br />

}<br />

vector operator+( const vector& that ) const<br />

{<br />

check size(that.my size);<br />

vector sum(my size);<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

sum[i]= data[i] + that[i];<br />

return sum ;<br />

}<br />

private:<br />

int my size ;<br />

T∗ data ;<br />

};<br />

Listing 4.1: Template vector class<br />

The template class is not essentially different to a non-template class. There is only the extra<br />

parameter T as placeholder <strong>for</strong> the type that the class is used with.<br />

We have member variables like my size and member functions size() that are not affected by<br />

the template parameter. Other functions like the access operator or the first constructor are<br />

parametrized. However the difference is minimal, whereever we had double (or another type)<br />

be<strong>for</strong>e we put now the type parameter T, e.g. <strong>for</strong> return types or within new. Likewise our<br />

member variables and constants can be parametrized by T as <strong>for</strong> data. Even program parts<br />

that use generic functions or data can be often implemented without explicitly stating the


4.4. CONCEPTS AND MODELING 97<br />

type parameters. For instance the destructor uses the pointer data with a template type but<br />

the delete function can deduce its type automatically and <strong>for</strong> the null pointer test it does not<br />

matter either.<br />

Template arguments can have default values. Assume, our vector class has in addition to the<br />

value type also two parameters <strong>for</strong> the orientation and location:<br />

template <br />

class vector;<br />

The arguments of a vector can be fully declared:<br />

vector v;<br />

The last argument is equal to the default value and can be omitted:<br />

vector v;<br />

As <strong>for</strong> functions, only the last arguments can be omitted. For instance, if the second argument<br />

is the default and the last one is not we must write them all:<br />

vector w;<br />

If all template arguments are the default values, we can of course omit them all. However the<br />

type is still a template class and the compiler gets confused if we skip the brackets in the type:<br />

vector x; // wrong, it is considered a non−template class<br />

vector y; // looks a bit strange but is correct<br />

Other than the defaults of function arguments, the template defaults can refer to previous<br />

template arguments:<br />

template <br />

class pair;<br />

This is a class <strong>for</strong> two values that might have different types. If they do not we do not want to<br />

declare it twice:<br />

pair p1; // object with an int and float value<br />

pair p2; // object with two int values<br />

The dependency on previous arguments can be more complex than just equality when using<br />

meta-functions that we will introduce in Chapter ??.<br />

TODO: transition to next section<br />

4.4 Concepts and Modeling<br />

In the previous sections one could get the impression that template parameters can be replaced<br />

by any type. This is in fact not entirely true. The programmer of templated classes and functions<br />

makes assumptions about the operations that can be per<strong>for</strong>med on the templated variables. So<br />

it is very important to know which types may correctly be substituted <strong>for</strong> the <strong>for</strong>mal template<br />

parameters, in C ++ lingo which types the template function or class can be instantiated with.<br />

Clearly, accumulate can be instantiated with int or double. Types without addition like a solver


98 CHAPTER 4. GENERIC PROGRAMMING<br />

class (on page 70) cannot be used <strong>for</strong> accumulate. What should be accumulated from a set of<br />

solvers? All the requirements <strong>for</strong> the template T of the function accumulate can be summarized<br />

as follows:<br />

• T is CopyConstructable;<br />

– Copy constructor T::T(const T&) exists so that ‘T a(b);’ is compilable if b is of type<br />

T.<br />

• T is PlusAssignable:<br />

– PlusAssign operator T::operator+=(const T&) exists so that ‘a+= b;’ is compilable if<br />

b is of type T.<br />

• T is Constructible from int<br />

– Constructor T::T(int) exists so that ‘T a(0);’ is compilable.<br />

Such a set of type requirements is called a ‘Concept’. A concept CR that contains all requirements<br />

of concept C and additional requirements is called a ‘Refinement’ of C. A type t that<br />

holds all requirements of concept C is called a ‘Model’ of C.<br />

A complete definition of a template function or type shall contain the list of required concepts<br />

like it is done <strong>for</strong> functions from the Standard Template Library, see http://www.sgi.com/<br />

tech/stl/.<br />

Today such requirements are mere documentation. A prototype of a C ++ Concept Compiler [?]<br />

that checks that<br />

• Whether a function can be called with a certain type (combination 5 );<br />

• Whether a class can be instantiated with a certain types (combination); and<br />

• Whether a function’s requirement list covers all used expressions, including those in subfunctions.<br />

The compiler generates short and comprehensive message when template functions or classes<br />

are used erroneously. People interested in generic programming shall try the compiler, it helps<br />

<strong>for</strong> better understanding. However, the compiler really is a prototype and must not be used in<br />

production code. This functionality was even planned <strong>for</strong> the next language standard but the<br />

committee could achieve a consensus on its details, to make a (very) long story short.<br />

Discussion 4.1 The most vulnerable aspect of generic programming is the semantic con<strong>for</strong>mance,<br />

that is which Semantic Concepts are modelled. For instance, an algorithm might require<br />

that a binary operation is associative to calculate correctly. One can express this requirement<br />

in the functions documentation but if someone calls this function with an operation that is not<br />

associative the compiler has no idea about this. If one violates a syntactical requirement than<br />

the compiler will complain about the missing function or operator — often in a hardly readable<br />

<strong>for</strong>m — but it will be caught no matter what. If one violates a semantic requirement the<br />

compiler generates erroneous executables and the compilation does not give not any warning<br />

because it is entirely unaware of the user types’ semantic. The only way to find such semantic<br />

errors in templates with today’s compilers is careful documentation (and its reading of course).<br />

Latest research gives hope that future C ++ standards and compilers will provide more reliable<br />

and elegant possibilities to ensure semantic correctness of template programs.<br />

5 If you have multiple template arguments


4.5. INHERITANCE OR GENERICS? 99<br />

For illustration purpose we like to show the conceptualized implementation of a generic sorting<br />

function as used in the library of the concept compiler:<br />

template<br />

requires LessThanComparable<br />

&& CopyAssignable<br />

&& Swappable<br />

&& CopyConstructible<br />

inline void sort(Iter first, Iter last);<br />

If the function is called erroneously, the compiler will detect this directly in the function call<br />

not deep inside its implementation.<br />

4.5 Inheritance or Generics?<br />

In this section we will discuss the commonalities and difference of/between object-oriented<br />

programming (OOP) and generic programming. People that do not know OOP so far will not<br />

learn it in this section. The purpose of this section is to motivate why we pay more attention<br />

to generic than to object-oriented programming in this book. The short answer is per<strong>for</strong>mance<br />

and applicability. If this answer is good enough <strong>for</strong> you, you can skip this section and continue<br />

with the next one. Programmers that are used to OOP and think they can implement the<br />

functionality with inheritance instead of templates should take the time and read this section.<br />

Inheritance and generic programming are similar in the sense that most programming problems<br />

that can be solved by inheritance have a generic alternative solution and vice versa. The<br />

following table summarizes the basic components of inheritance and the corresponding building<br />

blocks of generic programming:<br />

Inheritance Generic Programming<br />

base class concept<br />

derived class model<br />

In the remainder of this section we will discuss the differences between generic programming<br />

and inheritance.<br />

We will focus on functions, but similar arguments hold <strong>for</strong> templated classes. The advantage<br />

of using a base class reference or pointer as argument type of a function is that we are sure<br />

that all derived classes can be used as argument too, see § ??. 6 Inheritance in C ++ and other<br />

OOP languages is designed in a fashion that a function in a derived class can substitute (hide)<br />

the one in the base class with the identic signature. Thus calling the function <strong>for</strong> a base class<br />

argument will either use the base class’s implementation or those of the derived class (if the<br />

function is virtual). In both cases we can rely on the existance. We will explain OOP in more<br />

detail in Section ??. Here, we only name advantages and disadvantages of the two approaches<br />

regarding different aspects of programming.<br />

Compile time: With the OOP approach, function is only compiled once. The distinction<br />

between the different calculations is realized at run-time. The generic implementation requires<br />

6 TODO: OOP section is still not written yet.


100 CHAPTER 4. GENERIC PROGRAMMING<br />

a new compilation <strong>for</strong> each combination of types. As a consequence, the sources must reside in<br />

header files and cannot be stored to libraries. 7<br />

Executable size: As mentioned be<strong>for</strong>e, generic functions need multiple compilations and<br />

as a result of this, the generated executable contains code <strong>for</strong> each instatiation. A function<br />

programmed against an abstract interface exist only once. On the other hand, the virtual<br />

functions introduce some additional memory need to store the virtual function tables. Except<br />

<strong>for</strong> some pathological examples, one can expect that this additional space is less than the extra<br />

space needed <strong>for</strong> having separate machine code <strong>for</strong> every instantiation of a generic function. In<br />

extreme cases, a very large executable size can negatively impact the per<strong>for</strong>mance due to waste<br />

of cache memory.<br />

Per<strong>for</strong>mance: The higher compilation ef<strong>for</strong>ts <strong>for</strong> generic programming has a double per<strong>for</strong>mance<br />

benefit. Functions within the multi-functional computations do not need to be called<br />

indirectly via expensive function pointers but can be called directly. Whenever appropriate they<br />

can be even inlined saving the function call overhead entirely. We once measured the impact<br />

of the approaches to the per<strong>for</strong>mance of an accumulate function (a more general approach than<br />

in § 4.2.1) [?]. The generic version was in our case about 40 times faster than the inheritancebased<br />

implementation. This value varies from plat<strong>for</strong>m to plat<strong>for</strong>m but <strong>for</strong> small functions<br />

one can expect that an inlined template function is 10–100 times faster than virtual functions.<br />

Conversely, <strong>for</strong> long calculations like solving a large linear system the per<strong>for</strong>mance difference is<br />

unperceivable.<br />

Concept refinement: that is adding (syntactic) requirements is feasible with the inheritance<br />

approach but it is very tedious and obfuscates the program source, details in [?].<br />

Intrusiveness: The genericity emulation by inheritance can induce a deep class hierarchy [?],<br />

more critical <strong>for</strong> the universal applicability is that the technology is intrusive. A type cannot<br />

be used as argument of an OOP implementation if it is not derived from the according class<br />

even if the provides the correct interface! Thus, we have to add additional base class(es) to the<br />

type. This is particularly problematic if we use types from third-party libraries or intrinsic types<br />

because we cannot add base classes their. Generic functions have not such rigid constraints. We<br />

can even adapt a third-party or intrinsic type to meet a generic function’s syntactic requirements<br />

without modifying third-party programs.<br />

Time of selection: At least one advantage of the OOP-style polymorphism we should mention<br />

at the end. The argument type of generic function call must be known at compile time so that<br />

the compiler can instantiate the template function. The type of an OOP function argument can<br />

be chosen during the execution of the program and there<strong>for</strong>e depend on preceeding calculations<br />

or input data. For instance, one can define in a file which linear solver is used in an application.<br />

Résumé: It is not our goal to compare object-oriented and generic programming in general.<br />

The two approaches complete each other in many respects and this is beyond the scope of this<br />

discussion. However, when only considering the aspect of maximal applicability with optimal<br />

per<strong>for</strong>mance the generic approach is undoubtly superior. Especially if functions of a library are<br />

used with types defined outside this library, possibly necessary interface adaption is quite easy<br />

without modifying the type definition while the addition of extra base classes <strong>for</strong>ces changing<br />

the type definition what is not always possible (or desirable). In contexts where functions are<br />

used with limited numbers of types and they are defined in the same library, derivation can be<br />

7 Libraries in the classical sense that are linked with separately compiled sources as opposed to template<br />

libraries.


4.6. TEMPLATE SPECIALIZATION 101<br />

an appropriate technique to achieve polymorphism.<br />

4.6 Template Specialization<br />

Although one of the advantages of a generic implementation is that the same code can be used<br />

<strong>for</strong> all objects that satisfy the corresponding concept, this is not always the best approach.<br />

Sometimes the same behavior can be implemented more efficiently <strong>for</strong> a specific type. In<br />

principle, one can even implement a different behaviour <strong>for</strong> a specific type but this is not<br />

advisable in general because the program becomes much more complicated to understand and<br />

using the specialized classes can require a whole chain of further specialization (bearing the<br />

danger of errors when imcompletely realized). C ++ provides an enormous flexibility and the<br />

programmer is in charge to use this flexibility responsibly and <strong>for</strong> being consistent to himself.<br />

4.6.1 Specializing a Class <strong>for</strong> One Type<br />

In the following, we want to specialize our vector example from page 96 <strong>for</strong> bool. Our goal is<br />

to save memory by packing 8 bools into one byte. Let us start with the class definition:<br />

template <br />

class vector<br />

{<br />

// ..<br />

};<br />

Although our specialized class is not type-parametric, we still need the template key word and<br />

the empty triangle brackets. After the class the complete type list must be given. This syntax<br />

looks a bit cumbersome in this context but makes more sense <strong>for</strong> multiple template arguments<br />

where only some are specialized. For instance, if we had some container with 3 arguments and<br />

specialize the second one:<br />

template <br />

class some container<br />

{<br />

// ..<br />

};<br />

Back to our boolean vector class. In the class we define a default constructor <strong>for</strong> empty vectors,<br />

a constructor <strong>for</strong> vectors of size n and a destructor. In the size of the array, we have to pay<br />

some attention if the vector size is not disible by 8 because the integer division simply cuts off<br />

the remainder.<br />

template <br />

class vector<br />

{<br />

public:<br />

explicit vector(int size)<br />

: my size(size), data(new unsigned char[(my size + 7) / 8] )<br />

{}<br />

vector() : my size(0), data(0) {}


102 CHAPTER 4. GENERIC PROGRAMMING<br />

∼vector() { if (data) delete [] data ; }<br />

private:<br />

int my size;<br />

unsigned char∗ data ;<br />

};<br />

One thing we realize is that the default constructor and the destructor are identic with the<br />

non-specialized version (in the following also referred to as general version). Un<strong>for</strong>tunately, this<br />

is not ‘inherited’ to the specialization. If we write a specialization we have to define everything<br />

from scratch. We are free to omit member functions or variables from the general but <strong>for</strong> the<br />

sake of consistency we do this only <strong>for</strong> good reasons, <strong>for</strong> very good reasons. For instance, we<br />

might omit the operator+ because we have no addition <strong>for</strong> bool. The constant access operator<br />

is implemented with shifting and bit masking:<br />

template class vector<br />

{<br />

bool operator[](int i) const { return (data[i/8] ≫i%8) & 1; }<br />

};<br />

The mutable access is trickier because we cannot refer to single bits. The trick is to returns<br />

some helper type — called ‘Proxy’ — that can per<strong>for</strong>m the assignment and returning boolean<br />

from a byte reference and the position within the byte.<br />

template class vector<br />

{<br />

vector bool proxy operator[](int i)<br />

{<br />

return vector bool proxy(data[i/8], i%8);<br />

}<br />

};<br />

Let us now implement our proxy:<br />

class vector bool proxy<br />

{<br />

public:<br />

vector bool proxy(unsigned char& byte, int p) : byte(byte), mask(1 ≪ p) {}<br />

private:<br />

unsigned char& byte;<br />

unsigned char mask;<br />

};<br />

To simplify further operations we create a mask that has 1 on the position in question and 0<br />

on all other positions.<br />

The reading access is implemented by simply masking in the conversion operator:<br />

class vector bool proxy<br />

{<br />

operator bool() const { return byte & mask; }<br />

};<br />

Setting a bit is realized by an assignment operator <strong>for</strong> bool:


4.6. TEMPLATE SPECIALIZATION 103<br />

class vector bool proxy<br />

{<br />

vector bool proxy& operator=(bool b)<br />

{<br />

if (b)<br />

byte|= mask;<br />

else<br />

byte&= ∼mask;<br />

return ∗this;<br />

}<br />

};<br />

If our argument is true we ‘or’ it with the mask, i.e. on the considered position the one bit in<br />

the mask turns on the bit in the byte reference and in all other positions the zero bits in the<br />

mask leave the according positions unchanged. Reversely with a false argument, we first invert<br />

the mask and ‘and’ it with the byte reference so that the mask’s zero bit on the active position<br />

turns the bit off and on all other positions the ‘and’ with one bits conserves the old bit values. 8<br />

4.6.2 Specializing a Function to a Specific Type<br />

Functions can be specialized in the same manner as classes. Assume we have a generic function<br />

that computes the power x y and want specialize this one:<br />

template <br />

Base inline power(const Base& x, const Exponent);<br />

template <br />

double inline power(const double& x, const double& y); // Do not use this<br />

Un<strong>for</strong>tunately many of such specializations are ignored. There<strong>for</strong>e, we give the following<br />

Advise<br />

Do not use function template specialization!<br />

To specialize a function to one specific type or type tuple as above, we can simply use overloading.<br />

This works better and is even simpler. Back to our example, assume we have an entirely<br />

generic power method. 9 In the case that both arguments are double we want nevertheless use<br />

the standard implementation hoping that some caffeine-drugged geeks figured out an incredibly<br />

fast assembler hack <strong>for</strong> our plat<strong>for</strong>m and put it in our Linux distribution. Excited by the<br />

incredible per<strong>for</strong>mance — even if it is only the hope <strong>for</strong> it — we overload our power function<br />

as follows:<br />

#include <br />

template <br />

Base inline power(const Base& x, const Exponent)<br />

8 TODO: picture<br />

9 TODO: Anybody an idea <strong>for</strong> an implementation? Or a better example?


104 CHAPTER 4. GENERIC PROGRAMMING<br />

{<br />

}<br />

...<br />

double inline power(double x, double y)<br />

{<br />

return std::pow(x, y);<br />

}<br />

Speaking of plat<strong>for</strong>m-specific assembler hacks, maybe we are eager to contribute a code that<br />

explores SSE units by per<strong>for</strong>ming two computations in parallel:<br />

template <br />

Base inline power(const Base& x, const Exponent) { ... }<br />

#ifdef SSE FOR TRYPTICHON WQ OMICRON LXXXVI SUPPORTED<br />

std::pair inline power(const std::pair& x, double y)<br />

{<br />

asm {<br />

# Yo, I’m the greatestest geek under the sun!<br />

}<br />

return whatever;<br />

}<br />

#endif<br />

#ifdef ... more hacks ...<br />

What is to say about this snippet? If you do not like to write such specializations, we will<br />

not blame you. If you do, always put such hacks in conditional compilation. You have<br />

to make sure as well that your build system only enables the macro when it is definitely a<br />

plat<strong>for</strong>m that supports the hack. For the case that it does not, we must guarentee that the<br />

generic implementation or another overload can deal with pairs of double. Last but not least,<br />

you have to rewrite your applications <strong>for</strong> using this function. Convincing others to use such<br />

special implementation could be even more work than getting the assembler hack producing<br />

plausible numbers. More importantly, such special signatures undermines the ideal of a clear<br />

and intuitive programming. However, if power functions are computed on entire vectors and<br />

matrices, one could per<strong>for</strong>m the calculation pairwise internally without affecting the interface<br />

or the user application.<br />

You might also think that SSEs were yesterday and today we have GPUs and GPGPUs but<br />

programming generically still takes a lot of tricks (at least in the beginning of 2010). But this is<br />

another story and we digress. Resuming: programming <strong>for</strong> highest per<strong>for</strong>mance can be tricky<br />

but at least there often ways to explore unportable feature (where available) without sacrificing<br />

portability at the application level. 10<br />

In the previous examples, we specialized all arguments of the function. It is also possible to<br />

specialize some argument(s) and leave the remaining argument(s) as template(s):<br />

template <br />

Base inline power(const Base& x, const Exponent& y);<br />

template <br />

10 TODO: Is this comprehensible?


4.6. TEMPLATE SPECIALIZATION 105<br />

Base inline power(const Base& x, int y);<br />

template <br />

double inline power(double x, const Exponent& y);<br />

The compiler will find all overloads that match the argument combination and select the most<br />

specific. For instance, power(3.0, 2u) will match <strong>for</strong> the first and third overload where the latter<br />

is more specific. 11 To put it to higher math: 12 type specificity is a partial order that <strong>for</strong>ms a<br />

lattice and the compiler picks the maximum of the available overloads. However, you do not<br />

need to dive deeply into algebra to see which type or type combination is more specific.<br />

If we call power(3.0, 2) with the previous overloads all three matches. However, this time we<br />

cannot determine the most specific overload. The compiler will tell us that the call is ambiguous<br />

and show us overload 2 and 3 as candidates. As we implemented the overloads consisently and<br />

with optimal per<strong>for</strong>mance we might be glad with either choice but the compiler will not choose.<br />

To disambiguate the overloads we must add:<br />

double inline power(double x, int y);<br />

The lattice people from the previous paragraph will think “Of course, we were missing the join<br />

in the specificity order.” Again, one can understand C ++ without studying lattices.<br />

4.6.3 Partial Specialization<br />

If you implemented template classes you will run sooner or later in the situation where you like<br />

to specialize a template class <strong>for</strong> another template class. Suppose we have a templated complex<br />

class:<br />

template <br />

class complex;<br />

Assume further that we had some really boosting algorithmic specialization <strong>for</strong> complex vectors<br />

13 that safes tremendous compute time. Then we start specializing our vector class:<br />

template <br />

class vector;<br />

template <br />

class vector; // again ??? :−/<br />

template <br />

class vector; // how many more ??? :−P<br />

Apparently, this lacks elegance to reimplement the specialization <strong>for</strong> all possible and impossible<br />

instantiations of complex. Much worse, it destroys our ideal of universal applicability because<br />

the complex class is intended to support user-defined types as Real but the specialization of the<br />

vector class will be ignored <strong>for</strong> those types.<br />

The solution to the implementation redundancy and the ignorance of new types is ‘Partial<br />

Specialization’. We specialize our vector class <strong>for</strong> all complex instantiations:<br />

11 TODO: Exercises <strong>for</strong> which type is more specific than which.<br />

12 For those who like higher mathematics. And only <strong>for</strong> those.<br />

13 TODO: Anyone a good example?


106 CHAPTER 4. GENERIC PROGRAMMING<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

That will do the trick. Pay attention to put a space between the closing ‘¿’; otherwise the<br />

compiler will take two subsequent ‘¿’ as shift operator ‘¿¿’ and becomes pretty confused. 14<br />

This also works <strong>for</strong> classes with multiple parameters, <strong>for</strong> instance:<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

We can also specialize <strong>for</strong> all pointers:<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

Whenever the set of types is expressible by a Type Pattern we can apply partial specialization<br />

on it.<br />

Partial template specialization can be combined with regular template specialization from § 4.6.1<br />

— let us call it ‘Complete Specialization’ <strong>for</strong> distinction. In this case, the complete specialization<br />

is prioritized over the partial one. Between different partial specializations the most specific is<br />

selected. In the following example:<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

template <br />

class vector<br />

{<br />

...<br />

};<br />

the second specialization is more specific than the first one and picked when matches. In this<br />

sense a complete specialization is always more specific than a partial one.<br />

4.6.4 Partially Specializing Functions<br />

The C ++ standard committee distinguishes between explicit specialization as in the first paragraph<br />

of § 4.6.2 and implicit specialization. An example <strong>for</strong> implicit specialization is the following<br />

computation of a value’s magnitude:<br />

14 In the next (new depending on publication date) standard, closing ‘¿’ without intermediate spaces. Some<br />

compilers — e.g., VS 2008 already support the conglutinated notation today.


4.6. TEMPLATE SPECIALIZATION 107<br />

template <br />

T inline abs(const T& x)<br />

{<br />

return x < T(0) ? −x : x;<br />

}<br />

template // Do not specialize functions like this either<br />

T inline abs(const std::complex& x)<br />

{<br />

return sqrt(real(x)∗real(x) + imag(x)∗imag(x));<br />

}<br />

This works significantly better than the explicit specialization but even this <strong>for</strong>m of specialization<br />

fails sometimes in the sense that a template function is selected which is not the most<br />

specific. 15 A mean aspect of this implicit specialization is that it seems to work properly with<br />

few specializations and when a software project grows eventually it goes wrong. Since the<br />

developers have seen the specialization working be<strong>for</strong>e, they might not expect it and the unintended<br />

function selection might remain unobserved while corrupting results or at least wasting<br />

resources. It is also possible that the specialization behavior varies from compiler to compiler. 16<br />

The only conclusion from this is to not specializing function templates! It introduces an<br />

unnecessary fragility into our software. Instead we introduce an additional class (called functor<br />

§ 4.8) with an operator(). Template classes are properly specialized on all compilers 17 both<br />

partially and completely.<br />

In our abs example we start with the function itself and a <strong>for</strong>ward declaration of the template<br />

class:<br />

template struct abs functor;<br />

template <br />

typename abs functor::result type<br />

inline abs(const T& x)<br />

{<br />

abs functor functor object;<br />

return functor object(x);<br />

}<br />

Alternatively to the <strong>for</strong>ward declaration we could have declared the class directly. The return<br />

type of our function refers to a typedef or (as correct term in generic programming) to a<br />

‘Associated Type’ of abs functor. Already <strong>for</strong> complex numbers we do not return the argument<br />

type itself but its associated type value type. Using an associated type here gives us all possible<br />

flexibility <strong>for</strong> further specialization. For instance, the magnitude of a vector could be the sum<br />

or maximum of the elements’ magnitudes or a vector with the magnitudes of each element.<br />

Evidently the functor classes must define a result type to be called.<br />

Inside the function, we instantiate the functor class with the argument type: abs functor<br />

and create an object of this type. Then we call the object’s application operator. As we do not<br />

15 TODO: Good example.<br />

16 TODO: Ask a compiler expert about this.<br />

17 Several years ago many compilers failed in partial specialization, e.g. VS 2003, but today all major compiler<br />

handle this properly. If you nevertheless experience problems with this feature in some compiler take your hands<br />

off of it, most likely you will encounter further problems. Even the CUDA compiler that is far from being<br />

standard-compliant supports partial specialization.


108 CHAPTER 4. GENERIC PROGRAMMING<br />

really the object itself but only use it <strong>for</strong> the calculation, we can as well create an anonymous<br />

object and per<strong>for</strong>m the creation/construction and calcution in one expression:<br />

template <br />

typename abs functor::result type<br />

inline abs(const T& x)<br />

{<br />

return abs functor()(x);<br />

}<br />

In this expression we have two pairs of parentheses: the first one contains the arguments of the<br />

constructor, which are empty, and the arguments of the application operator, which is/are the<br />

argument(s) of the function. If would write:<br />

template <br />

typename abs functor::result type<br />

inline abs(const T& x)<br />

{<br />

return abs functor(x); // error<br />

}<br />

then x would be interpreted as argument of the constructor and an object of the functor class<br />

would be returned. 18<br />

Now we have to implement our functor classes:<br />

template <br />

struct abs functor<br />

{<br />

typedef T result type;<br />

};<br />

T operator()(const T& x)<br />

{<br />

return x < T(0) ? −x : x;<br />

}<br />

template <br />

struct abs functor<br />

{<br />

typedef T result type;<br />

};<br />

T operator()(const std::complex& x)<br />

{<br />

return sqrt(real(x)∗real(x) + imag(x)∗imag(x));<br />

}<br />

We wrote a general implementation that works <strong>for</strong> all fixed-point and floating-point types.<br />

18 Many years and versions ago, g++ tolerated this expression (sometimes) despite it is not standard-compliant.


4.7. NON-TYPE PARAMETERS FOR TEMPLATES 109<br />

4.7 Non-Type Parameters <strong>for</strong> Templates<br />

So far, we used template arguments only <strong>for</strong> types. Values can be template arguments as well.<br />

Not all values but only integral types, i.e. fixed point numbers and bool.<br />

Very popular is the definition of short vectors and small matrices with size arguments as template<br />

parameters, <strong>for</strong> instance:<br />

template <br />

class fsize vector<br />

{<br />

typedef fsize vector self;<br />

void check index(int i) const { assert(i >= 0 && i < my size); }<br />

public:<br />

typedef T value type;<br />

const static int my size= Size;<br />

fsize vector() {}<br />

fsize vector( const self& that )<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

self& operator=( const self& that )<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that.data[i];<br />

}<br />

int size() const { return my size ; }<br />

const T& operator[]( int i ) const<br />

{<br />

check index(i);<br />

return data[i];<br />

}<br />

T& operator[]( int i )<br />

{<br />

check index(i);<br />

return data[i] ;<br />

}<br />

self operator+( const self& that ) const<br />

{<br />

self sum;<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

sum[i]= data[i] + that[i];<br />

return sum ;<br />

}<br />

private:


110 CHAPTER 4. GENERIC PROGRAMMING<br />

};<br />

T data[Size] ;<br />

If you compare this implementation with the implementation in Section 4.3 on page 95 you<br />

realize that there not so many differences.<br />

The essential difference is that the size is now part of the type and that the compiler knows it.<br />

Let us start with the latter. The compiler can use its knowlegde <strong>for</strong> optimization. For instance,<br />

if we create a variable<br />

fsize vector v(w);<br />

the compiler can decide that the generated code <strong>for</strong> the copy constructor is not per<strong>for</strong>med in a<br />

loop but as a sequence of independent operations like:<br />

fsize vector( const self& that )<br />

{<br />

data[0]= that.data[0];<br />

data[1]= that.data[1];<br />

data[2]= that.data[2];<br />

}<br />

This saves the incrementation of the counter and the test <strong>for</strong> the loop end. In some sense,<br />

this test is already per<strong>for</strong>med at compile time. As a rule of thumb, the more is known during<br />

compilation the more potential <strong>for</strong> optimization exist. We will come back to this in more detail<br />

in Section 8.2 and Chapter ??.<br />

Which optimization is induced by additional compile-time in<strong>for</strong>mation is of course compilerdependent.<br />

One can only find out which trans<strong>for</strong>mation is actually done by reading the generated<br />

assembler code — what is not that easy, especially with high optimization and with low<br />

optimization the effect will probably not be there — or indirectly by observing per<strong>for</strong>mance and<br />

comparing it with other implementations. In the example above, the compiler will probably<br />

unroll the loop as shown <strong>for</strong> small sizes like 3 and keep the loop <strong>for</strong> larger sizes say 100. You<br />

see, why this compile-time sizes are particularly interesting <strong>for</strong> small matrices and vectors, e.g.<br />

three-dimensional coordinates or rotations.<br />

Another benefit of knowning the size at compile time is that we can store the values in an array<br />

and even inside the class. Then the values of temporary objects are stored on the stack and not<br />

on the heap. 19 The creation and destruction is much less expensive because only the change of<br />

the program counter at function begin and end needs to adapted to the objects size compared<br />

to dynamic memory allocation on the heap that involves the management of lists to keep track<br />

of allocated and free memory blocks. 20 To make a long story short, keeping the data in small<br />

arrays is much less expensive than dynamic allocation.<br />

We said that the size becomes part of the type. The careful reader might have realized that we<br />

omitted the checks whether the vectors have the same size. We do not need them anymore. If<br />

an argument has the class type, it implicitly has the same size. Consider the following program<br />

snippet:<br />

fsize vector v;<br />

fsize vector w;<br />

19 TODO: Picture.<br />

20 TODO: Need easier or longer explication. or citation.


4.8. FUNCTORS 111<br />

vector x(3), y(4);<br />

v= w;<br />

x= y;<br />

The last two lines are incompatible vector assignments. The difference is that the imcompatibility<br />

in the second assignment x= y; is discovered at run time in our assertion. The assignment<br />

v= w; does not even compile because fixed-size vectors of dimension 3 only accept vectors of the<br />

same dimension as argument.<br />

Like type arguments, non-type template arguments can have defaults. Say the most frequent<br />

dimension of our vectors is three because we live in a three-dimensional world, relativity and<br />

string theory aside. Then we save some typing with a default:<br />

template <br />

class fsize vector<br />

{ /∗ ... ∗/ };<br />

fsize vector v, w, x, y;<br />

fsize vector space time;<br />

fsize vector string;<br />

4.8 Functors<br />

Let us develop a mathematical algorithm <strong>for</strong> computing the finite difference of a differentiable<br />

function f. The finite difference is an approximation of the first derivative by<br />

f ′ (x) ≈<br />

where h is a small value also called spacing.<br />

f(x + h) − f(x)<br />

h<br />

A general function <strong>for</strong> computing the finite difference is presented here:<br />

#include <br />

#include <br />

// Function taking a function argument<br />

double finite difference( double f( double ), double x, double h ) {<br />

return ( f(x+h) − f(x) ) /h ;<br />

}<br />

double sin plus cos( double x ) {<br />

return sin(x) + cos(x) ;<br />

}<br />

int main() {<br />

std::cout ≪ finite difference( sin plus cos, 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( sin plus cos, 0., 0.001 ) ≪ std::endl ;<br />

}


112 CHAPTER 4. GENERIC PROGRAMMING<br />

Note that the function finite difference takes an arbitrary function (from double to double) as<br />

argument.<br />

Now suppose we want to compute the second order derivative. It would make sense to call<br />

finite difference with finite difference as argument. Un<strong>for</strong>tunately this is not possible since we have<br />

three arguments in this function and the first argument of finite difference only accepts a function<br />

with a single argument.<br />

For this reason, we can use ‘functors’. Functors — not to confuse with functors from category<br />

theory — are either functions or objects of classes providing operator(). This means that<br />

‘functors’ are things which can be called liked functions but are not necessarily functions.<br />

Using objects of a class providing operator() has the additional advantage that it can use an<br />

internal state in terms of member variables. 21<br />

For our example, the functor could be implemented as follows:<br />

struct sin plus cos<br />

{<br />

double operator() (double x) const<br />

{<br />

return sin(x) + cos(x) ;<br />

}<br />

};<br />

but we could also consider a functor with an parameter like this:<br />

class para sin plus cos<br />

{<br />

public:<br />

para sin plus cos(double parameter) : parameter(parameter) {}<br />

double operator() (double x) const<br />

{<br />

return sin(parameter ∗ x) + cos(x) ;<br />

}<br />

private:<br />

double parameter;<br />

} ;<br />

How can we use the functor in a function? We want to be able to pass objects of both sin plus cos<br />

and para sin plus cos to our finite difference function. There are two possible solutions: inheritance<br />

and generic programming, which we now discuss.<br />

4.8.1 Functors via inheritance<br />

TODO: Better as counter-example in OO chapter. We haven’t introduced virtual functions<br />

yet.<br />

Let us first rewrite our function finite difference using an abstract base class.<br />

21 TODO: Do we want the following sentences?: Functors can encapsulate C and C ++ function pointers<br />

employing the concepts templates and polymorphism. All the functions must have the same return-type and<br />

calling parameters.


4.8. FUNCTORS 113<br />

struct functor base<br />

{<br />

virtual double operator() (double x) const= 0 ;<br />

} ;<br />

double finite difference( functor base const& f, double x, double h )<br />

{<br />

return ( f(x+h) − f(x) ) /h ;<br />

}<br />

The functor class has a pure 22 virtual function operator() and thus can not be used. We can<br />

however alter the functor para sin plus cos such that it inherits from the abstract base class and<br />

specializes operator().<br />

class para sin plus cos<br />

: public functor base<br />

{<br />

public:<br />

para sin plus cos(double p) : parameter(p) {}<br />

double operator() (double x) const // Is virtual function in base<br />

{<br />

return sin( parameter ∗ x ) + cos(x);<br />

}<br />

private:<br />

double parameter;<br />

};<br />

Now we can use an object of this class as the first argument of finite difference.<br />

The whole program looks as follows:<br />

#include <br />

#include <br />

struct functor base {<br />

virtual double operator() ( double x ) const = 0 ;<br />

} ;<br />

double finite difference( functor base const& f, double x, double h ) {<br />

return ( f(x+h) − f(x) ) /h ;<br />

}<br />

class para sin plus cos<br />

: public functor base<br />

{<br />

public:<br />

para sin plus cos( double const& p )<br />

: parameter ( p )<br />

{}<br />

double operator() ( double x ) const { // Virtual function<br />

return sin( parameter ∗ x )+ cos(x) ;<br />

22 TODO: undefined


114 CHAPTER 4. GENERIC PROGRAMMING<br />

}<br />

private:<br />

double parameter ;<br />

} ;<br />

int main() {<br />

para sin plus cos sin 1( 1.0 ) ;<br />

std::cout ≪ finite difference( sin 1, 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 0., 0.001 ) ≪ std::endl ;<br />

}<br />

4.8.2 Functors via generic programming<br />

If we make the functor argument in finite difference generic, we do not need a functor base<br />

any longer. There is also no need to alter our previously defined functors sin plus cos and<br />

para sin plus cos. This is a perfect example of the fact that using generic programming makes<br />

extending software easier. The program now looks like:<br />

#include <br />

#include <br />

template <br />

T inline finite difference(F const& f, const T& x, const T& h)<br />

{<br />

return ( f(x+h) − f(x) ) / h ;<br />

}<br />

class para sin plus cos<br />

{<br />

public:<br />

para sin plus cos(double p) : parameter(p) {}<br />

double operator() ( double x ) const<br />

{<br />

return sin( parameter ∗ x ) + cos(x);<br />

}<br />

private:<br />

double parameter;<br />

};<br />

int main()<br />

{<br />

para sin plus cos sin 1( 1.0 ) ;<br />

std::cout ≪ finite difference( sin 1, 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 1., 0.001 ) ≪ std::endl ;<br />

std::cout ≪ finite difference( para sin plus cos(2.0), 0., 0.001 ) ≪ std::endl ;<br />

}<br />

return 0;


4.8. FUNCTORS 115<br />

Since we are using a template argument F we need to define the constraints that it has to satisfy.<br />

For this function, we need F to be a functor with one argument. This is called a UnaryFunctor.<br />

Formally, we can write this as follows:<br />

• Let f be of type F.<br />

• Let x be of type X, where X is the argument type of F.<br />

• f(x) calls f with one argument, and returns an object of the result type.<br />

In this example we also require that the argument type and result type of F are identical. We<br />

can remove this restriction if we establish a unique way to deduce the return type. This can be<br />

achieved by meta-programming or with the type deduction in the next C ++ standard.<br />

So far so good. We complained be<strong>for</strong>e that we cannot apply the finite differences on themselves<br />

to compute higher order derivatives. Actually, we still cannot. The problem is that the<br />

finite difference expects (amongst others) a unary functor and is itself a ternary function. So it<br />

cannot use itself as argument. The solution is to realize its functionality in a unary functor that<br />

we call derivative:<br />

template <br />

class derivative<br />

{<br />

public:<br />

derivative(const F& f, const T& h) : f(f), h(h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( f(x+h) − f(x) ) / h ;<br />

}<br />

private:<br />

const F& f;<br />

T h;<br />

};<br />

Now we can create an object that approximates the derivative from f(x) = sin(1 · x) + cos x:<br />

typedef derivative spc der 1;<br />

spc der 1 spc(sin 1, 0.001);<br />

The object spc can be used like a function and it approximates f ′ (x). In addition it is a unary<br />

functor. That means we can compute its derivative:<br />

typedef derivative spc der 2;<br />

spc der 2 spc scd(spc, 0.001);<br />

std::cout ≪ ”Second derivative of sin(0) + cos(0) is ” ≪ spc scd(0.0) ≪ ’\n’;<br />

The object spc scd is again a unary functor and aproximates f ′′ (x). We could again construct<br />

a functor <strong>for</strong> its derivative and continue this game eternally.<br />

Assume that we need second derivatives from different functions. Then it becomes annoying to<br />

define first the type of the first derivative constructing a functor from it <strong>for</strong> finally creating a<br />

functor the second one. According to Greg Wilson’s [?] 23 maxim “Whatever you use twice,<br />

automate!” we write a class that provides us the second derivative directly:<br />

23 This online course contains a gigantic collection of tips how to develop software successfully and avoid<br />

frustrating unproductivity. We highly recommend you reading this material.


116 CHAPTER 4. GENERIC PROGRAMMING<br />

template <br />

class second derivative<br />

{<br />

public:<br />

second derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( fp(x+h) − fp(x) ) / h ;<br />

}<br />

private:<br />

T h;<br />

derivative fp;<br />

};<br />

Now we can build the f ′′ functor from f:<br />

second derivative spc scd2(para sin plus cos(1.0), 0.001);<br />

When we think about how we would implement the third, fourth or in general the n-th derivative,<br />

we realize that they would look much like the second one: calling the (n-1)-th derivative on x+h<br />

and x. We can explore this with a recursive implementation:<br />

template <br />

class nth derivative<br />

{<br />

typedef nth derivative prec derivative;<br />

public:<br />

nth derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( fp(x+h) − fp(x) ) / h ;<br />

}<br />

private:<br />

T h;<br />

prec derivative fp;<br />

};<br />

To save the compiler from infinite recursion we must stop this mutual referring when we reach<br />

the first derivative. Note that we cannot use ‘if’ or ‘?:’ to stop the recursion because both of its<br />

respective branches are evaluated and one of them still contains the infinite recursion. Recursive<br />

template definitions are terminated with a specialization like this:<br />

template <br />

class nth derivative<br />

{<br />

public:<br />

nth derivative(const F& f, const T& h) : f(f), h(h) {}<br />

T operator()(const T& x) const<br />

{<br />

return ( f(x+h) − f(x) ) / h ;<br />

}<br />

private:


4.8. FUNCTORS 117<br />

};<br />

const F& f;<br />

T h;<br />

This specialization is identical with the class derivative that we now could throw away. If we keep<br />

it, we can at least reuse its functionality and variables to reduce redundancy. This is achieved<br />

by derivation (more in Chapter 6).<br />

template <br />

class nth derivative<br />

: public derivative<br />

{<br />

public:<br />

nth derivative(const F& f, const T& h) : derivative(f, h) {}<br />

};<br />

With our recursive definition we can easily define the twenty-second derivative:<br />

nth derivative spc 22(para sin plus cos(1.0), 0.00001);<br />

The new object spc 22 is again a unary functor. Un<strong>for</strong>tunately, it approximates so badly that<br />

we are too ashamed to present the results here. From Taylor series we know that the error of<br />

the f ′′ approximation is reduced from O(h) to O(h 2 ) when a backward difference is applied<br />

on the <strong>for</strong>ward difference. This said, maybe we can improve our approximation if we alternate<br />

between <strong>for</strong>ward and backward differences:<br />

template <br />

class nth derivative<br />

{<br />

typedef nth derivative prec derivative;<br />

public:<br />

nth derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />

T operator()(const T& x) const<br />

{<br />

return N & 1 ? ( fp(x+h) − fp(x) ) / h<br />

: ( fp(x) − fp(x−h) ) / h ;<br />

}<br />

private:<br />

T h;<br />

prec derivative fp;<br />

};<br />

Sadly, our 22nd derivative is still as wrong as be<strong>for</strong>e, well slightly worse. Which is particularly<br />

frustrating when we become aware that we evaluate f over four million times. 24 Decreasing h<br />

does not help either: the tangent approaches better the derivative but on the other hand the<br />

values of f(x) and f(x ± h) become quite close and their difference has only few meaningful<br />

bits. At least the second derivative improved by our alternating difference scheme as the Taylor<br />

series teach us. Another consolidating fact is that we probably did not pay <strong>for</strong> the alteration.<br />

The template argument N is known at compile time and the condition N&1 whether the last<br />

bit is on can be also evaluated during compilation. When N is odd than the operator reduces<br />

effectively to:<br />

24 TODO: Is there an efficient and well-approximating recursive scheme to compute higher order derivatives?


118 CHAPTER 4. GENERIC PROGRAMMING<br />

T operator()(const T& x) const<br />

{<br />

return ( fp(x+h) − fp(x) ) / h ;<br />

}<br />

Likewise <strong>for</strong> even N, only the backward difference is computed without testing.<br />

If nothing else we learned something about C ++ and we are confirmed in the<br />

Truism<br />

Not even the coolest programming can substitute <strong>for</strong> solid mathematics.<br />

In the end, this script is primarily about programming. To improve the expressiveness of our<br />

software, functors are an extremely powerful approach. We have seen how to take an arbitrary<br />

unary function and construct a unary function that approximates its derivative or a higher-order<br />

derivative.<br />

If we do not know the type of a function or we do not like to bother with it we can write a<br />

convenience function that detects the type automatically:<br />

template <br />

nth derivative<br />

inline make nth derivative(const F& f, const T& h)<br />

{<br />

return nth derivative(f, h);<br />

}<br />

Here F and T are types of function arguments and can be detected by the compiler. The only<br />

template argument that the compiler does not detect is N. Note that such arguments must be<br />

at the beginning of the template argument list and the compiler-detected at the end. There<strong>for</strong>e<br />

the following template function is wrong:<br />

template // error<br />

nth derivative<br />

inline make nth derivative(const F& f, const T& h)<br />

{<br />

return nth derivative(f, h);<br />

}<br />

If you call this one, the compiler will complain that it cannot detect N. This leads us to the<br />

question how we call this function. Of course, we can explicitly declare all argument types:<br />

make nth derivative(sin 1, 0.00001);<br />

But this is exactly what we wanted to avoid with implementing this function. As said, F and T<br />

can be detected by the compiler and we only need to provide N:<br />

make nth derivative(sin 1, 0.00001);<br />

What is this expression good <strong>for</strong>? Written like this, not much. It creates a function that will<br />

be immediately destroyed. If it is a function we should be able to call it with an argument:


4.8. FUNCTORS 119<br />

std::cout ≪ ”Seventh derivative of sin 1 at x=3 is ”<br />

≪ make nth derivative(sin 1, 0.00001)(3.0) ≪ ’\n’;<br />

In the cases above the type of the functor was obvious because we wrote the class ourselves. The<br />

type is less obvious if the type is constructed from an expression, <strong>for</strong> instance by a λ-function.<br />

Support <strong>for</strong> λ-functions will be introduced with C ++0x. 25 Emulation is available since some<br />

years with Boost.Lambda [?]. For instance, we can generate a functor object that computes<br />

with the following short expression:<br />

(3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1;<br />

p(x) = 3.5x 3 + 4x 2 = (3.5x + 4)x 2<br />

This expression can be used with our derivative function:<br />

make nth derivative((3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1, 0.0001)<br />

to generate a functor computing (approximating) 21x + 8.<br />

With the lambda expressions, we do not even know the type of our functor but we can compute<br />

its derivative. The type is in fact so long 26 that it is much easier to implement our own functor<br />

when we were obliged to spell the type out.<br />

The following listing illustrates how to approximate p ′′ (2):<br />

#include <br />

// .. our definitions of derivatives<br />

int main()<br />

{<br />

using boost::lambda:: 1;<br />

std::cout ≪ ”Second derivative of 3.5∗xˆ3+4∗xˆ2 at x=2 is ”<br />

≪ make nth derivative((3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1, 0.0001)(2) ≪ ’\n’;<br />

return 0;<br />

}<br />

Un<strong>for</strong>tunately, we cannot keep the results of our computations if we do not know their types<br />

with current standard C ++. In C ++0x, we will be able to let the compiler deduce the type:<br />

auto p= (3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1; // With <strong>C++</strong>0x<br />

auto p2= make nth derivative(p, 0.0001);<br />

Once defined, we can reuse p and p2 as often as we want. Of course, calculating the derivatives<br />

of polynomials can be done better than with differential quotients. We will discuss this in<br />

Section 8.2.<br />

25 TODO: Try in g++ 4.3 and 4.4?<br />

26 boost::lambda::lambda functor


120 CHAPTER 4. GENERIC PROGRAMMING<br />

4.8.3 The function accumulate with a functor argument<br />

TODO: Again, I don’t like the use of pointers here — Peter<br />

Recall the function accumulate from section 4.2.1 that we used to introduce Generic Programming.<br />

In this section, we will generalize this function. We introduce a binary functor (concept<br />

BinaryFunctor) that implements an operation on two arguments as function or callable class<br />

object. 27 Then we can accumulate values with respect to this binary operation:<br />

template <br />

T accumulate( T∗ a, T∗ a end, T init, BinaryFunctor op ) {<br />

T sum( init ) ;<br />

<strong>for</strong> ( ; a!=a end; ++a ) {<br />

sum = op( sum, ∗a ) ;<br />

}<br />

return sum ;<br />

}<br />

The concept BinaryFunctor is defined as follows: 28<br />

• Let op be of type BinaryFunctor.<br />

– has the method op( first argument type, second argument type ) with result type being<br />

convertible to T. T should be convertible to the first and second argument types.<br />

From this generic example, it is quite clear that the conceptual conditions are becoming complicated<br />

when we are mixing types. Usually, we make sure that the first argument type, second<br />

argument type and result type are the same, but strictly speaking, this is not required, since<br />

the compiler is allowed to per<strong>for</strong>m conversions.<br />

The main program could be as follows:<br />

struct sum functor<br />

{<br />

double operator() ( double a, double b ) const {<br />

return a + b ;<br />

}<br />

} ;<br />

struct product functor<br />

{<br />

double operator() ( double a, double b ) const {<br />

return a ∗ b ;<br />

}<br />

} ;<br />

int main()<br />

{<br />

int n=10;<br />

double a[n] ;<br />

double s = accumulate( a, a+n, 0.0, sum functor() ) ;<br />

s = accumulate( a, a+n, 1.0, product functor() ) ;<br />

}<br />

27 TODO: Introduce term.<br />

28 TODO: revisit


4.9. STL — THE MOTHER OF ALL GENERIC LIBRARIES 121<br />

4.9 STL — The Mother of All Generic Libraries<br />

The Standard Template Library — STL — is an example of a generic C ++ library. It defines<br />

generic container classes, generic algorithms, and iterators. Online documentation is provided<br />

under www.sgi.com/tech/stl. There are also entire books written about the usage of STL so<br />

that we can keep it short here and refer to these books [?].<br />

4.9.1 Introducing Example<br />

Containers are classes whose purpose is to contain other objects. The classes vector and list are<br />

examples of STL container classes. Each of these classes is templated, and can be instantiated<br />

to contain any type of object (that is a model of the appropriate concept). For example, the<br />

following lines create a vector containing doubles and another one containing integers:<br />

std::vector vec d ;<br />

std::vector vec i ;<br />

The STL also includes a large collection of algorithms that manipulate the data stored in<br />

containers. The accumulate algorithm, <strong>for</strong> example, can be used to compute any reduction —<br />

such as sum, product, or minimum — on a list or vector in the following way:<br />

std::vector vec ; // fill the vector...<br />

std::list lst ; // fill the list...<br />

double vec sum = std::accumulate( vec.begin(), vec.end(), 0.0 ) ;<br />

double lst sum = std::accumulate( lst.begin(), lst.end(), 0.0 ) ;<br />

Notice the use of the functions begin() and end(), that denote the begin and end of the vector<br />

and the list represented by ‘Iterators’. Iterators are the central concept of the STL and we will<br />

have a closer look at it.<br />

4.9.2 Iterators<br />

Disrespectfully spoken, an iterator is a generalized pointer: one can dereference it and change the<br />

referred location. This over-simplified view is not doing justice to its importance. Iterators are a<br />

Fundamental Methodology to Decouple the Implementation of Data Structures and Algorithms.<br />

Figure 4.2 29 depicts this central role of iterators. Every data structure provides an iterator <strong>for</strong><br />

traversing it and all algorithms are implemented in terms of iterators.<br />

To program m algorithms on n data structures, one needs in classical C and Fortran programming<br />

m · n implementations.<br />

Expressing algorithms in terms of iterators decreases this to only<br />

m + n implementations!<br />

29 TODO: Flatter boxes and more containers and algos, maybe.


122 CHAPTER 4. GENERIC PROGRAMMING<br />

Data Structures Algorithms<br />

vector<br />

set<br />

map<br />

queue<br />

: :<br />

Iterators<br />

Figure 4.2: Central role of iterators in STL<br />

copy<br />

search<br />

replace<br />

Evidently, not all algorithms can be implemented on every data structure. Which algorithm<br />

works on a given data structure depends on the kind of iterator provided by the container.<br />

Iterators can be distinguished by the <strong>for</strong>m of access:<br />

InputIterator: an iterator concept <strong>for</strong> reading the referred entries.<br />

OutputIterator: an iterator concept <strong>for</strong> writing to the referred entries.<br />

Note that the ability to write does not imply readability, e.g., an ostream iterator is an STL<br />

interface used to write to output streams like files opened in write mode. Another differentiation<br />

of iterators is the <strong>for</strong>m of traversal:<br />

ForwardIterator: a concept <strong>for</strong> iterators that can pass from one element to the next, i.e. types<br />

that provide an operator++. It is a refinement of InputIterator and OutputIterator. In contrast<br />

to those, ForwardIterator they allows <strong>for</strong> traversing multiple times.<br />

BidirectionalIterator: a concept <strong>for</strong> iterators with step-wise <strong>for</strong>ward and backward traversal,<br />

i.e. types with operator++ and operator−−. It refines ForwardIterator.<br />

RandomAccessIterator: a concept <strong>for</strong> iterators that can increment their position by an arbitrary<br />

integer, i.e. types that also provide operator[]. It refines BidirectionalIterator.<br />

Data structures that provide more refined iterators (e.g. modeling RandomAccessIterator) can be<br />

used in more algorithms. Dually, algorithm implementations that require less refined iterators<br />

(like InputIterator) can be applied to more data structures. The interfaces are designed with<br />

backward compatibility in mind and old-style pointers can be used as iterators.<br />

All standard container templates provide a rich and consistent set of iterator types. The<br />

following very simple example shows a typical use of iterators:<br />

std::list l ;<br />

<strong>for</strong> (std::list::const iterator it = l.begin(); it != l.end(); ++it) {<br />

std::cout ≪ ∗it ≪ std::endl;<br />

}<br />

sort<br />

: :


4.10. CURSORS AND PROPERTY MAPS 123<br />

As illustrated above, iterators are usually used in pairs, where one is used <strong>for</strong> the actual iteration<br />

and the second serves to mark the end of the collection. The iterators are created by the<br />

corresponding container class using standard methods such as begin() and end(). The iterator<br />

returned by begin() points to the first element. The iterator returned by end() points past the<br />

end of elements to mark the end. All algorithms are implemented with right-open intervals<br />

[b, e) operating on the value referred by b until b = e. There<strong>for</strong>e intervals of the <strong>for</strong>m [x, x) are<br />

regarded empty.<br />

A more general (and more useful) algorithm is the linear search on an arbitrary sequence. This<br />

is provided by the STL function find in the following fashion:<br />

template <br />

InputIterator find(InputIterator first, InputIterator last, const T& value) {<br />

while (first != last && ∗first != value)<br />

++first;<br />

return first;<br />

}<br />

find takes three arguments: two iterators that define the right-open interval of the search space,<br />

and a value to search <strong>for</strong> in that range. Each entry referred by ‘first’ is compared with ‘value’.<br />

When a match is found, the iterator pointing to it is returned. If the value is not contained<br />

in the sequence, an iterator equal to ‘last’ is returned. Thus, the caller can test whether the<br />

search was successful by comparing its result with ‘last’. In fact, one must per<strong>for</strong>m this test<br />

because after a failed search the returned iterator cannot dereferred correctly (it points outside<br />

the given range and might cause segmentation violations or corrupt data).<br />

This section only scratched the surface of STL and was primarily intended to introduce the<br />

iterator concept that we will generalize in the following section.<br />

4.10 Cursors and Property Maps<br />

The essential idea of iterators is to represent a position and a referred value. A further generalization<br />

of this idea is to decouple the the notion of position and value. Dietmar Kühl<br />

proposed this mechanism in his master thesis (Diplomarbeit) [?] <strong>for</strong> the generic treatment of<br />

grahps. The Boost Graph Library [?] provides the notion of property maps in the <strong>for</strong>m that<br />

properties are available <strong>for</strong> vertices and edges and all properties can be accessed independently<br />

from each other and from the traversal of the graph.<br />

As case study we implement a simple sparse matrix class with cursors and property maps. The<br />

minimalistic implementation of the sparse matrix is:<br />

#include <br />

#include <br />

#include <br />

#include <br />

template <br />

class coo matrix<br />

{<br />

typedef Value value type; // better in trait<br />

public:<br />

coo matrix(int nr, int nc) : nr(nr), nc(nc) {}


124 CHAPTER 4. GENERIC PROGRAMMING<br />

void insert(int r, int c, Value v)<br />

{<br />

assert(r < nr && c < nc);<br />

row index.push back(r);<br />

col index.push back(c);<br />

data.push back(v);<br />

}<br />

void sort() {}<br />

int nnz() const { return row index.size(); }<br />

int num rows() const { return nr; }<br />

int num cols() const { return nc; }<br />

int begin row(int r) const<br />

{<br />

unsigned i= 0;<br />

while (i < row index.size() && row index[i] < r) ++i;<br />

return i;<br />

}<br />

template friend struct coo col;<br />

template friend struct coo row;<br />

template friend struct coo const value;<br />

template friend struct coo value;<br />

private:<br />

int nr, nc;<br />

std::vector row index, col index;<br />

std::vector data;<br />

};<br />

The matrix is supposed to be sorted lexicographically (although we omitted the implementation<br />

of the sort function <strong>for</strong> the sake of brevity). For any offset i the i th entry in each of the vectors<br />

row index, row index and data represent row, column and value of one non-zero entry in the matrix.<br />

The traversal over all non-zeros of the matrix can be realized with a cursor that contains just<br />

this offset.<br />

struct nz cursor<br />

{<br />

typedef int key type;<br />

nz cursor(int offset) : offset(offset) {}<br />

nz cursor& operator++() { offset++; return ∗this; }<br />

nz cursor operator++(int) { nz cursor tmp(∗this); offset++; return tmp; }<br />

key type operator∗() const { return offset; }<br />

bool operator!=(const nz cursor& other) { return offset != other.offset; }<br />

protected:<br />

int offset;<br />

};


4.10. CURSORS AND PROPERTY MAPS 125<br />

The cursor is initialized with one offset. Many cursor classes will keep a reference to the traversed<br />

matrix object but we do not need this here. The cursor can be incremented, compared, and<br />

dereferred. The result of the dereferentiation is a ‘key’. For simplicity we used an int as key<br />

type.<br />

Like the begin and end functions in STL we define:<br />

template <br />

nz cursor nz begin(const Matrix& A)<br />

{<br />

return nz cursor(0);<br />

}<br />

template <br />

nz cursor nz end(const Matrix& A)<br />

{<br />

return nz cursor(A.nnz());<br />

}<br />

the function nz begin that returns a cursor on the first non-zero entry and nz end which gives a<br />

past-the-end cursor to terminate the traversal<br />

A key can be used as argument <strong>for</strong> a property map that we will define now:<br />

template <br />

struct coo col<br />

{<br />

typedef int key type;<br />

coo col(const Matrix& ref) : ref(ref) {}<br />

int operator()(key type k) const { return ref.col index[k]; }<br />

private:<br />

const Matrix& ref;<br />

};<br />

Property maps have typically a reference to the matrix in order to read internal data from it.<br />

They are often declared as friends because they are an important tool to access the object’s<br />

internal data — it might even be the only way to access data as in the Boost Graph Library.<br />

Property maps to read the row index or the value fo the offset key are equivalent and there<strong>for</strong>e<br />

omitted here.<br />

A property map <strong>for</strong> mutable entries is implemented as follows:<br />

template <br />

struct coo value<br />

{<br />

typedef int key type;<br />

typedef typename Matrix::value type value type;<br />

coo value(Matrix& ref) : ref(ref) {}<br />

value type operator()(key type k) const { return ref.data[k]; }<br />

void operator()(key type k, const value type& v) { ref.data[k]= v; }


126 CHAPTER 4. GENERIC PROGRAMMING<br />

private:<br />

Matrix& ref;<br />

};<br />

In contrast to the previous maps it contains a mutable reference and another operator <strong>for</strong> setting<br />

a value.<br />

To test our implementation we create matrix A:<br />

coo matrix A(3, 5);<br />

A.insert(0, 0, 2.3);<br />

A.insert(0, 3, 3.4);<br />

A.insert(1, 2, 4.5);<br />

and define the three property maps:<br />

coo col col(A);<br />

coo row row(A);<br />

coo value value(A);<br />

A read-only traversal of all non-zero entries reads:<br />

<strong>for</strong> (nz cursor c= nz begin(A), end= nz end(A); c != end; ++c)<br />

std::cout ≪ ”A[” ≪ row(∗c) ≪ ”][” ≪ col(∗c) ≪ ”] = ” ≪ value(∗c) ≪ ”\n”;<br />

Scaling all non-zero elements can be achieved similarly:<br />

<strong>for</strong> (nz cursor c= nz begin(A), end= nz end(A); c != end; ++c)<br />

value(∗c, 2.0 ∗ value(∗c));<br />

Note that we did not used all property maps in the last algorithm. In fact, this is one of the<br />

motivation <strong>for</strong> property maps. Only data really needed in the algorithm must be provided. In<br />

today’s computer landscape, this can make a significant difference in per<strong>for</strong>mance since reading<br />

and writing data is much more time-consuming than most numeric computations. Or if data is<br />

only available implicitly and needs recomputation.<br />

Another advantage of this approach is the easier realization of nested traversals. Say we have<br />

an algorithm that iterates over rows and within each row over the non-zero entries. In this case,<br />

we need other cursor type(s) but can reuse the property maps — if our new cursor derefer to<br />

the same key type. First we need a cursor to iterate over all rows of a matrix:<br />

template <br />

struct row cursor<br />

{<br />

row cursor(int r, const Matrix& ref) : r(r), ref(ref) {}<br />

row cursor& operator++() { r++; return ∗this; }<br />

row cursor operator++(int) { row cursor tmp(∗this); r++; return tmp; }<br />

bool operator!=(const row cursor& other) { return r != other.r; }<br />

nz cursor begin() const { return nz cursor(ref.begin row(r)); }<br />

nz cursor end() const { return nz cursor(ref.begin row(r+1)); }<br />

protected:<br />

int r;<br />

const Matrix& ref;<br />

};


4.10. CURSORS AND PROPERTY MAPS 127<br />

Its implementation is almost the same as nz cursor and with some refactoring one could certainly<br />

combine it in one implementation that serves both cursors as base class. For the sake of<br />

simplicity we refrain from it here. The two main differences to nz cursor are<br />

• The lack of operator∗ because the cursor is not intended to be dereferred; and<br />

• The functions begin and end to provide the inter loop traversal.<br />

The according functions to provide a right-open interval of row cursors are straight <strong>for</strong>ward:<br />

template <br />

row cursor row begin(const Matrix& A)<br />

{<br />

return row cursor(0, A);<br />

}<br />

template <br />

row cursor row end(const Matrix& A)<br />

{<br />

return row cursor(A.num rows(), A);<br />

}<br />

We can now write begin and end functions that take a row cursor (instead of a matrix) as<br />

argument and give the right-open interval of the rows non-zeros:<br />

template <br />

nz cursor nz begin(const row cursor& c)<br />

{<br />

return c.begin();<br />

}<br />

template <br />

nz cursor nz end(const row cursor& c)<br />

{<br />

return c.end();<br />

}<br />

For the inner loop we can reuse nz cursor and only need to determine the right intervals within<br />

each row. This is per<strong>for</strong>med with the begin and end function from row cursor which in turn uses<br />

begin row from the matrix. That is why the row cursor needs a matrix reference.<br />

A two-dimensional traversal is realized as follows:<br />

<strong>for</strong> (row cursor< coo matrix > c= row begin(A), end= row end(A); c != end; ++c) {<br />

std::cout ≪ ”−−−−−\n”;<br />

<strong>for</strong> (nz cursor ic= nz begin(c), iend= nz end(c); ic != iend; ++ic)<br />

std::cout ≪ ”A[” ≪ row(∗ic) ≪ ”][” ≪ col(∗ic) ≪ ”] = ” ≪ value(∗ic) ≪ ”\n”;<br />

}<br />

std::cout ≪ ”−−−−−\n”;<br />

The outer loop iterates over all rows of the matrix and the inner loop over all non-zeros in this<br />

row.<br />

Résumé The technique is more complicated and less readable than accessing entries with<br />

operator[] and needs some familiarization. However, it allows <strong>for</strong>


128 CHAPTER 4. GENERIC PROGRAMMING<br />

• High Code Reuse with very Diverse Data Structures;<br />

• While still enabling High Per<strong>for</strong>mance.<br />

4.11 Exercises<br />

TODO: Move exercises to next chapter<br />

4.11.1 Unroll a loop<br />

Look at the loop from Subsection ??:<br />

int sum = 0;<br />

<strong>for</strong> (int i = 1 ; i


4.11. EXERCISES 129<br />

1<br />

2<br />

3<br />

4<br />

function gcd(a, b):<br />

if b = 0 return a<br />

else return gcd(b, a mod b)<br />

Then write an integral metafunction that executes the same algorithm but at compile time.<br />

Your metafunction should be of the following <strong>for</strong>m:<br />

template <br />

struct gcd meta {<br />

static int const value = ... ;<br />

} ;<br />

i.e. gcd meta::value is the GCD of a and b. Verify whether the results correspond with your<br />

C ++ function gcd().<br />

4.11.6 Overloading of functions<br />

Overloading of functions is possible <strong>for</strong> different types, e.g.<br />

void foo( int i ) { ... }<br />

void foo( double d ) { ... }<br />

This is an exercise on another <strong>for</strong>m of overloading: based on a boolean meta expression. We<br />

will use the Boost functions enable if and disable if <strong>for</strong> this exercise.<br />

#include <br />

#include <br />

template <br />

typename boost::enable if< boost::is integral, T >::type foo( T const& v ) {<br />

return v ;<br />

}<br />

template <br />

typename boost::disable if< boost::is integral, T >::type foo( T const& v ) {<br />

return std::floor( v ) ;<br />

}<br />

If we call e.g. foo(5);, the compiler uses the special version <strong>for</strong> integers:<br />

template <br />

T foo( T const& v ) {<br />

return v ;<br />

}<br />

If we call e.g. foo(5.0);, the compiler uses the special version <strong>for</strong> types that are not integral:<br />

template <br />

T foo( T const& v ) {<br />

return std::floor( v ) ;<br />

}


130 CHAPTER 4. GENERIC PROGRAMMING<br />

Create a meta function to check whether a type is a pointer. Write a function evaluate that<br />

returns the same value as its argument, except when the argument is a pointer, in which case<br />

you return the value pointed to by the pointer. Hint: look at http://www.boost.org/libs/<br />

utility/enable_if.html <strong>for</strong> enable if c.<br />

4.11.7 Meta-list<br />

Revisit exercise ??.<br />

Make a list of types. Make meta functions insert, append, delete and size.<br />

4.11.8 Iterator of a vector<br />

Revisit exercise ??. Add methods begin() and end() <strong>for</strong> returning a begin and end iterator. Add<br />

the types iterator and const iterator to the class. Note that pointers are iterators.<br />

Use the STL functions sort and lower bound.<br />

4.11.9 Iterator of a list<br />

Revisit exercise ??.<br />

Make a generic list type.<br />

Add methods begin() and end() <strong>for</strong> returning a begin and end const iterator. Add the type<br />

const iterator to the class. Note that pointers cannot be used as iterators.<br />

4.11.10 Trapezoid rule<br />

A simple method <strong>for</strong> computing the integral of a function is the trapezoid rule. Suppose we<br />

want to integrate the function f over the interval [a, b]. We split the interval in n small intervals<br />

[xi, xi+1] of the same length h = (b − a)/n and approximate f by a piecewise linear function.<br />

The integral is then approximated by the sum of the integrals of the piecewise linear function.<br />

This gives us the <strong>for</strong>mula :<br />

I = h<br />

2<br />

f(a) + h<br />

2<br />

n−1 �<br />

f(b) + h<br />

j=1<br />

f(a + jh) (4.1)<br />

In this exercise, we develop a function <strong>for</strong> the trapezoid rule, with a functor argument. We<br />

develop software using inheritance and using generic programming. Then we use the function<br />

<strong>for</strong> integrating the following functions:<br />

• f = exp(−3x) <strong>for</strong> x ∈ [0, 4]. Try the following arguments of trapezoid:<br />

double exp3( double x ) {<br />

return std::exp( 3.0 ∗ x ) ;<br />

}<br />

struct exp3 {


4.11. EXERCISES 131<br />

double operator() ( double x ) const {<br />

return std::exp( 3.0 ∗ x ) ;<br />

}<br />

} ;<br />

• f = sin(x) if x < 1 and f = cos(x) if x ≥ 1 <strong>for</strong> x ∈ [0, 4].<br />

• Can we use trapezoid( std::sin, 0.0, 2.0 ); ?<br />

As a second exercise, develop a functor <strong>for</strong> computing the finite difference. Then integrate the<br />

finite difference to verify that you get the function value back.<br />

4.11.11 STL and functor<br />

Write a generic function that copies the values of a container to another container after trans<strong>for</strong>mation<br />

using a functor:<br />

struct double functor {<br />

int operator() ( int v ) const {<br />

if (v my input vec ; ...<br />

std::vector< int > my output vec ;<br />

trans<strong>for</strong>m( my input vec.begin(), my input vec.end(), my output vec.begin(), double functor() ) ;<br />

Write code <strong>for</strong> the function trans<strong>for</strong>m and test it.


132 CHAPTER 4. GENERIC PROGRAMMING


Meta-programming<br />

Chapter 5<br />

‘Meta-programming’ is actually discovered by accident. Erwin Unruh wrote in the early 90’s<br />

a program that printed prime number as error messages. This showed that C ++ compilers can<br />

compute. Because the language has changed since Unruh wrote the example, here is a version<br />

adapted to today’s standard C ++:<br />

// Prime number computation by Erwin Unruh<br />

template struct D { D(void∗); operator int(); };<br />

template struct is prime {<br />

enum { prim = (p==2) || (p%i) && is prime2?p:0), i−1> :: prim };<br />

};<br />

template struct Prime print {<br />

Prime print a;<br />

enum { prim = is prime::prim };<br />

void f() { D d = prim ? 1 : 0; a.f();}<br />

};<br />

template struct is prime { enum {prim=1}; };<br />

template struct is prime { enum {prim=1}; };<br />

template struct Prime print {<br />

enum {prim=0};<br />

void f() { D d = prim ? 1 : 0; };<br />

};<br />

main() {<br />

Prime print a;<br />

a.f();<br />

}<br />

When tried to compile with g++ 4.1.2, one will observe the following error message: TODO:<br />

Need English error message.<br />

TODO: Ask Erwin Unruh if we can use his example.<br />

After people realized the computational power of the C ++ compiler, it was used to realize very<br />

powerful per<strong>for</strong>mance optimization techniques. In fact, one can per<strong>for</strong>m entire applications<br />

during compile time. Jeremiah Wilcock once wrote a Lisp interpreter that evaluated Lisp<br />

133


134 CHAPTER 5. META-PROGRAMMING<br />

expression during a C ++ compilation [?]. Todd Veldhuizen showed that the template type<br />

system of C ++ is Turing complete [?].<br />

On the other hand, excessive usage of meta-programming techniques can end in quite long<br />

compile times. Entire research projects were cancelled after many millions dollars of funding<br />

because even short applications of less than 20 lines took weeks of compile time on parallel<br />

computers. We know people who managed to produce a 18 MB error message (it came mainly<br />

from one single error). Nevertheless, the authors used a fair amount of meta-programming in<br />

their scientific projects and could still avoid exhaustive compile times. 1 Also compilers improved<br />

significantly in the last decade. Meanwhile the compile time grew quadratically in the template<br />

instantiation depth in old compilers, today compile grows only linearly [?].<br />

5.1 Let the Compiler Compute<br />

Typical introduction examples <strong>for</strong> meta-programming are factorial and Fibonacci numbers. It<br />

is computed recursively:<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= fibonacci::value + fibonacci::value;<br />

};<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 1;<br />

};<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 1;<br />

};<br />

Note that we need the specialization <strong>for</strong> 1 and 2 to terminate the recursion. The following<br />

definition:<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= N < 3 ? 1 : fibonacci::value + fibonacci::value; // error<br />

};<br />

ends in an infinite compile loop. For N = 2, the compiler would evaluate the expression:<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 2 < 3 ? 1 : fibonacci::value + fibonacci::value; // error<br />

};<br />

1 TODO: Oder René?


5.2. PROVIDING TYPE INFORMATION 135<br />

This requires the evaluation of fibonacci::value as<br />

template <br />

struct fibonacci<br />

{<br />

static const long value= 0 < 3 ? 1 : fibonacci< −1>::value + fibonacci< −2>::value; // error<br />

};<br />

which needs fibonacci< −1>::value . . . . Although the values <strong>for</strong> N < 3 are not used in the end,<br />

the compiler will nevertheless generate these terms infinitely and die at some point.<br />

We said be<strong>for</strong>e that we implement the computation recursively. In fact, all repetive calculations<br />

must be realized recursively as there is no iteration <strong>for</strong> 2 meta-functions.<br />

If we write <strong>for</strong> instance<br />

std::cout ≪ fibonacci::value ≪ ”\n”;<br />

the value would be already calculated during the compilation and the program just prints<br />

it. If you do not believe us, you can read the assembler code (e.g. compile with ‘g++ -S<br />

fibonacci.cpp -o fibonacci.asm’).<br />

We mentioned long compilations with meta-programming at the beginning of the chapter. The<br />

compilation <strong>for</strong> Fibonacci number 45 took less than a second. Compared to it, a naïve run-time<br />

implemtation:<br />

long fibonacci2(long x)<br />

{<br />

return x < 3 ? 1 : fibonacci2(x−1) + fibonacci2(x−2);<br />

}<br />

took 14s on the same computer. The reason is that the compiler remember intermediate results<br />

while the run-time version recomputes everything. We are, however, convinced that every reader<br />

of this book can rewrite fibonacci2 without the exponential overhead of recomputations.<br />

5.2 Providing Type In<strong>for</strong>mation<br />

5.2.1 Type Traits<br />

When we write template functions, we can easily define temporary values because they have<br />

usually the same type as one of the template arguments. But not always. Imagine you have a<br />

function that returns from two value that with the minimal magnitude:<br />

template <br />

T inline min magnitude(const T& x, const T& y)<br />

{<br />

using std::abs;<br />

T ax= abs(x), ay= abs(y);<br />

return ax < ay ? x : y;<br />

}<br />

We can call this <strong>for</strong> int, unsigned, double values:<br />

2 The Meta Programming Library provides compile-time iterators but even those are recursive internally.


136 CHAPTER 5. META-PROGRAMMING<br />

double d1= 3., d2= 4.;<br />

std::cout ≪ ”min magnitude(d1, d2) = ” ≪ min magnitude(d1, d2) ≪ ’\n’;<br />

If we call this function with two complex values:<br />

std::complex c1(3.), c2(4.);<br />

std::cout ≪ ”min magnitude(c1, c2) = ” ≪ min magnitude(c1, c2) ≪ ’\n’;<br />

we will see the error message<br />

no match <strong>for</strong> ≫operator< ≪in ≫ax < a≪<br />

The problem is that abs returns in this case double values which provides the comparison operator<br />

but we store them as complex values in the temporaries.<br />

The careful reader might think we do we store them at all, if we compared the magnitudes<br />

directly we might safe memory and we could compare them as they are. This absolutely true<br />

and this is how we would implement the function normally. However, there are situations where<br />

one need a temporary, e.g., when computing the value with the minimal magnitude in a vector.<br />

For the sake of simplicity we just look at two values. With the new standard we can also handle<br />

the issue easily with auto types like:<br />

template <br />

T inline min magnitude(const T& x, const T& y)<br />

{<br />

using std::abs;<br />

auto ax= abs(x), ay= abs(y);<br />

return ax < ay ? x : y;<br />

}<br />

To make a long story short, sometimes we need to know explicitly the result type of an expression<br />

or a type in<strong>for</strong>mation in general. Just think of a member variable of a template class: we must<br />

know the type of the member in the definition of the class.<br />

This leads us to ‘type traits’. Type traits meta-functions that provide an in<strong>for</strong>mation about a<br />

type.<br />

In the example here we search <strong>for</strong> a given type an appropriate type <strong>for</strong> its magnitude. We can<br />

provide such type in<strong>for</strong>mation by template specialization:<br />

template <br />

struct Magnitude {};<br />

template <br />

struct Magnitude<br />

{<br />

typedef int type;<br />

};<br />

template <br />

struct Magnitude<br />

{<br />

typedef float type;<br />

};<br />

template


5.2. PROVIDING TYPE INFORMATION 137<br />

struct Magnitude<br />

{<br />

typedef double type;<br />

};<br />

template <br />

struct Magnitude<br />

{<br />

typedef float type;<br />

};<br />

template <br />

struct Magnitude<br />

{<br />

typedef double type;<br />

};<br />

Admittedly, this is rather cumbersome.<br />

We can abbreviate the first definitions by postulating “if we do not know better, we assume<br />

that T’s Magnitude type is T itself.”<br />

template <br />

struct Magnitude<br />

{<br />

typedef T type;<br />

};<br />

This is true <strong>for</strong> all intrinsic types and we handle them all correctly with one definition. A slight<br />

disadvantage of this definition is that it incorrectly applies to all types whose type trait is not<br />

specialized. A set of classes where we know that the above definition is not correct, are all<br />

instantiations of the template class complex. So we define specializations like:<br />

template <br />

struct Magnitude<br />

{<br />

typedef double type;<br />

};<br />

Instead of defining them individually <strong>for</strong> complex, complex, . . . we use a templated<br />

<strong>for</strong>m to treat them all<br />

template <br />

struct Magnitude<br />

{<br />

typedef T type;<br />

};<br />

Now that the type traits are defined we can refactor our function to use it:<br />

template <br />

T inline min magnitude(const T& x, const T& y)<br />

{<br />

using std::abs;<br />

typename Magnitude::type ax= abs(x), ay= abs(y);<br />

return ax < ay ? x : y;<br />

}


138 CHAPTER 5. META-PROGRAMMING<br />

We can now consider extending this definition to vectors and matrices, e.g., to determine the<br />

return type of a norm. The specialization reads<br />

template <br />

struct Magnitude<br />

{<br />

typedef T type; // not really perfect<br />

};<br />

However, if the value type of the vector is complex, its norm will not. Instead, we need the<br />

magitude type from the values:<br />

template <br />

struct Magnitude<br />

{<br />

typedef typename Magnitude::type type;<br />

};<br />

5.2.2 A const-clean View Example<br />

In this section, we look at an efficient and expressive implementation of a transposed matrix. If<br />

you compute the transposed of a matrix, many software packages return a new matrix object<br />

with the interchanged values. This is a quite expensive operation: it requires memory allocation<br />

and deallocation and often copying a lot of data.<br />

Writing a Simple View Class<br />

A much more efficient approach is implementing a ‘View’ of the existing object. We refer<br />

internally to the viewed object and just adapt its interface. This can be done very nicely <strong>for</strong><br />

the transposed of a matrix:<br />

1 template <br />

2 class transposed view<br />

3 {<br />

4 public:<br />

5 typedef typename mtl::Collection::value type value type;<br />

6 typedef typename mtl::Collection::size type size type;<br />

7<br />

8 transposed view(Matrix& A) : ref(A) {}<br />

9<br />

10 value type& operator()(size type r, size type c) { return ref(c, r); }<br />

11 const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />

12<br />

13 private:<br />

14 Matrix& ref;<br />

15 };<br />

Listing 5.1: Simple view implementation<br />

We assume that the matrix class has an operator() taking two arguments <strong>for</strong> the row and column<br />

index respectively. We further suppose that type traits are defined <strong>for</strong> value type and size type.<br />

This is all we need to know about the referred matrix, 3 at least in this mini example.<br />

3 TODO: We should define a concept <strong>for</strong> it.


5.2. PROVIDING TYPE INFORMATION 139<br />

The reader will imagine that implementations in libraries like MTL or GLAS will provide<br />

a larger interface in such classes. this short example is expressive enough to demonstrate the<br />

approach. However, the example is large enough to demonstrate the need of meta-programming<br />

in certain views.<br />

An object of this class can be handled like a matrix so that a template function use it as<br />

argument whereever a matrix is expected. The transposition is achieved by calling operator() in<br />

the referred object with switched indices. For every matrix object we can define a transposed<br />

view that behaves like a matrix<br />

mtl::dense2D A(3, 3);<br />

A= 2, 3, 4,<br />

5, 6, 7,<br />

8, 9, 10;<br />

tst::transposed view At(A);<br />

When we access At(i, j) we will get A(j, i). We even define a non-const access so that we can<br />

even change entries:<br />

At(2, 0)= 4.5;<br />

This operation sets A(0, 2) to 4.5.<br />

The definition of a transposed view object does not leed to particularly concise programs. For<br />

convience we define a function that returns the transposed view.<br />

template <br />

transposed view inline trans(Matrix& A)<br />

{<br />

return transposed view(A);<br />

}<br />

Now we can use the transposed elegantly in our scientific software, <strong>for</strong> instance in a matrix<br />

vector product:<br />

v= trans(A) ∗ q;<br />

In this case, a temporary view is created and used in the product. Since operator() from the<br />

view is inlined the transposed product will be as fast as with A itself.<br />

Dealing with Const-ness<br />

So far, so good. Problems arise if we build the transposed view of a constant matrix:<br />

const mtl::dense2D B(A);<br />

We still can create the transposed view of B but we cannot access its elements:<br />

std::cout ≪ ”tst::trans(B)(2, 0) = ” ≪ tst::trans(B)(2, 0) ≪ ’\n’; // error<br />

The compiler will tell us that it cannot initialize a ‘float&’ from a ‘const float’. If we look at the<br />

location of the error we will realize that this is line 9 in Listing 5.1. But we did the compiler<br />

used the non-constant version of the operator? In line 10 we defined an operator <strong>for</strong> constant<br />

objects which returns a constant reference and fits perfectly <strong>for</strong> this situation.


140 CHAPTER 5. META-PROGRAMMING<br />

First of all, is the ref member really constant? We never used const in the class definition or<br />

the function trans. Help is provided from the ‘Run-Time Type Identification (RTTI)’. We add<br />

the header ‘typeinfo’ and print the type in<strong>for</strong>mation:<br />

#include <br />

...<br />

std::cout ≪ ”typeid of trans(A) = ” ≪ typeid(tst::trans(A)).name() ≪ ’\n’;<br />

std::cout ≪ ”typeid of trans(B) = ” ≪ typeid(tst::trans(B)).name() ≪ ’\n’;<br />

This will produce the following output: 4<br />

typeid of trans(A) = N3tst15transposed_viewIN3mtl6matrix7dense2DIfNS2_10<br />

parametersINS1_3tag9row_majorENS1_5index7c_indexENS1_9non_fixed10<br />

dimensionsELb0EEEEEEE<br />

typeid of trans(B) = N3tst15transposed_viewIKN3mtl6matrix7dense2DIfNS2_10<br />

parametersINS1_3tag9row_majorENS1_5index7c_indexENS1_9non_fixed10<br />

dimensionsELb0EEEEEEE<br />

The output is apparently not very clear. However, if we look very careful, we see the extra<br />

‘K’ in the second line that tells us that the view is instantiated with a constant matrix type.<br />

Another disadvantage of RTTI is that we only see the const attribute of template parameters.<br />

That is printing the type in<strong>for</strong>mantion of trans(B).ref would not tell wether or not this type is<br />

constant.<br />

An alternative that solves both problems is inspecting the type by provocing an error message.<br />

We can <strong>for</strong> instance write:<br />

int ta= trans(A);<br />

int tb= trans(B);<br />

Then the compiler gives us a message like:<br />

trans_const.cpp:120: Error: ≫mtl::matrix::transposed_view >≪ cannot be converted to ≫int≪<br />

in initialization<br />

trans_const.cpp:121: Error: ≫const mtl::matrix::transposed_view≪ cannot be<br />

converted to ≫int≪ in initialization<br />

Here the types are much more readable. 5 We can see clearly that trans(B) returns a view with<br />

a constant template parameter. The same trick could be done <strong>for</strong> the reference in the view:<br />

int tar= trans(A).ref;<br />

int tbr= trans(B).ref;<br />

The error message would be accordingly:<br />

4<br />

With g++, on other compilers it might be different but the essential in<strong>for</strong>mation will be the same. The lines<br />

are broken manually.<br />

5<br />

TODO: Why the hell is this const outside in line 121???


5.2. PROVIDING TYPE INFORMATION 141<br />

trans_const.cpp:121: Error: ≫const mtl::matrix::dense2D≪ cannot be converted to ≫int≪<br />

in initialization<br />

Obviously, with this trick we will not get an executable binary. But we know more about the<br />

types in our program and can now better solve our problems. In the rare case that the type you<br />

examine is convertible to int, you can take any other type like std::set to which the examined<br />

class is not convertible. To exclude convertibility entirely you can introduce a new type.<br />

After this short excursion into type introspection we know <strong>for</strong> certain that the member ref is a<br />

constant reference. The following happens:<br />

• When we call trans(B) the function’s template argument is instantiated with const dense2D.<br />

• Thus, the return type is transposed view.<br />

• The constructor argument has type const dense2D&.<br />

• Likewise the member has type const dense2D&.<br />

It remains the question why the non-const version of the operator (line 9) is called despite we<br />

refer a constant matrix. The answer is that the constancy of ref does not matter <strong>for</strong> the choice<br />

but whether or not the view object is constant. Thus, we can write:<br />

const tst::transposed view Bt(B);<br />

std::cout ≪ ”Bt(2, 0) = ” ≪ Bt(2, 0) ≪ ’\n’;<br />

This works but it is not very elegant.<br />

A brutal possibility to get the view compiled <strong>for</strong> constant matrices is to cast away the constancy.<br />

The undesired result would be that mutable views on constant matrices enable the modification<br />

of the allegedly constant matrix. This violates so heavily our principles that we do not even<br />

show how the code would read.<br />

Rule<br />

Never cast away const.<br />

In the following we will empower you with very strong methodologies <strong>for</strong> handling constancy<br />

correctly. Every const cast is an indicator <strong>for</strong> a severe design error. As Sutter and Alexandrescu<br />

phrased it “If you go const you never go back.” The only situation where a const cast<br />

is needed by using const-incorrect third-party software, i.e. read-only arguments are passed as<br />

mutable pointers or references. That is not our fault and we have no choice. Un<strong>for</strong>tunately,<br />

there is still a lot of const-incorrect packages around and some of them would take too much<br />

resources to reimplement that we have to live with them. The best we can do is to add an<br />

appropriate API on top of it and avoid working with the original API. This saves ourselves<br />

from spoiling our applications with const casts and restricts the unspeakable const cast to the<br />

interface. A good example of such a layer is ‘Boost::Bindings’ [?] that provides const-correct


142 CHAPTER 5. META-PROGRAMMING<br />

high-quality interface to BLAS, LAPACK and other libraries with similarly old-fashioned 6 interfaces.<br />

Conversely, as long as we only use our own functions and classes we can avoid every<br />

const cast. 7<br />

We could implement a second view class <strong>for</strong> constant matrices and overload the trans function<br />

to return this view:<br />

template <br />

class const transposed view<br />

{<br />

public:<br />

typedef typename mtl::Collection::value type value type;<br />

typedef typename mtl::Collection::size type size type;<br />

};<br />

const transposed view(const Matrix& A) : ref(A) {}<br />

const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />

//private:<br />

const Matrix& ref;<br />

template <br />

const transposed view inline trans(const Matrix& A)<br />

{<br />

return const transposed view(A);<br />

}<br />

This works fine and the user could use the trans function <strong>for</strong> both constant and mutable matrices.<br />

However, a complete new class definition is a fair amount of work where just one piece of the<br />

class definition needs to be altered. For this purpose we introduce two meta-functions.<br />

Check <strong>for</strong> Constancy<br />

Our problem with the view in Listing 5.1 is that it cannot handle constant types as template<br />

argument. To modify the behavior <strong>for</strong> constant arguments we first need to find out whether<br />

an argument is constant. The meta-function that provides this in<strong>for</strong>mantion is very simple to<br />

implement by partial template specialization:<br />

template <br />

struct is const<br />

{<br />

static const bool value= false;<br />

};<br />

template <br />

struct is const<br />

{<br />

static const bool value= true;<br />

};<br />

6 To phrase it diplomatically.<br />

7 We disagree with Sutter and Alexandrescu on the other exception <strong>for</strong> using const cast [SA05, page 179],<br />

this can be handled easily with an extra function. 8


5.2. PROVIDING TYPE INFORMATION 143<br />

Constant types match both definitions but the second one is more specific and there<strong>for</strong>e picked<br />

by the compiler. Non-constant types match only the first one. Note that the constancy of<br />

template parameters is not considered, e.g., view is not regarded constant.<br />

Compile-time Branching<br />

The other tool we need <strong>for</strong> our view is a type selection depending on a logical condition. Introduced<br />

was this technology by Krzysztof Czarnecki 9 and Ulrich W. Eisenecker [CE00]<br />

This can be achieved by a rather simple implementation<br />

1 template <br />

2 struct if c<br />

3 {<br />

4 typedef ThenType type;<br />

5 };<br />

6<br />

7 template <br />

8 struct if c<br />

9 {<br />

10 typedef ElseType type;<br />

11 };<br />

Listing 5.2: Compile-time if<br />

When this template is instantiated with a logical expressions and two type, only the general<br />

definition in line 1 matches when the first argument evaluates to true and the ‘ThenType’ is used<br />

in the type definition. If the first argument evaluates to false then the specialization in line 7<br />

is more specific so that the ‘ElseType’ is used. Like many ingenious inventions it is very simple<br />

once it is found.<br />

This allows us to define funny things like using double <strong>for</strong> temporaries when our maximal<br />

iteration number is larger than 100 other float:<br />

typedef tst::if c 100, double, float>::type tmp type;<br />

std::cout ≪ ”typeid = ” ≪ typeid(tmp type).name() ≪ ’\n’;<br />

Needless to say that ‘max iter’ must be known at compile time. Admittedly, the example does<br />

not look extremely useful and the meta-if is not so important in small isolated code snippets.<br />

On the other hand, <strong>for</strong> the development of large generic software packages, it becomes extremely<br />

important.<br />

A convenience function as defined in the Meta-Programming Library [GA04] is ‘if ’<br />

template <br />

struct if<br />

: if c<br />

{};<br />

It expects as first argument a type with a static const member named value and convertible to<br />

bool. In other words, it selects the type based on the value of condition (and saves typing 8<br />

characters).<br />

9 Zu dem Zeitpunkt war er Doktorand an der TU Ilmenau.


144 CHAPTER 5. META-PROGRAMMING<br />

The Solution<br />

Now we have all we need to revise the view from Listing 5.1. The problem was that we returned<br />

an entry of a constant matrix as mutable reference. To avoid this we can try to make the<br />

mutable access operator disappear in the view when the referred matrix is constant. This is<br />

possible but too complicated <strong>for</strong> the momemt. We will come back to this in Section 5.2.4.<br />

An easier solution is to keep both the mutable and the constant access operator but choose the<br />

return type of the <strong>for</strong>mer depending on the type of the template argument:<br />

1 template <br />

2 class transposed view<br />

3 {<br />

4 public:<br />

5 typedef typename mtl::Collection::value type value type;<br />

6 typedef typename mtl::Collection::size type size type;<br />

7 private:<br />

8 typedef typename if ::type vref type;<br />

12 public:<br />

13 transposed view(Matrix& A) : ref(A) {}<br />

14<br />

15 vref type operator()(size type r, size type c) { return ref(c, r); }<br />

16 const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />

17<br />

18 private:<br />

19 Matrix& ref;<br />

20 };<br />

Listing 5.3: Const-safe view implementation<br />

This implementation returns a constant reference in line 15 when the referred matrix is constant<br />

and a mutable referrence <strong>for</strong> mutable referred matrix. Let us see if this is what we need. For<br />

mutable matrix references, the return type of operator() depends on the constancy of the view<br />

object:<br />

• If the view object is mutable (line 15) then operator() returns a mutable reference (line 10);<br />

and<br />

• If the view object is constant (line 16) then operator() returns a constant reference.<br />

This is the same behavior as in Listing 5.1.<br />

If the matrix reference is constant, then a constant reference is always returned:<br />

• If the view object is mutable (line 15) then operator() returns a mutable reference (line 9);<br />

and<br />

• If the view object is constant (line 16) then operator() returns a constant reference.<br />

Altogether, we implemented a view object that provides read and write access whereever appropriate<br />

and disables it where inappropriate.


5.2. PROVIDING TYPE INFORMATION 145<br />

5.2.3 More Useful Meta-functions<br />

The Boost Type Traits library [?] provides a large spectrum of meta-functions to test or manipulate<br />

attributes of types. Some of them are rather easy to implement — like the previously<br />

introduced is const — and others — like has trivial constructor or is base — require deep insight<br />

into C ++ subtleties and often into compiler internals as well. Unless one only uses very simple<br />

type traits and wants absolutely avoid the dependency of an external library, it is advisable to<br />

favor the extensively tested implementations from the type traits library over rewriting it.<br />

With the boost::is xyz we can implement special behavior <strong>for</strong> certain sets of types. One can easily<br />

add tests <strong>for</strong> domain specific type sets:<br />

template <br />

struct is matrix<br />

: boost::mpl::false<br />

{};<br />

template <br />

struct is matrix<br />

: boost::mpl::true<br />

{};<br />

// more matrix classes ...<br />

template <br />

struct is matrix<br />

: is matrix<br />

{};<br />

// more views ...<br />

Our program snippet is in line with the implementations in Boost. Instead of defining a static<br />

constant as in Section 5.2.2 we derive the meta function from boost::mpl::false and boost::mpl::true<br />

where static constants are defined with some additional typedefs. This not only shorter but<br />

requires also a bit less compile time, see [?]. 10<br />

The code is quite self-explanatory. Type we do not know are considered not being a matrix.<br />

Then we specialize <strong>for</strong> known matrix classes. For views we can further refer to the matrix-ness<br />

of the template argument.<br />

Alternatively, we can say in the type trait that every transposed view is a matrix and instead<br />

require <strong>for</strong> template arguments of transposed view that they are matrices.<br />

#include <br />

template <br />

class transposed view<br />

{<br />

BOOST STATIC ASSERT((is matrix::value)); // Make sure that the argument is a matrix type<br />

// ...<br />

};<br />

This additional assertion guarantees that the view class can only be instantiated with known<br />

matrix types. For other argument types the compilation, will terminate in this line. Un<strong>for</strong>tunately,<br />

the error message is not very in<strong>for</strong>mative <strong>for</strong> not saying confusing:<br />

10 TODO: page


146 CHAPTER 5. META-PROGRAMMING<br />

trans_const.cpp:96: Error: Invalid application of ≫sizeof≪ on incomplete type<br />

≫boost::STATIC_ASSERTION_FAILURE≪<br />

If you see an error message with “STATIC ASSERTION” in it, do not think about the message<br />

itself (it is meaningless) but look at the source code line that caused this error and hope that<br />

the author of the assertion will provide more in<strong>for</strong>mation in a comment.<br />

When we try to compile our test with the assertion we will see that trans(A) compiles but<br />

not trans(B). The reason is that ‘const dense2D’ is considered different from ‘dense2D’ in<br />

template specialization so that it is still considered non-matrix. The good new is that we do not<br />

need to double our specializations <strong>for</strong> mutable and constant types but we can write a partial<br />

specialization <strong>for</strong> all constant arguments:<br />

template <br />

struct is matrix<br />

: is matrix {};<br />

Note that BOOST STATIC ASSERT is a macro and does not understand C ++. This manifests in<br />

particular if the argument contains one or more commas. Than the preprocessor will interpret<br />

this as multiple arguments <strong>for</strong> the macro and get confused. This confusion can be avoided<br />

by enclosing the argument of BOOST STATIC ASSERT with two enclosing parentheses as we did<br />

in the example (although it was not necessary here). Despite the double parentheses and the<br />

rather arbitrary error message, static assertions are very useful to increase reliabily. The next<br />

C ++ standard will provide static assertions in the language like:<br />

template <br />

class transposed view<br />

{<br />

static assert(is matrix::value, ”transposed view requires a matrix as argument”);<br />

// ...<br />

};<br />

As the reader can see, the integration into the language overcomes the be<strong>for</strong>e-mentioned deficiencies<br />

of the macro implementation.<br />

Useful are meta-functions to remove something from a type if exists, e.g. remove const trans<strong>for</strong>ms<br />

const T into T and non-constant types remain unchanged. Note that this only removes the<br />

constancy of entire types not that of template arguments, e.g., in vector the constancy<br />

of the arguments is not removed.<br />

Dually, meta-functions can add something to a type:<br />

typedef typename boost::add reference::type ref type;<br />

It would be shorter to just add an & but this is easily overseen in longer type definitions. More<br />

importantly, if some trait returns already a reference then it is an error to add another one. The<br />

meta-function adds the reference only to types that are no references yet. To adding const to a<br />

type we find it more concise without the meta-function:<br />

typedef typename some trait::type const const type;<br />

If the type trait returns already a constant type, the second const will be ignored.<br />

The widest functionality in the area of meta-programming provides the Boost Meta-Programming<br />

Library (MPL) [GA04]. The library implements most of the STL algorithms (§ 4.9) and also


5.2. PROVIDING TYPE INFORMATION 147<br />

provides similar data types, e.g., vector or map. Another interesting library is Boost Fusion [?]<br />

that helps the mixing the execution at compile and run time. Both libraries are well documented<br />

and there<strong>for</strong>e not further discussed here.<br />

5.2.4 Enable-If<br />

A very powerful mechanism <strong>for</strong> meta-programming is “enable if” discovered by Jaakko Järvi<br />

and Jeremiah Wilcock. It bases on the paradigm SFINAE — Substitution Failure Is Not An<br />

Error. Imagine a function call with a given argument type — say dense vector. One of the<br />

overloads has a return type that is determined by a meta-function depending on the function<br />

argument. Then compiler will substitute the meta-function argument with dense vector<br />

to find out the return type. If this meta-function is defined dense vector then the template<br />

function (overload) has no return type. Instead of generating an error message, the C ++ compiler<br />

diligently ignores this overload. Of course, an error might occur later if all overloads are ignored<br />

<strong>for</strong> the given type or compiler cannot determine the most specific overload between those that<br />

are not ignored the.<br />

This compiler behavior can be explored to select an implementation based on meta-functions. As<br />

an example think of the L1 norm. It is defined <strong>for</strong> vector spaces and linear operators. Although<br />

these definitions are related, the practical real-world implementation <strong>for</strong> finite-dimensional vectors<br />

and matrices is different. Of course we could implement L1 norm <strong>for</strong> every matrix and<br />

vector type so that the call one norm(x) would select the appropriate implementation <strong>for</strong> this<br />

type.<br />

More productively, we like have one single implementation <strong>for</strong> all matrix types (including views)<br />

and one single implementation <strong>for</strong> all vector types. We use meta-function is matrix and implement<br />

accordingly is vector:<br />

template <br />

struct is vector<br />

: boost::mpl::false<br />

{};<br />

template <br />

struct is vector<br />

: boost::mpl::true<br />

{};<br />

// ... more vector types<br />

We also need the meta-function Magnitude to handle the magnitude of complex matrices and<br />

vectors.<br />

The implementation of enable if is very simple. It defines a type if the condition holds and none<br />

if the condition does not. The version in Boost adds a second level to access the static value<br />

member in types:<br />

template <br />

struct enable if c {<br />

typedef T type;<br />

};<br />

template <br />

struct enable if c {};


148 CHAPTER 5. META-PROGRAMMING<br />

template <br />

struct enable if<br />

: public enable if c<br />

{};<br />

The real enabling behavior is realized in enable if c whereas enable if is merely a convience function<br />

to avoid type ‘::value’.<br />

Now we have all we need to implement the L1 norm in the generic fashion we aimed <strong>for</strong>:<br />

1 template <br />

2 typename boost::enable if::type<br />

3 inline one norm(const T& A)<br />

4 {<br />

5 using std::abs;<br />

6 typedef typename Magnitude::type mag type;<br />

7 mag type max(0);<br />

8 <strong>for</strong> (unsigned c= 0; c < num cols(A); c++) {<br />

9 mag type sum(0);<br />

10 <strong>for</strong> (unsigned r= 0; r < num cols(A); r++)<br />

11 sum+= abs(A[r][c]);<br />

12 max= max < sum ? sum : max;<br />

13 }<br />

14 return max;<br />

15 }<br />

16<br />

17 template <br />

18 typename boost::enable if::type<br />

19 inline one norm(const T& v)<br />

20 {<br />

21 using std::abs;<br />

22 typedef typename Magnitude::type mag type;<br />

23 mag type sum(0);<br />

24 <strong>for</strong> (unsigned r= 0; r < size(v); r++)<br />

25 sum+= abs(v[r]);<br />

26 return sum;<br />

27 }<br />

The selection is now driven by enable if in line 2 and 18. Let us look at line 2 in detail <strong>for</strong> a<br />

matrix argument:<br />

1. is matrix is evaluated to (i.e. inherited from) true ;<br />

2. enable if passes true ::value i.e. true to enable if c;<br />

3. enable if c< >::type is set to typename Magnitude::type;<br />

4. This is the return type of the function overload.<br />

What happens in this line when the argument is not a matrix type:<br />

1. is matrix is evaluated to (i.e. inherited from) false ;<br />

2. enable if passes false ::value i.e. false to enable if c<br />

3. enable if c< >::type is not set in this case;


5.2. PROVIDING TYPE INFORMATION 149<br />

4. The function overload has no return type;<br />

5. Is there<strong>for</strong>e ignored.<br />

For short, the overload is only enabled if the argument is a matrix — as the names of the<br />

meta-functions say. Likewise the second overload is only available <strong>for</strong> vectors. A short test<br />

demonstrates this:<br />

mtl::dense2D A(3, 3);<br />

A= 2, 3, 4,<br />

5, 6, 7,<br />

8, 9, 10;<br />

mtl::dense vector v(3);<br />

v= 3, 4, 5;<br />

std::cout ≪ ”one norm(A) is ” ≪ tst::one norm(A) ≪ ”\n”;<br />

std::cout ≪ ”one norm(v) is ” ≪ tst::one norm(v) ≪ ”\n”;<br />

For types that are neither matrix or vector it will look as there is no function one norm at all.<br />

Types that are considered both matrix and vector would cause an ambiguity.<br />

Draw-backs: The mechanism of enable if is very powerful but not particularly pleasant to<br />

debug. Error messages caused by enable if are usually rather long but not very meaningful. If<br />

a function match is missing <strong>for</strong> a given argument type, it is hard to determine why because no<br />

helpful in<strong>for</strong>mation is provided to the programmer, he/she is only told that no match is found,<br />

period. The enabling mechanism can not select the most specific condition. For instance,<br />

we cannot specialize implementation <strong>for</strong> say is sparse matrix. This can be achieved by avoid<br />

ambiguities in the conditions:<br />

template <br />

typename boost::enable if c::type<br />

inline one norm(const T& A);<br />

template <br />

typename boost::enable if::type<br />

inline one norm(const T& A);<br />

Evidently, this will become quite confusing if too many hierarchical conditions are considered.<br />

The SFINAE paradigm only applies to template arguments of the function itself. There<strong>for</strong>e,<br />

member functions cannot be enabled depending on the class’ template argument. For instance,<br />

the mutable access operator in line 9 of Listing 5.1 cannot be hidden with enable if <strong>for</strong> views on<br />

constant matrices because the operator itself is not a template function. There are possibilities<br />

to introduce a template argument artificially <strong>for</strong> a member function to enable enable if but this<br />

really does not contribute to the clarity of the program.<br />

Concepts can handle hierarchies in conditions, non-template member functions and provide also<br />

more helpful error messages. Un<strong>for</strong>tunately, they will not be available in C ++0x and it is not<br />

clear yet when they will usable <strong>for</strong> mainstream programming.


150 CHAPTER 5. META-PROGRAMMING<br />

5.3 Expression Templates<br />

Scientific software has usually strong per<strong>for</strong>mance requirements — especially those problems<br />

we tackle with C ++. Many large-scale simulations of physical, chemical, or biological processes<br />

run <strong>for</strong> weeks or months and everybody is glad if at least a part of this very long execution<br />

times can be safed. Such safings are often at the price of readable and maintainable program<br />

sources. In Section 5.3.1 we will show a simple implementation of an operator and discuss why<br />

this is not efficient and in the remainder of Section 5.3 we will demonstrate how to improve to<br />

improve the per<strong>for</strong>mance without sacrificing the natural notation.<br />

5.3.1 Simple Operator Implementation<br />

Assume we have an application with vector addition. We want <strong>for</strong> instance write an expression<br />

of the following <strong>for</strong>m <strong>for</strong> vectors w, x, y and z:<br />

w = x + y + z;<br />

Say, we have a vector class as in Section 4.3:<br />

template <br />

class vector<br />

{<br />

public:<br />

explicit vector(int size) : my size(size), data(new T[my size]) {}<br />

vector() : my size(0), data(0) {}<br />

};<br />

friend int size(const vector& x) { return x.my size; }<br />

const T& operator[](int i) const { check index(i); return data[i]; }<br />

T& operator[](int i) { check index(i); return data[i]; }<br />

// ...<br />

We can of course provide an operator <strong>for</strong> adding such vectors:<br />

template <br />

vector inline operator+(const vector& x, const vector& y)<br />

{<br />

x.check size(size(y));<br />

vector sum(size(x));<br />

<strong>for</strong> (int i= 0; i < size(x); ++i)<br />

sum[i] = x[i] + y[i];<br />

return sum;<br />

}<br />

A short test program checks that everything works:<br />

int main()<br />

{<br />

vector x(4), y(4), z(4), w(4);<br />

x[0]= x[1]= 1.0; x[2]= 2.0; x[3] = −3.0;<br />

y[0]= y[1]= 1.7; y[2]= 4.0; y[3] = −6.0;<br />

z[0]= z[1]= 4.1; z[2]= 2.6; z[3] = 11.0;


5.3. EXPRESSION TEMPLATES 151<br />

}<br />

std::cout ≪ ”x = ” ≪ x ≪ std::endl;<br />

std::cout ≪ ”y = ” ≪ y ≪ std::endl;<br />

std::cout ≪ ”z = ” ≪ z ≪ std::endl;<br />

w= x + y + z;<br />

std::cout ≪ ”w= x + y + z = ” ≪ w ≪ std::endl;<br />

return 0;<br />

If this works properly, what is wrong with it? From the software engineering prospective:<br />

nothing. From the per<strong>for</strong>mance prospective: a lot.<br />

How is the statement executed:<br />

1. Create temporary variable sum <strong>for</strong> the addition of x and y;<br />

2. Per<strong>for</strong>m a loop reading x and y, adding it element-wise, and writing the result to sum;<br />

3. Copy sum to a temporary variable, say t xy, in the return statement;<br />

4. Delete sum;<br />

5. Create temporary variable sum <strong>for</strong> the addition of t xy and z;<br />

6. Per<strong>for</strong>m a loop reading t xy and z, adding it element-wise, and writing the result to sum;<br />

7. Copy sum to a temporary variable, say t xyz, in the return statement;<br />

8. Delete sum;<br />

9. Delete t xy;<br />

10. Per<strong>for</strong>m a loop reading t xyz and writing to w;<br />

11. Delete t xyz;<br />

This is admittedly the worst-case scenario. But it was the code that old compilers generated.<br />

Modern compilers per<strong>for</strong>m more optimizations by static code analysis and can avoid copying<br />

the return value into the temporaries t xy and t xyz. Instead of creating the temporaries t xy<br />

and t xyz, they become aliases <strong>for</strong> the respective sum temporaries.<br />

The optimized version per<strong>for</strong>ms:<br />

1. Create temporary variable sum (<strong>for</strong> distinction sum xy) <strong>for</strong> the addition of x and y;<br />

2. Per<strong>for</strong>m a loop reading x and y, adding it element-wise, and writing the result to sum;<br />

3. Create temporary variable sum (<strong>for</strong> distinction sum xyz) <strong>for</strong> the addition of sum xy and z;<br />

4. Per<strong>for</strong>m a loop reading sum xy and z, adding it, and writing the result to sum xyz;<br />

5. Delete sum xy;<br />

6. Per<strong>for</strong>m a loop reading sum xyz and writing to w;<br />

7. Delete sum xyz;<br />

How much operations did we per<strong>for</strong>m? Say our vectors have lenght n then we have in total:<br />

• 2n additions;


152 CHAPTER 5. META-PROGRAMMING<br />

• 3n assignments;<br />

• 5n reads;<br />

• 3n writes;<br />

• 2 memory allocations; and<br />

• 2 memory deallocations.<br />

As comparison, if we could write a single loop or an inline function:<br />

template <br />

void inline add3(const vector& x, const vector& y, const vector& z, vector& sum)<br />

{<br />

x.check size(size(y));<br />

x.check size(size(z));<br />

x.check size(size(sum));<br />

<strong>for</strong> (int i= 0; i < size(x); ++i)<br />

sum[i] = x[i] + y[i] + z[i];<br />

}<br />

This function per<strong>for</strong>ms:<br />

• 2n additions;<br />

• n assignments;<br />

• 3n reads;<br />

• n writes;<br />

The call of this function:<br />

add3(x, y, z, w);<br />

is of course less elegant than the operator notation. Often, one need another look at the<br />

documentation wether the first or the last argument contains the result. With operators this is<br />

evident.<br />

In high-per<strong>for</strong>mance software, programmers tend to implement a hard-coded version of every<br />

important operation instead of freely compose them from smaller expressions. The reason is<br />

obvious, our operator implementation per<strong>for</strong>med additionally:<br />

• 2n assignments;<br />

• 2n reads;<br />

• 2n writes;<br />

• 2 memory allocations; and<br />

• 2 memory deallocations.<br />

The good news is we have not per<strong>for</strong>med additional arithmetic. The bad news is that the<br />

operations above are more expensive. On modern computers, it takes much more time to<br />

read data from or write to the memory than executing fixed or floating point operations. 11<br />

Un<strong>for</strong>tunately, vectors in scientific applications tend to be rather long, often larger than the<br />

11 TODO: Maybe quantifying <strong>for</strong> some machine.


5.3. EXPRESSION TEMPLATES 153<br />

caches of the plat<strong>for</strong>m and the vectors must really be transfer to and from main memory. In<br />

case of shorter vectors, the data might reside in L1 or L2 cache and the data transfer is less<br />

critical. But in this case, the allocation and deallocation becomes a serious slow down factor.<br />

The purpose of expression templates is to keep the original operator notation without introducing<br />

the overhead induced by temporaries.<br />

5.3.2 An Expression Template Class<br />

The solution is to introduce a special class that keeps references to the vectors and allows us<br />

to per<strong>for</strong>m all computations later in one sweep. The addition does not return now a vector but<br />

an object with the references:<br />

template <br />

class vector sum<br />

{<br />

public:<br />

vector sum(const vector& v1, const vector& v2) : v1(v1), v2(v2) {}<br />

private:<br />

const vector& v1, v2;<br />

};<br />

template <br />

vector sum inline operator+(const vector& x, const vector& y)<br />

{<br />

return vector sum(x, y);<br />

}<br />

Now we can already write x + y but not w= x + y yet. It is not only that the assignment is not<br />

defined, we have not yet provided vector sum with enough functionality to per<strong>for</strong>m something<br />

useful in the assignment. Thus, we first extend vector sum so that it looks like a vector itself:<br />

template <br />

class vector sum<br />

{<br />

void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />

public:<br />

vector sum(const vector& v1, const vector& v2) : v1(v1), v2(v2)<br />

{<br />

assert(size(v1) == size(v2));<br />

}<br />

friend int size(const vector sum& x) { return size(x.v1); }<br />

T operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />

private:<br />

const vector& v1, v2;<br />

};<br />

For the sake of defensive programming, we added a test that the two vectors have the same<br />

size and can be consistently added. Then we consider the size of the first vector as the size of<br />

our vector sum. The most important function is the bracket operator: when the i th entry we<br />

compute the sum of the operands i th entries.


154 CHAPTER 5. META-PROGRAMMING<br />

Discussion 5.1 The drawback is that if the entries are accessed multiple times the sum is<br />

recomputed. On the other hand, most expressions are only used once and this is not a problem.<br />

An example where vector entries are accessed several times is A ∗ (x+y). Here, it is preferable<br />

to compute a true vector first instead of computing the matrix vector product on the expression<br />

template. 12<br />

To evaluate w= x + y we also need an assignment operator <strong>for</strong> vector sum:<br />

template class vector sum; // <strong>for</strong>ward declaration<br />

template <br />

class vector<br />

{ // ...<br />

vector& operator=(const vector sum& that)<br />

{<br />

check size(size(that));<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that[i];<br />

return ∗this;<br />

}<br />

};<br />

The assignment runs a loop over w and that. As that is an object of type vector sum the expression<br />

that[i] computes x[i] + y[i]. In contrast to the implementationn in Section 5.3.1 we have now<br />

• Only one loop;<br />

• No temporary vector;<br />

• No additional memory allocation and deallocation; and<br />

• No addional data reads and writes.<br />

In fact, the same operations are per<strong>for</strong>med as in the loop<br />

<strong>for</strong> (int i= 0; i < size(w); ++i)<br />

w[i] = x[i] + y[i];<br />

The cost to create a vector sum object is negligible. The object will be kept on the stack and does<br />

not require memory allocation. Even that little ef<strong>for</strong>t <strong>for</strong> creating the object will be optimized<br />

away by most compilers with static code analysis.<br />

What happens when we like to add three vectors? The naïve implementation from § 5.3.1<br />

returns a vector and this vector can be added to another vector. Our approach returns a<br />

vector sum and we have no addition <strong>for</strong> vector sum and vector. Thus we would need another ET<br />

class and an according operation:<br />

template <br />

class vector sum3<br />

{<br />

void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />

public:<br />

vector sum3(const vector& v1, const vector& v2, const vector& v3) : v1(v1), v2(v2), v3(v3)<br />

{<br />

assert(size(v1) == size(v2)); assert(size(v1) == size(v3));<br />

12 TODO: Shall we provide a solution <strong>for</strong> this as well? This something that is over-due in MTL4 anyway.


5.3. EXPRESSION TEMPLATES 155<br />

}<br />

friend int size(const vector sum3& x) { return size(x.v1); }<br />

T operator[](int i) const { check index(i); return v1[i] + v2[i] + v3[i]; }<br />

private:<br />

const vector& v1, v2, v3;<br />

};<br />

template <br />

vector sum3 inline operator+(const vector sum& x, const vector& y)<br />

{<br />

return vector sum3(x.v1, x.v2, y);<br />

}<br />

Furthermore, vector sum must declare our new plus operator as friend to access its private<br />

members and vector needs an assignment <strong>for</strong> vector sum3. This becomes increasingly annoying.<br />

Also, what happens if we per<strong>for</strong>m the second addition first w= x + (y + z)? Then we<br />

need another plus operator. What if some of the vectors are multiplied by a scalar, e.g.,<br />

w= x + dot(x, y) ∗ y + 4.3 ∗ z, and this scalar product is also implemented by an ET? Our implementation<br />

ef<strong>for</strong>t runs into combinatorial explosion and we need a more flexible solution that<br />

we introduce in the next section.<br />

5.3.3 Generic Expression Templates<br />

So far we started from a specific class (vector) and generalized the implementation gradually.<br />

Although this can help us to understand the mechanism, we like to go now to the general version<br />

that takes arbitrary vector types:<br />

template <br />

vector sum inline operator+(const V1& x, const V2& y)<br />

{<br />

return vector sum(x, y);<br />

}<br />

We now need an expression class with arbitrary arguments:<br />

template <br />

class vector sum<br />

{<br />

typedef vector sum self;<br />

void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />

public:<br />

vector sum(const V1& v1, const V2& v2) : v1(v1), v2(v2)<br />

{<br />

assert(size(v1) == size(v2));<br />

}<br />

???? operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />

friend int size(const self& x) { return size(x.v1); }<br />

private:<br />

const V1& v1;


156 CHAPTER 5. META-PROGRAMMING<br />

};<br />

const V2& v2;<br />

This is rather straight<strong>for</strong>ward. The only issue is what type to return in operator[]? For this<br />

we must define value type in each class — more flexible would be an external type trait. In<br />

vector sum we take the value type of the first argument which can itself be taken from another<br />

class.<br />

template <br />

class vector sum<br />

{<br />

// ...<br />

typedef typename V1::value type value type;<br />

};<br />

value type operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />

To assign such an expression to a vector we can also generalize the assign operator:<br />

template <br />

class vector<br />

{<br />

public:<br />

typedef T value type;<br />

};<br />

template <br />

vector& operator=(const Src& that)<br />

{<br />

check size(size(that));<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that[i];<br />

return ∗this;<br />

}<br />

This assigment can also handle vector as argument and we can omit the standard assignment<br />

operator.<br />

Advantages of expression templates: Although the availability of operator overloading<br />

in C ++ resulted in notationally nicer code, the scientific community refused to give up programming<br />

in Fortran or to implement the loops directly in C/C ++. The reason was that the<br />

traditional operator implementations were too expensive. Due to the overhead of the creation<br />

of temporary variables and the copying of vector and matrix objects, C ++ could not compete<br />

with the per<strong>for</strong>mance of programs written in Fortran. This problem has now been resolved<br />

by the introduction of generics and expression templates. Now it is possible to write efficient<br />

scientific programs in a notationally convenient manner.<br />

5.4 Meta-Tuning: Write Your Own Compiler Optimization<br />

Compiler technology is progressing and provides us an increasing number of optimization techniques.<br />

Ideally, everyone writes his software in the way it is the easiest <strong>for</strong> him and the compiler


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 157<br />

trans<strong>for</strong>ms the operations to a <strong>for</strong>m that is best <strong>for</strong> execution time. We would only need a new<br />

compiler and our programs become faster. 13 But live — especially as advanced C ++ programmer<br />

— is no walk in the park. Of course, the compiler helps us a lot to speed up our programs.<br />

But there are limitations, many optimizations need knowledge of the semantic behavior and can<br />

there<strong>for</strong>e only be applied on types and operations where the semantic is known at the time the<br />

compiler is written, see also discussion in [?]. Research is going on, to overcome this limitations<br />

by providing concept-based optimization [?]. Un<strong>for</strong>tunately, this will take time until it becomes<br />

mainstream, especially now that concepts are taken out of the C ++0x standard. An alternative<br />

is source-to-source code trans<strong>for</strong>mation with external tools like ROSE [?].<br />

Even <strong>for</strong> types and operations that the compiler can handle, it has its limitations. Most compilers<br />

(gcc, . . . 14 ) only deal with the inner loop in nested ones (see solution in Section 5.4.2)<br />

and does not dare to introduce extra temporaries (see solution in Section ??). Some compilers<br />

are particularly tuned <strong>for</strong> benchmarks. 15 For instance, they have pattern matching to recognize<br />

a 3-nested loop that computes a dense matrix product and trans<strong>for</strong>m those in BLAS-like code<br />

with 7 or 9 plat<strong>for</strong>m-dependent loops. 16 All this said, writing high-per<strong>for</strong>mance software is no<br />

walk in the park. That does not mean that such software must be unreadable and unmaintainable<br />

hackery. The route of success is again to provide appropriate abstractions. Those can be<br />

empowered with compile-time optimizations so that the applications are still writen in natural<br />

mathematical notation whereas the generated binaries can still explore all known techniques<br />

<strong>for</strong> fast execution.<br />

5.4.1 Classical Fixed-Size Unrolling<br />

The easiest <strong>for</strong>m of compile-time optimization can be realized <strong>for</strong> fixed-size data types, in<br />

particular vectors as in Section 4.7. Simular to the default assignment, we can write a generic<br />

vector assignment:<br />

template <br />

class fsize vector<br />

{<br />

public:<br />

const static int my size= Size;<br />

};<br />

template <br />

self& operator=(const self& that)<br />

{<br />

<strong>for</strong> (int i= 0; i < my size; ++i)<br />

data[i]= that[i];<br />

}<br />

13<br />

In some sense, this is the programming equivalent of communism: everybody contributes as much as he<br />

pleases and like he pleases and in the end, the right thing will happen anyway thanks to a self-improving society.<br />

Likewise, some people write software in a very naïve fashion and blame the compiler not trans<strong>for</strong>ming their<br />

programs into high-per<strong>for</strong>mance code.<br />

14<br />

TODO: we should run some benchmarks on MSVC and icc.<br />

15<br />

TODO: search <strong>for</strong> paper on kcc.<br />

16<br />

One could sometimes get the impression that the HPC community beliefs that multiplying dense matrices<br />

at near-peak per<strong>for</strong>mance solves all per<strong>for</strong>mance issues of the world or at least demonstrates that everything can<br />

be computed at near-peak per<strong>for</strong>mance if only one tries hard enough. Fortunately, more and more people in the<br />

supercomputer centers realize that their machines are not only running BLAS3 and LAPACK operations and<br />

that real-world applications are more often than not limited by memory bandwidth and latency.


158 CHAPTER 5. META-PROGRAMMING<br />

A state-of-the-art compiler will recognize that all iterations are independent one from each<br />

other, e.g., data[2]= that[2]; is independent of data[1]= that[1];. The compiler will also determine<br />

the size of loop during compilation. As a consequence, the generated binary of a type with size<br />

3 will be equivalent to:<br />

template <br />

class fsize vector<br />

{<br />

template <br />

self& operator=(const self& that)<br />

{<br />

data[0]= that[0];<br />

data[1]= that[1];<br />

data[2]= that[2];<br />

}<br />

};<br />

The right-hand-side vector that might be an expression template § 5.3 <strong>for</strong> say alpha ∗ x + y and<br />

its evaluation will be also inlined:<br />

template <br />

class fsize vector<br />

{<br />

template <br />

self& operator=(const self& that)<br />

{<br />

data[0]= alpha ∗ x[0] + y[0];<br />

data[1]= alpha ∗ x[1] + y[1];<br />

data[2]= alpha ∗ x[2] + y[2];<br />

}<br />

};<br />

To make the unrolling more explicit and <strong>for</strong> the sake of step-wise introducing meta-tuning we<br />

develop a functor that computes the assignment:<br />

template <br />

struct fsize assign<br />

{<br />

void operator()(Target& tar, const Source& src)<br />

{<br />

fsize assign()(tar, src);<br />

std::cout ≪ ”assign entry ” ≪ N ≪ ’\n’;<br />

tar[N]= src[N];<br />

}<br />

};<br />

template <br />

struct fsize assign<br />

{<br />

void operator()(Target& tar, const Source& src)<br />

{<br />

std::cout ≪ ”assign entry ” ≪ 0 ≪ ’\n’;<br />

tar[0]= src[0];<br />

}<br />

};


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 159<br />

The print-outs shall show us the execution. For convenience, one can templatize the operator<br />

on the argument types:<br />

template <br />

struct fsize assign<br />

{<br />

template <br />

void operator()(Target& tar, const Source& src)<br />

{<br />

fsize assign()(tar, src);<br />

std::cout ≪ ”assign entry ” ≪ N ≪ ’\n’;<br />

tar[N]= src[N];<br />

}<br />

};<br />

template <br />

struct fsize assign<br />

{<br />

template <br />

void operator()(Target& tar, const Source& src)<br />

{<br />

std::cout ≪ ”assign entry ” ≪ 0 ≪ ’\n’;<br />

tar[0]= src[0];<br />

}<br />

};<br />

Then the vector types can by deduced by the compiler when the operator is called. Instead of<br />

the previous loop, we call the assignment functor in the operator:<br />

template <br />

class fsize vector<br />

{<br />

BOOST STATIC ASSERT((my size > 0));<br />

};<br />

self& operator=( const self& that )<br />

{<br />

fsize assign()(∗this, that);<br />

return ∗this;<br />

}<br />

template <br />

self& operator=( const Vector& that )<br />

{<br />

fsize assign()(∗this, that);<br />

return ∗this;<br />

}<br />

The execution of the following code fragment<br />

yields<br />

fsize vector v, w;<br />

v[0]= v[1]= 1.0; v[2]= 2.0; v[3]= −3.0;<br />

w= v;


160 CHAPTER 5. META-PROGRAMMING<br />

assign entry 0<br />

assign entry 1<br />

assign entry 2<br />

assign entry 3<br />

In this implementation, we replaced the loop by a recursion — counting on the compiler to<br />

inline the operations (otherwise it would be even slower as the loop) — and made sure that no<br />

loop index is incremented and tested <strong>for</strong> termination. This is only beneficial <strong>for</strong> small loops that<br />

run in L1 cache. Larger loops are dominated by loading the data from memory and the loop<br />

overhead is irrelevant. To the contrary, unrolling operations on very large vectors entirely will<br />

probably decrease the per<strong>for</strong>mance because a lot of instructions need to be loaded and decrease<br />

there<strong>for</strong>e the available bandwidth <strong>for</strong> the data. As mentioned be<strong>for</strong>e, compilers can unroll such<br />

operations by themselves — and hopefully know when it is better not to — and sometimes this<br />

automatic unrolling is even slightly faster then the explicit implementation.<br />

5.4.2 Nested Unrolling<br />

From our experience, compilers usually unroll nested loops. Even a good compiler that can<br />

handle certain nested loops will not be able to optimize every program kernel, in particular those<br />

with heavily templatized programs instantiated with user-defined types. We will demonstrate<br />

here how to unroll nested loops at compile time at the example of matrix vector multiplication.<br />

For this purpose, we introduce a simplistic fixed-size matrix type:<br />

template <br />

class fsize matrix<br />

{<br />

typedef fsize matrix self;<br />

public:<br />

typedef T value type;<br />

BOOST STATIC ASSERT((Rows ∗ Cols > 0));<br />

const static int my rows= Rows, my cols= Cols;<br />

fsize matrix()<br />

{<br />

<strong>for</strong> (int i= 0; i < my rows; ++i)<br />

<strong>for</strong> (int j= 0; j < my cols; ++j)<br />

data[i][j]= T(0);<br />

}<br />

fsize matrix( const self& that ) { ... }<br />

// cannot check column index<br />

const T∗ operator[](int r) const { return data[r]; }<br />

T∗ operator[](int r) { return data[r]; }<br />

mat vec et operator∗(const fsize vector& v) const<br />

{<br />

return mat vec et (∗this, v);<br />

}<br />

private:


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 161<br />

};<br />

T data[Rows][Cols];<br />

The bracket operator returns a pointer <strong>for</strong> the sake of simplicity but a good implementation<br />

should return a proxy that allows <strong>for</strong> checking the column index. The multiplication with a<br />

vector is realized by means of an expression template <strong>for</strong> not copying the result vector.<br />

Then the vector assigment needs a specialization <strong>for</strong> the expression template 17<br />

template <br />

class fsize vector<br />

{<br />

template <br />

self& operator=( const mat vec et& that )<br />

{<br />

typedef mat vec et et;<br />

fsize mat vec mult()(that.A, that.v, ∗this);<br />

return ∗this;<br />

}<br />

};<br />

The functor fsize mat vec mult must now compute the matrix vector product on the three arguments.<br />

The general implementation of the functor reads:<br />

template <br />

struct fsize mat vec mult<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult()(A, v in, v out);<br />

v out[Rows]+= A[Rows][Cols] ∗ v in[Cols];<br />

}<br />

};<br />

Again, the functor is only templatized on the sizes and the container types are deduced. The<br />

operator assumes that all smaller column indices are already handled and we can increment<br />

v out[Rows] by A[Rows][Cols] ∗ v in[Cols]. In particular, we assume that the first operation on<br />

v out[Rows] initializes it. Thus we need a (partial) specialization <strong>for</strong> Cols = 0:<br />

template <br />

struct fsize mat vec mult<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult()(A, v in, v out);<br />

v out[Rows]= A[Rows][0] ∗ v in[0];<br />

}<br />

};<br />

The careful reader noticed the substitution of += by =. We also notice that we have to call the<br />

computation <strong>for</strong> the preceeding row with all columns and inductively <strong>for</strong> all smaller rows. The<br />

17 A better solution would be implementing all assignments with a functor and specialize the functor because<br />

partial template specialization of functions does not always work as expected.


162 CHAPTER 5. META-PROGRAMMING<br />

number of columns in the matrix is taken from an internal definition in the matrix type <strong>for</strong> the<br />

sake of simplicity. Passing this as extra template argument or taking a type traits would have<br />

been more general because we are now limited to types where my cols is defined in the class.<br />

We still need a (full) specialization to terminate the recursion:<br />

template <br />

struct fsize mat vec mult<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

v out[0]= A[0][0] ∗ v in[0];<br />

}<br />

};<br />

With the inlining, our program will execute the operation w= A ∗ v <strong>for</strong> vectors of size 4 as:<br />

w[0]= A[0][0] ∗ v[0];<br />

w[0]+= A[0][1] ∗ v[1];<br />

w[0]+= A[0][2] ∗ v[2];<br />

w[0]+= A[0][3] ∗ v[3];<br />

w[1]= A[1][0] ∗ v[0];<br />

w[1]+= A[1][1] ∗ v[1];<br />

w[1]+= A[1][2] ∗ v[2];<br />

w[1]+= A[1][3] ∗ v[3];<br />

w[2]= A[2][0] ∗ v[0];<br />

w[2]+= A[2][1] ∗ v[1];<br />

w[2]+= A[2][2] ∗ v[2];<br />

w[2]+= A[2][3] ∗ v[3];<br />

w[3]= A[3][0] ∗ v[0];<br />

w[3]+= A[3][1] ∗ v[1];<br />

w[3]+= A[3][2] ∗ v[2];<br />

w[3]+= A[3][3] ∗ v[3];<br />

Our tests have shown that such an implementation is really faster than the compiler optimization<br />

on loops. 18<br />

Increasing Concurrency<br />

A disadvantage of the preceeding implementation is that all operations on an entry of the target<br />

vector are per<strong>for</strong>med in one sweep. There<strong>for</strong>e, the second operation must wait <strong>for</strong> the first the<br />

third <strong>for</strong> the second on so on. The fifth operation can be done in parallel with the <strong>for</strong>th,<br />

the ninth with the eighth but this is not satisfying. We like to have more concurrency in our<br />

program that enables parallel pipelines in superscalar processors. Again, we can twiddle our<br />

thumbs and hope that the compiler will reorder the statements or take it in our hands. More<br />

concurrency is provided by the following operation sequence:<br />

w[0]= A[0][0] ∗ v[0];<br />

w[1]= A[1][0] ∗ v[0];<br />

w[2]= A[2][0] ∗ v[0];<br />

w[3]= A[3][0] ∗ v[0];<br />

w[0]+= A[0][1] ∗ v[1];<br />

18 TODO: Give numbers


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 163<br />

w[1]+= A[1][1] ∗ v[1];<br />

w[2]+= A[2][1] ∗ v[1];<br />

w[3]+= A[3][1] ∗ v[1];<br />

w[0]+= A[0][2] ∗ v[2];<br />

w[1]+= A[1][2] ∗ v[2];<br />

w[2]+= A[2][2] ∗ v[2];<br />

w[3]+= A[3][2] ∗ v[2];<br />

w[0]+= A[0][3] ∗ v[3];<br />

w[1]+= A[1][3] ∗ v[3];<br />

w[2]+= A[2][3] ∗ v[3];<br />

w[3]+= A[3][3] ∗ v[3];<br />

We only need to reorganize our functor. The general template reads now:<br />

template <br />

struct fsize mat vec mult cm<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult cm()(A, v in, v out);<br />

v out[Rows]+= A[Rows][Cols] ∗ v in[Cols];<br />

}<br />

};<br />

Now, we need a partial specialization <strong>for</strong> row 0 to go the next column:<br />

template <br />

struct fsize mat vec mult cm<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult cm()(A, v in, v out);<br />

v out[0]+= A[0][Cols] ∗ v in[Cols];<br />

}<br />

};<br />

The partial specialization <strong>for</strong> column 0 is also needed to initialize the entry of the output vector:<br />

template <br />

struct fsize mat vec mult cm<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

fsize mat vec mult cm()(A, v in, v out);<br />

v out[Rows]= A[Rows][0] ∗ v in[0];<br />

}<br />

};<br />

Finally, we still need a specialization <strong>for</strong> row and column 0 to terminate the recursion. This<br />

can be reused from the previous functor:<br />

template <br />

struct fsize mat vec mult cm<br />

: fsize mat vec mult {};


164 CHAPTER 5. META-PROGRAMMING<br />

Using Registers<br />

Another feature of modern processors one should keep in mind: cache coherency. Processors<br />

are nowadays designed to share memory while pertaining consistency in their caches. As a<br />

result, every time we write into data structure in memory like our vector w a cache invalidation<br />

signal is sent on the bus. Even if no other processor is present. Un<strong>for</strong>tunately, this slows down<br />

computation perceivably (from our experience).<br />

Fortunately, this can be avoided in many cases in a rather simple way by introducing a temporary<br />

in a function that resides in register(s) if the type allows. We can rely on the compiler to<br />

decide reasonably the location of temporaries.<br />

This implementation requires two classes: one <strong>for</strong> the outer and one <strong>for</strong> the inner loop. Let us<br />

start with the outer loop:<br />

1 template <br />

2 struct fsize mat vec mult reg<br />

3 {<br />

4 template <br />

5 void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

6 {<br />

7 fsize mat vec mult reg()(A, v in, v out);<br />

8<br />

9 typename VecOut::value type tmp;<br />

10 fsize mat vec mult aux()(A, v in, tmp);<br />

11 v out[Rows]= tmp;<br />

12 }<br />

13 };<br />

We assume that fsize mat vec mult aux is defined or declared be<strong>for</strong>e this class. The first statement<br />

in line 7 calls the computations on the preceeding rows. A temporary is defined in line 9 with<br />

the hope that it will be located in a register. Then we call the computation within this row. The<br />

temporary is passed as reference to an inline function so that the summation will be per<strong>for</strong>med<br />

in a register. In line 10 we write the result back to v out. This still causes the invalidation signal<br />

on the bus but only once <strong>for</strong> each entry.<br />

The functor must be specialized <strong>for</strong> row 0 to avoid infinite loops:<br />

template <br />

struct fsize mat vec mult reg<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />

{<br />

typename VecOut::value type tmp;<br />

fsize mat vec mult aux()(A, v in, tmp);<br />

v out[0]= tmp;<br />

}<br />

};<br />

Within each row we iterate over the columns and increment the temporary (in the register<br />

hopefully):<br />

template <br />

struct fsize mat vec mult aux


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 165<br />

{<br />

};<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />

{<br />

fsize mat vec mult aux()(A, v in, tmp);<br />

tmp+= A[Rows][Cols] ∗ v in[Cols];<br />

}<br />

To terminate the computation in the column we write a specialization.<br />

template <br />

struct fsize mat vec mult aux<br />

{<br />

template <br />

void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />

{<br />

tmp= A[Rows][0] ∗ v in[0];<br />

}<br />

};<br />

In this section we showed different ways to optimize a two-dimensional loop (with fixed sizes).<br />

There are certainely more possibilities: <strong>for</strong> instance, we could try to implement it in a way that<br />

uses registers but with the same concurrency as in the second-last implementation. Another<br />

<strong>for</strong>m of optimization could be to agglomerate the write-backs so that multiple invalidation<br />

signals are sent at a time and maybe behave less interruptive.<br />

5.4.3 Dynamic Unrolling – Warm up<br />

⇒ vector unroll example.cpp<br />

As important as the fixed-size optimization is, acceleration <strong>for</strong> dynamically sized containers is<br />

needed even more. We start here with a simple example and some observations. We will reuse<br />

the vector class from Listing 4.1. To show the implementation more clearly, we write the code<br />

without operators and expression templates. Our test case will compute<br />

u = 3v + w<br />

<strong>for</strong> three short vectors of size 1000. The wall clock time will be measured with boost::timer. 19<br />

The vectors v and w will be initialized and to have the data ready to use (i.e. the vectors are<br />

definitively in cache 20 ) we run few additional operations without timing:<br />

#include <br />

#include <br />

// ...<br />

int main()<br />

{<br />

unsigned s= 1000;<br />

if (argc > 1) s= atoi(argv[1]); // read (potentially) from command line<br />

19 See http://www.boost.org/doc/libs/1_43_0/libs/timer/timer.htm<br />

20 TODO: shouldn’t the initialization make this sure? Do we have a better explanation? Reference to benchmark<br />

literature? Do we really need a bullet proof justification here?


166 CHAPTER 5. META-PROGRAMMING<br />

}<br />

vector u(s), v(s), w(s);<br />

vector u(s), v(s), w(s);<br />

<strong>for</strong> (unsigned i= 0; i < s; i++) {<br />

v[i]= float(i);<br />

w[i]= float(2∗i + 15);<br />

}<br />

<strong>for</strong> (unsigned j= 0; j < 3; j++)<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

const unsigned rep= 200000;<br />

boost::timer native;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

std::cout ≪ ”Compute time native loop is ” ≪ 1000000.0 ∗ native.elapsed() / double(rep) ≪ ” µs.\n”;<br />

return 0 ;<br />

Alternatively we compute this with an unrolling of 4 cycles:<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

<strong>for</strong> (unsigned i= 0; i < s; i+= 4) {<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

u[i+1]= 3.0f ∗ v[i+1] + w[i+1];<br />

u[i+2]= 3.0f ∗ v[i+2] + w[i+2];<br />

u[i+3]= 3.0f ∗ v[i+3] + w[i+3];<br />

}<br />

This code will obviously only work if the vector size is divisible by 4. To avoid errors we can<br />

add an assertion on the vector size but this is not really satisfying. Instead, we generalize this<br />

implementation to arbitrary vector sizes:<br />

boost::timer unrolled;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++) {<br />

unsigned sb= s / 4 ∗ 4;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= 4) {<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

u[i+1]= 3.0f ∗ v[i+1] + w[i+1];<br />

u[i+2]= 3.0f ∗ v[i+2] + w[i+2];<br />

u[i+3]= 3.0f ∗ v[i+3] + w[i+3];<br />

}<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

}<br />

std::cout ≪ ”Compute time unrolled loop is ” ≪ 1000000.0 ∗ unrolled.elapsed() / double(rep) ≪ ” µs.\n”;<br />

std::cout ≪ ”u is ” ≪ u ≪ ’\n’;<br />

Listing 5.4: Unrolled computation of u = 3v + w<br />

The little program was compiled with g++ 4.1.2 with the flags -O3 -ffast-math -DNDEBUG


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 167<br />

and resulted on the test computer 21 in:<br />

Compute time native loop is 2.64 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Alternatively to our hand-coded unrolling we can use the compiler flag -funroll-loops. This<br />

results in the following execution time on the test machine:<br />

Compute time native loop is 2.51 µs.<br />

Compute time unrolled loop is 1.22 µs.<br />

The original loop became slightly faster while our optimized version slowed down a bit. An<br />

entirely different behavior we see if we replace the size s by a constant:<br />

const unsigned s= 1000;<br />

In this case the compiler knows the size of the loops and it might be easier to trans<strong>for</strong>m the<br />

loop or to determine that a trans<strong>for</strong>mation is beneficial.<br />

Compute time native loop is 1.6 µs.<br />

Compute time unrolled loop is 1.55 µs.<br />

Now the native loop is clearly accelerated by the compiler optimization. Why our hand-written<br />

unrolling is slower than be<strong>for</strong>e is not clear. Apparently, the manual and the automatic optimization<br />

got into conflict or the latter overrode the first.<br />

Discussion 5.2 Software tuning and benchmarking is an art of its own with complex compiler<br />

optimization. The tiniest modification in the source can change the run-time behavior of an<br />

examined computation. In the example it should not have mattered whether the size is known<br />

at compile time or not. But it did. Especially when the code is compiled without -DNDEBUG<br />

the compiler might omit the index check in some situations and per<strong>for</strong>m it on others. It is<br />

also important to print out computed values (and filter them out with grep or such) because<br />

the compiler might omit an entire computation when it is obvious that the result is not needed.<br />

Such optimization happen in particular if the results are intrinsic types while computations on<br />

user-defined types are usually subject to such omissions (but one should not count on it).<br />

The goal of this section is not to determine why which code is how much faster than another<br />

one. Besides, each compiler has a different sensitivity on sizes and flags so that we would need a<br />

different line of argumentation and calculation <strong>for</strong> each of them. The only conclusion we like to<br />

draw from these observations is that despite all the progress in compiler technology, we cannot<br />

rely blindly on it and still need hand-tuned implementations and careful benchmarking when<br />

maximal per<strong>for</strong>mance is needed. On the other hand, program snippets as in the last listing shall<br />

not appear in scientific applications <strong>for</strong> the sake of readability, maintainability, portability, . . .<br />

Another question we have not raised so far is: What is the optimal block size <strong>for</strong> the<br />

unrolling?<br />

• Does it depend on the expression?<br />

• Does it depend on the types of the arguments?<br />

• Does it depend on the computer architecture?<br />

21 Phenom II X2 545 3.0 GHz, 3600 MHz PSB, 7MB total cache, Sockel AM2,2x 2GB DDR2-800


168 CHAPTER 5. META-PROGRAMMING<br />

The answer is yes. All of them. The main reason (but not the only one) is that different<br />

processors have different numbers of registers. How many registers are needed in one iteration<br />

depends on the expression and on the types (a complex value needs more registers than a float).<br />

In the following section we will address both issues: how to encapsulate the trans<strong>for</strong>mation<br />

so that it does not show up in the application and how we can change the block size without<br />

rewritten the loop.<br />

5.4.4 Unrolling Vector Expressions<br />

For easier understanding, we discuss the abstraction in meta-tuning step by step. We start with<br />

the previous loop and implement a function <strong>for</strong> it. Say the function’s name is my axpy and it<br />

has a template argument <strong>for</strong> the block size so that we can write <strong>for</strong> instance<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

my axpy(u, v, w);<br />

This function shall contain a main loop in unrolled manner with customizable size and a clean-up<br />

loop at the end:<br />

template <br />

void my axpy(U& u, const V& v, const W& w)<br />

{<br />

assert(u.size() == v.size() && v.size() == w.size());<br />

unsigned s= u.size(), sb= s / BSize ∗ BSize;<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

my axpy ftor()(u, v, w, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

u[i]= 3.0f ∗ v[i] + w[i];<br />

As mentioned be<strong>for</strong>e, deduced template types, as the vector types in our case, must be defined<br />

at the end and the explicitly given arguments, in our case the block size, must be at the<br />

beginning of the template arguments. The implementation of the block statement in the first<br />

loop can be implemented similarly to the functor in Section 5.4.1. We deviate a bit from this<br />

implementation by using two template arguments where the <strong>for</strong>mer is increased until it is equal<br />

to the second. It appeared that this approach yielded faster binaries on gcc than using only<br />

one argument and counting it down to zero. 22 In addition, the two-argument version is more<br />

consistent with the multi-dimensional implementation in Section ??. As <strong>for</strong> fixed-size unrolling<br />

we need a recursive template definition. Within the operators, a single statement is per<strong>for</strong>med<br />

and the following statements are called:<br />

template <br />

struct my axpy ftor<br />

{<br />

template <br />

void operator()(U& u, const V& v, const W& w, unsigned i)<br />

{<br />

u[i+Offset]= 3.0f ∗ v[i+Offset] + w[i+Offset];<br />

22 TODO: exercise <strong>for</strong> it


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 169<br />

};<br />

}<br />

my axpy ftor()(u, v, w, i);<br />

The only difference to fixed-size unrolling is that the indices are relative to an argument —<br />

here i. The operator() is first called with Offset equal to 0, then with 1, 2, . . . Since each call is<br />

inlined the functor call results in one monolithic block of operations without loop control and<br />

function call. Thus, the call of my axpy ftor()(u, v, w, i) per<strong>for</strong>ms the same operations as<br />

one iteration of the first loop in Listing 5.4.<br />

Of course this compilation would end in an infinite loop if we <strong>for</strong>get to specialize it <strong>for</strong> Max:<br />

template <br />

struct my axpy ftor<br />

{<br />

template <br />

void operator()(U& u, const V& v, const W& w, unsigned i) {}<br />

};<br />

Per<strong>for</strong>ming the considered vector operation with different unrollings yields<br />

Compute time unrolled loop is 1.44 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Compute time unrolled loop is 1.15 µs.<br />

Compute time unrolled loop is 1.14 µs.<br />

Now we can call this operation <strong>for</strong> any block size we like. On the other hand, it is rather<br />

cumbersome to implement the according functions and functors <strong>for</strong> each vector expression.<br />

There<strong>for</strong>e, we combine this technique now with expression templates.<br />

5.4.5 Tuning an Expression Template<br />

⇒ vector unroll example2.cpp<br />

Let us recall Section 5.3.3. So far, we developed a vector class with expression templates <strong>for</strong><br />

vector sums. In the same manner we can implement the product of a scalar and a vector but<br />

we leave this as exercise and consider expressions with addition only, <strong>for</strong> example:<br />

u = v + v + w<br />

Now we frame this vector operation with a repeting loop and the time measure:<br />

boost::timer t;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

u= v + v + w;<br />

std::cout ≪ ”Compute time is ” ≪ 1000000.0 ∗ t.elapsed() / double(rep) ≪ ” µs.\n”;<br />

This results in:<br />

Compute time is 1.72 µs.<br />

To incorporate meta-tuning into expression templates we only need to modify the actual assignment<br />

because only here a loop is per<strong>for</strong>med. All the other operations (well so far we have


170 CHAPTER 5. META-PROGRAMMING<br />

only a sum but in theory it could be tons of them) only return objects with references. The<br />

loop operator= is split into the unrolled at the beginning and the one-by-one completion at the<br />

end:<br />

template <br />

class vector<br />

{<br />

template <br />

vector& operator=(const Src& that)<br />

{<br />

check size(size(that));<br />

unsigned s= my size, sb= s / 4 ∗ 4;<br />

};<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= 4)<br />

assign()(∗this, that, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

data[i]= that[i];<br />

return ∗this;<br />

The assign functor is realized analogous to my axpy ftor:<br />

template <br />

struct assign<br />

{<br />

template <br />

void operator()(U& u, const V& v, unsigned i)<br />

{<br />

u[i+Offset]= v[i+Offset];<br />

assign()(u, v, i);<br />

}<br />

};<br />

template <br />

struct assign<br />

{<br />

template <br />

void operator()(U& u, const V& v, unsigned i) {}<br />

};<br />

Computing the expression above we yield:<br />

Compute time is 1.37 µs.<br />

With this rather simple modification we now accelerated ALL vector expression templates.<br />

In comparison with the previous implementation we lost however the flexibility to costumize<br />

the loop unrolling. The functor assign has two arguments thus allowing <strong>for</strong> customization. The<br />

problem is the assignment operator. In principle we can define an explicit template argument<br />

there:<br />

template <br />

vector& operator=(const Src& that)<br />

{


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 171<br />

}<br />

check size(size(that));<br />

unsigned s= my size, sb= s / BSize ∗ BSize;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

assign()(∗this, that, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

data[i]= that[i];<br />

return ∗this;<br />

The drawback is that we cannot use the symbol ‘=’ naturally as infix operator but must write:<br />

u.operator=(v + v + w);<br />

This has in fact a certain geeky charm and one could also argue that people did (and still do)<br />

more painful things <strong>for</strong> per<strong>for</strong>mance. Nonetheless, it does not meet our ideals of intuitiveness<br />

and readability.<br />

Alternative notations are:<br />

or<br />

unroll(u= v + v + w);<br />

unroll(u)= v + v + w;<br />

Both version are implementable and provide comparable intuitiveness. The <strong>for</strong>mer expresses<br />

more correctly what we are doing while the latter is easier to implement and the structure of the<br />

computed expression remains better visibility. There<strong>for</strong>e we show the realization of the second<br />

<strong>for</strong>m.<br />

The function unroll is simple to implement: it just returns an object with a reference to the<br />

vector and a type in<strong>for</strong>mation <strong>for</strong> the unroll size:<br />

template <br />

unroll vector inline unroll(Vector& v)<br />

{<br />

return unroll vector(v);<br />

}<br />

The class unroll vector is not complicated either. It only needs to take a reference to the target<br />

vector and an assignment operator:<br />

template <br />

class unroll vector<br />

{<br />

public:<br />

unroll vector(V& ref) : ref(ref) {}<br />

template <br />

V& operator=(const Src& that)<br />

{<br />

assert(size(ref) == size(that));<br />

unsigned s= size(ref), sb= s / BSize ∗ BSize;


172 CHAPTER 5. META-PROGRAMMING<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

assign()(ref, that, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

ref[i]= that[i];<br />

return ref;<br />

}<br />

private:<br />

V& ref;<br />

};<br />

Evaluting the considered vector expressions <strong>for</strong> some block sizes yields:<br />

Compute time unroll(u)= v + v + w is 1.72 µs.<br />

Compute time unroll(u)= v + v + w is 1.52 µs.<br />

Compute time unroll(u)= v + v + w is 1.36 µs.<br />

Compute time unroll(u)= v + v + w is 1.37 µs.<br />

Compute time unroll(u)= v + v + w is 1.4 µs.<br />

This few benchmarks are consistent with the previous results, i.e. unroll is equal to the<br />

canocical implementation and unroll is as fast as the hard-wired unrolling.<br />

5.4.6 Tuning Reduction Operations<br />

Reducing on a Single Variable<br />

⇒ reduction unroll example.cpp<br />

In the preceding vector operations, the i th entry of each vector was handled independently of<br />

any other entry. For reduction operations, they are related by one or more temporary variables.<br />

And this temporary variable(s) can become a serious bottle neck.<br />

First, we test if a reduction operation, say the discrete L1 norm (also known as Manhattan<br />

norm) can be accelerated by the techniques from Section 5.4.4. We implement the one norm<br />

function in terms of a functor <strong>for</strong> the iteration block:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typename Vector::value type sum(0);<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(sum, v, i);<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

sum+= abs(v[i]);<br />

return sum;


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 173<br />

The functor is also implemented in the same manner as be<strong>for</strong>e:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

sum+= abs(v[i+Offset]);<br />

one norm ftor()(sum, v, i);<br />

}<br />

};<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i) {}<br />

};<br />

The measured run-time behavior behavior is:<br />

Compute time one_norm(v) is 7.42 µs.<br />

Compute time one_norm(v) is 3.64 µs.<br />

Compute time one_norm(v) is 1.9 µs.<br />

Compute time one_norm(v) is 1.25 µs.<br />

Compute time one_norm(v) is 1.03 µs.<br />

This is already a good improvement but maybe we can do better. 23<br />

Reducing on an Array<br />

⇒ reduction unroll array example.cpp<br />

When we look at the previous computation, we see that a different entry of v is used in each<br />

iteration. But every computation accesses the same temporary variable sum and this limits<br />

concurrency. To provide more concurrency, we can use multiple temporaries 24 in an array <strong>for</strong><br />

instance. The modified function reads then:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typename Vector::value type sum[BSize];<br />

<strong>for</strong> (unsigned i= 0; i < BSize; i++)<br />

sum[i]= 0;<br />

23 TODO: Test it with gcc 3.4 and MSVC. Speed up in table<br />

24 Strictly speaking, this is not true <strong>for</strong> every possible scalar type we can think of. The addition of the sum type<br />

must be a commutative monoid because we change the evaluation order. This holds of course <strong>for</strong> all intrinsic<br />

numeric types and certainly <strong>for</strong> almost all user-defined arithmetic types. But one is free to define an addition<br />

that is not commutative or not monoidal. In this case our trans<strong>for</strong>mation would be wrong. To deal with such<br />

exceptions we need semantic concepts which hopefully become part of C ++ in the next years.


174 CHAPTER 5. META-PROGRAMMING<br />

}<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(sum, v, i);<br />

<strong>for</strong> (unsigned i= 1; i < BSize; i++)<br />

sum[0]+= sum[i];<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

sum[0]+= abs(v[i]);<br />

return sum[0];<br />

The according functor must refer the right element in the sum array:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S∗ sum, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

sum[Offset]+= abs(v[i+Offset]);<br />

one norm ftor()(sum, v, i);<br />

}<br />

};<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S∗ sum, const V& v, unsigned i) {}<br />

};<br />

On the test machine this took:<br />

Compute time one_norm(v) is 7.33 µs.<br />

Compute time one_norm(v) is 5.15 µs.<br />

Compute time one_norm(v) is 2 µs.<br />

Compute time one_norm(v) is 1.4 µs.<br />

Compute time one_norm(v) is 1.16 µs.<br />

This is even a bit slower than the version with one variable. Maybe an array is more expensive<br />

to pass as argument even in an inline function. Let us try something else.<br />

Reducing on a Nested Class Object<br />

⇒ reduction unroll nesting example.cpp<br />

To avoid arrays, we can define a class <strong>for</strong> n temporary variables where n is a template argument.<br />

Such a class is designed more consistently with the recursive scheme of the functors:<br />

template <br />

struct multi tmp


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 175<br />

{<br />

};<br />

typedef multi tmp sub type;<br />

multi tmp(const Value& v) : value(v), sub(v) {}<br />

Value value;<br />

sub type sub;<br />

template <br />

struct multi tmp<br />

{<br />

multi tmp(const Value& v) {}<br />

};<br />

An object of this type can be recursively initialized so that we do not need a loop as <strong>for</strong> the<br />

array. A functor can operate on the value member and pass a reference to the sub member to<br />

its successor. This leads us to the implementation of our functor:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

sum.value+= abs(v[i+Offset]);<br />

one norm ftor()(sum.sub, v, i);<br />

}<br />

};<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& sum, const V& v, unsigned i) {}<br />

};<br />

The unrolled function that uses this functor reads:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typedef typename Vector::value type value type;<br />

multi tmp multi sum(0);<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(multi sum, v, i);<br />

value type sum= multi sum.sum();<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

sum+= abs(v[i]);


176 CHAPTER 5. META-PROGRAMMING<br />

}<br />

return sum;<br />

There is one piece still missing. We need to reduce the partial sums in multi sum. Un<strong>for</strong>tunately<br />

we cannot write a loop over the members of multi sum. So, we need a recursive function that<br />

dives down into multi sum. This would be a bit cumbersome as free function, especially as we<br />

try to avoid partial specialization of template. As a member function, it is much easier and the<br />

specialization happens more safely on the class level:<br />

template <br />

struct multi tmp<br />

{<br />

Value sum() const { return value + sub.sum(); }<br />

};<br />

template <br />

struct multi tmp<br />

{<br />

Value sum() const { return 0; }<br />

};<br />

Note that we started the summation with 0 not the innermost value member. We could do this<br />

but then we need another specialization <strong>for</strong> multi tmp. Likewise we can implement a<br />

general reduction but we need as in std::accumulate an initial element:<br />

template <br />

struct multi tmp<br />

{<br />

template <br />

Value reduce(Op op, const Value& init) const { return op(value, sub.reduce(op, init)); }<br />

};<br />

template <br />

struct multi tmp<br />

{<br />

template <br />

Value reduce(Op, const Value& init) const { return init; }<br />

};<br />

The compute time of this version is:<br />

Compute time one_norm(v) is 7.47 µs.<br />

Compute time one_norm(v) is 1.14 µs.<br />

Compute time one_norm(v) is 0.71 µs.<br />

Compute time one_norm(v) is 0.75 µs.<br />

Compute time one_norm(v) is 1.01 µs.<br />

Pushing Temporaries into Registers<br />

⇒ reduction unroll registers example.cpp


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 177<br />

Earlier experiments with older compilers (gcc 3.4) 25 exposed a serious overhead <strong>for</strong> using arrays<br />

or nested classes; it was finally even slower then using one single variable. The reason was<br />

probably that the compiler could not use registers <strong>for</strong> these types. 26<br />

The most likely way to store temporaries in registers is to declare them as separate variables:<br />

inline one norm(const Vector& v)<br />

{<br />

typename Vector::value type s0(0), s1(0), s2(0), ...<br />

}<br />

As one can see, the problem is how many one declares. The number cannot depend on the<br />

template argument but must be fix <strong>for</strong> all sizes (unless one writes a different implementation<br />

<strong>for</strong> each number and undermines the expressiveness of templates). Thus, we have to fix a certain<br />

number of variables — say 8. Then, we cannot unroll it more than eight times.<br />

The next issue we run into is the number of function arguments. When we call the iteration<br />

block we pass all variables (registers):<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(s0, s1, s2, s3, s4, s5, s6, s7, v, i);<br />

The first calculation in such a block is per<strong>for</strong>med on s0 and s1–s2 are only passed to the functors<br />

<strong>for</strong> the following computations. After this, the second computation must accumulate on the<br />

second function argument, the third calculation on the third argument, . . . This is un<strong>for</strong>tunately<br />

not implementable with templates (only with very ugly and highly error-prone source code<br />

manipulations by macros).<br />

Alternatively, each computation is per<strong>for</strong>med on its first function argument and subsequent<br />

functors are called with omitted first argument:<br />

one norm ftor()(s1, s2, s3, s4, s5, s6, s7, v, i);<br />

one norm ftor()(s2, s3, s4, s5, s6, s7, v, i);<br />

one norm ftor()(s3, s4, s5, s6, s7, v, i);<br />

This is neither realizable with templates.<br />

The solution is to rotate the references to registers:<br />

one norm ftor()(s1, s2, s3, s4, s5, s6, s7, s0, v, i);<br />

one norm ftor()(s2, s3, s4, s5, s6, s7, s0, s1, v, i);<br />

one norm ftor()(s3, s4, s5, s6, s7, s0, s1, s2, v, i);<br />

This rotation is achieved by the following functor implementation:<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& s0, S& s1, S& s2, S& s3, S& s4, S& s5, S& s6, S& s7, const V& v, unsigned i)<br />

{<br />

using std::abs;<br />

s0+= abs(v[i+Offset]);<br />

one norm ftor()(s1, s2, s3, s4, s5, s6, s7, s0, v, i);<br />

25 TODO: Show!!!<br />

26 TODO: which raises the question why they can do it today


178 CHAPTER 5. META-PROGRAMMING<br />

};<br />

}<br />

template <br />

struct one norm ftor<br />

{<br />

template <br />

void operator()(S& s0, S& s1, S& s2, S& s3, S& s4, S& s5, S& s6, S& s7, const V& v, unsigned i) {}<br />

};<br />

The according one norm function based on this functor is straight<strong>for</strong>ward:<br />

template <br />

typename Vector::value type<br />

inline one norm(const Vector& v)<br />

{<br />

using std::abs;<br />

typename Vector::value type s0(0), s1(0), s2(0), s3(0), s4(0), s5(0), s6(0), s7(0);<br />

unsigned s= size(v), sb= s / BSize ∗ BSize;<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />

one norm ftor()(s0, s1, s2, s3, s4, s5, s6, s7, v, i);<br />

s0+= s1 + s2 + s3 + s4 + s5 + s6 + s7;<br />

<strong>for</strong> (unsigned i= sb; i < s; i++)<br />

s0+= abs(v[i]);<br />

return s0;<br />

A slight disadvantage is that all registers must be accumulated after the first iteration no matter<br />

how small BSize is and how short the vector. A great advantage of the rotation is that BSize<br />

is not limited to the number of temporary variables in such accumulations. If BSize is larger<br />

then some or all variables are used multiple times without corrupting the result. The number<br />

of temporaries is nonetheless a limiting factor <strong>for</strong> the concurrency.<br />

The execution of this implementation durates on the test machine:<br />

Compute time one_norm(v) is 6.77 µs.<br />

Compute time one_norm(v) is 1.13 µs.<br />

Compute time one_norm(v) is 0.71 µs.<br />

Compute time one_norm(v) is 0.75 µs.<br />

Compute time one_norm(v) is 1.07 µs.<br />

This is comparable with the nested class (in this environment).<br />

Résumé on Reduction Tuning<br />

The goal of this section was not to determine the ultimately tuned reduction implementation<br />

<strong>for</strong> superscalar processors. 27 The main ambition of this section, in fact of the whole book, is to<br />

demonstrate the diversity of implementation opportunities. With the enormous expressiveness<br />

27 In the presence of the new GPU cards with hundreds of cores and millions of threads, the fight <strong>for</strong> this little<br />

concurrency is not so impressive. Nonetheless, we will still need per<strong>for</strong>mance tuning on single-core and “few-core”


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 179<br />

of C ++ one can use (or abuse) the compiler to generate the most efficient version without<br />

rewriting the program sources, as one would need in C or Fortran. The power of internal<br />

code generation with the C ++ compiler only makes external code generation as in ATLAS 28<br />

unnecessary. In ATLAS, functions are written in a domain specific language and C programs 29 in<br />

slight variations are generated with a tool and compared regarding per<strong>for</strong>mance. The techniques<br />

presented here empower us to generate binaries equivalent to those variations by just using a<br />

C ++ compiler. Thus, we can tune our programs by changing template arguments or constants<br />

(that might be set plat<strong>for</strong>m-dependently).<br />

5.4.7 Tuning Nested Loops<br />

⇒ matrix unroll example.cpp<br />

The most used (and abused) example in per<strong>for</strong>mance discussions is dense matrix multiplication.<br />

We do not claim to compete with hand-tuned assembler codes but we show the power of metaprogramming<br />

to generate code variations from a single implementation. As starting point we<br />

use a templatized implementation of matrix class from Section 3.7.4.<br />

We begin our implementation with a simple test case:<br />

int main()<br />

{<br />

const unsigned s= 4; // s= 4 <strong>for</strong> testing and 128 <strong>for</strong> timing<br />

matrix A(s, s), B(s, s), C(s, s);<br />

}<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

<strong>for</strong> (unsigned j= 0; j < s; j++) {<br />

A(i, j)= 100.0 ∗ i + j;<br />

B(i, j)= 200.0 ∗ i + j;<br />

}<br />

mult(A, B, C);<br />

std::cout ≪ ”C is ” ≪ C ≪ ’\n’;<br />

A matrix multiplication is easily implemented with three nested loops. One of the 6 possible<br />

nestings is a dot-product-like calculation of each entry from C:<br />

cik = Ai · B k<br />

where Ai is the i th row of A and Bk the k th column of B. We use a temporary in the innermost<br />

loop to decrease the cache-invalidation overhead of writing to C’s elements in each operation:<br />

template <br />

void inline mult(const Matrix& A, const Matrix& B, Matrix& C)<br />

{<br />

assert(A.num rows() == B.num rows()); // ...<br />

machines at least <strong>for</strong> some years since not everybody has GPU card <strong>for</strong> numerics and not every algorithm is<br />

already successfully ported (e.g. incomplete LU on arbitrary sparse matrices). By the time of this writing their<br />

is not even support <strong>for</strong> std::complex.<br />

28 http://math-atlas.source<strong>for</strong>ge.net/<br />

29 In some cases the C programs contain assembler snippets <strong>for</strong> a given plat<strong>for</strong>m in order to achieve per<strong>for</strong>mance<br />

close to peak.


180 CHAPTER 5. META-PROGRAMMING<br />

}<br />

typedef typename Matrix::value type value type;<br />

unsigned s= A.num rows();<br />

<strong>for</strong> (unsigned i= 0; i < s; i++)<br />

<strong>for</strong> (unsigned k= 0; k < s; k++) {<br />

value type tmp(0);<br />

<strong>for</strong> (unsigned j= 0; j < s; j++)<br />

tmp+= A(i, j) ∗ B(j, k);<br />

C(i, k)= tmp;<br />

}<br />

For this implementation, we write a benchmark function:<br />

template <br />

void bench(const Matrix& A, const Matrix& B, Matrix& C, const unsigned rep)<br />

{<br />

boost::timer t1;<br />

<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />

mult(A, B, C);<br />

double t= t1.elapsed() / double(rep);<br />

unsigned s= A.num rows();<br />

}<br />

std::cout ≪ ”Compute time mult(A, B, C) is ”<br />

≪ 1000000.0 ∗ t ≪ ” µs. This are ”<br />

≪ s ∗ s ∗ (2∗s − 1) / t / 1000000.0 ≪ ” MFlops.\n”;<br />

The run time and per<strong>for</strong>mance of our canonical implementation (with 128 × 128 matrices) is:<br />

Compute time mult(A, B, C) is 5290 µs. This are 789.777 MFlops.<br />

This implementation is our reference regarding per<strong>for</strong>mance and results.<br />

For the development of the unrolled implementation we go back to 4 × 4 matrices. In contrast<br />

to Section 5.4.6 we do not unroll a single reduction but per<strong>for</strong>m multiple reductions in parallel.<br />

That means <strong>for</strong> the three loops to unroll the two outer loops and to replace the body in the<br />

inner loop by multiple operations. The latter we achieve as usual with a functor.<br />

As in the canonical implementation, the reduction shall not be per<strong>for</strong>med in elements of C<br />

but in temporaries. For this purpose we use the class multi tmp from § 5.4.6. For the sake of<br />

simplicity we limit ourselves to matrix sizes that are multiples of the unroll parameters. 30 An<br />

unrolled matrix multiplication is shown in the following code:<br />

template <br />

void inline mult(const Matrix& A, const Matrix& B, Matrix& C)<br />

{<br />

assert(A.num rows() == B.num rows()); // ...<br />

assert(A.num rows() % Size0 == 0); // we omitted cleanup here<br />

assert(A.num cols() % Size1 == 0); // we omitted cleanup here<br />

typedef typename Matrix::value type value type;<br />

unsigned s= A.num rows();<br />

30 A full implementation <strong>for</strong> arbitrary matrix sizes is realized in MTL4.


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 181<br />

}<br />

mult block block;<br />

<strong>for</strong> (unsigned i= 0; i < s; i+= Size0)<br />

<strong>for</strong> (unsigned k= 0; k < s; k+= Size1) {<br />

multi tmp tmp(value type(0));<br />

<strong>for</strong> (unsigned j= 0; j < s; j++)<br />

block(tmp, A, B, i, j, k);<br />

block.update(tmp, C, i, k);<br />

}<br />

We still owe the reader the implementation of the functor mult block. The techniques are the<br />

same as in vector operations but we have to deal with more indices and their respective limits:<br />

template <br />

struct mult block<br />

{<br />

typedef mult block next;<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Index1 ≪ ”]\n”;<br />

tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />

next()(tmp.sub, A, B, i, j, k);<br />

}<br />

};<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />

C(i + Index0, k + Index1)= tmp.value;<br />

next().update(tmp.sub, C, i, k);<br />

}<br />

template <br />

struct mult block<br />

{<br />

typedef mult block next;<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Max1 ≪ ”]\n”;<br />

tmp.value+= A(i + Index0, j) ∗ B(j, k + Max1);<br />

next()(tmp.sub, A, B, i, j, k);<br />

}<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;


182 CHAPTER 5. META-PROGRAMMING<br />

};<br />

}<br />

C(i + Index0, k + Max1)= tmp.value;<br />

next().update(tmp.sub, C, i, k);<br />

template <br />

struct mult block<br />

{<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Max0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Max1 ≪ ”]\n”;<br />

tmp.value+= A(i + Max0, j) ∗ B(j, k + Max1);<br />

}<br />

};<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Max0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />

C(i + Max0, k + Max1)= tmp.value;<br />

}<br />

In order to verify that all operations are per<strong>for</strong>med, we log them completely but look here only<br />

at tmp.4 and tmp.3:<br />

tmp.4+= A[1][0] * B[0][0]<br />

tmp.3+= A[1][0] * B[0][1]<br />

tmp.4+= A[1][1] * B[1][0]<br />

tmp.3+= A[1][1] * B[1][1]<br />

tmp.4+= A[1][2] * B[2][0]<br />

tmp.3+= A[1][2] * B[2][1]<br />

tmp.4+= A[1][3] * B[3][0]<br />

tmp.3+= A[1][3] * B[3][1]<br />

C[1][0]= tmp.4<br />

C[1][1]= tmp.3<br />

tmp.4+= A[3][0] * B[0][0]<br />

tmp.3+= A[3][0] * B[0][1]<br />

tmp.4+= A[3][1] * B[1][0]<br />

tmp.3+= A[3][1] * B[1][1]<br />

tmp.4+= A[3][2] * B[2][0]<br />

tmp.3+= A[3][2] * B[2][1]<br />

tmp.4+= A[3][3] * B[3][0]<br />

tmp.3+= A[3][3] * B[3][1]<br />

C[3][0]= tmp.4<br />

C[3][1]= tmp.3


5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 183<br />

This log shows that C[1][0] and C[1][1] are computed alternately so that it can be per<strong>for</strong>med in<br />

parallel on a super-scalar computer. One can also verify that<br />

cik =<br />

3�<br />

j=0<br />

aijbjk.<br />

Printing C will also show the same result as <strong>for</strong> the canonical matrix multiplication.<br />

The implementation above can be simplified. The first functor specialization is only different<br />

to the general functor in the way how the indices are incrememted. We can factor this out with<br />

an additional loop class:<br />

template <br />

struct loop2<br />

{<br />

static const unsigned next index0= Index0, next index1= Index1 + 1;<br />

};<br />

template <br />

struct loop2<br />

{<br />

static const unsigned next index0= Index0 + 1, next index1= 0;<br />

};<br />

Such a general class has a high potential of reuse. With this class we can fuse the funtor<br />

template and the first specialization:<br />

template <br />

struct mult block<br />

{<br />

typedef loop2 l;<br />

typedef mult block next;<br />

template <br />

void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />

{<br />

std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />

k + Index1 ≪ ”]\n”;<br />

tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />

next()(tmp.sub, A, B, i, j, k);<br />

}<br />

};<br />

template <br />

void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />

{<br />

std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />

C(i + Index0, k + Index1)= tmp.value;<br />

next().update(tmp.sub, C, i, k);<br />

}<br />

The other specialization remains unaltered.<br />

Last but not least we like to see impact of our not-so-simple matrix product. The benchmark<br />

yielded on our test machine:


184 CHAPTER 5. META-PROGRAMMING<br />

Compute time mult(A, B, C) is 5250 µs. This are 795.794 MFlops.<br />

Compute time mult(A, B, C) is 2770 µs. This are 1508.27 MFlops.<br />

Compute time mult(A, B, C) is 1990 µs. This are 2099.46 MFlops.<br />

Compute time mult(A, B, C) is 2230 µs. This are 1873.51 MFlops.<br />

Compute time mult(A, B, C) is 2130 µs. This are 1961.46 MFlops.<br />

Compute time mult(A, B, C) is 2930 µs. This are 1425.91 MFlops.<br />

Compute time mult(A, B, C) is 2350 µs. This are 1777.84 MFlops.<br />

Compute time mult(A, B, C) is 3420 µs. This are 1221.61 MFlops.<br />

Compute time mult(A, B, C) is 4010 µs. This are 1041.88 MFlops.<br />

Compute time mult(A, B, C) is 2870 µs. This are 1455.72 MFlops.<br />

Compute time mult(A, B, C) is 3230 µs. This are 1293.47 MFlops.<br />

Compute time mult(A, B, C) is 3060 µs. This are 1365.33 MFlops.<br />

Compute time mult(A, B, C) is 2780 µs. This are 1502.85 MFlops.<br />

One can see that mult has the same per<strong>for</strong>mance as the original implementation which<br />

in fact is per<strong>for</strong>ming the operations in exactly the same order (so far the compiler optimization<br />

does not change the order internally). We see also that the unrolled versions are all faster, up<br />

to a speed-up of 2.6.<br />

With double matrices the per<strong>for</strong>mance is lower in total:<br />

Compute time mult(A, B, C) is 10080 µs. This are 414.476 MFlops.<br />

Compute time mult(A, B, C) is 8700 µs. This are 480.221 MFlops.<br />

Compute time mult(A, B, C) is 7470 µs. This are 559.293 MFlops.<br />

Compute time mult(A, B, C) is 5910 µs. This are 706.924 MFlops.<br />

Compute time mult(A, B, C) is 3750 µs. This are 1114.11 MFlops.<br />

Compute time mult(A, B, C) is 5140 µs. This are 812.825 MFlops.<br />

Compute time mult(A, B, C) is 3420 µs. This are 1221.61 MFlops.<br />

Compute time mult(A, B, C) is 4590 µs. This are 910.222 MFlops.<br />

Compute time mult(A, B, C) is 4310 µs. This are 969.355 MFlops.<br />

Compute time mult(A, B, C) is 6280 µs. This are 665.274 MFlops.<br />

Compute time mult(A, B, C) is 5310 µs. This are 786.802 MFlops.<br />

Compute time mult(A, B, C) is 4290 µs. This are 973.874 MFlops.<br />

Compute time mult(A, B, C) is 3490 µs. This are 1197.11 MFlops.<br />

It shows that other parametrizations yield more acceleration and that the per<strong>for</strong>mance could<br />

almost be tripled.<br />

Which configuration is best and why is — as mentioned be<strong>for</strong>e — not topic of this script; we<br />

only show programming techniques. The reader is invited to try this program on his/her own<br />

computer. The technique in this section is intended <strong>for</strong> L1 cache usage. If matrices are larger,<br />

one should use more levels of blocking. A general-purpose methodology <strong>for</strong> locality on L2, L3,<br />

main memory, local disk, . . . is recursion. This avoids reimplementation <strong>for</strong> each cache size and<br />

per<strong>for</strong>ms even reasonably well in virtually memory, see <strong>for</strong> instance [?].


5.5. EXERCISES 185<br />

5.5 Exercises<br />

5.5.1 Vector class<br />

Revisit the vector example from §??.<br />

Make an expression <strong>for</strong> a scalar times a vector:<br />

class scalar times vector expressions {<br />

} ;<br />

that inherits from base vector. Use the inheritance mechanism to assign scalar times vector expressions<br />

into vector.<br />

5.5.2 Vector expression template<br />

Make a vector concept, which you call Vector. Make a vector class (you can use std::vector)<br />

that satisfies this concept. This vector class should have at least the following members:<br />

class my vector {<br />

public:<br />

typedef double value type ;<br />

public:<br />

my vector( int n ) ;<br />

// Copy Constructor from type itself<br />

my vector( my vector& ) ;<br />

// Constructor from generic vector<br />

template <br />

my vector( Vector& ) ;<br />

// Assignment operator<br />

my vector& operator=( my vector const& v ) ;<br />

// Assignment <strong>for</strong> generic Vector<br />

template <br />

my vector& operator=( Vector const& v ) ;<br />

value type& operator() ( int i ) ;<br />

public: // Vector concept<br />

int size() const ;<br />

value type operator() ( int i ) const ;<br />

} ;<br />

Make an expression <strong>for</strong> a scalar times a vector:<br />

template <br />

class scalar times vector expression{<br />

} ;


186 CHAPTER 5. META-PROGRAMMING<br />

template <br />

scalar times vector expressions operator∗( Scalar const& s, Vector const& v ) {<br />

return scalar times vector expressions( s, v ) ;<br />

}<br />

Put all classes and functions in the namespace athens. You can also make an expression template<br />

<strong>for</strong> the addition of two vectors.<br />

Write a small program, e.g.<br />

int main() {<br />

athens::my vector v( 5 ) ;<br />

... Fill in some values of v ...<br />

athens::my vector w( 5 ) ;<br />

w = 5.0 ∗ v ;<br />

w = 5.0 ∗ (7.0 ∗ v ) ;<br />

w = v + 7.0∗v ; // (If you have added the operator+)<br />

}<br />

Use the debugger to see what happens.


Inheritance<br />

Chapter 6<br />

C ++ is a multi-paradigm language and the paradigm that is most strongly associated with C ++<br />

is ‘Object-Oriented Programming’ (OOP). The authors feel nevertheless that it is not the most<br />

important paradigm <strong>for</strong> scientific programming because it is inferior to generic programming<br />

<strong>for</strong> two major reasons:<br />

• Flexibility and<br />

• Per<strong>for</strong>mance.<br />

However, the impact of these two disadvantages is negligible in some situations. The per<strong>for</strong>mance<br />

is only deteriorated when we use virtual functions (§ 6.1).<br />

OOP in combination with generic programming is a very powerful mechanism to provide a <strong>for</strong>m<br />

of reusability that neither of the paradigms can provide on it own (§ 6.3–§ 6.5).<br />

6.1 Basic Principles<br />

See section ?? from page ?? to page ??.<br />

6.2 Dynamic Selection by Sub-typing<br />

solver base class<br />

The way solvers are selected in AMDiS. The MTL4 solvers generic functions. AMDiS is only<br />

slightly generic but many decisions are made at run-time (by means of pointers and virtual<br />

functions). So, we needed a way to call the generic functions but decide at run time which one.<br />

The dynamic solver selection can be done with classical C features like:<br />

#include <br />

#include <br />

class matrix {};<br />

class vector {};<br />

void cg(const matrix& A, const vector& b, vector& x)<br />

187


188 CHAPTER 6. INHERITANCE<br />

{<br />

}<br />

std::cout ≪ ”CG\n”;<br />

void bicg(const matrix& A, const vector& b, vector& x)<br />

{<br />

std::cout ≪ ”BiCG\n”;<br />

}<br />

int main (int argc, char∗ argv[])<br />

{<br />

matrix A;<br />

vector b, x;<br />

}<br />

switch (std::atoi(argv[1])) {<br />

case 0: cg(A, b, x); break;<br />

case 1: bicg(A, b, x); break;<br />

}<br />

return 0 ;<br />

This works but it is not scalable with respect to source code complexity. If we call the solver<br />

with other vectors and matrices somewhere else we must copy the whole switch-case-block<br />

<strong>for</strong> each argument combination. This can avoided by encapsulating the block into a function<br />

and call this one with different arguments. More complicated is to different preconditioners<br />

(diagonal, ILU, IC, . . . ) that are also dynamically selected. Shall we copy a switch block <strong>for</strong><br />

the preconditioners into each case block of the solvers?<br />

An elegant solution is an abstract solver class and derived classes <strong>for</strong> the solvers:<br />

struct solver<br />

{<br />

virtual void operator()(const matrix& A, const vector& b, vector& x)= 0;<br />

virtual ∼solver() {}<br />

};<br />

// potentially templatize<br />

struct cg solver : solver<br />

{<br />

void operator()(const matrix& A, const vector& b, vector& x) { cg(A, b, x); }<br />

};<br />

struct bicg solver : solver<br />

{<br />

void operator()(const matrix& A, const vector& b, vector& x) { bicg(A, b, x); }<br />

};<br />

In the application we can define one or multiple pointers of type solver and assign them the<br />

desired solver:<br />

// Factory<br />

solver∗ my solver= 0;<br />

switch (std::atoi(argv[1])) {<br />

case 0: my solver= new cg solver; break;<br />

case 1: my solver= new bicg solver; break;


6.3. REMOVE REDUNDANCY WITH BASE CLASSES 189<br />

}<br />

This idea is discussed thouroughly in the design patterns book [?] as factory pattern. Once we<br />

have defined a pointer of such a abstract class (also called interface), we can call it directly:<br />

(∗my solver)(A, b, x);<br />

Without going into detail, we can have multiple factories and use the pointers together without<br />

the combinatorial explosion in the program sources:<br />

// Preconditioner factory<br />

precon∗ my precon= 0;<br />

switch (std::atoi(argv[2])) { ... }<br />

(∗my solver)(∗my precon, A, b, x);<br />

C ++ does not allow virtual template functions because this would make the compiler implementation<br />

very complicated to avoid infinite function pointer tables. However, template classes can<br />

have virtual functions. This enables generic programming with virtual functions by templatizing<br />

the entire class not single member functions.<br />

6.3 Remove Redundancy With Base Classes<br />

especially when no type infos involved<br />

6.4 Casting Up and Down and Elsewhere<br />

In C ++, there are four different cast operators:<br />

• static cast;<br />

• dynamic cast;<br />

• const cast; and<br />

• reinterpret cast.<br />

Its linguistic root C knew only one casting operator: ‘( type ) expr’. The trouble with this<br />

single operator is that it is not standardized or clearly defined which casting is per<strong>for</strong>med under<br />

which conditions. As a consequence, the behavior of the casting can change from compiler to<br />

compiler. C ++ still allows the this old-style casting but all C ++ experts agree on discouraging<br />

its use. Another quite important issue is that this notation is not easy to find in large code<br />

bases (there is no regular expression to filter out all C casts) what increases significantly the<br />

maintenance costs, see also discussion in [SA05, chapter 95]. In this section, we will show you<br />

the different cast operators and discuss the pros and cons of different casts in different contexts.


190 CHAPTER 6. INHERITANCE<br />

6.4.1 Casting Between Base and Derived Classes<br />

Casting Up<br />

⇒ up down cast example.cpp<br />

Casting up, i.e. from a derived to a base class, is always possible if there are no ambiguities and<br />

can be even per<strong>for</strong>med implicitly. Assume we have the following class structure: 1<br />

struct A<br />

{<br />

virtual void f(){}<br />

virtual ∼A(){}<br />

int ma;<br />

};<br />

struct B : A { float mb; };<br />

struct C : A {};<br />

struct D : B, C {};<br />

and the following unary functions:<br />

void f(A a) { /∗ ... ∗/ }<br />

void g(A& a) { /∗ ... ∗/ }<br />

void h(A∗ a) { /∗ ... ∗/ }<br />

An object of type B can be passed to all three funtions:<br />

int main (int argc, char∗ argv[])<br />

{<br />

B b;<br />

f(b);<br />

g(b);<br />

h(&b);<br />

}<br />

return 0 ;<br />

In all three cases the object b is implicitly converted to an object of type A. The call of function<br />

‘f’ is however a bit different: only b’s members within class A are copied into the function<br />

argument and the remainder — in our example the member ‘mb’ is not accessible in f by any<br />

means. The functions ‘g’ and ‘h’ refer to an object of type A by reference or pointer. If an<br />

object of a derived class is passed to one of those functions the other members are in principle<br />

still there but just hidden. One could still access them by down-casting the argument in the<br />

function. Be<strong>for</strong>e we down-cast we should ask ourselves the following questions:<br />

• How do we assure that the argument passed to function is really an object of the derived<br />

class? For instance with extra arguments or with run-time tests.<br />

• What can we do if the object cannot be down-casted?<br />

• Can we write directly a function <strong>for</strong> the derived class?<br />

• Why we do not overload the function <strong>for</strong> the base and the derived type? This is definitively<br />

a much cleaner design and always feasible.<br />

1 TODO: picture


6.4. CASTING UP AND DOWN AND ELSEWHERE 191<br />

Up-casting only fails if the base class is ambiguous. In the current example we cannot up-cast<br />

from D to A:<br />

D d;<br />

A ad(d); // error: ambiguous<br />

because the compiler does not know if we mean the base class A from B or from C. We can<br />

clarify this with an explicit intermediate up-cast:<br />

A ad(B(d));<br />

Or we can share A between B and C: 2<br />

struct B : virtual A { float mb; };<br />

struct C : virtual A {};<br />

Now the members of A exist only once in D which is probably the best solution <strong>for</strong> multiple<br />

inheritance in most cases because we safe memory and do not need to pay attention which<br />

replica of A is accessed.<br />

Casting Down<br />

There are situations where references or pointers are casted down, e.g. in the next section § 6.5.<br />

This can be per<strong>for</strong>med with static cast or dynamic cast. As the names suggest, static cast is<br />

statically type-checked during compile time whereas dynamic cast per<strong>for</strong>ms run-time tests (with<br />

only minimal compile-time tests). We still use our diamond-shaped class hierarchy A–D as case<br />

study. Now we introduce to pointers of type B∗ holding objects of types B and D:<br />

B ∗bbp= new B, ∗bdp= new D;<br />

When we cast these pointers down to D∗, dynamic cast verifies whether the referred object<br />

actually allows this cast. Since this in<strong>for</strong>mation is in general only known at run time, e.g.:<br />

B ∗bxp= argc > 1 ? new B : new D;<br />

dynamic cast must verify the referred object’s type with run-time in<strong>for</strong>mation (RTTI). Per<strong>for</strong>ming<br />

an incorrect cast leads to a null pointer:<br />

D∗ dbp= dynamic cast(bbp); // error: cannot downcast from B to D<br />

D∗ ddp= dynamic cast(bdp); // ok: bdp points to bn object of type D<br />

std::cout ≪ ”Dynamic downcast of bbp should fail and pointer should be 0, it is: ” ≪ dbp ≪ ’\n’;<br />

std::cout ≪ ”Dynamic downcast of bdp should succeed and pointer should not be 0, it is: ” ≪ ddp ≪<br />

’\n’;<br />

The programmer can check the zeroness of the pointer and eventually react to the failed downcast.<br />

Likewise, incorrect down-casts of references throw an exception of type std::bad cast can<br />

be handled in a try-catch block.<br />

In contrast to it, static cast only verifies that the target type is a derived class of the source<br />

type — respectively references or pointers thereof — or vice versa:<br />

2 TODO: picture


192 CHAPTER 6. INHERITANCE<br />

dbp= static cast(bbp); // erroneous downcast per<strong>for</strong>med<br />

ddp= static cast(bdp); // correct downcast but not checked by the system<br />

std::cout ≪ ”Erroneous downcast of bbp will not return 0, it is: ” ≪ dbp ≪ ’\n’;<br />

std::cout ≪ ”Correct downcast of bdp but not checked at run−time, it is: ” ≪ ddp ≪ ’\n’;<br />

Whether the referred object really allows <strong>for</strong> the downcast cannot be decided at compile time<br />

and lies in the responsibility of the programmer.<br />

Cross-casting<br />

An interesting feature of dynamic cast is casting across from B to C when the referred object’s<br />

type is a derived class of both types:<br />

C∗ cdp= dynamic cast(bdp); // cross−cast from B to C ok: bdp points to an object of type D<br />

std::cout ≪ ”Dynamic cross−cast of bdp should succeed and pointer should not be 0, it is: ” ≪ cdp ≪<br />

’\n’;<br />

Static cross-casting from B to C:<br />

cdp= static cast(bdp); // error: cross−cast from B to C does not compile<br />

is not possible because C is neither a base or derived class of B. It can be casted indirectly over<br />

D:<br />

cdp= static cast(static cast(bdp)); // error: cross−cast from B to C via D<br />

This again is in the responsibility of the programmer whether the addressed object can be really<br />

casted this way.<br />

Comparing Static and Dynamic Cast<br />

Dynamic casting is safer but slower then static casting due the run-time check of the referred<br />

object’s type. Static casting allows <strong>for</strong> casting up and down with the programmer’s responsibility<br />

that the referred objects are handled correctly. Dynamic casting is in some sense always<br />

up, namely from the referred object’s type to a super-type (including itself).<br />

Furthermore, dynamic casting can only be applied on ‘Polymorphic Types’ that are class that<br />

define or inherit a virtual function. The following summarizes the differences between the to<br />

<strong>for</strong>ms of casting:<br />

static cast dynamic cast<br />

Applicability all only polymorphic classes<br />

Cross-casting no yes<br />

Run-time check no yes<br />

Speed no run-time overhead overhead <strong>for</strong> checking<br />

Table 6.1: Static vs. dynamic cast


6.5. BARTON-NACKMAN TRICK 193<br />

6.4.2 Const Cast<br />

const cast adds or removes the attributes const and/or volatile. The key word volatile in<strong>for</strong>ms the<br />

compiler that a variable can be modified by other programs. It is there<strong>for</strong>e not hold or cached<br />

in registers and accessed each time memory. This feature is not used in this script. Adding<br />

an attribute is an implicit conversion in C ++. That is one can always assign an expression<br />

to an variable of the same type with extra attributes without the need <strong>for</strong> a cast. Removing<br />

an attribute requires a const cast and should be only done when unavoidable, e.g. to interface<br />

old-style software that is lacking appropriate const attributes.<br />

6.4.3 Reinterpretation Cast<br />

This is the most aggressive <strong>for</strong>m of casting and not used in this script. It takes an address or an<br />

object’s memory location and interprets the bits there as it was of the target type. One can <strong>for</strong><br />

instance change a single bit in a floating point number by casting it to a bit chain. It is more<br />

important <strong>for</strong> programming hardware drivers than complex flux solvers. Needless to say that<br />

reinterpret cast is one of the most efficient ways to undermine the portability of an application.<br />

6.5 Barton-Nackman Trick<br />

This section describes the ‘Curiously Recurring Template Pattern’ (CRTP). It was introduced<br />

by John Barton and Lee Nackman [?] and is there<strong>for</strong>e also referred to as the ‘Barton-<br />

Nackman Trick’.<br />

6.5.1 A Simple Example<br />

⇒ crtp simple example.cpp<br />

We will explain this with a simple example. Assume we have a class point with an equality<br />

operator:<br />

class point<br />

{<br />

public:<br />

point(int x, int y) : x(x), y(y) {}<br />

bool operator==(const point& that) const { return x == that.x && y == that.y; }<br />

private:<br />

int x, y;<br />

};<br />

We can program the unequality by using common sense or by applying de Morgan’s law:<br />

bool operator!=(const point& that) const { return x != that.x || y != that.y; }<br />

Or we can simplify our live and just negate the result of the equality:<br />

bool operator!=(const point& that) const { return !(∗this == that); }


194 CHAPTER 6. INHERITANCE<br />

Our compilers are so sophisticated, they certainly handle de Morgan’s law perfectly. Negating<br />

the equality operator is something we can do on every type that has an equality operator. We<br />

could copy-and-past this code snippet and just replace the type of the argument.<br />

Alternatively, we can write a class like this:<br />

template <br />

struct unequality<br />

{<br />

bool operator!=(const T& that) const { return !(static cast(∗this) == that); }<br />

};<br />

and derive from it:<br />

class point : public unequality { ... };<br />

This mutual dependency:<br />

• One class is derived from the other and<br />

• The latter takes the derived class’ type as template argument<br />

is somewhat confusing at the first view.<br />

Essential <strong>for</strong> this to work is that the code of a template class member is only generated when<br />

the class is instantiated and the function is actually called. At the time the template class<br />

‘unequality is parsed, the compiler checks only the correctness of the syntax.<br />

When we write<br />

int main (int argc, char∗ argv[])<br />

{<br />

point p1(3, 4), p2(3, 5);<br />

std::cout ≪ ”p1 != p2 is ” ≪ (p1 != p2 ? ”true” : ”false”) ≪ ’\n’;<br />

}<br />

return 0 ;<br />

fter the definition of unequality and point both types are completely known to the compiler.<br />

What happens when we call p1 != p2?<br />

1. The compiler searches <strong>for</strong> operator!= in class point → without success.<br />

2. The compiler looks <strong>for</strong> operator!= in the base class unequality → with success.<br />

3. The this pointer of unequality refers a component of point’s this pointer.<br />

4. Both types are completely known and we can statically down-cast the this pointer to point.<br />

5. Since we know that the this pointer of unequality is an up-casted this pointer to<br />

point 3 we are save to down-cast it to its original type.<br />

6. The equality operator <strong>for</strong> point is called. Its implementation is already known at this point<br />

because the code of unequality’s operator!= is not generated be<strong>for</strong>e the instantiation<br />

of point.<br />

3 Unless the first argument is really of type unequality. There are also ways to impede this, e.g. http:<br />

//en.wikipedia.org/wiki/Barton-Nackman_trick but we used this unary operator notation <strong>for</strong> the sake of<br />

simplicity.


6.5. BARTON-NACKMAN TRICK 195<br />

Likewise every class U with an equality operator can be derived from unequality. A collection<br />

of such CRTP templates <strong>for</strong> operator defaults is provided by Boost.Operators from Jeremy<br />

Siek and David Abrahams.<br />

Alternatively to the above implementation where the this pointer is dereferred and casted as<br />

reference, one can cast the pointer first and derefer it afterwards:<br />

template <br />

struct unequality<br />

{<br />

bool operator!=(const T& that) const { return !(∗static cast(this) == that); }<br />

};<br />

There is no difference, this is just a question of taste.<br />

6.5.2 A Reusable Access Operator<br />

⇒ matrix crtp example.cpp<br />

We still owe the reader the reusable implementation of the matrix bracket operator promised<br />

in Section 3.7.4. Back then we did not know enough language features.<br />

First of all we had no templates which are indispensable <strong>for</strong> a proxy. We will show you why.<br />

Say we have a matrix class as in § 3.7.4 and we just want to call the binary operator() from the<br />

unary operator[] via a proxy:<br />

class matrix; // Forward declaration<br />

class simple bracket proxy<br />

{<br />

public:<br />

simple bracket proxy(matrix& A, int r) : A(A), r(r) {}<br />

double& operator[](int c){ return A(r, c); }<br />

private:<br />

matrix& A;<br />

int r;<br />

};<br />

class matrix<br />

{<br />

// ...<br />

double& operator()(int r, int c) { ... }<br />

};<br />

simple bracket proxy operator[](int r)<br />

{<br />

return simple bracket proxy(∗this, r);<br />

}<br />

This does not compile because operator[] from simple bracket proxy calls operator() from matrix but<br />

this is not defined yet. The <strong>for</strong>ward declaration of matrix is not sufficient because we need the<br />

complete definition of matrix not only the assertion that the type exist. Vice versa if we define<br />

matrix first, we would miss the constructor of simple bracket proxy in the operator[] implementation.


196 CHAPTER 6. INHERITANCE<br />

Another disadvantage of the implementation above is that we would need another proxy <strong>for</strong> the<br />

constant access.<br />

This is an interesting aspect of templates. It does not only enable writing type-parametric software<br />

but can also help to break mutual dependencies thanks to its post-poned code generation.<br />

By templetizing the proxy the dependency is gone:<br />

template <br />

class bracket proxy<br />

{<br />

public:<br />

bracket proxy(Matrix& A, int r) : A(A), r(r) {}<br />

Result& operator[](int c){ return A(r, c); }<br />

private:<br />

Matrix& A;<br />

int r;<br />

};<br />

class matrix<br />

{<br />

// ...<br />

bracket proxy operator[](int r)<br />

{<br />

return bracket proxy(∗this, r);<br />

}<br />

};<br />

With this implementation we can now write A[i][j] and it is realized by the binary operator()<br />

however this is implemented. Such a bracket operator is useful in every matrix class and the<br />

implementation will be always the same.<br />

For this reason we like to have this implementation only once in our code base and reuse<br />

whereever appropriate. The only way to achieve this is with the CRTP paradigm:<br />

template <br />

class bracket proxy<br />

{<br />

public:<br />

bracket proxy(Matrix& A, int r) : A(A), r(r) {}<br />

Result& operator[](int c){ return A(r, c); }<br />

private:<br />

Matrix& A;<br />

int r;<br />

};<br />

template <br />

class crtp matrix<br />

{<br />

public:<br />

bracket proxy operator[](int r)<br />

{<br />

return bracket proxy(static cast(∗this), r);<br />

}


6.5. BARTON-NACKMAN TRICK 197<br />

};<br />

bracket proxy operator[](int r) const<br />

{<br />

return bracket proxy(static cast(∗this), r);<br />

}<br />

class matrix : public crtp matrix<br />

{<br />

// ...<br />

};<br />

Once we have such a CRTP class we can provide a bracket operator <strong>for</strong> every matrix class with a<br />

binary application operator. In a full-fledged linear algebra package one needs to pay attention<br />

which matrices return references and which are mutable but the approach is as described above.<br />

Several timings have shown that the indirection with the proxy did not create run-time overhead<br />

compared to the direct usage of the binary access operator. Apparently, the compilers optimized<br />

the creation of proxies away in the executables.


198 CHAPTER 6. INHERITANCE


Effective Programming: The<br />

Polymorphic Way<br />

Chapter 7<br />

Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove<br />

it.<br />

—Alan Perlis<br />

To remove complexity in scientific application development (but not only there), several programming<br />

techniques, methods, and application of paradigms have to be used accordingly. This<br />

not only depends on the ability to combine application-specific functionality with other librarycode<br />

from a variety of sources but also to restrict the amount of application-specific glue code.<br />

So libraries must remain open <strong>for</strong> extension but closed <strong>for</strong> modification, which can be attributed<br />

to a technique called polymorphic programming.<br />

The presented sections of this book introduced important mechanisms to successfully develop<br />

scientific applications such as <strong>C++</strong> basics, encapsulation, generic and meta programming as<br />

well as inheritance. An important part of scientific computing, matrix containers and matrix<br />

algorithms, has been presented to aid the topics so far. Effective programming is then possible if<br />

these mechanisms are not viewed as separate entities, but as different characteristics to achieve<br />

important goals, such as<br />

• uncompromising efficiency of simple basic operations (e.g., array subscripting should not<br />

incur the cost of a function call),<br />

• type-safety (e.g., an object from a container should be usable without explicit or implicit<br />

type conversion),<br />

• code reuse and extensibility,<br />

all with their respective advantages and disadvantages. This section reviews important techniques<br />

to achieve polymorphism from a more general point of view and highlights a basic but<br />

very important recurring principle <strong>for</strong> scientific computing: code reusability. This is not mainly<br />

because programmers are lazy people, but also because applications have to be tested. For<br />

the field of scientific applications this is particularlly important due to large parameter sets,<br />

changing boundary and initial conditions, as well as long run-times of simulation codes. Hence<br />

it should not be underestimated how much time and ef<strong>for</strong>t can be saved, if already tested code<br />

can be used as starting point or reference. So code reusability is not only about programming<br />

less, but also because of extend code quality. Most of the presented and discussed technique so<br />

199


200 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

far already deal with some kind of code reusability, but mostly in an implicit way. The following<br />

section overview polymorphic mechanisms in a more explicit way.<br />

As soon as code reusability is covered, an almost equal importance is placed on code extensibility,<br />

which should not be constrained by reused code. Scientific code development is always<br />

driven by trans<strong>for</strong>ming newly developed scientific methods into executable code. Various programming<br />

techniques with different scopes are there<strong>for</strong>e mandatory. If programming techniques<br />

are analyzed this way, it becomes understandable why some of the presented programming<br />

paradigms are not ideally suited to accomplish code reusability and extensibility together (e.g.,<br />

the object-oriented inheritance model).<br />

No technique, or more generally paradigm, will result in the ultimate and final solution, but<br />

each of the techniques results in tools to manage the complexity of a problem. It does not give<br />

you the ability to do so. A bad problem specification will lead to a bad solution independently<br />

from the technique or paradigm used <strong>for</strong> implementation.<br />

The usage of the Boost graph library (BGL) is an excellent example. There is great diversity<br />

of requirements in the field of graph algorithms and data structures. Even so, the per<strong>for</strong>mance<br />

claim <strong>for</strong> a library like this, is very high. Nevertheless, it was possible to implement all necessary<br />

functionality at a high per<strong>for</strong>mance level. More than this, the library can be extended greatly<br />

in many different ways. But on the other side, this library is not easy to use or extend without<br />

an understanding of the underlying techniques.<br />

And as a reminder of the main goal of this book is how to write good scientific software.


7.1. IMPERATIVE PROGRAMMING 201<br />

7.1 Imperative Programming<br />

Imperative programming may be viewed as the very bones on which all other abstractions<br />

depend. This programming paradigm uses a sequence of instructions which act on a state to<br />

realize algorithms. Thus it is always specified in detail what and how to execute next. The<br />

modification of the program state while convenient is also an issue, as with increasing size of<br />

the program, unintended modifications of the state becomes an increasing problem. In order<br />

to address this issue the imperative programming method has been refined to procedural and<br />

structured programming paradigms, which attempt to provide more control of the modifications<br />

of the program state. Hence it is based upon organized procedure calls. Procedure calls, also<br />

known as routines, subroutines, methods, or functions simply contain a series of computational<br />

steps to be carried out. Any given procedure might be called at any point during a program’s<br />

execution, including other procedures or itself. A function consists of:<br />

• The return type of the function: A function returns the value at a user specified position.<br />

C or C ++ which does not provide procedures explicitely has to use the keyword void <strong>for</strong><br />

indicating that a function does not return a value.<br />

• The name of the function: Therewith the function can be called. The name should be as<br />

expressive possible. Never underestimate names with good significance.<br />

• The parameter list of the function: The parameters of a function serve as placeholders<br />

<strong>for</strong> values that are later supplied by the user during each invocation of the function. A<br />

function can have an empty parameter list. The values of the parameter list can be given<br />

by value or by reference.<br />

• The body of the function: The body of a function implements the logic of the operation.<br />

Typically, it manipulates the named parameters of the function.<br />

The advantages of this paradigm are:<br />

• Few techniques<br />

• Rapid prototyping <strong>for</strong> easy problems<br />

• Functions can be put into a library<br />

• Fast compilation<br />

The disadvantages of this paradigm are:<br />

• Test ef<strong>for</strong>t is high<br />

• Source of error is manifold<br />

• Non-trivial problems cause high programming line ef<strong>for</strong>t<br />

• No user defined data types<br />

• No locality of data<br />

• Only very few and simple functions can be put into a library<br />

Even in the refined <strong>for</strong>m as procedural programming the incurred overhead can be limited to a<br />

bare minimum as the level of abstraction is relatively low. This was well suited <strong>for</strong> the situation


202 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

of scarce computing resources and a lack of mature and powerful tools. Under these circumstances<br />

the overall per<strong>for</strong>mance, in terms of execution speed or memory consumption is solely<br />

dependent on the skill and ingenuity of the programmer and has resulted in the almost mythical<br />

”hand optimized” code. However, to achive the desired specifications in such a fashion the<br />

clarity and readability, and therby the maintainability of the code were sacrificed. Furthermore,<br />

the low level of abstraction also hinders portability, as different architectures favour different<br />

assumptions to produce efficient execution. To address this effect, implementations were duplicated<br />

in order to optimize <strong>for</strong> different architectures and plat<strong>for</strong>ms, which of course makes a<br />

mockery of goals such as code reusability or even extensiblity.<br />

This paradigm and the derived techniques are then used differently in Section 2.11, where<br />

generic programming is used to offer an efficient approach <strong>for</strong> matrix operations.


7.2. GENERIC PROGRAMMING 203<br />

7.2 Generic Programming<br />

Generic programming may be viewed as having been developed in order to further facilitate<br />

the goals of code reusability and extensibility. From a general view the generic programming<br />

paradigm is about generalizing software components so that they can be directly reused easily<br />

in a wide variety of situations. While these are among the goals, which lead to the development<br />

of object oriented programming, it may vary quite profoundly in the realization. A major<br />

distinction from object oriented programming, which is focused on data structures and their<br />

states, is that it especially allows <strong>for</strong> a very abstract and orthogonal description of algorithms.<br />

To achieve this kind of generalization a separation of the basic tools of programming are important:<br />

algorithms, containers (data structures), and a glue between them (so called iterators<br />

or more generally traversors). As introduced as an important part <strong>for</strong> effective programming,<br />

the minimization of glue code, iterators and traversal objects operate as a minimal but fully<br />

abstract interface between data structures and algorithms.<br />

While the desired functionality is often implemented using static polymorphism mechanisms,<br />

such as templates in <strong>C++</strong>, generic programming should not be equated with simply programming<br />

with templates. However, when generic programming is realized using purely compile<br />

time facilities such as static polymorphism, not only is implementation ef<strong>for</strong>t reduced but the<br />

resulting run time per<strong>for</strong>mance optimized.<br />

In the following, the process of generic programming is given by elevating a procedural code to<br />

a generic one simultanioulsy fullfilling the important topics of effective programming (efficiency,<br />

type-safety, code reuse):<br />

• Algorithm: Generic algorithms are generic in two ways. First the data type which they<br />

are operating on is arbitrary and second, the type of container within which the elements<br />

are held is arbitrary.<br />

To get in touch with the generic approach, a generalization of the memcpy() function of the<br />

C standard library is discussed. An implementation of memcpy() might look somewhat<br />

like the following:<br />

void∗ memcpy(void∗ region1, const void∗ region2, size t n)<br />

{<br />

const char∗ first = (const char∗)region2;<br />

const char∗ last = ((const char∗)region2) + n;<br />

char∗ result = (char∗)region1;<br />

while (first != last)<br />

∗result++ = ∗first++;<br />

return result;<br />

}<br />

The memcpy() function is already generalized to some extent by the use of void∗ so that<br />

the function can be used to copy arrays of different kinds of data.<br />

Looking at the body of memcpy(), the function’s minimal requirements are that it needs to<br />

traverse the sequence using some sort of pointer, access the elements pointed to, copy the<br />

elements to the destination, and compare pointers to know when to stop. The memcpy()<br />

function can then be written in a generic manner:<br />

template <br />

OutputIterator


204 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

copy(InputIterator first, InputIterator last, OutputIterator result)<br />

{<br />

while (first != last)<br />

∗result++ = ∗first++;<br />

return result;<br />

}<br />

With this code the same functionality of the memcpy() from the C library is achieved.<br />

All kinds of data structure which offer an begin() and end() iterator can be used.<br />

• Container: An abstraction to all kinds of data structures which can store other data<br />

types.<br />

• Iterator: This is the glue between the containers and the algorithms. First, it separates<br />

the usage of data structures and algorithms. Second, it provides a concept hierarchy <strong>for</strong><br />

all kinds of traversal within data structures.<br />

This type of genericity is called parametric polymorphism (see Section 7.5.2). Section 4.9<br />

introduced the Standard Template Library (STL). The STL solves many standard data structure<br />

and algorithmic problems. The STL is (or should be) the first choice in all code development<br />

steps.<br />

• Algorithm/Data-Structure Interoperability: First, each algorithm is written in a datastructure<br />

neutral way, allowing a single template function to operate on many different<br />

classes of containers. The concept of an iterator is the key ingredient in this decoupling of<br />

algorithms and data-structures. The impact of this technique is a reduction of the STL’s<br />

code size from O(M*N) to O(M+N), where M is the number of algorithms and N is the<br />

number of containers. Considering a situation of 20 algorithms and 5 data-structures,<br />

this makes the difference between writing 100 functions versus only 25 functions! And the<br />

differences grows faster as the number of algorithms and data-structures increase.<br />

• Extension through Function Objects: The second way that the STL is generic is that its<br />

algorithms and containers are extensible. The user can adapt and customize the STL<br />

through the use of function objects. This flexibility is what makes STL such a great<br />

tool <strong>for</strong> solving real-world problems. Each programming problem brings its own set of<br />

entities and interactions that must be modeled. Function objects provide a mechanism<br />

<strong>for</strong> extending the STL to handle the specifics of each problem domain.<br />

• Element Type Parametrization: The third way that STL is generic is that its containers<br />

are parametrized on the element type.<br />

Most people think, that element type parametrization is the feature that makes the successful.<br />

This is perhaps the least interesting way in which STL is generic. The interoperability with<br />

iterators and the extensibility by function objects are more important parts of the STL. But<br />

the essence is the programming with concepts. The programmer can write the data structures<br />

and algorithms, or in other words the concept of these, as it should be. Next to these facts, the<br />

STL has proven that with the generic programming paradigm, high per<strong>for</strong>mance computing<br />

can be accomplished as well on several different computer architectures.<br />

The advantages of this paradigm are:<br />

• Programming with concepts<br />

• Great number of available libraries


7.2. GENERIC PROGRAMMING 205<br />

• Great expandibility<br />

• Great code reusability<br />

• Development of high per<strong>for</strong>mance code<br />

• All other paradigms can be used<br />

• Concepts can be proven by the compiler<br />

The disadvantages of this paradigm are:<br />

• Long compilation times: <strong>C++</strong> and the statical type checking requires a complete template<br />

instantiation and type checking.<br />

• Steep learning curve due to many complex techniques<br />

• Code bloat: Due to an incorrect usage of templates, the compiler can produce an excessive<br />

amount of code.


206 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.3 Programming with Objects<br />

Programming with objects may be viewed as an evolution from the structured imperative<br />

paradigm. It, on the one hand, tries to address the issue of code reuseability by providing a<br />

specific type of polymorphism, sub-typing. On the other hand it addresses the issue of unchecked<br />

modification of state by en<strong>for</strong>cing data encapsulation, thus en<strong>for</strong>cing changes through defined<br />

interfaces. Both of these notions are attached to an entity called an object. There<strong>for</strong>e an object<br />

serves as a self contained unit which interacts with the environment via messages. It thus<br />

accomplishes a decoupling of the internal implementation within the object and the interaction<br />

with the sourrounding environment. Thus en<strong>for</strong>cing (clean) interfaces, which is essential <strong>for</strong><br />

effective programming. The algorithms are expressed much more by the notion of what is to be<br />

done as an interaction and modification of objects, where the details of how, are encapsulated<br />

to a great extent within the objects themselves.<br />

Another benefit <strong>for</strong> programming with objects is, that these entities can be placed in libraries.<br />

This saves the ef<strong>for</strong>t of continually rewriting the same code <strong>for</strong> every new program. Furthermore,<br />

because objects can be made polymorphic, object libraries offer the programmer more flexibility<br />

and functionality than subroutine libraries (their counterparts in the procedural paradigm).<br />

Technically, object libraries are quite feasible, and the advantages of extensibility can be significant.<br />

However, the real challenge to making code reusable is not technical. Rather, it is<br />

identifying functionality that other people both understand and want. People who use procedural<br />

languages have been writing and using subroutine libraries <strong>for</strong> decades. These libraries are<br />

most successful when they per<strong>for</strong>m simple, clearly defined functions, such as calculating square<br />

roots or computing trigonometric functions. An object library can provide complex functions<br />

more easily than a subroutine library. However, unless those functions are clearly defined, well<br />

understood and generally useful, the library is unlikely to be used widely.<br />

To give an intuitive specification of the programming approach with objects, the following list<br />

specifies different points in the object world:<br />

• Identity is the quantization of data in discrete, distinguishable entities called objects<br />

• Classification is the grouping of objects with the same structure and behavior into classes<br />

• Polymorphism is the differentiation of behavior of the same operation on different classes<br />

• Inheritance is the sharing of structure and behavior among classes in a hierarchical relationship<br />

But one of the biggest problems of this programming approach is the interaction of objects with<br />

algorithms. The problem can easily be seen using the example of a simple sorting algorithm.<br />

Should the algorithm be placed into the object? Should an algorithm work on a class hierarchy<br />

with a common interface?<br />

The problem cannot be solved easily within this paradigm. A possible solution is some kind of<br />

polymorphism, which is explained in Section 7.5.2.<br />

7.3.1 Object-Based Programming<br />

In languages which support identity and classification the object-based paradigm can be used<br />

efficiently.


7.3. PROGRAMMING WITH OBJECTS 207<br />

The advantages of this paradigm are:<br />

• User defined data structures with data locality: programming can be more intuitive than<br />

compared to the procedural paradigm. and algorithms can be put into a library<br />

• Library code can be tested independently<br />

• Fast compilation, may be slower than procedural<br />

The disadvantages of this paradigm are:<br />

• Runtime per<strong>for</strong>mance<br />

• Library/code reusability<br />

7.3.2 Object-Oriented Programming<br />

To overcome the mentioned problem of code reusability, inheritance and polymorphism were<br />

introduced 1 . Inheritance is deployed with the aim of reducing implementation ef<strong>for</strong>ts by allowing<br />

refinement of already existing objects. By using inheritance and the connected sub typing also<br />

makes polymorphic programming available at run time:<br />

• Inheritance allows us to group classes into families of related types, allowing to share<br />

common operations and data. The reuse of already existing code can be accomplished.<br />

• Polymorphism allows us to implement these families as a unit rather than as individual<br />

classes, giving us greater flexibility in adding or removing any particular class. This point<br />

is explained in more detail in Section7.5.2, where this type of polymorphism is called<br />

subtyping polymorphism.<br />

• Dynamic binding is a third aspect of object-oriented programming.<br />

The actual member function resolution is delayed until run time. With the combination<br />

of inheritance and (subtyping) polymorphism a generic way of dealing with geometrical<br />

objects can be achieved.<br />

While the concepts of object orientation have proved to be invaluable to the development of<br />

modular software, its limits also became apparent as the goal of general reusability suffers from<br />

the stringent limitations of the required sub typing. Which may be viewed as a consequence<br />

that objects are not necessarily fit to accomodate the required abstractions such as in the case<br />

the algorithms themselves. Furthermore, the extension of existing codes is often only possible<br />

by intrusive means, such as changing the already existing implementations thus not leading to<br />

the high degree of reduction of ef<strong>for</strong>t as was hoped <strong>for</strong>.<br />

Compared to the run time environment or compiler required to realize the simple imperative<br />

programming paradigm, the object oriented paradigm requires more sophistication as it needs<br />

to be able to handle run time dispatches using virtual functions <strong>for</strong> instance. Additionally,<br />

seemingly simple statements may hide the true complexity encapsulated within the objects.<br />

Thus not only is the demand on the tools higher but the programmer also needs to be aware<br />

of the implications of the seemingly simple statements in order to achive desirable levels of<br />

per<strong>for</strong>mance.<br />

1 If a language supports all these features (identity, classification, polymorphism, and inheritance), then the<br />

object-oriented paradigm is supported in this language.


208 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

Behind the Dynamic Polymorphism in <strong>C++</strong><br />

A programmer must be aware of the fact, that inheritance is one of the<br />

strongest bonds between objects. In real-world examples, few problems can<br />

be modeled successfully by class-inheritance only. The coupling by inheritance<br />

should be used very carefully.<br />

The advantages of this paradigm are:<br />

• Library: Data-types can be enhanced greatly.<br />

• Abstract algorithms with polymorphism enable greater code reusability compared to the<br />

procedural paradigm.<br />

• Strong binding of data structures and methods: Logical connections can be modeled easily.<br />

Logical errors can be detected easily.<br />

The disadvantages of this paradigm are:<br />

• The binary-method problem (see Section 7.5.3)<br />

• Bad optimization capability of a compiler due to the subtyping polymorphism (see Section<br />

??)<br />

• Strong binding of data structures and methods: Only usable on object-oriented problems.


7.4. FUNCTIONAL PROGRAMMING 209<br />

7.4 Functional Programming<br />

In contrast to the procedural and object oriented paradigm, which explicitly <strong>for</strong>mulate algorithms<br />

and programs as a sequence of instructions which act on a program state, the functional<br />

paradigm uses mathematical functions <strong>for</strong> this task and <strong>for</strong>goes the use of a state altogether.<br />

There<strong>for</strong>e, there are no mutable variable and no side effects in purely functional programming.<br />

As such it is declarative in nature and relies on the language’s environment to produce an imperative<br />

representation which can be run on a physical machine. Among the greatest strengths of<br />

the functional paradigm is the availability of a strong theoretical framework of lambda calculus<br />

(cite()), which is explained in more detail in Section ref(), <strong>for</strong> the different implementations.<br />

Higher-order functions are an important concept of functional programming due to its usability<br />

in procedural languages. They were studied in lambda calculus theory well be<strong>for</strong>e the notion of<br />

functional programming existed and present the design of a number of functional programming<br />

languages, such as Scheme and Haskell.<br />

As modern procedural languages and their implementations have started to put greater emphasis<br />

on correctness, rather than raw speed, and the implementations of functional languages have<br />

begun to emphasize speed as well as correctness, the per<strong>for</strong>mance of functional languages and<br />

procedural languages has begun to converge. For programs which spend most of their time<br />

doing numerical computations, some functional languages (such as OCaml and Clean) can<br />

approach the per<strong>for</strong>mance of programs written in C speed, while <strong>for</strong> programs that handle<br />

large matrices and multidimensional databases, array functional languages (such as J and K)<br />

are usually faster than most non-optimized C programs. Functional languages have long been<br />

criticized as resource-hungry, both in terms of CPU resources and memory. This was mainly<br />

due to two things:<br />

• some early functional languages were implemented with no concern <strong>for</strong> efficiency<br />

• non-functional languages achieved speed at least in part by neglecting features such as<br />

checking of bounds or garbage collection which are viewed as essential parts of modern<br />

computing frameworks representing an overhead which was built-in to functional languages<br />

by default<br />

Since a purely functional description is free of side effects, it is a favourable choice <strong>for</strong> parallelization,<br />

as the description does not contain a state, which would require synchronization.<br />

Data related dependencies, however, must still be considered in order to ensure correct operation.<br />

Since the declarative style connected to the functional paradigm distances itself from<br />

the traditional imperative paradigm and its connection to states, input and output opearions<br />

pose a hurdle which is often addressed in a manner, which is not purely functional. As such<br />

functional interdependencies may be specified trivially, while the details how these are to be<br />

met remain opaque and as a choice to the specificc implementaiton.<br />

Last, we give an example of pure functional programming. We point out, that the next code<br />

snippet is presented in Haskel, not in C ++ syntax. The ”hello world” program in the functional<br />

programming paradigm: the factorial calculation<br />

fac :: Integer → Integer<br />

fac 0 = 1<br />

fac n | n>0 = n ∗ fac (n−1)


210 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.4.1 Lambda Calculus<br />

As was presented in Section ref(), is not very easy to reuse the STL standard function objects<br />

because the use is not very intuitive. Either a function object <strong>for</strong> each loop or binder has to be<br />

written. A binder (or binder object) is passed at construction time, to another function object<br />

which per<strong>for</strong>ms an action. The binder takes the function object as well as another binding value<br />

and makes a binary function unary by fixing the first parameter. However it is not obvious at<br />

first. An easy way to implemented such a functionality is to write it as it is:<br />

std::<strong>for</strong> each(vec.begin(), vec.end(), std::cout ≪ ∗vec iter);<br />

Of course this can not compile <strong>for</strong> several reasons. First the third argument is not a function<br />

object. Second the variable vec iter does not exist, nor does it know anything about the<br />

iterated container vec. Anyway, an expression like this is easy to write and less error prone<br />

compared to a binder object. To enable a program like this the following has to be accomplised:<br />

First the output-stream operator <br />

ArgumentT<br />

operator()(ArgumentT arg)<br />

{<br />

return arg1;<br />

}<br />

template< typename Argument1T, typename Argument2T><br />

ArgumentT<br />

operator()(Argument1T arg1, Argument2T arg2)<br />

{<br />

return arg1;<br />

}<br />

};<br />

So what does this object really do? It provides unary and binary bracket operators <strong>for</strong> one and<br />

two objects which return the argument passed. A function object is implemented next which<br />

stores an arbitrary stream type.<br />

template<br />

output function object<br />

{<br />

output function object (StreamType stream, FunctionObjectT func) : stream(stream), func(func) {}<br />

template < typename ArgumentT><br />

void operator()(ArgumentT arg)<br />

{


7.4. FUNCTIONAL PROGRAMMING 211<br />

}<br />

};<br />

stream ≪ func(arg);<br />

The only thing we have to do by now is to write an appropriate object generator around in<br />

order to persuade the C ++ syntax to accept something like the first line of code of this chapter.<br />

template<br />

output function object<br />

operator≪ (StreamType stream, FunctionObjectT func)<br />

{<br />

return output function object(stream, func);<br />

}<br />

By using these objects it is almost possible to offer a convenient way to write the already<br />

presented <strong>for</strong> each - code snippet. The remaining adaptation is to use a so-called unnamed<br />

object instead of the dereferenced iterator arg1.<br />

argument 1 function object arg1;<br />

std::<strong>for</strong> each(vec.begin(), vec.end(), std::cout ≪ arg1);<br />

By creating a collection of functor objects 2 a functional programming style can be mimiced.<br />

As can then be obvserved, polymorphism, which has to be especially provided in the imperative<br />

world, comes naturally to the functional paradigm as no specific assumptions about data types<br />

are required, only conceptual requirements need to be met.<br />

2 Instead of creating all of these functors again, the Boost Phoenix library or the <strong>C++</strong> TR1 lambda library<br />

can be used.


212 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.5 From Monomorphic to Polymorphic Behavior<br />

As presented in the last sections, each programming techique (or paradigm) offers different key<br />

benefits regarding effective programming. The imperative programming, the related procedural<br />

paradigm, and the object-related programming are simple and require that all calls to an object<br />

or function have exactly the same typing as the signature. So type checks and type constraints<br />

can be derived directly from the program text. But the effectivness (genericity and applicability)<br />

is greatly reduced <strong>for</strong> real world problems. This is in contrast to polymorphic code that freely<br />

operates only on abstract concept types. Polymorphic behavior enables the use of algorithms<br />

and data structures with several different types. The object-oriented, generic, and functional<br />

programming offer an additional mechanism which delays the actual type instantiation to a<br />

later evaluation point. Compared to the simple monomorphic way, the polymorphic mechanism<br />

is composed of a complex set of inference rules, because there is propagation of type in<strong>for</strong>mation<br />

between the object and function signature and the call signature in both directions.<br />

In object-oriented programming, libraries typically specify that the types supplied to the library<br />

must be derived from a common abstract base class, providing implementa- tions <strong>for</strong> a collection<br />

of pure virtual functions. The library knows only about the abstract base class interface,<br />

but can be extendedto work with new user types derived from the abstract interface. That is,<br />

variability is achieved through differing implementations of the virtual functions in the derived<br />

classes. This is how object-oriented programming supports modules that are closed <strong>for</strong> modification,<br />

yet remain open <strong>for</strong> extension. One strength of this paradigm is its support <strong>for</strong> varying<br />

the types supplied to a module at runtime. Composability of modules is limited, however,<br />

since independently produced modules generally do not agree on common abstract interfaces<br />

from which supplied types must inherit. The paradigm of generic programming, pioneered by<br />

Stepanov, Musser and their collaborators, is based on the principle of decomposing software into<br />

efficient components which make only minimal assumptions about other components, allowing<br />

maximum flexibility in composition. <strong>C++</strong> libraries developed following the generic programming<br />

paradigm typically rely on templates <strong>for</strong> the parametric and ad- hoc polymorphism they<br />

offer. Composability is enhanced as use of a library does not require inheriting from a particular<br />

abstract interface. Interfaces of library components are specified using concept collections of<br />

requirements analogous to, say, Haskell type classes. The key difference to abstract base classes<br />

and inheritance is that a type can be made to satisfy the constraints of a concept retroactively,<br />

independently of the definition of the type. Also, generic programming strives to make algorithms<br />

fully generic, while remaining as efficient as non-generic hand-written algorithms. Such<br />

an approach is not possible when the cost of any customization is a virtual function call.<br />

The strength of polymorphism is that the same piece of code can operate on different types,<br />

even types that are not known at the time the code was written. Such applicability is the<br />

cornerstone of polymorphism because it amplifies the usefulness and reusability of code. If<br />

the types of poymorphism are analysed in more detail, then two different main types can be<br />

observed:<br />

• Ad-hoc polymorphism<br />

• Universal polymorphism<br />

Only the second type, universal polymorphism, is actually important <strong>for</strong> effective programming,<br />

where the first type, ad-hoc polymorphism, is rather convenience.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 213<br />

7.5.1 Ad-hoc Polymorphism<br />

This kind of polymorphic behavior is expressed with ad-hoc, which should point out, that this<br />

kind of behavior is locality. Common to these two types (overloading and coercion) is the fact<br />

that the programmer has to specify exactly what types are to be usable with the polymorphic<br />

function.<br />

Overloading<br />

Is a simple convenient way of programming, to ease the programmer’s life.<br />

class my stack<br />

{<br />

virtual bool push(int ..) {}<br />

virtual bool push(double ..) {}<br />

virtual bool push(complex ..) {}<br />

virtual int pop() {..}<br />

virtual double pop() {..}<br />

// ....<br />

};<br />

Coercion<br />

Coercion is automatic type conversion. The following stack example can be used with all<br />

numerical data types, which can be converted to double:<br />

class my stack<br />

{<br />

virtual bool push(double ..) {}<br />

virtual double pop() {..}<br />

// ....<br />

};


214 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

7.5.2 Universal Polymorphism<br />

The universal in the title means, that the different kinds of expression <strong>for</strong> polymorphic behavior<br />

in this section are the most useful techniques to accomplish the desired behavior and should be<br />

used preferably:<br />

• Dynamic polymorphism (subtyping)<br />

• Static polymorphism (parametric)<br />

Subtyping Polymorphism<br />

In <strong>C++</strong> the object-oriented paradigm implements subtyping polymorphism 3 using sub-classing.<br />

The term dynamic polymorphism is often found <strong>for</strong> this type of polymorphism.<br />

To introduce the applicability of this kind of polymorphism an example from the topological<br />

area is given. Classes <strong>for</strong> different kinds of points are used, which should be comparable in their<br />

own set. Traversing through containers or data structures is a quite common task in generic<br />

programming. The next code snippet presents the base class <strong>for</strong> all kind of vertices.<br />

#include<br />

class topology { };<br />

class vertex<br />

{<br />

public:<br />

virtual bool equal(const vertex∗ ve) const = 0;<br />

};<br />

If these vertex types have to be extended, only the new class with the according equal method<br />

should be implemented. The next code snippet presents two possible implementations <strong>for</strong> a<br />

vertex, which can be used in different topologies.<br />

class structured vertex : public vertex<br />

{<br />

public:<br />

structured vertex(int id, topology∗ topo) : id(id), topo(topo) {}<br />

virtual bool equal(const vertex∗ ve) const<br />

{<br />

const structured vertex∗ sv = dynamic cast(ve);<br />

return ((id == sv→ id) && (topo == sv→ topo));<br />

}<br />

protected:<br />

int id;<br />

topology∗ topo;<br />

};<br />

3 Also called inclusion polymorphism.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 215<br />

class unstructured vertex : public vertex<br />

{<br />

public:<br />

unstructured vertex(int handle, topology∗ topo, int segment) : handle(handle), segment(segment), topo(topo) {}<br />

virtual bool equal(const vertex∗ ve) const<br />

{<br />

const unstructured vertex∗ sv = dynamic cast(ve);<br />

return handle == (( sv→ handle ) && ( topo == sv→ topo) && (segment == sv→ segment));<br />

}<br />

protected:<br />

int handle;<br />

int segment;<br />

topology∗ topo;<br />

};<br />

With this virtual class hierarchy, an algorithm which operates on all different classes derived<br />

from vertex can be written. This is called explicit interface.<br />

void print equal(const vertex∗ ve1, const vertex∗ ve2)<br />

{<br />

std::cout ≪ std::boolalpha ≪ ve1→ equal(ve2) ≪ std::endl;<br />

}<br />

The next code lines present the generic behavior of the algorithm, which operators on both<br />

types derived from vertex.<br />

int main()<br />

{<br />

topology the topo;<br />

vertex∗ the vertex1;<br />

vertex∗ the vertex2;<br />

}<br />

// ∗∗∗ structured<br />

the vertex1 = new structured vertex(12, &the topo);<br />

the vertex2 = new structured vertex(12, &the topo);<br />

print equal(the vertex1, the vertex2);<br />

// ∗∗∗ unstructured<br />

the vertex1 = new unstructured vertex(12, &the topo, 1);<br />

the vertex2 = new unstructured vertex(12, &the topo, 2);<br />

print equal(the vertex1, the vertex2);<br />

return 0;<br />

As can be seen, polymorphic behavior can be achieved, but with major drawbacks. First,


216 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

pointers or references to the objects have to be used, which eliminates the possibility <strong>for</strong> a<br />

compiler to optimize some parts of the code, i.e. inlining. Second, a dynamic cast has to be<br />

used which can cause an exception at run time. This kind of problem is called binary-methodproblem,<br />

which is explained in Section 7.5.3<br />

Nevertheless, dynamic polymorphism in <strong>C++</strong> is best at:<br />

• Uni<strong>for</strong>m manipulation based on base/derived class relationships: Different classes that<br />

hold a base/derived relationship can be treated uni<strong>for</strong>mly.<br />

• Static type checking: All types are checked statically in <strong>C++</strong>.<br />

• Dynamic binding and separate compilation: Code that uses classes in a hierarchy can<br />

be compiled apart from the code of the entire hierarchy. This is possible because of the<br />

indirection that pointers provide (both to objects and to functions).<br />

• Binary interfacing: Modules can be linked either statically or dynamically, as long as the<br />

linked modules lay out the virtual tables the same way.<br />

Behind the Dynamic Polymorphism in <strong>C++</strong><br />

How virtual functions work:<br />

• Normally when the compiler sees a member function call it simply inserts<br />

instructions calling the appropriate subroutine (as determined by the<br />

type of the pointer or reference)<br />

• However, if the function is virtual a member function call such as<br />

vc→ foo() is replaced with following: (∗((vc→ vtab)[0]))()<br />

• The expression vc→ vtab locates a special ”secret” data member of the<br />

object pointed to by vc. This data member is automatically present in<br />

all objects with at least one virtual function. It points to a class-specific<br />

table of function pointers (known as the class’s vtable)<br />

• The expression (vc→ vtab)[0] locates the first element of the class’s<br />

vtable of the object (the one corresponding to the first virtual function<br />

foo() ). That element is a function pointer to the appropriate foo()<br />

member function.<br />

• Finally, the expression (∗((vc→ vtab)[0]))() dereferences the function<br />

pointer and calls the function<br />

• Special care must be taken with destructors in virtual class hierarchies.<br />

The base class does not know anything about the derived classes and<br />

so the derived class destructor has to be marked with virtual, too.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 217<br />

Parametric Polymorphism<br />

Parametric polymorphism was the first type of polymorphism developed, and first identified by<br />

Christopher Strachey in 1967. It was also the first type of polymorphism to appear in an actual<br />

programming language, ML in 1976. It exists in <strong>C++</strong>, Standard ML, Haskell, and others. The<br />

term static polymorphism is often found.<br />

In <strong>C++</strong>, this type of polymorphism can be used via templates and also lets a value have more<br />

than one type. Inside<br />

template double function(T param) {..}<br />

param can have any type that can be substituted inside function to render compilable code.<br />

This is called implicit interface in contrast to a base class’s explicit interface. It achieves the<br />

same goal of polymorphism - writing code that operates on multiple types but in a very different<br />

way.<br />

To tie up to the dynamic polymorphic by example, the same example as in the static polymorphic<br />

world is used through function templates:<br />

#include<br />

class topology<br />

{<br />

// ... temp class<br />

};<br />

class structured vertex<br />

{<br />

public:<br />

structured vertex(int id, topology∗ topo) : id(id), topo(topo) {}<br />

bool equal(const structured vertex& ve) const<br />

{<br />

return id == ve.id && topo == ve.topo;<br />

}<br />

protected:<br />

int id;<br />

topology∗ topo;<br />

};<br />

class unstructured vertex<br />

{<br />

public:<br />

unstructured vertex(int handle, topology∗ topo, int segment) : handle(handle), segment(segment), topo(topo) {}<br />

bool equal(const unstructured vertex& ve) const<br />

{<br />

return handle == ve.handle && topo == ve.topo && segment == ve.segment;<br />

}


218 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

protected:<br />

int handle;<br />

int segment;<br />

topology∗ topo;<br />

};<br />

Here, no class hierarchy is required. It has only be guaranteed, that each data type provides an<br />

implementation of the required method. Below print equal() is written as a function template:<br />

template<br />

void print equal(const VertexType& ve1, const VertexType& ve2)<br />

{<br />

std::cout ≪ std::boolalpha ≪ ve1.equal(ve2) ≪ std::endl;<br />

}<br />

In the code snippet below, the same polymorphic behavior can be seen as in the dynamic<br />

polymorphism example, but without the necessity of inheriting from a common base class.<br />

int main()<br />

{<br />

topology the topo;<br />

}<br />

// ∗∗∗ structured<br />

structured vertex sv1(12, &the topo);<br />

structured vertex sv2(12, &the topo);<br />

print equal(sv1, sv2);<br />

// ∗∗∗ unstructured<br />

unstructured vertex usv1(12, &the topo,1);<br />

unstructured vertex usv2(12, &the topo,2);<br />

print equal(usv1, usv2);<br />

return 0;<br />

Without a pointer mechanisms the compiler can easily optimize these lines, i.e. inline the code.<br />

Additionally exceptions cannot occur at run time.<br />

Due to its characteristics, static polymorphism in <strong>C++</strong> is best at:<br />

• Uni<strong>for</strong>m manipulation based on syntactic and semantic interface: Types that obey a<br />

syntactic and semantic interface can be treated uni<strong>for</strong>mly.


7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 219<br />

• Static type checking: All types are checked statically.<br />

• Static binding (prevents separate compilation):All types are bound statically.<br />

• Efficiency: Compile-time evaluation and static binding allow optimization and efficiencies<br />

not available with dynamic binding.<br />

7.5.3 Comparison of Static and Dynamic Polymorphism<br />

Here the main features from static and dynamic polymorphism are summarized:<br />

• Virtual function calls are slower during run time than function templates: A virtual<br />

function call includes an extra pointer dereference to find the appropriate method in the<br />

virtual table. By itself, this overhead may not be significant. Significant slowdowns can<br />

result in compiled code because the indirection may prevent an optimizing compiler from<br />

inlining the function and from applying subsequent optimizations to the surrounding code<br />

after inlining.<br />

• Run time dispatch versus compile-time dispatch: The run time dispatch of virtual functions<br />

and inheritance is certainly one of the best features of object-oriented programming.<br />

For certain kinds of components, run time dispatching is an absolute requirement, decisions<br />

need to be made based on in<strong>for</strong>mation that is only available at run time. When this<br />

is the case, virtual functions and inheritance are needed.<br />

Templates do not offer run time dispatching, but they do offer significant flexibility at<br />

compile time. In fact, if the dispatching can be per<strong>for</strong>med at compile time, templates offer<br />

more flexibility than inheritance because they do not require the template arguments types<br />

to inherit from some base class.<br />

• Code size: virtual functions are small, templates are big:: A common concern in templatebased<br />

programs is code bloat, which typically results from naive use of templates. Carefully<br />

designed template components need not result in significantly larger executable size<br />

than their inheritance-based counterparts.<br />

• The binary method problem: There is a serious problem that shows up when using inheritance<br />

and virtual functions to express operations that work on two or more objects.


220 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />

Note<br />

The binary method problem is encountered when methods in which the<br />

receiver type and argument type should vary together, such as equality<br />

comparisons, must instead use a fixed <strong>for</strong>mal parameter type to<br />

maintain type safety. The problem arises in mainstream object-oriented<br />

languages because only the receiver of a method call is used <strong>for</strong> run time<br />

method selection, and so the argument must be assumed to have the<br />

most general possible type. Existing techniques to solve this problem<br />

require intricate coding patterns that are tedious and error-prone. The<br />

binary method problem is a prototypical example of a larger class of<br />

problems where overriding methods require type in<strong>for</strong>mation <strong>for</strong> their<br />

<strong>for</strong>mal parameters. Another common example of this problem class is<br />

the implementation of event handling (e.g., <strong>for</strong> graphical user interfaces),<br />

where ”callback methods” must respond to a variety of event<br />

types.


7.6. BEST OF BOTH WORLDS 221<br />

7.6 Best of Both Worlds<br />

The object-oriented programming paradigm offers mechanisms to write libraries that are open<br />

<strong>for</strong> extension, but it tends to impose intrusive interface requirements on the types that will be<br />

supplied to the library. The generic programming paradigm has seen much success in <strong>C++</strong>,<br />

partly due to the fact that libraries remain open to extension without imposing the need to<br />

intrusively inherit from particular abstract base classes. However, the static polymorphism that<br />

is a staple of programming with templates and overloads in <strong>C++</strong>, limits generic programming<br />

applicability in application domains where more dynamic polymorphism is required.<br />

In combining elements of object-oriented programming with those of generic programming, we<br />

take generic programming as the starting point, retaining its central ideas. In particular, generic<br />

programming is built upon the notion of value types that are assignable, copy constructible,<br />

The behavior expected from value types reflects that of <strong>C++</strong> built-in types, like int, double,<br />

and so <strong>for</strong>th. This generally assumes that types encapsulate their memory and resource management<br />

into their constructors, copy-constructors, assignment operators, and destructors, so<br />

that objects can be copied, and passed as parameters by copy, etc., without worrying about<br />

references to their resources becoming aliased or becoming dangling. Value types simplify local<br />

reasoning about programs. Explicitly managing objects on the heap and using pass-by-reference<br />

as the parameter passing mode makes <strong>for</strong> complex object ownership management (and object<br />

lifetime management in languages that are not garbage collected). Instead, explicitly visible<br />

mechanisms thin wrapper types like reference wrapper in the (draft) <strong>C++</strong> standard library ae<br />

used when sharing is desired.<br />

.. more to come..<br />

7.6.1 Compile Time Container<br />

7.6.2 Meta-Functions<br />

7.6.3 Run-Time concepts


222 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY


Part II<br />

Using C ++<br />

223


Finite World of Computers<br />

Chapter 8<br />

8.1 Mathematical Objects inside the Computer<br />

First natural numbers N are introduced and the data types available in a programming language<br />

used to represent them. The difference between a single digit, and their connection to the used<br />

base is an important concept in computer science.<br />

A number is represented by several single digits, with each digit being a factor <strong>for</strong> a corresponding<br />

power of the base. The number is only complete when both the base and all of the digits are<br />

known. To use an example, the digit sequence 123 is calculated with the corresponding base,<br />

e.g. base = 10:<br />

12310 = 1 · 10 2 + 2 · 10 1 + 3 · 10 0<br />

If the base is switched, e.g. base = 4, then the following numbers are derived:<br />

1234 = 1 · 4 2 + 2 · 4 1 + 3 · 4 0 = 2710<br />

One of the drawbacks of the representation of numbers within the computer is the fact that the<br />

built-in types such as int and long can only use a finite number of bits and are hence limited<br />

in their range (the int can be omitted), e.g.:<br />

short int: -32768 +32767<br />

long int: -2147483648 +2147483647<br />

unsigned long int: 0 +4 294 967 295<br />

As can be seen the maximum number of countable items is restricted. If, as an example, a<br />

program has to count the living humans on earth we have to switch to another number concept,<br />

either floating point or a decimal data type. A plain and simple arbitrary digit number container<br />

can be implemented by:<br />

class big number{<br />

long base;<br />

std::vector digits;<br />

public:<br />

225


226 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />

// .........<br />

};<br />

8.2 More Numbers and Basic Structure<br />

Polynomials are an important and efficient tool <strong>for</strong> numerous fields of science. Due to the simple<br />

rules regarding differentiation and integration polynomials have found wide spread application.<br />

Polynomials can be defined as a weighted sum of exponential terms in at least one variable or<br />

expression, with the exponents being restricted to non-negative whole numbers. Their simple<br />

definition as well as the fact that their algebraic structure is not only closed under addition,<br />

subtraction, and multiplication, but also under differentiation and integration, result in their<br />

widespread application. The demand of additional properties such as, e.g., orthogonality with<br />

respect to an inner product results in special classes of polynomials, orthogonal polynomials,<br />

which further increases their appeal in fields such as finite elements. A polynomial consists of<br />

coefficients (ai) and a variable expression (x i ):<br />

a0 x 0 + a1 x 1 + a2 x 2 + . . . + an x n<br />

Thus a container representation to store the coefficients <strong>for</strong> polynomials was chosen so that a<br />

generic <strong>C++</strong> variable contains the expression:<br />

gsse::polynomial<br />

When storing the coefficients in a container great care has been taken to implement the library<br />

to be generic with respect to the type of the underlying data structure. In this way it is possible<br />

to use compile time containers if the size or even the concrete coefficients are already known at<br />

compile time. This allows the compiler to inline and execute operations at compile time.<br />

The most suitable container to use <strong>for</strong> the coefficients usually depends on the input and not<br />

the algorithms. It is there<strong>for</strong>e important to provide a basic set of programming utilities which<br />

are generic with regard to the used container type. Compile time and run time containers have<br />

a few incompatible requirements which make it hard to define a common set of utilities.<br />

8.2.1 Accessing Coefficients<br />

Accessing a polynomial’s coefficients is an important operation. There exist two basic ways of<br />

accessing the coefficient. Compile time accessors are used when the index of the coefficient to<br />

be accessed is known at compile time, while run time accessors have to be used otherwise. The<br />

compile time version takes the index as a template-parameter, while the run time entity as a<br />

function argument.<br />

namespace compiletime {<br />

template<br />

typename result of::coeff::type<br />

coeff(Polynomial const &p);<br />

}<br />

namespace runtime {


8.2. MORE NUMBERS AND BASIC STRUCTURE 227<br />

template<br />

typename result of::coeff::type<br />

coeff(index type n, Polynomial const &p);<br />

}<br />

Access to the coefficient is then available by:<br />

polynomial p;<br />

compiletime::coeff(p);<br />

runtime::coeff(n, p);<br />

Thus it is possible <strong>for</strong> the compiler to simplify the code and determine more in<strong>for</strong>mation about<br />

the coefficient. There<strong>for</strong>e the compile time version is more flexible than the run time version.<br />

Using inhomogeneous compile time containers in conjunction with the run time accessor is not<br />

possible since it is not possible to determine the return type in advance. This reduces the<br />

flexibility of the code using the run time accessors. A workaround to this problem can be<br />

achieved by using the visitor pattern.<br />

template<br />

void coeff visitor(<br />

index type n,<br />

Polynomial const &p,<br />

Visitor v<br />

);<br />

However, this approach has the disadvantage of being more complicated to use than the coeff<br />

function.<br />

The coefficient accessors are not simple wrappers around the accessors of the underlying container.<br />

They check the access and return a zero value if the container does not contain the<br />

coefficient. The zero value is determined by the coeff trait template class:<br />

template<br />

struct coeff trait<br />

{<br />

typedef CoeffType zero type;<br />

static zero type const<br />

zero value = zero type();<br />

};<br />

By using partial template-specialization it is possible to define the corresponding zero value <strong>for</strong><br />

the correct type. For inhomogeneous polynomials default coeff is passed as CoeffType and<br />

the default behavior is to return an int.<br />

8.2.2 Setting Coefficients<br />

Coefficients may set using the set coeff function. It does not change the given polynomial but<br />

creates a new view instead. This provides the polynomial library with a functional programming<br />

style. Setting the coefficients and changing the polynomial can only be achieved by directly<br />

manipulating the coefficient container.


228 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />

namespace compiletime<br />

{<br />

template<br />

typename result of::set coeff::type<br />

set coeff(Polynomial const &p,<br />

Coeff const &c);<br />

}<br />

namespace runtime<br />

{<br />

template<br />

typename result of::set coeff::type<br />

set coeff(index type n,<br />

Polynomial const &p,<br />

Coeff const &c);<br />

}<br />

Write access is then available by:<br />

polynomial p;<br />

compiletime::set coeff(p, 1);<br />

runtime::set coeff(n, p, 1);<br />

The degree of the polynomial is defined as the maximum degree of all of its terms, where the<br />

degree of a term is given as the sum of the degree of all variables in this term. The polynomial<br />

library defines the degree as the index of the highest non-zero coefficient. To obtain the correct<br />

degree requires to use a polynomial <strong>for</strong> each variable and finally combine them:<br />

struct X;<br />

typedef polynomial<<br />

X,<br />

fusion::map< pair< mpl::int , double> ><br />

> inner poly;<br />

degree � 3 x 4 y 2� = 4 + 2 = 6<br />

typedef polynomial<<br />

Y,<br />

fusion::map< pair< mpl::int , inner poly> ><br />

> the polynomial;<br />

By instantiating the polynomial the calculation of its degree is possible:<br />

the polynomial p;<br />

assert( degree(p) == 6 );


8.2. MORE NUMBERS AND BASIC STRUCTURE 229<br />

8.2.3 Compile Time Programming<br />

The application of meta-programming is presented which utilizes the compiler to execute code<br />

at compile time and then reduce the result of the expressions. As an example the derivative of<br />

a second-degree polynomial is calculated and a second polynomial is added:<br />

d(3 + 4.5 x + 10 x 2 )<br />

dx<br />

+ (1 + 2x) = 5.5 + 22 x<br />

The type list represents the type of each coefficient starting from the zero to the second degree<br />

coefficient.<br />

struct X { } x;<br />

typedef fusion::vector coeffs;<br />

typedef polynomial poly;<br />

poly p(x, coeffs(3.0, 4.5, 10));<br />

typedef result of::diff::type diffed;<br />

diffed d = diff(p, x);<br />

poly q(x, coeffs(1.0, 2.0, 0));<br />

std::cout ≪ coeff(q + d);<br />

By compiling and evaluating the assembler code it is revealed that the calculations were per<strong>for</strong>med<br />

at compile time and the binary only contains the final result of 22.<br />

8.2.4 Arbitrary-Precision Arithmetic<br />

The application of the polynomial library to per<strong>for</strong>m arbitrary-precision arithmetic (or “bignum<br />

arithmetic”) is also presented here. It uses the fact that a number is in essence a polynomial<br />

with a fixed base.<br />

1372 = 1 · 10 3 + 3 · 10 2 + 7 · 10 + 2<br />

This can easily be translated into <strong>C++</strong> code by using the polynomial library. Note, that the<br />

first element in the array is the zero coefficient:<br />

typedef unsigned char byte t;<br />

typedef array coeffs t;<br />

coeffs t coeffs = {{2, 7, 3, 1}};<br />

gsse::polynomial p(coeffs);<br />

Since computer systems usually operate on binary numbers base-2 is the optimal choice. The<br />

difference between polynomial arithmetic and arbitrary-precision arithmetic is that the coefficients<br />

need to be realigned to the base after each operation.


230 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />

8.2.5 Finite Element Integration<br />

In the theory of finite elements [?, ?], a continuous function space is projected onto a finite<br />

function space P k , where the space P k is the space of polynomials up to the total order of k.<br />

For many special cases, finite element integrals can be computed manually and added into<br />

the source code of an application. This results in excellent run time per<strong>for</strong>mance but lacks<br />

flexibility. For more general cases, e.g., general coefficients, they must be computed by numerical<br />

integration at run-time. To prevent an ill-conditioned system matrix, orthogonal polynomials<br />

have to be chosen as numerical integration weights. One possible type of polynomial is a<br />

normalized Legendre polynomial [?]. Coefficients <strong>for</strong> such a polynomial Pk of order k can be<br />

efficiently evaluated by using the recursion procedure:<br />

P0(x) = 1 (8.1)<br />

P1(x) = x<br />

2j − 1<br />

Pk(x) = x Pk−1(x) −<br />

j<br />

j − 1<br />

j Pk−2(x) k ≥ 2<br />

To use arbitrary p-finite elements (polynomial order [?, ?]) the numerical coefficients have to<br />

be calculated either manually and inserted into the source code or determined numerically at<br />

run time.<br />

The polynomial library presented here is then used to store manually pre-calculated integration<br />

tables at compile time (order 1-5). If the user requires higher order finite elements, numerical<br />

coefficients are calculated at run time to any order.<br />

8.3 A Loop and More<br />

One of the important parts in computer science is repetation. A computer was made to do<br />

exactly like this, programmable operations and repetations. To give a simple example, a <strong>for</strong><br />

loop is expressed by:<br />

<strong>for</strong> (long i = 0; i < max counter; ++i)<br />

{}<br />

To give a real application of this concept, integration is used.<br />

� b<br />

a<br />

f(x)dx<br />

Several approximation schemes are also available:<br />

� b<br />

a<br />

� b<br />

a<br />

f(x)dx ≈<br />

f(x)dx ≈<br />

b − a<br />

6<br />

f(a) + f(b)<br />

2<br />

�<br />

f(a) + 4f<br />

· (b − a)<br />

� a + b<br />

2<br />

� �<br />

+ f(b)


8.4. THE OTHER WAY AROUND 231<br />

As can be seen, this is a very coarse approximation, but the main issue persists. The known<br />

continuos integration is not possible inside the computer, but the concept of numerical integration<br />

is possible. This means the constraint of a finite dx is replaced by a ∆x and the � is<br />

replaced by a finite sum �<br />

i=0;i


232 CHAPTER 8. FINITE WORLD OF COMPUTERS


How to Handle Physics on the<br />

Computer<br />

Chapter 9<br />

9.1 Finite Elements<br />

Discretization schemes lead in general to a linear system of equations:<br />

These matrices are typically:<br />

• sparse (there are only few non-zero elements per row)<br />

• large dimension N (10 4 − 10 9 unknowns)<br />

—xx<br />

A x = f (9.1)<br />

The non-zero elements of the matrix Ai,j represent a finite element with both degrees of freedom<br />

i and j connected.<br />

To demonstrate the transfer of a continuous <strong>for</strong>mulated equation such as the Laplace or Poisson<br />

equation to the finite regime of a computer, a simple Dirichlet problem is used. If an implicit<br />

(uni<strong>for</strong>m) 1D-grid with n elements is used, the contribution of each element to the system<br />

matrix A is constant, so called stencil sub-matrix.<br />

⎛<br />

⎜<br />

A = ⎜<br />

⎝<br />

2 −1<br />

−1 2 −1<br />

2D implicit grid of dimension N = (n − 1) 2 is:<br />

−1 2 −1<br />

−1 2<br />

233<br />

⎞<br />

⎟<br />

⎠<br />

(n−1)x(n−1)


234 CHAPTER 9. HOW TO HANDLE PHYSICS ON THE COMPUTER<br />

⎛<br />

⎜<br />

A = ⎜<br />

⎝<br />

⎜<br />

D = ⎜<br />

⎝<br />

2 −1<br />

−1 2 −1<br />

⎛<br />

⎜<br />

A = ⎜<br />

⎝<br />

⎛<br />

−1 2 −1<br />

−1 2<br />

D −I<br />

−I D −I<br />

4 −1<br />

−1 4 −1<br />

and the (n − 1)x(n − 1) identity matrix I.<br />

9.2 Again, Integrators<br />

⎞<br />

⎟<br />

⎠<br />

−I D −I<br />

−I D<br />

−1 4 −1<br />

−1 4<br />

⎞<br />

⎟<br />

⎠<br />

(n−1)x(n−1)<br />

⎞<br />

⎟<br />

⎠<br />

(n−1)x(n−1)


Programming tools<br />

Chapter 10<br />

In this chapter we introduce programming tools that can be used to solve the exercises.<br />

10.1 GCC<br />

GCC stands <strong>for</strong> the Gnu Compiler Collection. It is a collection of compilers (C, <strong>C++</strong>, FOR-<br />

TRAN, Fortran 90, java) free of charge [?]. The <strong>C++</strong> compilers are very good and produce<br />

reasonably efficient code. In this section, we explain how to compile a <strong>C++</strong> program.<br />

The following command:<br />

g++ -o hello hello.cpp<br />

compiles the <strong>C++</strong> source file hello.cpp into the executable hello.<br />

The compiler command is gcc or g++ with the following options.<br />

• -Idirectory: Include files directory<br />

• -O: Optimization<br />

• -g: Debugging<br />

• -p: Profiling<br />

• -o filename: output file name<br />

• -c: Compile, no link<br />

• -Ldirectory: Library directory<br />

• -lfile: Link with library libfile.a<br />

Here is another example:<br />

g++ -o foo foo.cpp -I/opt/include -L/opt/lib -lblas<br />

compiles and links the file foo.cpp using include files from /opt/include/ (option -I) and<br />

linked with a library that is situated in the directory /opt/lib: For optimizing code, we have<br />

to use the compilation options :<br />

-O3 -DNDEBUG<br />

235


236 CHAPTER 10. PROGRAMMING TOOLS<br />

The -DNDEBUG option sets the C-preprocessor variable NDEBUG which tells the assert command<br />

that debug tests should not be done. This allows us to save time at execution.<br />

10.2 Debugging<br />

10.2.1 Debugging with text tools<br />

“Et la tu t’dis que c’est fini<br />

car pire que ça ce serait la mort.<br />

Qu’en tu crois enfin que tu t’en sors<br />

quand y en a plus et ben y en a encore!”<br />

— Stromae.<br />

There are several debugging tools. In general, graphical ones are more user friendly, but they<br />

are not always available. In this section, we describe the gdb debugger, which is very useful to<br />

trace the cause of a run time error if the code was compiled with the option -g.<br />

The following contains a printout of a gdb session of the program hello.cpp:<br />

#include <br />

#include <br />

int main() {<br />

glas::dense vector< int > x( 2 ) ;<br />

x(0) = 1 ; x(1) = 2 ;<br />

<strong>for</strong> (int i=0; i


10.2. DEBUGGING 237<br />

T& glas::continuous_dense_vector::operator()(ptrdiff_t) [with T = int]:<br />

Assertion ‘i


238 CHAPTER 10. PROGRAMMING TOOLS<br />

10.2.2 Debugging with graphical interface: DDD<br />

More convenient than debugging on a text level is using a graphical interface like DDD (Data<br />

Display Debugger). It has more or less the same functionality as gdb and in fact it runs gdb<br />

internally. One can use it also with another text debugger.<br />

As case study, we use a modified example from Section 5.4.5. In fact, the buggy program arose<br />

by teaching § 5.4.5, i.e. one of the authors tried to reconstruct vector unroll example2.cpp<br />

on the fly.<br />

TODO: Find a better example. The above finally was okay, the tuning just did not change the<br />

run-time behaviour.<br />

In addition to the window above you will see a smaller one like in Figure 10.1, typically on the<br />

right of the large window if there is enough space on your screen.<br />

This control panel let you geer through the debug session in way that is<br />

easier <strong>for</strong> beginner and even <strong>for</strong> some advanced users more convenient.<br />

You have the following command:<br />

Run Start or restart your program.<br />

Interrupt If your program does not terminate or does not reach the next<br />

break point you can stop it manually.<br />

Step Go one step <strong>for</strong>ward. If your position is a function call, jump into<br />

the function.<br />

Next Go to the next line in your source code. If you are located on a<br />

function call do not jump into it unless there is a break point set<br />

inside.<br />

Figure 10.1: DDD<br />

control panel


10.3. VALGRIND 239<br />

Stepi and Nexti This are the equivalents on instruction level. This is<br />

only needed <strong>for</strong> debugging assembler code and not subject in this<br />

book.<br />

Until Position your cursor in your source and run the program until you<br />

reach this line. If your program flow do not pass this line the execution<br />

will continued till the end, the next break point or bug.<br />

Finish Execute the remainder of the current function and stop in the first<br />

line outside this function, i.e. the line after the function call.<br />

Cont Continue your execution till the next event (break point, bug, or<br />

end).<br />

Kill the program.<br />

Up Show the line of the current function’s call, i.e. go up one level in the<br />

call stack.<br />

Down Go back to the called function, i.e. go down one level in the call<br />

stack.<br />

Undo Revert last action (works rarely or never).<br />

Redo Repeat the last command.<br />

Edit Call an editor with the source file currently shown.<br />

Make Call ‘make’ (which must know what to compile).<br />

10.3 Valgrind<br />

The valgrind distribution offers several tools that you can use to analyze your software. We will<br />

only use one of these tools called memcheck. For more in<strong>for</strong>mation on the others we refer you<br />

to http://valgrind.org Memcheck detects memory-management problems like memory leaks.<br />

Memcheck also reports if your program accesses memory it should not or if it uses uninitialized<br />

values. All these errors are reported as soon as they occur along with the corresponding source<br />

line number at which they occurred and also a stack trace of the functions called to reach<br />

that line. You should also take into account that Memcheck runs programs about 10 to 30<br />

times slower than normal. Use the following command to check the memory management of a<br />

program:<br />

valgrind −−tool=memcheck program name<br />

10.4 Gnuplot<br />

A useful tool <strong>for</strong> making plots is Gnuplot. It is a public domain program.<br />

Invoke gnuplot to start the program. Suppose we have the file results with the following<br />

content:


240 CHAPTER 10. PROGRAMMING TOOLS<br />

0 1<br />

0.25 0.968713<br />

0.75 0.740851<br />

1.25 0.401059<br />

1.75 0.0953422<br />

2.25 -0.110732<br />

2.75 -0.215106<br />

3.25 -0.237847<br />

3.75 -0.205626<br />

4.25 -0.145718<br />

4.75 -0.0807886<br />

5.25 -0.0256738<br />

5.75 0.0127226<br />

6.25 0.0335624<br />

6.75 0.0397399<br />

7.25 0.0358296<br />

7.75 0.0265507<br />

8.25 0.0158041<br />

8.75 0.00623965<br />

9.25 -0.000763948<br />

9.75 -0.00486465<br />

plot "results" w l plot "results"<br />

The first column represents the x coordinate and the second colum contains the corresponding<br />

y coordinate values. We can plot this using the command:<br />

plot "results" w l<br />

The command<br />

plot "results"<br />

only plots stars, no line. The command help is also useful. For 3D plots, i.e. a table with three<br />

columns, we use the command splot.<br />

10.5 Unix and Linux<br />

Unix (and Linux) are not used as often as Windows plat<strong>for</strong>ms, although <strong>for</strong> scientific programming<br />

they are popular development plat<strong>for</strong>ms. The Unix operating system is a command line<br />

system with several graphical interfaces. Especially in Linux, the graphical interfaces are well<br />

developed so that you get a windows like look and feel. Although you can easily browse through<br />

the directories, create new directories and move data around with a few mouse clicks, it may<br />

be interesting to know at least a few Unix commands:<br />

• ps: list of my processes,<br />

• kill -9 id : kill the process with id id,


10.5. UNIX AND LINUX 241<br />

• top: list all processes and resource use,<br />

• mkdir: make a new directory,<br />

• rmdir: remove an (empty) directory,<br />

• pwd: name of the current directory,<br />

• cd dir: change directory to dir,<br />

• ls: list the files in the current directory<br />

• cp from to: copy the file from to the file or directory to. if the file to exists, it is<br />

overwritten, unless you use cp -i from to,<br />

• mv from to: move the file from to the file or directory to. If the file to exists, it is<br />

overwritten, unless you use mv -i from to,<br />

• rm files: remove all the files in the list files. rm * removes everything (be careful) chmod<br />

mode files : change the user mode <strong>for</strong> files.<br />

See http://www.physics.wm.edu/unix_intro/outline.html <strong>for</strong> on-line help.


242 CHAPTER 10. PROGRAMMING TOOLS


C ++ Libraries <strong>for</strong> Scientific Computing<br />

Chapter 11<br />

TODO: Introducing words.<br />

11.1 GLAS: Generic Linear Algebra Software<br />

11.1.1 Introduction<br />

Software kernels <strong>for</strong> dense and sparse linear algebra have been developed over many decades.<br />

The development of the BLAS [?] [?] [?] [?] [?] in FORTRAN and later the similar work in<br />

<strong>C++</strong>, see MTL [?], Blitz++, to name a few.<br />

Currently, more and more scientific software is written in <strong>C++</strong>, but the language does not<br />

provide us with dense and sparse vector and matrix concepts and algorithms, as this is the<br />

case <strong>for</strong> Matlab. This makes exchanging <strong>C++</strong> software harder than, <strong>for</strong> example, Fortran 90<br />

software, which has dense vector and matrix concepts defined in the language. Note that<br />

Fortran 90 does not have sparse and structured matrix types such as symmetric or upper<br />

triangular, or banded matrices.<br />

11.1.2 Goal<br />

The goal of the GLAS project is to open the discussion on standardization <strong>for</strong> <strong>C++</strong> programming.<br />

The goal is not to present a standard as such, but may be a first step to achieve this<br />

goal.<br />

We realize that this is very ambitious. We think, the GLAS proposal meets the goals, but the<br />

internals are still rather complicated, which makes extensions less straight<strong>for</strong>ward. GLAS is a<br />

generic software package using advanced meta programming tools such as the Boost MPL, but<br />

this is invisible to the user who does not want to add extensions to GLAS. A minor knowledge<br />

about template programming and expression templates is required <strong>for</strong> making proper use of the<br />

software.<br />

This version does not use Concept <strong>C++</strong>, since we have encountered instability problems with<br />

the Concept-GCC compiler and found it hard to work with expression-templates.<br />

243


244 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

We now briefly explain how the goals are met be<strong>for</strong>e entering a more detailed discussion of the<br />

software design.<br />

The GLAS should be considered as an interface to other software <strong>for</strong> linear algebra, e.g. the<br />

BLAS, MTL, or other linear algebra software. Such interface is provided by the Back-ends,<br />

whereas the syntax <strong>for</strong> using such backends does not change. For example, if we want to add a<br />

scaled vector to another vector (an axpy), then we write<br />

y + = a ∗ x ;<br />

but the implementation can use the BLAS (e.g. daxpy), or MTL, or another package. We have<br />

provided a reference <strong>C++</strong> implementation, that is an illustration of how the expressions are<br />

dispatched to the actual implementation.<br />

The concepts mainly contain free functions and meta functions, so that external objects can be<br />

used in GLAS provided these functions are specialized. As an exercise, we show how this can<br />

be done <strong>for</strong> an std::vector.<br />

For more in<strong>for</strong>mation, see [?].<br />

11.1.3 Status<br />

GLAS is still under development. Currently, there are features <strong>for</strong> working with dense vectors<br />

and matrices, and sparse matrices. There is support to the Boost.Sandbox.Bindings and Toolboxes<br />

<strong>for</strong> working with LAPACK, Structured Matrices (mase toolbox), and iterative methods<br />

(iterative toolbox).<br />

11.2 Boost<br />

Boost is a bit out of line in this chapter. Firstly, it is not a library itself but a whole collection<br />

of freely available C ++ libraries. Secondly, not all of the contained libraries deal directly with<br />

scientific computing. However, many of the “non-scientific” libraries provide useful functionality<br />

<strong>for</strong> scientific libraries and applications.<br />

Boost provides free portable <strong>C++</strong> libraries.<br />

Currently, the following Boost libraries are available that are useful <strong>for</strong> numerical software:<br />

• Data structures<br />

– tuple: pairs, triples, etc, e.g. tuple<br />

– smart ptr: smart pointers<br />

• Correctness and testing<br />

– static assert: compile time assertions<br />

• Template programming<br />

– enable if, mpl, type traits<br />

– static assert: compile time assertions


11.3. BOOST.BINDINGS 245<br />

• Math and numerics<br />

– numeric::conversions: conversions of types<br />

– thread: multi-threading<br />

– bindings: generic bindings to external software<br />

– graph: graph programs<br />

– integer: integer types<br />

– interval: interval arithmetic<br />

– random: random number generator<br />

– rational: rational numbers<br />

– math: various mathematical things, e.g. greatest common divisor<br />

– typeof: type deduction<br />

– numeric::ublas: vector and matrix library<br />

– math::quaternion, math::octonian<br />

– math::special functions<br />

• Miscellaneous<br />

– filesystem: advanced operations on files, directories<br />

– program options: working with command line options in your<br />

– timer: timing class<br />

For more in<strong>for</strong>mation on these and other boost libraries see http://www.boost.org.<br />

11.3 Boost.Bindings<br />

Scientific programmers using <strong>C++</strong> also want to use the features offered by mature FORTRAN<br />

and C codes such as LAPACK [?], MUMPS [?] [?], SuperLU [?] and UMFPACK [?]. The<br />

programming ef<strong>for</strong>t <strong>for</strong> rewriting these codes in <strong>C++</strong> is very high. It there<strong>for</strong>e makes more<br />

sense to link the codes into <strong>C++</strong> code. Another argument <strong>for</strong> linking with external software is<br />

per<strong>for</strong>mance : the vendor tuned BLAS functions are perhaps the most obvious example.<br />

In the traditional approach, an interface is developed <strong>for</strong> each basic <strong>C++</strong> linear algebra package<br />

and <strong>for</strong> each external linear algebra package. This is illustrated by Figure 11.1. The Boost<br />

bindings adopt the approach of orthogonality between algorithms and data. This orthogonality<br />

is created by traits classes that provide the necessary data to the external software. The vector<br />

traits, <strong>for</strong> example, provide a pointer (or address), size and stride, which can then be used<br />

by e.g. the BLAS function ddot. Each traits class is specialized <strong>for</strong> user defined vector and<br />

matrix packages. This implies that, <strong>for</strong> a new vector or matrix type, the development ef<strong>for</strong>t is<br />

limited to the specialization of the traits classes. Once the traits classes are specialized, BLAS<br />

and LAPACK can be used straightaway. For a new external software package, it is sufficient


246 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

BLAS LAPACK ATLAS MUMPS . . .<br />

uBLAS<br />

✪✪✪✪✪✪ ✧ ✧✧✧✧✧✧✧✧<br />

✦✦✦✦✦✦✦✦✦✦✦✦✦ MTL GLAS . . .<br />

Figure 11.1: Traditional interfaces between software<br />

BLAS LAPACK<br />

❅<br />

❅❘ ❄<br />

uBLAS<br />

��✒<br />

✻<br />

MTL<br />

ATLAS<br />

❄<br />

Bindings<br />

✻<br />

GLAS<br />

MUMPS<br />

Figure 11.2: Concept of bindings as a generic layer between linear algebra algorithms and vector<br />

and matrix software<br />

to provide a layer that uses the bindings. Figure 11.2 illustrates this philosophy. Note the<br />

difference with Figure 11.1.<br />

11.3.1 Software bindings<br />

We now illustrate how the bindings can be used to interface external software by means of<br />

examples.<br />

BLAS bindings<br />

The BLAS are the Basic Linear Algebra Subroutines [?] [?] [?] [?] [?], whose reference implementation<br />

is available through Netlib 1 . The BLAS are subdivided in three levels : level one contains<br />

vector operations, level two matrix vector operations and level three, matrix operations.<br />

The BLAS bindings in Boost Sandbox contain interfaces to some BLAS functions. Functions<br />

are added on request. The interfaces check the input arguments using the assert command,<br />

which is only compiled when the NDEBUG compile flag is not set. The interfaces are contained<br />

in three files : blas1.hpp, blas2.hpp, and blas3.hpp in the directory boost/numeric/bindings/blas. The<br />

BLAS bindings reside in the namespace boost::numeric::bindings::blas.<br />

The BLAS provide functions <strong>for</strong> vectors and matrices with value type float, double, std::complex,<br />

and std::complex. All matrix containers have ordering type column major t,since the (FOR-<br />

TRAN) BLAS assume column major matrices.<br />

The bindings are illustrated in Figure 11.3 <strong>for</strong> the BLAS subprograms DCOPY, DSCAL, and<br />

DAXPY <strong>for</strong> objects of type std::vector. Note the include files <strong>for</strong> the bindings of the<br />

BLAS-1 subprograms and the include file that contains the specialization of vector traits <strong>for</strong><br />

std::vector.<br />

1 http://www.netlib.org<br />

. . .<br />

❄<br />

✻<br />

. . .<br />

�<br />

�✠


11.3. BOOST.BINDINGS 247<br />

#include <br />

#include <br />

int main() {<br />

std::vector< double > x( 10 ), y( 10 ) ;<br />

// Fill the vector x<br />

...<br />

bindings::blas::copy( x, y ) ;<br />

bindings::blas::scal( 2.0, y ) ;<br />

bindings::blas::axpy( −3.0, x, y ) ;<br />

return 0 ;<br />

}<br />

LAPACK bindings<br />

Figure 11.3: Example <strong>for</strong> BLAS-1 bindings and std::vector bindings traits<br />

Software <strong>for</strong> dense and banded matrices is collected in LAPACK [?]. It is a collection of<br />

FORTRAN routines mainly <strong>for</strong> solving linear systems, and eigenvalue problems, including the<br />

singular value decomposition. As <strong>for</strong> the BLAS, the Boost Sandbox does not contain a full set<br />

of interfaces to LAPACK routines, but only very commonly used subprograms. On request,<br />

more functions are added to the library. The LAPACK bindings reside in the namespace<br />

boost::numeric::bindings::lapack.<br />

Many LAPACK subroutines require auxiliary arrays, which a non-expert user does not wish to<br />

allocate <strong>for</strong> reasons of com<strong>for</strong>t. The interface allows the user to allocate auxiliary vectors using<br />

the templated Boost.Bindings class array.<br />

The LAPACK bindings verify the matrix structure to see whether the routine is the right choice.<br />

It is also checked whether the matrix arguments are column major. Every function’s return<br />

type is int. The return value is the return value of the INFO argument of the corresponding<br />

LAPACK subprogram.<br />

Figure 11.4 shows an example using GLAS.<br />

MUMPS bindings<br />

MUMPS stands <strong>for</strong> Multifrontal Massively Parallel Solver. The first version was a result from<br />

the EU project PARASOL [?, ?, ?]. The software is developed in Fortran 90 and contains a C interface.<br />

The input matrices should be given in coordinate <strong>for</strong>mat, i.e. storage <strong>for</strong>mat=coordinate t<br />

and the index numbering should start from one, i.e. sparse matrix traits::index base==1. We<br />

refer to the MUMPS Users Guide, distributed with the software [?].<br />

The <strong>C++</strong> interface is a generic interface to the respective C structs <strong>for</strong> the different value<br />

types that are available from the MUMPS distribution: float, double, std::complex, and<br />

std::complex. The <strong>C++</strong> bindings also contain functions to set the pointers and sizes<br />

of the parameters in the C struct using the bindings traits classes. An example is given in<br />

Figure 11.5. The sparse matrix is the uBLAS coordinate matrix, which is a sparse matrix in


248 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

#include <br />

...<br />

int main () {<br />

int n=100;<br />

// Define a real n x n matrix<br />

glas::dense matrix< double > matrix( n, n ) ;<br />

// Define a complex n vector<br />

glas::dense vector< std::complex > eigval( n ) ;<br />

// Fill the matrix<br />

...<br />

// Call LAPACK routine DGEES <strong>for</strong> computing the eigenvalue Schur <strong>for</strong>m.<br />

// We create workspace <strong>for</strong> best per<strong>for</strong>mance.<br />

bindings::lapack::gees( matrix, eigval, bindings::lapack::optimal workspace() ) ;<br />

...<br />

}<br />

Figure 11.4: Example <strong>for</strong> LAPACK bindings and matrix bindings traits


11.4. MATRIX TEMPLATE LIBRARY 249<br />

coordinate <strong>for</strong>mat. The matrix is stored column wise. The template argument 1 indicates that<br />

row and column numbers start from one, which is required <strong>for</strong> the Fortran 90 code MUMPS.<br />

Finally, the last argument indicates that the row and column indices are stored in type int,<br />

which is also a requirement <strong>for</strong> the Fortran 90 interface. The solve consists of three phases :<br />

(1) the analysis phase, which only needs the matrix’ integer data, (2) the factorization phase,<br />

where also the numerical values are required and (3) the solution phase (or backtrans<strong>for</strong>mation),<br />

where the right-hand side vector is passed on. The included files contain the specializations of<br />

the dense matrix and sparse matrix traits <strong>for</strong> uBLAS and the MUMPS bindings.<br />

11.4 Matrix Template Library<br />

11.5 Blitz++<br />

TODO: We can ask Todd to write something himself — Peter<br />

11.6 Graph Libraries<br />

TODO: Few introducing words from Peter<br />

11.6.1 Boost Graph Library<br />

TODO: I can write something about it — Peter<br />

11.6.2 LEDA<br />

LEDA implements advanced container types and combinatorial algorithms, especially graph<br />

algorithms. Containers are parameterized by element type and implementation strategies. Algorithms<br />

in general work only with the data structure of the library itself.<br />

11.7 Geometric Libraries<br />

TODO: Few introducing words from René and Philipp<br />

11.7.1 CGAL<br />

TODO: Ask Sylvain to write something? Or can René and Philipp write it?<br />

CGAL implements generic classes and procedures <strong>for</strong> geometric computing. The data structure<br />

complexity is at a very high level.


250 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />

#include <br />

#include <br />

#include <br />

int main() {<br />

namespace ublas = boost::numeric::ublas ;<br />

namespace mumps = boost::numeric::bindings::mumps ;<br />

...<br />

typedef ublas::coordinate matrix< double, ublas::column major<br />

, 1, ublas::unbounded array<br />

> sparse matrix type ;<br />

sparse matrix type matrix( n, n, nnz ) ;<br />

// Fill the sparse matrix<br />

...<br />

mumps::mumps< sparse matrix type > mumps solver ;<br />

// Analysis (Set the pointer and sizes of the integer data of the matrix)<br />

matrix integer data( mumps solver, matrix ) ;<br />

mumps solver.job = 1 ;<br />

driver( mumps solver ) ;<br />

// Factorization (Set the pointer <strong>for</strong> the values of the matrix)<br />

matrix value data( mumps solver, matrix ) ;<br />

mumps solver.job = 2 ;<br />

driver( mumps solver ) ;<br />

// Set the right−hand side<br />

ublas::vector v( 10 ) ;<br />

...<br />

// Solve (set pointer and size <strong>for</strong> the right−hand side vector)<br />

rhs sol value data( mumps solver, v ) ;<br />

mumps solver.job = 3 ;<br />

mumps::driver( mumps solver ) ;<br />

return 0 ;<br />

}<br />

Figure 11.5: Example of the use of the MUMPS bindings


11.7. GEOMETRIC LIBRARIES 251<br />

11.7.2 GrAL<br />

TODO: René and Philipp write more?<br />

GrAL implements some concepts like GSSE, but without the generalization of function objects,<br />

the three-layer-concept (segment,domain,structure), generalized quantity storage, and<br />

n-dimensional structured grid.


252 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING


Real-World Programming<br />

Chapter 12<br />

12.1 Transcending Legacy Applications<br />

Legacy application has been written in plain ANSI C or are available as Fortran libraries. It is<br />

there<strong>for</strong>e highly desirable to rejuvenate the implementation which is already available so that<br />

it utilizes advanced technologies and techniques while at the same time keeping as much of<br />

the already obtained experience and trust related to the original code base. One approach<br />

<strong>for</strong> the transition is possible by an evolutionary fashion initially including as much of the old<br />

implementation as possible and gradually replacing it to bring it up to date.<br />

The following examples are based on a particle simulator, where to important concepts can be<br />

separated: scattering mechanisms (physical behaviour of particles at boundaries (TODO: PS))<br />

and physical model descriptions (how particles interact (TODO: PS) ). All available scattering<br />

mechanisms are implemented as individual functions, which are called subsequently. The scattering<br />

models require a variable set of parameters, which leads to non-homogeneous interfaces<br />

in the functions representing them. To alleviate this to some extent global variables have been<br />

employed completely eliminating any aspirations of data encapsulation and posing a serious<br />

problem <strong>for</strong> attempts <strong>for</strong> parallelization to take advantage of modern multi-core CPUs. The<br />

code has the very simple and repetitive structure:<br />

double sum = 0;<br />

double current rate = generate random number();<br />

if (A key == on)<br />

{<br />

sum = A rate(state, parameters);<br />

if (current rate < sum)<br />

{<br />

counter→ A[state→ valley]++;<br />

state after A (st, p);<br />

return;<br />

}<br />

}<br />

sum += B rate (state, state 2, parameters);<br />

253


254 CHAPTER 12. REAL-WORLD PROGRAMMING<br />

if (current rate < sum)<br />

{<br />

counter→ B[state→ valley]++;<br />

state after B (state, state 2);<br />

return;<br />

}<br />

...<br />

Extensions to this code are usually accomplished by copy and paste, which is prone to simple<br />

mistakes by oversight, such as failing to change the counter which has to be incremented or<br />

calling the incorrect function to update the electron’s state.<br />

Furthermore, at times the need arises to calculate the sum of all the scattering models (λtotal),<br />

which is accomplished in a different part of the implementation, thus further opening the possibility<br />

<strong>for</strong> inconsistencies between the two code paths.<br />

The decision which models to evaluate is done strictly at run time and it would require significant,<br />

if simple modification of the code to change this at compile time, thus making highly<br />

optimized specializations very cumbersome.<br />

The functions calculating the rates and state transitions, however, have been well tested and<br />

verified, so that abandoning them would be wasteful.<br />

12.1.1 Best of Both Worlds<br />

Scientific computing requires not only high per<strong>for</strong>mance components evaluated and optimized<br />

at compile-time, but also runtime exchangeable (physical) models and the ability to cope with<br />

various boundary conditions. The two most commonly used programming paradigms, object<br />

oriented and generic programming, differ in how the required functionality is implemented.<br />

Object oriented programming directly offers runtime polymorphism by means of virtual inheritance.<br />

Un<strong>for</strong>tunately current implementations of inheritance use an intrusive approach <strong>for</strong> new<br />

software components and tightly couples a type and the corresponding operations to the super<br />

type. In contrast to object-oriented programming, generic programming is limited to algorithms<br />

using statically and homogeneously typed containers but offers highly flexible, reusable,<br />

and optimizeable software components.<br />

As can be seen, both programming types offer different points of evaluation. runtime-polymorphism<br />

based on concepts [?] (runtime concepts) tries to combine the virtual inheritance runtime modification<br />

mechanism and the compile-time flexibility and optimization.<br />

Inheritance in the context of runtime polymorphism is used to provide an interface template<br />

to model the required concept where the derived class must provide the implementation of the<br />

given interface. The following code snippet<br />

template struct scatter facade<br />

{<br />

typedef StateT state type;<br />

struct scattering concept<br />

{<br />

virtual ∼scattering concept() {} ;<br />

virtual numeric type rate(const state type& input) const = 0;


12.1. TRANSCENDING LEGACY APPLICATIONS 255<br />

virtual void transition(state type& input) = 0;<br />

};<br />

boost::shared ptr scattering object;<br />

template struct scattering model:scattering concept<br />

{<br />

T scattering instance;<br />

scattering model(const T& x):scattering instance(x) {}<br />

numeric type rate(const state type& input) const ;<br />

void transition(state type& input) ;<br />

};<br />

numeric type rate(const state type& input) const;<br />

void transition(state type& input) ;<br />

template <br />

scatter facade(const T& x):scattering object(new scattering model(x)){}<br />

∼scatter facade() {}<br />

};<br />

there<strong>for</strong>e introduces a scattering facade which wraps a scattering concept part. The<br />

virtual inheritance is used to configure the necessary interface parts, in this case rate() and<br />

transition(), which have to be implemented by any scattering model. In the given example<br />

the state type is still available <strong>for</strong> explicit parametrization.<br />

In contrast to other applications of runtime concepts, e.g. in computer graphics, it is not<br />

necessary to provide mechanisms <strong>for</strong> deep copies, as the actual physical models remain unaltered<br />

once they have been created and would only serve unnecessarily increase the memory footprint.<br />

There<strong>for</strong>e a boost::shared ptr is used <strong>for</strong> memory management.<br />

The legacy application has been writte in plain ANSI C, which makes it easily compatible with<br />

the new C ++ implementation. Several design decisions, such as the use of global and static<br />

variables, make it difficult to extend and update appropriately <strong>for</strong> modern multi-core CPUs.<br />

To interface this novel approach a core structure is implemented which wraps the implementations<br />

of the scattering models by using runtime concepts.<br />

template<br />

struct scattering rate A<br />

{<br />

...<br />

const ParameterType& parameters;<br />

scattering rate A(const ParameterType& parameters):parameters(parameters){}<br />

template <br />

numeric type<br />

operator() (const StateType& state) const<br />

{<br />

return A rate(state, parameters);<br />

}<br />

};


256 CHAPTER 12. REAL-WORLD PROGRAMMING<br />

By supplying the required parameters at construction time it is possible to homogenize the<br />

interface of the operator(). This methodology also allows the continued use of the old data<br />

structures in the initial phases of transition, while not being so constrictive as to hamper future<br />

developments.<br />

The functions <strong>for</strong> the state transitions are treated similarly to those <strong>for</strong> the rate calculation.<br />

Both are then fused in a scattering pack to <strong>for</strong>m the complete scattering model and to ensure<br />

consistency of the rate and state transition calculations and which also models the runtime<br />

concept as can be seen in the following part of code:<br />

template<br />

struct scattering pack<br />

{<br />

// ...<br />

scattering rate type rate calculation;<br />

transition type state transition;<br />

scattering pack (const parameter type& parameters) :<br />

rate calculation(parameters),<br />

state transition(parameters)<br />

{}<br />

template<br />

numeric type rate(const StateType& state) const<br />

{<br />

return rate calculation(state);<br />

}<br />

template<br />

void transition(StateType& state)<br />

{<br />

state transition(state);<br />

}<br />

}<br />

The blend of runtime and compile time mechanisms allows the storage of all scattering models<br />

within a single container, e.g. std::vector, which can be iterated over in order to evaluate<br />

them.<br />

typedef std::vector scatter container type ;<br />

scatter container type scatter container ;<br />

scatter container.push back(scattering model) ;<br />

For the development of new collision models easy extendability, even without recompilations,<br />

is also a highly important issue. This approach allows the addition of scattering models at<br />

runtime and to expose an interface to an interpreted language such as, e.g., Python [?].<br />

In case a highly optimized version is desired, the runtime container (here the std::vector) may<br />

be exchanged by a compile time container, which is also readily available from the GSSE and<br />

provides the compiler with further opportunities <strong>for</strong> optimizations at the expense of runtime<br />

adaptability.


12.1. TRANSCENDING LEGACY APPLICATIONS 257<br />

12.1.2 Reuse Something Appropriate<br />

While the described approach initially slightly increases the burden of implementation, due<br />

to the fact, that wrappers need to be provided, it gives a transition path to integrate legacy<br />

codes into an up to date frame while at the same time not to abandoning the experience<br />

associated with it. The invested ef<strong>for</strong>t allows to raise the level of abstraction, which in turn<br />

allows to increase the benefits obtained from the advances in compiler technologies. This in<br />

turn inherently allows an optimization <strong>for</strong> several plat<strong>for</strong>ms without the need <strong>for</strong> massive human<br />

ef<strong>for</strong>t, which was needed in previous approaches.<br />

In this particular case, encapsulating the reliance on global variables of the functions implementing<br />

the scattering models to the wrapping structures, parallelization ef<strong>for</strong>ts are greatly<br />

facilitated, which are increasingly important with the continued increase of computing cores<br />

per CPU.<br />

Furthermore the results can easily be verified as code parts a gradually moved to newer implementations,<br />

the only stringent requirement being link compatibility with C ++. This test and<br />

verification can be taken a step further in case the original implementation is written in ANSI<br />

C, due to the high compatibility of it to C ++. It is possible to weave parts of the new implementation<br />

into the older code. Providing the opportunity to get very a fine grained comparison<br />

not only of final results, but of all the intermediates as well.<br />

Such swift verification of implementations allows to also speed up the steps necessary to verify<br />

calculated results with subsequent or contemporary experiments, which should not be neglected,<br />

in order to keep physical models and their numerical representations strongly rooted in reality.


258 CHAPTER 12. REAL-WORLD PROGRAMMING


Parallelism<br />

Chapter 13<br />

13.1 Multi-Threading<br />

To do!<br />

13.2 Message Passing<br />

13.2.1 Traditional Message Passing<br />

Parallel hello world<br />

#include <br />

#include <br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

std::cout ≪ ”Hello, World!\n”;<br />

MPI Finalize();<br />

}<br />

return 0 ;<br />

#include <br />

#include <br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

int myrank, nprocs;<br />

MPI Comm rank(MPI COMM WORLD, &myrank);<br />

MPI Comm size(MPI COMM WORLD, &nprocs);<br />

std::cout ≪ ”Hello world, I am process number ” ≪ myrank ≪ ” out of ” ≪ nprocs ≪ ”.\n”;<br />

259


260 CHAPTER 13. PARALLELISM<br />

}<br />

MPI Finalize();<br />

return 0 ;<br />

13.2.2 Generic Message Passing<br />

Everybody sends to process number 0.<br />

#include <br />

#include <br />

#include <br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

}<br />

int myrank, nprocs;<br />

MPI Comm rank(MPI COMM WORLD, &myrank);<br />

MPI Comm size(MPI COMM WORLD, &nprocs);<br />

float vec[2];<br />

vec[0]= 2∗myrank; vec[1]= vec[0]+1;<br />

// Local accumulation<br />

float local= std::abs(vec[0]) + std::abs(vec[1]);<br />

// Global accumulation<br />

float global= 0.0f;<br />

MPI Status st;<br />

// Receive from predecessor<br />

if (myrank > 0)<br />

MPI Recv(&global, 1, MPI FLOAT, myrank−1, 387, MPI COMM WORLD, &st);<br />

// Increment<br />

global+= local;<br />

// Send to successor<br />

if (myrank+1 < nprocs)<br />

MPI Send(&global, 1, MPI FLOAT, myrank+1, 387, MPI COMM WORLD);<br />

else<br />

std::cout ≪ ”Hello, I am the last process and I know that |v| 1 is ” ≪ global ≪ ”.\n”;<br />

MPI Finalize();<br />

return 0 ;<br />

low abstraction level<br />

The library per<strong>for</strong>ms the reduction.<br />

#include <br />

#include <br />

#include


13.2. MESSAGE PASSING 261<br />

int main (int argc, char∗ argv[])<br />

{<br />

MPI Init(&argc, &argv);<br />

}<br />

Because:<br />

int myrank, nprocs;<br />

MPI Comm rank(MPI COMM WORLD, &myrank);<br />

MPI Comm size(MPI COMM WORLD, &nprocs);<br />

float vec[2];<br />

vec[0]= 2∗myrank; vec[1]= vec[0]+1;<br />

// Local accumulation<br />

float local= std::abs(vec[0]) + std::abs(vec[1]);<br />

// Global accumulation<br />

float global;<br />

MPI Allreduce (&local, &global, 1, MPI FLOAT, MPI SUM, MPI COMM WORLD);<br />

std::cout ≪ ”Hello, I am process ” ≪ myrank ≪ ” and I know too that |v| 1 is ” ≪ global ≪ ”.\n”;<br />

MPI Finalize();<br />

return 0 ;<br />

• Higher abstraction:<br />

• MPI implementation usually adapted the underlying hardware: typically logarithmic ef<strong>for</strong>t;<br />

can be tuned in assember <strong>for</strong> network card


262 CHAPTER 13. PARALLELISM


Numerical exercises<br />

Chapter 14<br />

In this chapter, we list a number of exercises where the different aspects discussed in the course<br />

will be used. The goal is to implement a small application program in <strong>C++</strong>, run it and interpret<br />

the results.<br />

You can use any software that may help you with your task. A list of packages is provided<br />

at the end of this chapter. We have only installed Boost, Boost.Sandbox, GLAS, BLAS, and,<br />

LAPACK. Other smaller packages could be downloaded if necessary.<br />

In each exercise, a generic function or class will be developed, and its documentation. These<br />

functions and classes should be part of the namespace athens. The functions arguments will<br />

have to be described. Each template argument will have to satisfy concepts. You may have<br />

to define new concepts. If you are using STL or GLAS concepts, you can just refer to them,<br />

without definition.<br />

Write a small paper on the decisions you made <strong>for</strong> the development of the software. Use the<br />

software <strong>for</strong> some examples and report the results. You may write the report on paper or send<br />

it in electronic <strong>for</strong>m (PDF by preference).<br />

14.1 Computing an eigenfunction of the Poisson equation<br />

This is an example of a more complicated problem. It illustrates what is expected from the<br />

exercises. The actual exercises are less demanding.<br />

In this section, we derive software <strong>for</strong> the solution of the Poisson equation. We start with the<br />

1D problem and then move to the 2D problem.<br />

14.1.1 The 1D Poisson equation<br />

The 1D Poisson equation is<br />

− d2u = f (14.1)<br />

dx2 where u(x) is the solution and f the excitation and x ∈ [0, 1]. We impose the boundary<br />

conditions<br />

u(0) = u(1) = 0 .<br />

263


264 CHAPTER 14. NUMERICAL EXERCISES<br />

This is called a boundary value problem.<br />

The goal is to compute the solution u <strong>for</strong> all x ∈ [0, 1]. Since this is not possible numerically, we<br />

only compute u <strong>for</strong> a discrete number of x’s, which we call discretization points. We discretize x<br />

as xj = jh <strong>for</strong> j = 0, . . . , n+1 and h = 1/(n+1). This is called an equidistant distribution. The<br />

smaller h, the closer we are to the continuous problem, i.e. we have more points in [0, 1], but, as<br />

we shall see, the problem becomes more expensive to solve. One method <strong>for</strong> solving boundary<br />

value problems is to replace the derivative by finite differences. We use finite differences <strong>for</strong> the<br />

second order derivatives:<br />

Filling this in (14.1), we obtain<br />

d 2 u<br />

dx 2 (xj) ≈ 1<br />

h 2 (−2u(xj) + u(xj−1) + u(xj+1)) .<br />

1<br />

h 2 (−u(xi−1) − u (xi+1) + 2u(xi)) = f(xi) <strong>for</strong> j = 1, . . . , n . (14.2)<br />

Note that u(x0) = u(xn+1) = 0. Now define the vectors<br />

u = [u(x1), . . . , u(xn)] T<br />

and f = [f(x1), . . . , f(xn)] T .<br />

Putting together (14.2) <strong>for</strong> j = 1, . . . , n leads to the algebraic system of equations Au = f with<br />

n rows and columns where ⎡<br />

2<br />

⎢ −1<br />

A = ⎢<br />

⎣<br />

−1<br />

2<br />

. ..<br />

−1<br />

. .. . ..<br />

⎤<br />

⎥<br />

⎦<br />

−1 2<br />

.<br />

Note that A is a symmetric tridiagonal matrix. We can show that it is positive definite.<br />

In the algorithms, we need operations on this matrix. We will use two different types of<br />

operations. The first one is the matrix-vector product y = Ax. We write a function <strong>for</strong> this<br />

with a template argument <strong>for</strong> the vectors since we do not know be<strong>for</strong>ehand what the type of<br />

the vectors will be.<br />

#ifndef athens poisson 1d hpp<br />

#define athens poisson 1d hpp<br />

#include <br />

#include <br />

namespace athens {<br />

template <br />

void poisson 1d( X const& x, Y& y ) {<br />

assert( glas::size(x)==glas::size(y) ) ;<br />

assert( glas::size(x) > 1 ) ;<br />

y(0) = 2.0∗x(0) − x(1) ;<br />

<strong>for</strong> ( int i=1; i


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 265<br />

} // namespace athens<br />

#endif<br />

where we assume that the types X and Y are models of the concept glas::DenseVectorCollection.<br />

14.1.2 Richardson iteration<br />

Richardson iteration is an iterative method <strong>for</strong> the solution of the linear system<br />

Bu = g<br />

that starts from an initial guess u0 and computes ui = ui−1 + ri−1 at iteration i, where ri−1 is<br />

the residual g − Bui−1. It works as follows: The method converges when the eigenvalues of B<br />

1<br />

2<br />

3<br />

4<br />

5<br />

1. For i = 1, . . . , max it:<br />

1.1. Compute residual ri−1 = g − Bui−1<br />

1.2. If �ri−1�2 ≤ τ: return<br />

1.3. Compute new solution ui = ui−1 + ri−1<br />

lie between 0 and 2.<br />

The eigenvalues of the Poisson matrix A are λj = 2(1 − cos(πj/(n + 1))) <strong>for</strong> j = 1, . . . , n. The<br />

eigenvalues are thus bounded by 0 < λj < 4. We there<strong>for</strong>e first multiply Au = f by 0.5 into<br />

(0.5A)u = (0.5f)<br />

Note that the solution u does not change. Define B = 0.5A and g = 0.5f, then Bu = g and the<br />

eigenvalues of B lie in (0, 2). For such matrix, we can use the Richardson iteration method.<br />

We develop the following function<br />

template <br />

double richardson( Op const& op, G const& g, U& u, double const& tol, int max it ) ;<br />

where op is a BinaryFunction op(x,y) that computes y = Bx <strong>for</strong> a given input argument x, and<br />

where u is an initial estimate of the solution on input and the computed solution on output.<br />

The vector g is the right-hand side of the system. The return value of richardson is the residual<br />

norm. This allows us to check how accurate the solution is without having to compute the<br />

residual explicitly. The parameter tol corresponds to the tolerance τ.<br />

First, we set conceptual conditions on all arguments.<br />

• U is a model of concept glas::DenseVectorCollection, i.e. we assume that a dense vector<br />

from GLAS is used.<br />

• Op is a model of BinaryFunction, i.e. the following are valid expressions <strong>for</strong> op of type Op:<br />

– op(x,y) where x and y are instances of type X where X is a model of the concept<br />

glas::DenseVectorCollection.


266 CHAPTER 14. NUMERICAL EXERCISES<br />

• G is a model of concept glas::VectorExpression.<br />

Next, we write the code <strong>for</strong> the Richardson iteration. We store the variables ui in u and ri in r.<br />

#ifndef athens richardson hpp<br />

#define athens richardson hpp<br />

#include <br />

namespace athens {<br />

template <br />

double richardson( Op const& op, F const& f, U& u, double const& tol, int max it ) {<br />

double resid norm ;<br />

// Create residual vector<br />

glas::dense vector< typename glas::value type::type > r( glas::size(u) ) ;<br />

<strong>for</strong> ( int iter =0; iter


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 267<br />

glas::random( f, seed ) ;<br />

v type x( 10 ) ;<br />

x = 0.0 ;<br />

// Richardson iteration<br />

double res nrm = athens::richardson( poisson scaled(), 0.5∗f, x, 1.e−4, 1000 ) ;<br />

{<br />

glas::dense vector r( size(x) ) ;<br />

athens::poisson 1d( x, r ) ;<br />

std::cout ≪ ”res nrm = ” ≪ norm 2( f −r ) ≪ std::endl ;<br />

std::cout ≪ ”f = ” ≪ f ≪ std::endl ;<br />

std::cout ≪ ”x = ” ≪ x ≪ std::endl ;<br />

}<br />

return 0 ;<br />

}<br />

We multiply right-hand side and matrix vector product by 0.5 to make sure the Richardson<br />

method converges.<br />

The output looks like<br />

res_nrm = 0.000195164<br />

f = (10)[0.0484811,0.822283,0.102721,0.436631,0.46112,0.0475317,0.864644,0.0772845,0.920099,0.105434]<br />

x = (10)[1.85463,3.66081,4.64473,5.52601,5.97071,5.9544,5.8906,4.96226,3.95668,2.03105]<br />

Note that the Richardson method converges very slowly. For the Poisson equation, there exist<br />

much faster methods.<br />

14.1.3 LAPACK tridiagonal solver<br />

The LAPACK [?] software package contains routines <strong>for</strong> solving linear systems with a symmetric<br />

positive definite tridiagonal matrix. This package is written in FORTRAN 77. The<br />

corresponding functions are<br />

• Cholesky factorization: A = LL T by<br />

SUBROUTINE DPTTRF( N, D, E, INFO )<br />

• Linear solve: Ax = b using LL T x = b by<br />

SUBROUTINE DPTTRS( N, NRHS, D, E, B, LDB, INFO )<br />

In order to solve Au = f, first A is factorized by the Cholesky factorization into A = LDL T<br />

where L is a matrix consisting of a main diagonal of ones and a diagonal below the main diagonal<br />

and D is a diagonal matrix. Once the factorization is per<strong>for</strong>med, the solution is computed as<br />

u = L −T D(L −1 f). Note the inversions of L and L T are not computed explicitly. For example<br />

L −1 f is computed as a linear solve with L. Linear solves <strong>for</strong> triangular matrices are easy to<br />

program. This is what DPTTRS does <strong>for</strong> us.<br />

A <strong>C++</strong> interface to DPTTRF and DPTTRS is available from the BoostSandbox.Bindings. For<br />

our application, we can solve a linear system as follows.


268 CHAPTER 14. NUMERICAL EXERCISES<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

1. Given an approximate eigenvector x0<br />

2. Normalize: x0 = x0/�x0�2.<br />

3. For i = 1, . . . , m:<br />

3.1. Solve Ayi = xi.<br />

3.2. Compute the eigenvalue estimate: λi = � xi/ � yi.<br />

3.3. xi = yi/�yi�2.<br />

#include // Lapack binding<br />

#include // glas binding<br />

#include // glas vectors<br />

#include // <strong>for</strong> std::fill<br />

#include // <strong>for</strong> assert<br />

#include / <strong>for</strong> cout and endl<br />

int main() {<br />

int const n = 10 ;<br />

glas::dense vector< double > d(n) ; // Main diagonal<br />

glas::dense vector< double > e(n−1) ; // Lower/upper diagonal<br />

std::fill( begin(d), end(d), 2.0 ) ;<br />

std::fill( begin(e), end(e), −1.0 ) ;<br />

glas::dense vector< double > rhs( n ) ;<br />

std::fill( begin(rhs), end(rhs), 3.0 ) ;<br />

int info = boost::numeric::bindings::lapack::pttrf( d, e ) ;<br />

assert( !info ) ;<br />

std::cout≪ rhs ≪ std::endl ;<br />

info = boost::numeric::bindings::lapack::pttrs( ’L’, d, e, rhs ) ;<br />

std::cout≪ rhs ≪ std::endl ;<br />

// Solution is in rhs<br />

}<br />

14.1.4 The inverse iteration method<br />

The inverse iteration method computes an eigenvalue of a matrix A. The method converges to<br />

the eigenvector associated with the eigenvalue nearest zero. The method works as follows: In<br />

this algorithm � xi means the sum of the elements of xi. For the solution of the linear system,<br />

we can use Richardson iteration.<br />

Write a function with the following header:<br />

template <br />

void inverse iteration( Op const& op, DenseVectorCollection& x, int m, Float& lambda ) ;<br />

where Op is a model of BinaryFunction that solves y from x, x is the eigenvector estimate on<br />

input and output, and m the number of iterations. The return value is the estimated eigenvalue.


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 269<br />

First, we set conceptual conditions on all arguments.<br />

• Op is a model of BinaryFunction, i.e. the following are valid expressions <strong>for</strong> op of type Op:<br />

– op(x,y) where x and y are instances of type X where X is a model of the concept<br />

glas::DenseVectorCollection.<br />

• DenseVectorCollection is a model of glas::DenseVectorCollection.<br />

• Float is a concept of real numbers, i.e. it is float, double, or long double.<br />

The implementation <strong>for</strong> inverse iteration could be as follows:<br />

#ifndef athens inverse iteration hpp<br />

#define athens inverse iteration hpp<br />

#include <br />

#include <br />

namespace athens {<br />

template <br />

void inverse iteration( Op const& op, DenseVectorCollection& x, int m, Float& lambda ) {<br />

glas::dense vector< typename glas::value type::type > y( glas::size(x) ) ;<br />

x = x / norm 2( x ) ; // 2.<br />

<strong>for</strong> ( int i=0; i


270 CHAPTER 14. NUMERICAL EXERCISES<br />

}<br />

} ;<br />

athens::richardson( poisson scaled(), 0.5∗x, y, 1.e−8, 1000 ) ;<br />

int main() {<br />

typedef glas::dense vector v type ;<br />

v type x( 10 ) ;<br />

glas::random seed seed ;<br />

glas::random( x, seed ) ;<br />

double lambda ;<br />

athens::inverse iteration( solve(), x, 100, lambda ) ;<br />

std::cout ≪ ”lambda = ” ≪ lambda ≪ std::endl ;<br />

std::ofstream xf( ”x.out” ) ;<br />

<strong>for</strong> ( int i=0; i


14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 271<br />

0.45<br />

0.4<br />

0.35<br />

0.3<br />

0.25<br />

0.2<br />

0.15<br />

"x.out"<br />

0.1<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />

Figure 14.1: First eigenvector of the 1D Poisson operator<br />

int main() {<br />

typedef glas::dense vector v type ;<br />

int n = 10 ;<br />

v type x( n ) ;<br />

glas::random seed seed ;<br />

glas::random( x, seed ) ;<br />

v type d( n ) ; std::fill( begin(d), end(d), 2.0 ) ;<br />

v type e( n−1 ) ; std::fill( begin(e), end(e), −1.0 ) ;<br />

solve< v type, v type > solver( d, e ) ;<br />

athens::inverse iteration( solver, x, 100, lambda ) ;<br />

std::cout ≪ ”lambda = ” ≪ lambda ≪ std::endl ;<br />

std::ofstream xf( ”x.out” ) ;<br />

<strong>for</strong> ( int i=0; i plot "x.out" w l<br />

gnuplot>


272 CHAPTER 14. NUMERICAL EXERCISES<br />

14.2 The 2D Poisson equation<br />

The 2D Poisson equation is<br />

− ∂2u ∂x2 − ∂2u = f<br />

∂y2 where u(x, y) is the solution and f the excitation and (x, y) ∈ [0, 1] × [0, 1]. We impose the<br />

boundary conditions<br />

u(0, y) = u(1, y) = y(x, 0) = y(x, 1) = 0 .<br />

We discretize the x as xj = jh <strong>for</strong> j = 1, . . . , n and h = 1/n. Similarly, yj = jh. We use finite<br />

differences <strong>for</strong> the second order derivatives. This produces the equation<br />

1<br />

h 2 (−u(xi−1, yj)−u(xi, yj−1)−u (xi+1, yj)−u(xi, yj+1)+4u(xi, yj)) = f(xi, yj) <strong>for</strong> i, j = 1, . . . , n .<br />

This leads to the algebraic system of equations Au = f with n 2 rows and columns.<br />

Recall the example exercise of §14.1. We do exactly the same exercise. Since the matrix is not<br />

tridiagonal, we cannot use the LAPACK routine pttrf any longer. We use the LAPACK routine<br />

sytrf <strong>for</strong> a full matrix instead. See the documentation on<br />

boost-sandbox/libs/numeric/bindings/lapack/doc/index.html.<br />

For a 2D problem the solution vector u can be represented as a matrix. The row index corresponds<br />

to the variable x and the column index to a variable y.<br />

In particular, you develop the functions inverse iteration, poisson 2d <strong>for</strong> the matrix vector product,<br />

scaled poisson <strong>for</strong> the scaled matrix vector product, and richardson. Give <strong>for</strong> each templated<br />

argument the conceptual conditions. make a plot of the eigenvector using Gnuplot’s splot (<strong>for</strong><br />

plotting surfaces).<br />

14.3 The solution of a system of differential equations<br />

In this exercise, we write a function <strong>for</strong> the computation of a time step of a system of differential<br />

equations using Runge-Kutta methods.<br />

14.3.1 Explicit time integration<br />

Methods <strong>for</strong> the solution of the differential equation<br />

˙u = f(u) u(0) = u0<br />

operate time step by time step, i.e. the time is discretized and given the solution at time step<br />

tj, we compute the solution at time step tj+1 = tj + h where h is small.


14.3. THE SOLUTION OF A SYSTEM OF DIFFERENTIAL EQUATIONS 273<br />

The method that we use here is the Runge-Kutta 4 method, which is described here: the<br />

solution at time step tj+1 is computed as<br />

14.3.2 Software<br />

Write a generic function<br />

uj+1 = uj + h<br />

6 (k1 + 2k2 + 2k3 + k4)<br />

k1<br />

k2<br />

=<br />

=<br />

f(uj)<br />

�<br />

f uj + h<br />

2 k1<br />

k3 =<br />

�<br />

�<br />

f uj + h<br />

2 k2<br />

�<br />

k4 = f(uj + hk3)<br />

template <br />

void rk4( U& u, F& f, T const& h ) ;<br />

that computes one time step with the Runge-Kutta 4 method. The argument u is on input the<br />

solution at time t and on output at timestep t+h. The argument f is the functor that evaluates<br />

the function f(u). The argument u is a vector.<br />

When the implementation is finished, write the concepts <strong>for</strong> U and F in comment lines in the<br />

code.<br />

14.3.3 The van der Pol oscillator<br />

Differential equations appear in the study of physical phenomena. The Van der Pol oscillator<br />

is described by the following equation:<br />

d2x dt2 − µ(1 − x2 ) dx<br />

+ x = 0 (14.3)<br />

dt<br />

with initial solution x(0) and x ′ (0). This is a non-linear second order differential equation, with<br />

a parameter µ. When µ = 0, we have a purely harmonic solution (cos and sin). When µ ≥ 0,<br />

the solution evolves to a harmonic limit cycle.<br />

Second order differential equations are usually solved by writing them as a system of first order<br />

differential equations:<br />

� � � � � �<br />

d dx<br />

dt −µ(1 − x2 dx<br />

) 1<br />

+<br />

dt = 0 .<br />

dt x<br />

−1 0 x<br />

In matrix <strong>for</strong>m, the equation can be written as<br />

where<br />

A(u) =<br />

du<br />

dt<br />

+ A(u)u = 0<br />

� −µ(1 − u 2 2 ) 1<br />

−1 0<br />

�<br />

,


274 CHAPTER 14. NUMERICAL EXERCISES<br />

1 3<br />

2<br />

Figure 14.2: An example of a web with only four pages. An arrow from page A to page B<br />

indicates a link from page A to page B.<br />

or<br />

with<br />

14.3.4 Exercise<br />

du<br />

dt<br />

= f(u)<br />

f(u) = −A(u)u .<br />

Use the Runge-Kutta 4 method <strong>for</strong> evaluating the Van der Pol equation <strong>for</strong> µ = 0, µ = 0.1 and<br />

µ = 1 in the time interval [0, 10] with time steps h = 0.001. Also try smaller and larger time<br />

steps.<br />

Plot the results using gnuplot.<br />

14.4 Google’s Page rank<br />

We all use Google <strong>for</strong> web searching. In this exercise, we try and understand a particular tool<br />

used by Google to rank pages, called PageRank.<br />

The basic idea behind the Google Page Ranking Algorithm, is that the importance of a webpage<br />

is determined by the number of references made to it. We would like to compute a score xk<br />

reflecting the importance of page k. A simple minded approach would be just to count the<br />

number of links to each page. This approach does not reflect the fact that some pages might<br />

be more significant than others there<strong>for</strong>e rendering their votes more important. It also leaves<br />

open the possibility of artificially inflating the rank of a particular page by generating other<br />

trivial or advertising pages whose only function is to promote the importance of a particular<br />

page. Significant refinements are:<br />

• Weight each in-link by the importance of the page which links to it.<br />

• Give each page a total vote of 1. If page j contains nj links, one of which links to page k,<br />

then page k’s score is boosted by xj<br />

nj .<br />

4


14.4. GOOGLE’S PAGE RANK 275<br />

Taking the new refinements into account we can compute the importance score xk of a page k<br />

as follows:<br />

xk = � xj<br />

(14.4)<br />

nj<br />

j∈Lk<br />

Where Lk denotes the set of pages with a link to page k. Consider the simple example of Figure<br />

14.2. Using the <strong>for</strong>mula (14.4) we get the following equations <strong>for</strong> the importance scores of the<br />

pages in this example:<br />

x1 = x3 + x4<br />

2<br />

x2 = x1<br />

3<br />

x3 = x1<br />

3<br />

x4 = x1<br />

3<br />

+ x2<br />

2<br />

+ x2<br />

2<br />

+ x4<br />

2<br />

These linear equations can be written as Ax = x where x = [x1 x2 x3 x4] T and<br />

⎡<br />

⎤<br />

A =<br />

⎢<br />

⎣<br />

0 0 1 1<br />

2<br />

1<br />

3 0 0 0<br />

1<br />

3<br />

1<br />

3<br />

1 1<br />

2 0 2<br />

1<br />

2 0 0<br />

This trans<strong>for</strong>ms the web ranking problem into the standard problem of finding an eigenvector<br />

x with eigenvalue 1 <strong>for</strong> the square matrix A. This eigenvector can be found iteratively using<br />

the power method with a threshold τ:<br />

The power method converges to the eigenvector corresponding to the dominant eigenvalue λ1.<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

1. v (0) = some vector with � v (0) �= 1<br />

2. Repeat <strong>for</strong> k=1,2, . . . :<br />

2.1. Apply A: w = Av (k−1) .<br />

2.2. Normalize: v (k) = w/ � w �.<br />

3. Until �v (k−1) − v (k) � < τ<br />

The matrix A is called a column stochastic matrix, since it is a square matrix with positive<br />

entries and the entries in each column sum to one. In the case of a column stochastic matrix,<br />

this dominant eigenvalue is 1.<br />

14.4.1 Software<br />

Write a generic function:<br />

template <br />

void power iteration( V& v, Function & f, double tau ) ;<br />

that computes the power iteration algorithm 5 <strong>for</strong> a matrix A with starting vector v. The<br />

resulting eigenvector should be stored in v. The argument f is a functor that returns the result<br />

of the matrix vector product. Also write documentation and specify the conceptual constraints<br />

<strong>for</strong> the arguments.<br />

⎥<br />

⎦ .


276 CHAPTER 14. NUMERICAL EXERCISES<br />

14.4.2 Dictionary application<br />

The page ranking algorithm which was described above can also be used to rank different words<br />

in a dictionary. Consider the following small dictionary:<br />

backwoods = bush, jungle<br />

bush = backwoods, jungle, shrub, plant, hedge<br />

flower = plant<br />

hedge = bush<br />

jungle = bush, backwoods<br />

plant = bush, shrub, flower, weed<br />

shrub = bush, plant, tree<br />

tree = shrub<br />

weed = plant<br />

Construct a graph linking every word with the words in its explanation. The first line of the<br />

dictionary, <strong>for</strong> example, would link bush and jungle to backwoods. The graph can be constructed<br />

on paper. Use equation (14.4) to construct the sparse column stochastic matrix A and use your<br />

power method to rank the words.<br />

o of a function in an interval<br />

In this exercise, we make a programming exercise on a root finding method, called the bisection<br />

method.<br />

14.5.1 Functions in one variable<br />

Suppose we are given a function f in one variable and we want to compute the unique zero<br />

in the interval [a, b]. A method that could be used is the bisection method. It only requires<br />

function evaluations and is thus widely applicable.<br />

The method computes a small interval that contains the minimum. This small interval is<br />

obtained by splitting the interval [a, b] in two parts [a, m] and [m, b], where<br />

The method works as follows :<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

m =<br />

a + b<br />

2<br />

1. Given the interval [a, b] <strong>for</strong> which f(a)f(b) < 0.<br />

2. Repeat until b − a < τ:<br />

2.1. Compute m from (14.5).<br />

2.2. If f(m)f(a) < 0: b = m.<br />

2.2. Else: a = m.<br />

(14.5)


14.5. THE BISECTION METHOD FOR FINDING THE ZERO OF A FUNCTION IN AN INTERVAL277<br />

14.5.2 Software<br />

The task is to first develop the function<br />

template <br />

void bisection( T& a, T& b, Function& f, double tau ) ;<br />

that computes the bisection Algorithm 6. The object f is a functor that returns the function<br />

value <strong>for</strong> a single argument x. The type T is a float type, i.e. float or double.<br />

Write documentation <strong>for</strong> the function and describe the conceptual conditions on Function.<br />

14.5.3 The growth and downfall of a caterpillar population<br />

Everyone knows caterpillars grow up to be beautiful butterflies. But be<strong>for</strong>e they reach that<br />

stage of their life, they need lots of food to grow. A large population will not grow at the same<br />

rate as a smaller one, because of a shortage of food. Furthermore most birds enjoy a juicy<br />

caterpillar as snack, so they are responsible <strong>for</strong> the premature death of several members of the<br />

caterpillar population. These relationships can be modelled mathematically by the following<br />

equation:<br />

dN<br />

dt<br />

N αN 2<br />

= rN(1 − ) −<br />

K β + N 2<br />

In this equation rN(1 − N<br />

) models the growth of the population, where N equals the num-<br />

K<br />

ber of caterpillars, r is the growing rate of the population and K is the maximum amount of<br />

caterpillars that can inhabit the area. The second term of the equation models the death of the<br />

caterpillars. Here α is the maximum rate at which a bird can eat caterpillars when N is large<br />

and β is a parameter that indicates the intensity of the bird attacks. We want to know when<br />

there exists an equilibrium between the growth and death rate in the caterpillar population,<br />

i.e. when dN<br />

equals zero.<br />

dt<br />

Use the function bisection to compute the number of caterpillars N <strong>for</strong> which the following<br />

populations are at an equilibrium in the intervals [0.1; 10], [10, 20] and [20, 100]:<br />

• Population 1: r = 1.3, K = 100, α = 20 and β = 50<br />

• Population 2: r = 2.0, K = 80, α = 25 and β = 10<br />

Show the resulting roots in a table.<br />

14.5.4 Computing eigenvalues using the bisection method<br />

In this exercise, we use the function bisection to compute the eigenvalues of a real symmetric<br />

dense matrix A with real eigenvalues. The problem is to compute λ so that<br />

det(A − λI) = 0 .<br />

The determinant is computed using the QR factorization (which is available in LAPACK). The<br />

QR factorization computes an orthogonal matrix Q (Q T Q = I) and an upper triangular matrix<br />

R so that<br />

A − λI = QR .


278 CHAPTER 14. NUMERICAL EXERCISES<br />

We use the property that<br />

det(A − λI) = det(R)<br />

Since R is upper triangular, the determinant is the product of the diagonal elements of R.<br />

The matrices A are constructed as follows. Start with a simple case : the diagonal matrix with<br />

elements 1, 2, . . . , n on the main diagonal. Then do the tests <strong>for</strong> the same matrix multiplied<br />

on left and right by a random orthogonal matrix X, as in A = XDX T where D is a diagonal<br />

matrix.<br />

g the minimum of a convex function<br />

This exercise is a programming exercise on Newton’s method. First, we explain the method <strong>for</strong><br />

a function with a single variable, then we discuss the case of multivariate functions, and finally,<br />

we show a small application.<br />

14.6.1 Functions in one variable<br />

For a differentiable function f, the minimum ˜x is attained <strong>for</strong> f ′ (˜x) = 0. So, we must find the<br />

zero of the first order derivative. When f is a second order polynomial, we have<br />

then an extreme value of f is attained <strong>for</strong><br />

which is a minimum when f ′′ ≡ 2γ > 0.<br />

f = p := α + βx + γx 2<br />

f ′ = p ′ := β + 2γx<br />

˜x = − β<br />

2γ<br />

(14.6)<br />

For an arbitrary function, we do not have such simple explicit <strong>for</strong>mulae. We can use an iterative<br />

method, which is called Newton’s method. It is an iterative approach, i.e. we start from an<br />

initial guess ˜x and improve this value until it has converged to the minimum of the function. On<br />

each iteration we approximate the function by a degree two polynomial, <strong>for</strong> which the simple<br />

<strong>for</strong>mula (14.6) can be used. One way to compute such a degree two polynomial is to start from<br />

the Taylor expansion of f around ˜x :<br />

f(x) = f(˜x) + f ′ (˜x)(x − ˜x) + 1<br />

2 f ′′ (˜x)(x − ˜x) 2 + · · · .<br />

If we approximate f by the first 3 terms (i.e. a degree two polynomial), then we have<br />

f(x) ≈ p(x) := f(˜x) + f ′ (˜x)(x − ˜x) + 1<br />

2 f ′′ (˜x)(x − ˜x) 2 .<br />

If x is close to ˜x, |f(x) − p(x)| is small. The first order derivative is<br />

f ′ (x) ≈ p ′ (x) = f ′ (˜x) + f ′′ (˜x)(x − ˜x) .


14.6. THE NEWTON-RAPHSON METHOD FOR FINDING THE MINIMUM OF A CONVEX FUNCTION2<br />

Then p ′ (x) = 0 <strong>for</strong><br />

x = ˜x − f ′ (˜x)<br />

f ′′ (˜x)<br />

The Newton method goes as follows : In this algorithm τ is a tolerance <strong>for</strong> the stopping criterion.<br />

1<br />

2<br />

1. Given initial ˜x = x (0) .<br />

2. Repeat <strong>for</strong> j = 1, 2, . . .<br />

3<br />

2.1. Compute x (j) = x (j−1) − f ′ (x (j−1) )/f ′′ (x (j−1) 4<br />

)<br />

3. Until |f ′ (x (j−1) )/f ′′ (xj−1) 5<br />

)| < τ<br />

The iteration stops when the derivative is much smaller than the second order derivative. What<br />

happens when f ′′ (xj−1) = 0 ?<br />

14.6.2 Multivariate functions<br />

For multivariate functions, the principle is the same, but it is more complicated. A multivariate<br />

function f has an argument x ∈ R n , ie. a vector of size n. For example, f = sin(x1)+x2 cos(x1)<br />

is a multivariate function in the variables x1 and x2.<br />

We use the same idea as <strong>for</strong> one variable. That is, we use the Taylor expansion to approximate<br />

the function :<br />

f(x) � f(˜x) + ∇f(˜x) T (x − ˜x) + 1<br />

2 (x − ˜x)T H(f(˜x))(x − ˜x)<br />

⎛ ⎞<br />

∂f/∂x1<br />

⎜ ⎟<br />

∇f(˜x) = ⎝ . ⎠<br />

H(f) =<br />

⎡<br />

⎢<br />

⎣<br />

∂f/∂xn<br />

∂ 2 f/∂x1∂x1 · · · ∂ 2 f/∂x1∂xn<br />

.<br />

∂ 2 f/∂xn∂x1 · · · ∂ 2 f/∂xn∂xn<br />

where ∇f(˜x) is called the gradient vector and H(f) the Hessian matrix.<br />

The derivative becomes<br />

so, the derivative is zero when<br />

f ′ (x) = ∇f(˜x) + H(f(˜x))(x − ˜x)<br />

x = ˜x − {H(f(˜x))} −1 ∇f(˜x) .<br />

This requires the solution of an n × n linear system on each iteration. The Newton algorithm<br />

is very similar to the univariate case:<br />

14.6.3 Software <strong>for</strong> uni-variate functions<br />

The task is to first develop the function<br />

.<br />

⎤<br />

⎥<br />


280 CHAPTER 14. NUMERICAL EXERCISES<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

1. Given initial ˜x = x (0) ∈ R n .<br />

2. Repeat <strong>for</strong> j = 1, 2, . . .<br />

2.1. Compute d = {H(f(x (j−1) )} −1∇f(x (j−1 ))<br />

2.2. Compute x (j) = x (j−1) − d.<br />

3. Until �d�2 < τ<br />

template <br />

void newton raphson( X& x, Function& f, Derivative& d, SecondDerivative& s, double tau ) ;<br />

that computes the Newton-Raphson Algorithm 7. f, d, and s are functors that return the<br />

function value and the derivatives <strong>for</strong> the single argument x.<br />

Write documentation <strong>for</strong> the function and describe the conceptual conditions on Function,<br />

Derivative, and SecondDerivative.<br />

Next, you use this function to compute the minima <strong>for</strong> the following functions :<br />

• f = x 2 − 2x + 4<br />

• f = x 10<br />

• f = x + 5<br />

• f = −x 2 − 2x + 4<br />

14.6.4 Software <strong>for</strong> multi-variate functions<br />

The task is to first develop the function<br />

template <br />

void newton raphson( X& x, Function& f, Gradient& d, Hessian& h, double tau ) ;<br />

that computes the Newton-Raphson Algorithm 8. Note that in this case, g and h should return<br />

the resulting gradient vector and Hessian matrix respectively. Also write documentation and<br />

specify the conceptual constraints <strong>for</strong> the arguments.<br />

14.6.5 Application<br />

The following is an application <strong>for</strong> the multivariate case. Given a Hermitian matrix L ∈ R n×n ,<br />

then we want to solve the following optimization problem :<br />

min 1<br />

2 xT Lx<br />

s.t. x T x = 1<br />

We first introduce a Lagrange multiplier λ and rewrite this problem in the following <strong>for</strong>m. Find<br />

x and λ so that<br />

min f(x, λ) = 1<br />

2 xT Lx − 1<br />

2 λ(x∗x − 1)


14.7. SEQUENTIAL NOISE REDUCTION OF REAL-TIME MEASUREMENTS BY LEAST SQUARES281<br />

The gradient and Hessian are :<br />

∇f =<br />

H(f) =<br />

�<br />

Lx − λx<br />

xT �<br />

x − 1<br />

�<br />

L − λI −x<br />

−xT �<br />

0<br />

One can prove that the solution of this optimization problem is the smallest eigenvalue λ<br />

and associated normalized eigenvector x. This is a method <strong>for</strong> computing eigenvalues of large<br />

matrices.<br />

For solving a linear system with the Hessian, you can use the direct solver MUMPS or the<br />

iterative solver toolbox from GLAS.<br />

easurements by least squares<br />

Suppose, we want to measure a function f(t) <strong>for</strong> given time snapshots t1, . . . , tm. We know that<br />

the function is a polynomial of a given degree n − 1, but due to measurement errors, the data<br />

are noisy. If f is a polynomial,<br />

n�<br />

f(t) = ξjt j−1<br />

We could have a more general series, e.g.<br />

f(t) =<br />

where φj is the jth base function. With<br />

⎛ ⎞<br />

f(t1)<br />

⎜ ⎟<br />

b = ⎝ . ⎠<br />

f(tm)<br />

⎛ ⎞<br />

we have<br />

x =<br />

A =<br />

⎜<br />

⎝<br />

⎡<br />

⎢<br />

⎣<br />

ξ1<br />

.<br />

ξn<br />

⎟<br />

⎠<br />

j=1<br />

n�<br />

ξjφj(t)<br />

j=1<br />

φ1(t1) φ2(t1) · · · φn(t1)<br />

.<br />

.<br />

φ1(tm) φ2(tm) · · · φn(tm)<br />

⎤<br />

⎥<br />

⎦<br />

Ax = b (14.7)<br />

Note that (14.7) is an m×n linear system, where usually m ≫ n. This system is overdetermined,<br />

and so, due to errors in the data, it cannot be solved. However, we can solve the system in a<br />

least squares sense, i.e. find x so that<br />

min<br />

x �Ax − b�2 . (14.8)


282 CHAPTER 14. NUMERICAL EXERCISES<br />

When measurements come in sequentially, i.e. at time steps t1, t2, . . ., we receive at time step<br />

tj the jth row of A and the jth element of b. In the algorithms we now discuss, we have the<br />

following<br />

14.7.1 The least squares QR algorithm<br />

A numerically stable method <strong>for</strong> solving (14.8) is based on the QR factorization. The QR<br />

factorization of the m × n matrix A is<br />

A = QR<br />

with Q ∈ R m×n and unitary (Q T Q = I) and R ∈ R n×n upper triangular. If A has full rank,<br />

the diagonal elements of R are non-zero. Suppose we have computed the solution <strong>for</strong><br />

where<br />

�Akx − bk�2 min<br />

�Akx − bk�2 = �QkRkx − bk�2<br />

= �Rkx − Q T k bk�2<br />

We have to solve an upper triangular linear system. We can develop a ‘sequential’ method <strong>for</strong><br />

this QR decomposition without storing Q, but we will not discuss this any further.<br />

14.7.2 The least squares method via the normal equations<br />

One method to achieve this are the normal equations. That is, multiply (14.7) on the left by<br />

A T , then we obtain<br />

A T Ax = A T b (14.9)<br />

If the A has full column rank, the solution of x is unique and satisfies (14.8).<br />

14.7.3 Least squares Kalman filtering<br />

The Kalman filter is a method to solve the normal equations (14.9) in a step by step way, i.e.<br />

the measurements come in time step at time step. The Kalman filter adapts the least squares<br />

solution to the newly arrived data.<br />

Suppose we have computed the least squares solution of<br />

Akxk = bk<br />

where Ak are the first k rows of A and bk the first k elements of b with k ≥ n. Then we want<br />

to compute the least squares solution of<br />

Since<br />

Ak+1 =<br />

� Ak<br />

a T k+1<br />

Ak+1xk+1 = bk+1 .<br />

�<br />

�<br />

and bk+1 =<br />

bk<br />

f(tk+1)<br />


14.7. SEQUENTIAL NOISE REDUCTION OF REAL-TIME MEASUREMENTS BY LEAST SQUARES283<br />

we have, with gk = A T k bk, that<br />

A T k+1 Ak+1xk+1 = gk+1<br />

(A T k Ak + ak+1a T k+1 )xk+1 = gk + ak+1f(tk+1)<br />

With Mk = (A T k Ak) −1 ∈ R n×n , we derive from the Sherman-Morrison <strong>for</strong>mula, that<br />

and we also have that<br />

Mk+1 := (A T k Ak + ak+1a T k+1 )−1 = Mk − Mkak+1a T k+1 Mk<br />

1 + a T k+1 Mkak+1<br />

xk+1 = Mk+1(gk + ak+1f(tk+1)) = Mkgk + Mkak+1f(tk+1) − Mkak+1aT k+1<br />

1 + aT k+1Mkak+1 (Mkgk + Mkak+1f(tk+1))<br />

= xk +<br />

Mkak+1<br />

1 + aT k+1Mkak+1 (f(tk+1) − a T k+1xk) The Kalman method works as follows. We can use the LAPACK subroutine DGESV <strong>for</strong> com-<br />

1<br />

2<br />

3<br />

4<br />

5<br />

6<br />

7<br />

1. Solve Anxn = bn by taking the first n rows of A and b<br />

2. Let Mn = A−1 n A−T n<br />

3. For k = n + 1, . . . , m do:<br />

3.1. Compute the Kalman gain vector kk+1 = Mkak+1/(1 + aT k+1Mkak+1). 3.2. Update step: xk+1 = xk + kk+1(f(tk+1) − aT k+1xk). 3.3. Mk+1 = Mk − kkaT k+1Mk. puting A −1<br />

n .<br />

14.7.4 Software<br />

The goal is to write a function that computes the Kalman filter least squares. Because of the<br />

sequential character, we suggest to make a class with the following specifications:<br />

template <br />

class kalman {<br />

public:<br />

// Creation of the Kalman filter<br />

kalman( int n ) ;<br />

// Compute the first n observations and initialize the Kalman<br />

// filter (Steps 1 and 2 in the algorithm)<br />

// BaseFun is a binary functor.<br />

template <br />

void initialize( VIt t begin, VIt const& t end, BaseFun& base fun, F& f ) {<br />

...<br />

}<br />

template


284 CHAPTER 14. NUMERICAL EXERCISES<br />

void step( T const& t, Base& base, F const& f ) {<br />

...<br />

}<br />

public:<br />

// Return the solution<br />

typedef ... x type ;<br />

x type const& x() const { ... }<br />

private:<br />

...<br />

14.7.5 Test problems<br />

We now solve the following test problems. First, consider the following expansion:<br />

f(t) = ξ1 + ξ2 cos t + ξ3 sin t + ξ4cos2t + ξ5 sin 2t<br />

We compute the coefficients following the least squares criterion <strong>for</strong> the function<br />

f = (2 − 5 cos t)<br />

Print the solution x <strong>for</strong> each step of the Kalman filter and see how it changes. It should be very<br />

close to the function.<br />

Then apply random noise with relative size 0.0001:<br />

f = (2 − 5 cos t)(1 + 0.0001ɛ)<br />

where ɛ is a random number in [−1, 1]. Print the solution x <strong>for</strong> each step of the Kalman filter<br />

and see how it changes. It should be close to the solution of the function with ɛ ≡ 0.<br />

Plot the results using gnuplot.


Programmierprojekte<br />

Kapitel 15<br />

Die folgenden Hinweise betreffen alle Projekte.<br />

• Die Projekte werden vorzugsweise in Teams von 2 Studenten realisiert.<br />

• Jedes Team bekommt ein Repository in einem MTL4-Zweig zur Verfügung.<br />

• Das heißt auch, dass jeder Kursteilnehmer die Versionskontrollesoftware “subversion” lernen<br />

muss, siehe http://subversion.tigris.org/. Die Vorlesung von Greg Wilson<br />

gibt eine ausreichende Einführung in subversion, siehe http://software-carpentry.<br />

org/. Ich werde in der 2. Übung (19.4.) selbst eine kurze Einführung geben.<br />

• Die Projekte sollen mit einem Kommando gebildet (kompiliert, gelinkt) werden. Verwenden<br />

Sie möglichst “cmake”. 1 cmake ist bei jedem vernünftigen Linux mit dabei und<br />

müsste auch auf dem Pool vorhanden sein. Gibt es sogar für Windows: dort kann es die<br />

Projektdateien für Visual Studio erzeugen.<br />

• Schreiben Sie zuerst Tests für neue Features, bevor Sie sie implementieren.<br />

• Versuchen Sie, Ihre Rückfragen auf die Übungszeiten zu begrenzen.<br />

• Schreiben Sie eine doxygen-Dokumentation für Ihre Klassen und Funktionen (auf englisch).<br />

Schreiben Sie möglichst viele Beispiele. (Diese können gern von Ihren Tests abgeleitet<br />

sein.)<br />

– Formeln möglichst mit den Kommandos für L ATEX-Einfügungen (\f[ u.ä) erstellen.<br />

Bei dieser Gelegenheit lernt man häufig auch seine Linux-Installation besser kennen,<br />

da doxygen L ATEX nicht immer findet. Es ist hier keine Schande, Hilfe von befreundeten<br />

Hackern in Anspruch zu nehmen.<br />

15.1 Potenzieren von Matrizen A x<br />

Implementieren Sie Algorithmen für A x für verschiedene Matrixtypen und für x ∈ Q als x ∈ R.<br />

1 Notfalls reines “make” (siehe z.B. http://software-carpentry.org/build.html).<br />

285


286 KAPITEL 15. PROGRAMMIERPROJEKTE<br />

15.2 Exponisation von Matrizen e A<br />

Implementieren Sie Algorithmen für e A für verschiedene Matrixtypen, insbesondere schwach<br />

besetzte Matrizen. Nutzen Sie die in der MTL4 vorhandenen Algorithmen zum Lösen von<br />

Gleichungssystemen. Artikel von Cleve Moller, “19 dubios ways. . . ”<br />

15.3 LU-Zerlegung für m × n-Matrizen<br />

m, n L U<br />

m = n unteres Dreieck oberes Dreieck<br />

m > n Trapez oberes Dreieck<br />

m < n unteres Dreieck Trapez<br />

A = P · L · U (15.1)<br />

Bei L wird die Diagonale=1, daher nicht mit gespeichert. Berechung der Lösung eines Systems<br />

von Gleichungen und anschließende Fehlerberechung.<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_getrf.htm.<br />

15.4 Bunch-Kaufman Zerlegung<br />

A mit A = A T<br />

Implementiere die Zerlegung:<br />

• Überrschreibend,<br />

A = P · U · D · U T · P T<br />

(15.2)<br />

• und entwickle Funktionen zum Extrahieren von P , U und D aus dem resultierenden A.<br />

• Kopiere A, berechne die Zerlegung und gib P , U und D als Tuple zurück.<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_sytrf.htm.<br />

15.5 Konditionszahl (reziprok)<br />

• Im allgemeinen Fall LU nutzen.<br />

– Cholesky, wenn symmetrisch.<br />

∗ Gegebenenfalls Bunch-Kaufmann . . .


15.6. MATRIX-SKALIERUNG 287<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_gecon.htm.<br />

15.6 Matrix-Skalierung<br />

Für dicht und schach besetzte Matrizen Zeilen- und Spalten-Skalierungsfaktoren. Damit größter<br />

Matrixeintrag in jeder Zeile und Spalte 1 ist.<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_geequ.htm.<br />

15.7 QR mit Überschreiben<br />

Implementieren Sie verschiedene eine Zerlegung<br />

mit Q orthogal/unitär für reelle/komplexe A. Realisieren Sie:<br />

• Eine überschreibende Faktorisierung wie in LAPACK,<br />

• Funktionen zum Extrahieren von Q und R,<br />

• Eine Version, die A kopiert und Q und R als Paar zurückgibt.<br />

• Schreiben Sie Tests oder Anwendungen.<br />

A = QR (15.3)<br />

Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_orgqr.htm,<br />

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_ungqr.htm,<br />

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_ormqr.htm,<br />

http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_unmqr.htm.<br />

15.8 Direkter Löser für schwach besetzte Matrizen<br />

Implementieren Sie einen direkten Löser rekursiv.<br />

• Die Matrix sollte hierarchisch als Quad-Baum dargestellt werden.<br />

• Die Operationen sollen auch rekursiv auf Blöcken durchgeführt werden:<br />

– Matrixaddition und -subtrakion,<br />

– Matrixmultiplikation,<br />

– Inverse von Teilbäumen<br />

– Pivotisierung auf<br />

∗ Spalte,


288 KAPITEL 15. PROGRAMMIERPROJEKTE<br />

∗ Zeile oder<br />

∗ Diagonale.<br />

Je nachdem, was am besten geeignet erscheint.<br />

– Die Pivotisierung muss natürlich durch eine Permutation repräsentiert werden.<br />

• Die Anwendung der Lösung auf einen Vektor möglichst auch rekursiv anwenden.<br />

– Das bedeutet, den Dreieckslöser auch rekursiv zu implementieren.<br />

Abbildung 15.1: Hierarchischer Ansatz.<br />

Dieses Projekt ist die größte Heraus<strong>for</strong>derung von allen und auch signifikante Teilergebnisse<br />

werden als Erfolg gewertet.<br />

15.9 Anwendung MTL4 auf Typen der Intervallarithmetik<br />

Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen der Intervallarithmetik,<br />

z.B. boost::interval.


15.10. ANWENDUNG MTL4 AUF TYPEN MIT HÖHERER GENAUIGKEIT 289<br />

15.10 Anwendung MTL4 auf Typen mit höherer Genauigkeit<br />

Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen mit höherer<br />

Genauigkeit, z.B. Gnu Multiprecision (GMP).<br />

15.11 Anwendung MTL4 auf AD-Typen<br />

Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen des automatischen<br />

Differenzierens mit operatorüberladener Ableitung.


290 KAPITEL 15. PROGRAMMIERPROJEKTE


Acknowledgement<br />

Chapter 16<br />

Special thanks to Josef Weinbub, Carlos Giani, and Franz Stimpfl. These people are<br />

instrumental in the design and development of GSSE and this book. Thanks also goes to<br />

Michael Spevak <strong>for</strong> the development of some basic concepts and text parts <strong>for</strong> an early<br />

version of GSSE.<br />

Andrey Chesnokov, Yvette Vanberghen, Kris Demarsin and Yao Yue.<br />

students of the class “C ++ für Wissenschaftler” at <strong>Technische</strong> <strong>Universität</strong> <strong>Dresden</strong> <strong>for</strong> many<br />

fruitful discussion<br />

291


292 CHAPTER 16. ACKNOWLEDGEMENT


Bibliography<br />

[AG04] David Abrahams and Aleksey Gurtovoy. <strong>C++</strong> Template Metaprogramming: Concepts,<br />

Tools, and Techniques from Boost and Beyond. Addison-Wesley, 2004.<br />

[CE00] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative programming: methods,<br />

tools, and applications. ACM Press/Addison-Wesley Publishing Co., New York, NY,<br />

USA, 2000.<br />

[DHP03] Ionut Danaila, Frédéric Hecht, and Olivier Pironneau. Simulation Numérique en<br />

<strong>C++</strong>. Dunod, Paris, 2003.<br />

[ES90] Margaret A. Ellis and Bjarne Stroustrup. The Annotated <strong>C++</strong> Reference Manual.<br />

Addison-Wesley, 1990.<br />

[GA04] Aleksey Gurtovoy and David Abrahams. Boost Meta-Programming Library (MPL).<br />

Boost, 2004. www.boost.org/doc/libs/1_42_0/libs/mpl/doc/index.html.<br />

[Got11] Peter Gottschling. Mixed Complex Arithmetic. SimuNova, 2011.<br />

=https://simunova.zih.tu-dresden.de/mtl4/docs/mixed complex.html, Part of<br />

Matrix Template Library 4.<br />

[Kar05] Björn Karlsson. Beyond the <strong>C++</strong> standard library. Addison-Wesley, 2005.<br />

[SA05] Herb Sutter and Andrei Alexandrescu. <strong>C++</strong> coding standards. The <strong>C++</strong> in-depth<br />

series. Addison-Wesley, 2005.<br />

[Sch] Douglas C. Schmidt. <strong>C++</strong> programming language tutorials. http://www.cs.wustl.<br />

edu/~schmidt/<strong>C++</strong>.<br />

[Str97] Bjarne Stroustrup. The <strong>C++</strong> Programming Language. Addison-Wesley, 3rd edition,<br />

1997.<br />

293

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!