C++ for Scientists - Technische Universität Dresden
C++ for Scientists - Technische Universität Dresden
C++ for Scientists - Technische Universität Dresden
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Technische</strong> <strong>Universität</strong> <strong>Dresden</strong><br />
Fakültät Mathematik und Naturwissenschaften<br />
Institut für wissenschaftliches Rechnen<br />
01062 <strong>Dresden</strong><br />
http://www.math.tu-dresden.de/~pgottsch/script/cpp <strong>for</strong> scientists.pdf<br />
Peter Gottschling<br />
C ++ für Wissenschaftler<br />
basierend auf einer gemeinsamen Vorlesung mit Karl Meerbergen<br />
mit Hilfe von Andrey Chesnokov, Yvette Vanberghen,<br />
Kris Demarsin und Yao Yue<br />
und Beiträgen von René Heinzl und Philipp Schwaha<br />
Stand 16. Januar 2012
Copyright c○ 2010 Copyright (c); Peter Gottschling, René Heinzl, Karl Meerbergen, and<br />
Philipp Schwaha
Contents<br />
I Understanding <strong>C++</strong> 7<br />
Introduction 9<br />
0.1 Programming languages <strong>for</strong> scientific programming . . . . . . . . . . . . . . . . . 9<br />
0.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />
1 Good and Bad Scientific Software 11<br />
2 <strong>C++</strong> Basics 19<br />
2.1 Our First Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />
2.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br />
2.3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />
2.4 Expressions and Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
2.5 Control statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
2.6 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />
2.7 Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />
2.8 Structuring Software Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />
2.9 Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />
2.10 Pointers and References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />
2.11 Real-world example: matrix inversion . . . . . . . . . . . . . . . . . . . . . . . . 53<br />
2.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />
2.13 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />
3 Classes 65<br />
3.1 Program <strong>for</strong> universal meaning not <strong>for</strong> technical details . . . . . . . . . . . . . . 65<br />
3.2 Class members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />
3.3 Constructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<br />
3.4 Destructors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />
3.5 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75<br />
3.6 Automatically Generated Operators . . . . . . . . . . . . . . . . . . . . . . . . . 76<br />
3.7 Accessing object members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />
3.8 Other Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<br />
4 Generic programming 89<br />
4.1 Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />
4.2 Generic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />
3
4 CONTENTS<br />
4.3 Generic classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />
4.4 Concepts and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97<br />
4.5 Inheritance or Generics? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />
4.6 Template Specialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />
4.7 Non-Type Parameters <strong>for</strong> Templates . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />
4.8 Functors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />
4.9 STL — The Mother of All Generic Libraries . . . . . . . . . . . . . . . . . . . . . 121<br />
4.10 Cursors and Property Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123<br />
4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128<br />
5 Meta-programming 133<br />
5.1 Let the Compiler Compute . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134<br />
5.2 Providing Type In<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135<br />
5.3 Expression Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150<br />
5.4 Meta-Tuning: Write Your Own Compiler Optimization . . . . . . . . . . . . . . . 156<br />
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185<br />
6 Inheritance 187<br />
6.1 Basic Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />
6.2 Dynamic Selection by Sub-typing . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />
6.3 Remove Redundancy With Base Classes . . . . . . . . . . . . . . . . . . . . . . . 189<br />
6.4 Casting Up and Down and Elsewhere . . . . . . . . . . . . . . . . . . . . . . . . . 189<br />
6.5 Barton-Nackman Trick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193<br />
7 Effective Programming: The Polymorphic Way 199<br />
7.1 Imperative Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />
7.2 Generic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br />
7.3 Programming with Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206<br />
7.4 Functional Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209<br />
7.5 From Monomorphic to Polymorphic Behavior . . . . . . . . . . . . . . . . . . . . 212<br />
7.6 Best of Both Worlds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221<br />
II Using <strong>C++</strong> 223<br />
8 Finite World of Computers 225<br />
8.1 Mathematical Objects inside the Computer . . . . . . . . . . . . . . . . . . . . . 225<br />
8.2 More Numbers and Basic Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 226<br />
8.3 A Loop and More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230<br />
8.4 The Other Way Around . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231<br />
9 How to Handle Physics on the Computer 233<br />
9.1 Finite Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233<br />
9.2 Again, Integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234<br />
10 Programming tools 235<br />
10.1 GCC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235<br />
10.2 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236<br />
10.3 Valgrind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br />
10.4 Gnuplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
CONTENTS 5<br />
10.5 Unix and Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240<br />
11 <strong>C++</strong> Libraries <strong>for</strong> Scientific Computing 243<br />
11.1 GLAS: Generic Linear Algebra Software . . . . . . . . . . . . . . . . . . . . . . . 243<br />
11.2 Boost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244<br />
11.3 Boost.Bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245<br />
11.4 Matrix Template Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />
11.5 Blitz++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />
11.6 Graph Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />
11.7 Geometric Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249<br />
12 Real-World Programming 253<br />
12.1 Transcending Legacy Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 253<br />
13 Parallelism 259<br />
13.1 Multi-Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259<br />
13.2 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259<br />
14 Numerical exercises 263<br />
14.1 Computing an eigenfunction of the Poisson equation . . . . . . . . . . . . . . . . 263<br />
14.2 The 2D Poisson equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272<br />
14.3 The solution of a system of differential equations . . . . . . . . . . . . . . . . . . 272<br />
14.4 Google’s Page rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274<br />
14.5 The bisection method <strong>for</strong> finding the zero of a function in an interval . . . . . . . 276<br />
14.6 The Newton-Raphson method <strong>for</strong> finding the minimum of a convex function . . . 278<br />
14.7 Sequential noise reduction of real-time measurements by least squares . . . . . . 281<br />
15 Programmierprojekte 285<br />
15.1 Exponisation von Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285<br />
15.2 Exponisation von Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />
15.3 LU-Zerlegung für m × n-Matrizen . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />
15.4 Bunch-Kaufman Zerlegung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />
15.5 Konditionszahl (reziprok) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286<br />
15.6 Matrix-Skalierung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287<br />
15.7 QR mit Überschreiben . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287<br />
15.8 Direkter Löser für schwach besetzte Matrizen . . . . . . . . . . . . . . . . . . . . 287<br />
15.9 Anwendung MTL4 auf Typen der Intervallarithmetik . . . . . . . . . . . . . . . . 288<br />
15.10Anwendung MTL4 auf Typen mit höherer Genauigkeit . . . . . . . . . . . . . . . 289<br />
15.11Anwendung MTL4 auf AD-Typen . . . . . . . . . . . . . . . . . . . . . . . . . . 289<br />
16 Acknowledgement 291
6 CONTENTS
Part I<br />
Understanding C ++<br />
7
Introduction<br />
“It would be nice if every kind of numeric software could be written in <strong>C++</strong> without<br />
loss of efficiency, but unless something can be found that achieves this without compromising<br />
the <strong>C++</strong> type system it may be preferable to rely on Fortran, assembler<br />
or architecture-specific extensions.”<br />
— Bjarne Stroustrup.<br />
The purpose of this script is doing you this favor, Bjarne. Amongst others. Conversely, the<br />
reader of this book shall learn the best way to benefit from C ++ features <strong>for</strong> writing scientific<br />
software. It is not our goal to explain all C ++ features in a well-balanced manner. We rather<br />
aim <strong>for</strong> an application-driven illustration of features that are valuable <strong>for</strong> writing<br />
• Well-structured;<br />
• Readable;<br />
• Maintanable;<br />
• Extensible;<br />
• Type-safe;<br />
• Reliable;<br />
• Portable; and last but not least<br />
• Highly per<strong>for</strong>ming<br />
software.<br />
0.1 Programming languages <strong>for</strong> scientific programming<br />
Scientific programming is an old discipline in computer science. The first applications on computers<br />
were indeed computations. In the early decades, ALGOL was a relatively popular programming<br />
language, competing with FORTRAN. FORTRAN 77 became a standard in scientific<br />
programming because of its efficiency and portability. Other computer languages were developed<br />
in computer science but not frequently used in scientific computing : C, Ada, Java, C ++.<br />
They were merely used in universities and labs <strong>for</strong> research purposes.<br />
9
10<br />
C ++ was not a reliable computer language in the nineties : code was not portable, object<br />
code not efficient and had a large size. This made C ++ unpopular in scientific computing. This<br />
changed at the end of the nineties : the compilers produced more efficient code, and the standard<br />
was more and more supported by compilers. Especially the ability to inline small functions and<br />
the introduction of complex numbers in the C99 standard made C ++ more attractive to scientific<br />
programmers.<br />
Together with the development of compilers, numerical libraries are being developed in C ++<br />
that offer great flexibility together with efficiency. This work is still ongoing and more and<br />
more software is being written in C ++. Currently, other languages used <strong>for</strong> numerics are FOR-<br />
TRAN 77 (even new codes!), Fortran 95, and Matlab. More and more becoming popular is<br />
Python. The nice thing about Python is that it is relatively easy to link C ++ functions and<br />
classes into Python scripts. Writing such interfaces is not a subject of this course.<br />
The goal of this course is to introduce students to the exciting world of C ++ programming <strong>for</strong><br />
scientific applications. The course does not offer a deep study of the programming language<br />
itself, but rather focuses on those aspects that make C ++ suitable <strong>for</strong> scientific programming.<br />
Language concepts are introduced and applied to numerical programming, together with the<br />
STL and Boost.<br />
Starting C ++ programmers often adopt a Java programming style: both languages are object<br />
oriented, but there are subtle differences that allow C ++ to produce more compact expressions.<br />
For example, C ++ classes typically do not have getters and setters as this is often the case<br />
in Java classes. This will be discussed in more detail in the course. We use the following<br />
convention, which is also used by Boost, that is one of the good examples of C ++ software.<br />
Classes and variables are denoted by lower case characters. Underscores are used as separator<br />
in symbols. An exception are matrices that are written as single capitals <strong>for</strong> the simularity with<br />
the mathematical notation. Mixed upper and lower case characters (CamelCase) are typically<br />
used <strong>for</strong> concepts. Constants are often (as in C) written in capital.<br />
0.2 Outline<br />
The topics that will be discussed are several aspects of the syntax of C ++, illustrated by small<br />
numerical programs, an introduction to meta programming, expression templates, STL, boost,<br />
MTL4, and GLAS. We will also discuss interoperability with other languages. The first three<br />
chapters discuss basic language aspects, such as functions, types, and classes, inheritance and<br />
generic programming, include examples from STL. The remaining chapters discuss topics that<br />
are of great importance <strong>for</strong> numerical applications: functors, expression templates, and interoperability<br />
with FORTRAN and C.
Good and Bad Scientific Software<br />
Chapter 1<br />
This chapter will give you an idea what we consider good scientific software and what not. If<br />
you have never programmed be<strong>for</strong>e in your life you might wish to skip the entire chapter. This<br />
is o.k. because if you had no contact with the program sources of bad software you can learn<br />
programming with a pure mind.<br />
If you have some software knowledge, there might be still some details you will not understand<br />
right now but this is no reason to worry. If you do not understand it after reading this script<br />
then you can start worrying, or we as authors could. This chapter is only about getting a feeling<br />
what distinguishes good from bad software in science.<br />
As foundation of our discussion — and to not start the book with hello world — we consider an<br />
iterative method to solve system of linear equations Ax = b where A is a symmetric positivedefinite<br />
(SPD) matrix, x and b are vectors, and x is searched. The method is called ‘Conjugate<br />
Gradients’ (CG) and was introduced by Magnus R. Hestenes and Eduard Stiefel [?].<br />
The mathematical details do not matter here but the different styles of implementation. The<br />
algorithm can be written in the following <strong>for</strong>m: 1<br />
Algorithm 1: Conjugate Gradient Method Algorithm.<br />
Input: SPD matrix A, vector b, and left preconditioner L, termination criterion ε.<br />
Output: Vector x such Ax ≈ b.<br />
1 r = b − Ax<br />
2 while |r| ≥ ε do<br />
z = L−1 3<br />
r<br />
4 ρ = 〈r, z〉<br />
5 if First iteration then<br />
6 p = z<br />
7<br />
8<br />
9<br />
10<br />
11<br />
12<br />
13<br />
else<br />
p = z + ρ<br />
ρ ′ p<br />
q = Ap<br />
α = ρ/〈p, q〉<br />
x = x + αp<br />
r = r − αq<br />
ρ ′ = ρ<br />
1 This is not precisely the original notation but a slightly adapted version that introduces some extra variables<br />
to avoid redundant calculations.<br />
11
12 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />
Programmers trans<strong>for</strong>m this mathematical notation into a <strong>for</strong>m that a compiler understands,<br />
by using operations from the language. The result could look like Listing 1.1. Do not read it<br />
in detail, just skim it.<br />
#include <br />
#include <br />
double one norm(int size, double ∗vp)<br />
{<br />
int i;<br />
double sum= 0;<br />
<strong>for</strong> (i= 0; i < size; i++)<br />
sum+= fabs(vp[i]);<br />
return sum;<br />
}<br />
double dot(int size, double ∗vp, double ∗wp)<br />
{<br />
int i;<br />
double sum= 0;<br />
<strong>for</strong> (inti= 0; i < size; i++)<br />
sum+= vp[i] ∗ wp[i];<br />
return sum;<br />
}<br />
int cg(int size, int nnz, int∗ aip, int∗ ajp, double∗ avp,<br />
double ∗x, double ∗b, void (∗lpre)(int, double∗, double∗), double eps)<br />
{<br />
int i, j, iter= 0;<br />
double rho, rho 1, alpha;<br />
double ∗p= (double∗) malloc(size ∗ sizeof(double));<br />
double ∗q= (double∗) malloc(size ∗ sizeof(double));<br />
double ∗r= (double∗) malloc(size ∗ sizeof(double));<br />
double ∗z= (double∗) malloc(size ∗ sizeof(double));<br />
// r= b;<br />
<strong>for</strong> (i= 0; i < size; i++)<br />
r[i]= b[i];<br />
// r−= A∗x;<br />
<strong>for</strong> (i= 0; i < nnz; i++)<br />
r[aip[i]]−= avp[i] ∗ b[ajp[i]];<br />
while (one norm(size, r) >= eps) {<br />
// z = solve(L, r);<br />
(∗lpre)(size, z, r); // function pointer call<br />
rho= dot(size, r, z);<br />
if (!iter) {<br />
<strong>for</strong> (i= 0; i < size; i++)<br />
p[i]= z[i];<br />
} else {<br />
<strong>for</strong> (i= 0; i < size; i++)<br />
p[i]= z[i] + rho / rho 1 ∗ p[i];
}<br />
}<br />
// q= A ∗ p;<br />
<strong>for</strong> (i= 0; i < size; i++)<br />
q[i]= 0;<br />
<strong>for</strong> (i= 0; i < nnz; i++)<br />
q[aip[i]]+= avp[i] ∗ p[ajp[i]];<br />
alpha= rho / dot(size, p, q);<br />
// x+= alpa ∗ p; r−= alpha ∗ q;<br />
<strong>for</strong> (i= 0; i < size; i++) {<br />
x[i]+= alpha ∗ p[i];<br />
r[i]−= alpha ∗ q[i];<br />
}<br />
iter++;<br />
}<br />
free(q); free(p); free(r); free(z);<br />
return iter;<br />
void ic 0(int size, double∗ out, double∗ in) { /∗ .. ∗/ }<br />
int main (int argc, char∗ argv[])<br />
{<br />
int nnz, size;<br />
}<br />
// set nnz and size<br />
int ∗aip= (int∗) malloc(nnz ∗ sizeof(double));<br />
int ∗ajp= (int∗) malloc(nnz ∗ sizeof(double));<br />
double ∗avp= (double∗) malloc(nnz ∗ sizeof(double));<br />
double ∗x= (double∗) malloc(size ∗ sizeof(double));<br />
double ∗b= (double∗) malloc(size ∗ sizeof(double));<br />
// set A and b<br />
cg(size, nnz, aip, ajp, avp, x, b, ilu, 1e−9);<br />
return 0 ;<br />
Listing 1.1: Low Abstraction Implementation of CG<br />
As said be<strong>for</strong>e the details do not matter here but only the principal approach. The good thing<br />
about this code is that it is self-contained. But this is about the only advantage. The problem<br />
with this implemenation is its low abstraction level. This creates three major disadvantages:<br />
• Bad readability;<br />
• No flexibility; and<br />
• High error-proneness.<br />
The bad readability manifests in the fact that almost every operation is implemented in one<br />
or multiple loops. For instance, would we have found the matrix vector multiplication q = Ap<br />
13
14 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />
without the comments? We would easily catch where the variables representing q, A, and p<br />
are used but to see that this is a matrix vector product will take a closer look and a good<br />
understanding how the matrix is stored.<br />
This leads us to the second problem: the implementation commits to many technical details<br />
and only works in precisely this context. Algorithm 1 only requires that matrix A is symmetric<br />
positive-definite but it does not demand a certain storage scheme. There are many other sparse<br />
matrix <strong>for</strong>mats that we can all use in the CG method but not with this implementation. The<br />
matrix <strong>for</strong>mat is not the only detail the code commits to. What if we want to compute in lower<br />
(float) or higher precision (long double)? Or solve a complex linear system? For every such<br />
new CG application, we need a new implementation. Needless to say that running on parallel<br />
computers or exploring GPGPU (General-Purpose Graphic Processing Units) acceleration<br />
needs reimplementations as well. Much worse, every combination of the above needs a new<br />
implementation.<br />
Some of the readers might think: “It is only one function of 20–30 lines. Rewriting this little<br />
function, how much work can this be. And we do not introduce new matrix <strong>for</strong>mats or computer<br />
architectures every month.” Certainly true but in some sense it is putting the cart be<strong>for</strong>e<br />
the horse. Because of such inflexible and detail-obsessed programming style, many scientific<br />
applications grew into the 100,000s and millions of lines of code. Once an application or library<br />
reached such a monstruous size, it is very arduous modifying features of the software and only<br />
rarely done. The road to success is starting scientific software from a higher level of abstraction<br />
from the beginning, even if it is more work initially.<br />
The last major disadvantage is how error-prone it is. All arguments are given as pointers and<br />
the size of the underlying arrays is given as an extra argument. We as programmer of function<br />
cg can only hope that the caller did everything right because we have no way to verify it. If<br />
the user does not allocate enough memory (or does not allocate at all) the execution will crash<br />
at some more or less random position or even worse, will generate some nonsensical results<br />
because data and software can be randomly owerwritten. Good programmers must avoid such<br />
fragile interfaces because the slightest mistake can have catastrophic consequences and the<br />
program errors are extremely difficult to find. Un<strong>for</strong>tunately, even recently released and widely<br />
used software is written in this manner, either <strong>for</strong> backward-compatibility to C and Fortran or<br />
because it is written in one of these two languages. In fact, the implementation above is C and<br />
not C ++. If this is way you love software, you probably will not like this script.<br />
So much about software we do not like. In Listing 1.2 we show how scientific software could<br />
look like.<br />
// This source is part of MTL4<br />
#include <br />
#include <br />
template < typename LinearOperator, typename HilbertSpaceX, typename HilbertSpaceB,<br />
typename Preconditioner, typename Iteration ><br />
int conjugate gradient(const LinearOperator& A, HilbertSpaceX& x, const HilbertSpaceB& b,<br />
const Preconditioner& L, Iteration& iter)<br />
{<br />
typedef HilbertSpaceX Vector;<br />
typedef typename mtl::Collection::value type Scalar;<br />
Scalar rho(0), rho 1(0), alpha(0);
Vector p(resource(x)), q(resource(x)), r(resource(x)), z(resource(x));<br />
r = b − A∗x;<br />
while (! iter.finished(r)) {<br />
z = solve(L, r);<br />
rho = dot(r, z);<br />
if (iter.first())<br />
p = z;<br />
else<br />
p = z + (rho / rho 1) ∗ p;<br />
q = A ∗ p;<br />
alpha = rho / dot(p, q);<br />
x += alpha ∗ p;<br />
r −= alpha ∗ q;<br />
rho 1 = rho;<br />
++iter;<br />
}<br />
return iter;<br />
}<br />
int main (int argc, char∗ argv[])<br />
{<br />
int size;<br />
}<br />
// set size<br />
mtl::compressed2D A(size, size);<br />
mtl::dense vector x(size), b(size);<br />
// set A and b<br />
// Create preconditioner<br />
itl::pc::ic 0 L(A);<br />
// Object that controls iteration, terminate if residual is below 10ˆ−9 or decrease<br />
// by 6 orders of magnitude, abord after 30 iterations if not converged<br />
itl::basic iteration iter(b, 30, 1.e−6, 1.e−9);<br />
conjugate gradient(A, x, b, L, iter);<br />
return 0 ;<br />
Listing 1.2: High Abstraction Implementation of CG<br />
The first thing you might realize is that the CG implementation is readable without comments.<br />
As a thumb of rule, if other people’s comments look like your program sources then you are<br />
a really good programmer. If you compare the mathematical notation in Algorithm 1 with<br />
Listing 1.2 you will realize that — except <strong>for</strong> the type and variable declarations at the beginnig<br />
— they are identical. Some readers might think that it looks more like Matlab or Mathematica<br />
15
16 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE<br />
than C ++. Yes, C ++ can look like this if one puts enough ef<strong>for</strong>t in good software.<br />
Evidently, it is also much easier to write algorithms at this abstraction level than expressing it<br />
with low-level operations.<br />
The Purpose of Scienfic Software<br />
<strong>Scientists</strong> shall do science.<br />
Excellent scientific software is expressed only in mathematical and domainspecific<br />
operations without any technical detail exposed.<br />
At this abstraction level, scientists can focus on models and algorithms, being<br />
much more productive and progress scientific discovery.<br />
Nobody knows how many scientists wasting how much time every year dwelling on small technical<br />
details of bad software like in Listing 1.1. Of course, the technical details have to be realized<br />
in some place but not in a scientific application. This is the worst possible location. Use a<br />
two-level approach: write your applications in terms of expressive mathematical operations and<br />
if they do not exist, implement them separately. These mathematical operations must be carefully<br />
implemented <strong>for</strong> maximal per<strong>for</strong>mance or use other operations with maximal per<strong>for</strong>mance.<br />
Investing time in the per<strong>for</strong>mance of these fundamental operations is highly rentable because<br />
the functions will be reused very often.<br />
Advise<br />
Use the right abstractions!<br />
If they do not exist, implement them.<br />
Speaking of abstractions, the CG implementation in Listing 1.2 does not commit to any technical<br />
detail. In no place, the function is restricted to a numerical type like double. It works as well<br />
<strong>for</strong> float, GNU’s multiprecision, complex, interval arithmetic, quaternions, . . .<br />
The matrix A can have any internal <strong>for</strong>mat, as long as it can be multiplied with a vector<br />
it can be used in the function. In fact, it does not even need to be matrix but can be any<br />
linear operator. For instance, an object that per<strong>for</strong>ms a Fast Fourier Trans<strong>for</strong>mation (FFT)<br />
on a vector can be used as A when the FFT is expressed by a product of A with the vector.<br />
Similarly, the vectors do not need to be represented by finite-dimensional arrays but can be<br />
elements of any vector space that is somehow computer-representable as long as all operations<br />
in the algorithm can be per<strong>for</strong>med.<br />
We are also open to other computer architectures. If the matrix and the vectors are distributed<br />
over the nodes of a parallel supercomputer and according parallel operations are available, the<br />
functions runs in parallel without changing any single line. (GP)GPU acceleration can be also<br />
realized within the data structures and their operations without changing the algorithm. In
general, any existing or new plat<strong>for</strong>m that is supported in the operations of the matrix and<br />
vector types is also supported by our ‘Generic’ conjugate gradient function. As mentioned<br />
be<strong>for</strong>e, we do not even need to change it. If we have a sophisticated scientific application of<br />
several thousand lines (not 100,000s) written with appropriate abstractions, we need to modify<br />
it either.<br />
Starting with the next chapter, we will explain you how to write good scientific software.<br />
17
18 CHAPTER 1. GOOD AND BAD SCIENTIFIC SOFTWARE
C ++ Basics<br />
Chapter 2<br />
In this first chapter we will briefly introduce some basic knowledge about C ++. A useful site<br />
with a reference manual to C ++ commands is http://www.cplusplus.com/.<br />
2.1 Our First Program<br />
As an introduction to the C ++ language, let us look at the following example:<br />
#include <br />
int main ()<br />
{<br />
std::cout ≪ ”Answer to the Ultimate Question of Life, the Universe, and Everything is ”<br />
≪ 6 ∗ 7 ≪ std::endl;<br />
return 0;<br />
}<br />
according to Douglas Adams’ “Hitchhiker’s Guide to the Galaxy.” This short example shows<br />
already many things about C ++:<br />
• The first line includes a file name “iostream.” Whatever is defined in this file will be<br />
defined in our program as well. The file “iostream” contains the standard I/O of C ++.<br />
Input and output is not part of the core language in C ++ but part of the standard libraries.<br />
This means that we cannot program I/O commands without including “iostream” (or<br />
something similar). But it also means that this file must exist in every compiler because<br />
it is part of the standard. Include commands should be at the beginning of the file if<br />
possible.<br />
• the main program is called main and has an integer return value, which is set to 0 by the<br />
return command. The caller of a program (usually the operating system) knows that it<br />
finished successfully when a 0 is returned. A return code other than 0 symbolizes that<br />
something went wrong and often the return code also says something about what went<br />
wrong.<br />
• Braces “{ }” denote a block/group of code (also called a compound statement). Variables<br />
declared within “{ }” groups are only accessible within this block.<br />
19
20 CHAPTER 2. <strong>C++</strong> BASICS<br />
• std::cout and std::endl are defined in “iostream.” The <strong>for</strong>mer is an output stream that prints<br />
text on the screen (unless it is redirected). With std::endl a line is terminated.<br />
• The special operator≪ is used to pass objects to the output to an stream std::cout that is<br />
to print it on that stream.<br />
• The double quotes surround string constants, more precisely string literals. This is the<br />
same as in C. For string manipulation, however, one should use C ++’s string class instead<br />
of C’s cumbersome and error-prone functions.<br />
• The expression 6 ∗ 7 is evaluated and a temporary integer is passed to std::cout. In C ++<br />
everything has a type. Sometimes we as programmers have to declare the type and<br />
sometimes the compiler deduces it <strong>for</strong> us. 6 and 7 are literal constants that have type int<br />
and so has their product.<br />
This was a lot of in<strong>for</strong>mation <strong>for</strong> such a short program. So let us start step by step.<br />
TODO: A little explanation how to compile and run it. For g++ and Visual Studio.<br />
2.2 Variables<br />
In contrast to most scripting languages C ++ is strongly typed, that is every variable has a type<br />
and this type never change. A variable is declared by a statement TYPE varname. 1 Basic types<br />
are int, unsigned int, long, float, double, char, and bool.<br />
int integer1 = 2;<br />
int integer2, integer3;<br />
float pi = 3.14159;<br />
char mycharacter = ’a’;<br />
bool cmp = integer1 < pi;<br />
Each statement has to be terminated by a “;”. In the following section, we show operations<br />
that are often applied to integer and float types. In contrast to other languages like Python,<br />
where ’ and ” is used <strong>for</strong> both characters and strings, C ++ distinguishes between the two of<br />
them. The C ++ compiler considers ’a’ as the character ‘a’ (it has type char) and ”a” is the string<br />
containing ‘a’ (it has type char[1]). If you are used to Python please pay attention to this.<br />
Advise<br />
Define variables right be<strong>for</strong>e using them the first time. This makes your<br />
programs more readable when they grow long. It also allows the compiler to<br />
use your memory more efficiently when you have nested scopes (more details<br />
later). Old C versions required to define all variables at the beginning of a<br />
function and several people stick to this till today. However, in C ++ it leads<br />
generally to higher efficiency and more importantly to higher readability to<br />
define variables as late as possible.<br />
1 TODO: too simple, variable lists and in-place initialization is missing
2.2. VARIABLES 21<br />
2.2.1 Constants<br />
Syntactically, constants are like special variables in C ++ with the additional attribute of immutability.<br />
const int integer1 = 2;<br />
const int integer3; // Error<br />
const float pi = 3.14159;<br />
const char mycharacter = ’a’;<br />
const bool cmp = integer1 < pi;<br />
As they cannot be changed, it is mandatory to set the value in the definition. The second<br />
constant definition violates this rule and the compiler will complain about it.<br />
Constants can be used where ever variables are allowed — as long as they are not modified, of<br />
course. On the other hand, constants like those above are already known during compilation.<br />
This enables many kinds of optimizations and the constants can be even used as arguments of<br />
types (we will come back to this later).<br />
2.2.2 Literals<br />
Literals like “2” or “3.14” have types as well. Simply spoken, integral numbers are treated as<br />
int, long or unsigned long depending on the number of digits. Every number with a dot or an<br />
exponent (e.g. 3e12 ≡ 3 · 10 12 ) is considered a double.<br />
Usually this does not matter much in practice since C ++ has implicit conversation between<br />
built-in numeric types and most programs work well without explicitly specifying the type of<br />
the literals. There are however three major reasons why paying attention to the types of literals:<br />
• Availability;<br />
• Ambiguity and<br />
• Accuracy.<br />
Without going into detail here, the implicit conversation is not used with template functions<br />
(<strong>for</strong> good reasons). The standard library provides a type <strong>for</strong> complex numbers where the type<br />
<strong>for</strong> the real and imaginary part can be parametrized by the user:<br />
std::complex z(1.3, 2.4), z2;<br />
These complex numbers provide of course the common operations. However, when we write:<br />
z2= 2 ∗ z; // error<br />
z2= 2.0 ∗ z; // error<br />
we will get an error message that the multiplication is not available. More specifically, the<br />
compiler will tell us that there is no operator∗() <strong>for</strong> int and std::complex respectively <strong>for</strong><br />
double and std::complex. 2 The library provides a multiplication <strong>for</strong> the type that we use<br />
<strong>for</strong> the real and imaginary part, here float. There are two ways to ascertain that “2” is float:<br />
z2= float(2) ∗ z;<br />
z2= 2.0f ∗ z;<br />
2 It is however possible to implement std::complex in a fashion such that these expressions work [Got11].
22 CHAPTER 2. <strong>C++</strong> BASICS<br />
In the first case, we have an int literal that is converted into float and in the second case, the<br />
literal is float from the beginning. For the sake of clarity, the float literal is preferable.<br />
Later in this book we will introduce function overloading, that is a function with different<br />
implementations <strong>for</strong> different argument types (or argument tuples). The compiler selects the<br />
function overload that fits best. Sometimes the best fit is not clear, <strong>for</strong> instance if function f<br />
accepts an unsigned or a pointer and we call:<br />
f(0);<br />
“0” is considered as int and can be implicitly converted into unsigned or any pointer type. None<br />
of the conversions is prioritized. As be<strong>for</strong>e we can address the issue by explicit conversion and<br />
by a literal of the desired type:<br />
f(unsigned(0));<br />
f(0u);<br />
Again, we prefer the second version because it is more direct (and shorter).<br />
The accuracy issue comes up when work with long double. On the author’s computer, the <strong>for</strong>mat<br />
can handle at least 19 digits. Let us define one third with 20 digits and print out 19 of it:<br />
long double third= 0.3333333333333333333;<br />
cout.precision(19);<br />
cout ≪ ”One third is ” ≪ third ≪ ”.\n”;<br />
The result is:<br />
One third is 0.3333333333333333148.<br />
The program behavior is more satisfying if we append an “l” to the number:<br />
long double third= 0.3333333333333333333l;<br />
yielding the print-out that we hoped <strong>for</strong>:<br />
One third is 0.3333333333333333333.<br />
The following table gives examples of literals and their type:<br />
Literal Type<br />
2 int<br />
2u unsigned<br />
2l long<br />
2ul unsigned long<br />
2.0 double<br />
2.0f float<br />
2.0l long double<br />
For more details, see <strong>for</strong> instance [Str97, § 4.4f,§ C.4]. There you also find a description how to<br />
define literals on an octal or hexadecimal basis.
2.2. VARIABLES 23<br />
2.2.3 Scope of variables<br />
Global definition: Every variable that we intend to use in a program must have been declared<br />
with its type specifier at an earlier point in the code. A variable can be either of global or local<br />
scope. A global variable is a variable that has been declared in the main body of the source<br />
code, outside all functions. After declaration, global variables can be referred from anywhere in<br />
the code, even inside functions. This sounds very handy because it is easily available but when<br />
your software grows it becomes more difficult and painful to keep track of the global variables’<br />
modifications. At some point, every code change bears the potential of triggering an avalanche<br />
of errors. Just do not use global variables. Sooner or later you will regret this. Believe us.<br />
Global constants like<br />
const double pi= 3.14159265358979323846264338327950288419716939;<br />
are fine because they cannot cause side effects.<br />
Local definition: Opposed to it, a local variable is declared within the body of a function<br />
or a block. Its visibility/availability is limited to the block enclosed in curly braces { } where<br />
it is declared. More precisely, the scope of a variable is from its definition to the end of the<br />
enclosing braces. Recalling the example of output streams<br />
int main ()<br />
{<br />
std::ofstream myfile(”example.txt”);<br />
myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />
return 0;<br />
}<br />
the scope of myfile is the from its definition to the end of function main. If we would write:<br />
int main ()<br />
{<br />
int a= 5;<br />
{<br />
std::ofstream myfile(”example.txt”);<br />
myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />
}<br />
myfile ≪ ”a is ” ≪ a ≪ std::endl; // error<br />
return 0;<br />
}<br />
then the second output is not valid because myfile is out of scope. The program would not<br />
compile and the compiler would tell you something like “myfile is not defined in this<br />
scope”.<br />
Hiding: If variables with the same name exist in different scopes then only variable is visible<br />
the others are hidden. A variables in an inner scope hides all variables in outer scopes. For<br />
instance: 3<br />
3 TODO: Picture would be nice.
24 CHAPTER 2. <strong>C++</strong> BASICS<br />
int main ()<br />
{<br />
int a= 5; // define #1<br />
{<br />
a= 3; // assign #1, #2 is not defined yet<br />
int a; // define #2<br />
a= 8; // assign #2, #1 is hidden<br />
{<br />
a= 7; // #2<br />
}<br />
} // end of #2’s scope<br />
a= 11; // #1, #2 is now out of scope<br />
return 0;<br />
}<br />
Defining the same variable name twice in the same scope is an error.<br />
The advantage of scopes is that you do not need to worry whether a variable (or something<br />
else) is already defined outside the scope. It is just hidden but does not create a conflict. 4<br />
Un<strong>for</strong>tunately, the hiding makes the homonymous variables in the outer scope inaccessible.<br />
Best thing you can do is to rename the variable in the inner scope (and eventually in the nextouter<br />
scope(s) to access more of those variables). Renaming the outermost variable also solves<br />
the problem of accessibility but tends to be more work because it is probably more often used<br />
due to its longer life time. A better solution to manage nesting and accessibility are namespaces,<br />
see next section.<br />
Scopes also have the advantage to reuse memory, e.g.:<br />
int main ()<br />
{<br />
int x, y;<br />
float z;<br />
cin ≫x;<br />
if (x < 4) {<br />
y= x ∗ x;<br />
// something with y<br />
} else {<br />
z= 2.5 ∗ float(x);<br />
// something with z<br />
}<br />
}<br />
The example uses three variables. However, they are never used at the same time. y is only<br />
used in the first branch and z only in the second one.<br />
Thus, we rewrite the program as follows<br />
int main ()<br />
{<br />
int x;<br />
cin ≫x;<br />
4 As opposed to macros, an obsolete and reckless legacy feature from C that should be avoided at any price<br />
because it undermines all structure and reliability of the language.
2.3. OPERATORS 25<br />
if (x < 4) {<br />
int y= x ∗ x;<br />
// something with y<br />
} else {<br />
float z= 2.5 ∗ float(x);<br />
// something with z<br />
}<br />
}<br />
then y exists only in the first branch and z only exists in the second one. In general, it helps<br />
us saving memory to let variables only live as long as necessary, especially when we have very<br />
large objects. That is define variables as late as possible — ideally directly be<strong>for</strong>e using — then<br />
they are implicitly in the innermost possible scope, e.g. in the branches in the previous example<br />
instead of the main function. The reduced code complexity of having less active variables at<br />
any point in your program also simplifies your life if program does not what it should (in very<br />
rare cases, of course) and you have to debug it.<br />
For all those reasons, it is also preferable defining loop indices directly in the loop:<br />
<strong>for</strong> (int i= 0; i < n; i++) { ... }<br />
If you need the loop index afterwards you must define it outside, e.g.:<br />
cin ≫x;<br />
<strong>for</strong> (int i= 0; abs(x) > 0.001 && i < 100; i++)<br />
x= f(x);<br />
cout ≫”Did ” ≪ i ≪ ” iterations.\n” // error, which i????<br />
The example is some kind of (probably useless) fix point calculation. It stops when |x| ≤<br />
0.001 or 100 iterations were per<strong>for</strong>med (remember the second term is not a termination but a<br />
continuation criterion). When we finished the loop we want to know how many iterations we<br />
per<strong>for</strong>med. But our loop index already died. Let’s try again:<br />
cin ≫x;<br />
int i;<br />
<strong>for</strong> (i= 0; abs(x) > 0.001 && i < 100; i++)<br />
x= f(x);<br />
cout ≫”Did ” ≪ i ≪ ” iterations.\n”<br />
Now it works.<br />
2.3 Operators<br />
C ++ is rich in built-in operators. An operator is a symbol that tells the compiler to per<strong>for</strong>m<br />
specific mathematical or logical manipulations. C ++ has three general classes of operators,<br />
arithmetic, boolean, and bitwise. This section gives a short overview of the different operators<br />
and their meaning.<br />
2.3.1 Arithmetic operators<br />
The following table lists the arithmetic operators allowed in C ++:
26 CHAPTER 2. <strong>C++</strong> BASICS<br />
Operator Action<br />
− subtraction, also unary minus<br />
+ addition<br />
∗ multiplication<br />
/ division<br />
% modulus<br />
−− decrement<br />
++ increment<br />
The modulus operator yields the remainder of the integer division. The ++ operator adds one<br />
to its operand and −− subtracts one. Both can precede or follow the operand. When they<br />
precede the operand, the corresponding operation will be per<strong>for</strong>med be<strong>for</strong>e using the operand’s<br />
value to evaluate the rest of the expression. If the operator follows its operand, C ++ will use<br />
the operand’s value be<strong>for</strong>e incrementing or decrementing it. Consider the following example:<br />
x = 1;<br />
y = ++x;<br />
x = 1;<br />
z = x++;<br />
As a result of executing these four lines of code, y will be set to 2, x will be set to 2 and z will<br />
be set to 1.<br />
The priority and associativity of binary arithmetic operators is the same as we know it from<br />
math: multiplication and division precedes addition and subtraction. Thus, x + y ∗ z is evaluated<br />
as x + (y ∗ z). Operations of the same priority are left-associative, i.e. x / y ∗ z is<br />
equivalent to (x / y) ∗ z. Unary operators have precedence over binary: x ∗ y++ / −z means<br />
(x ∗ (y++)) / (−z). Nevertheless, as long as you are still learning C ++ and not entirely sure<br />
about the precedences, you might want to add redundant parenthesis instead of wasting hours<br />
debugging your program.<br />
With these operators we can write our first numeric program:<br />
#include <br />
int main ()<br />
{<br />
float r1 = 3.5, r2 = 7.3, pi = 3.14159;<br />
float area1 = pi ∗ r1∗r1;<br />
std::cout ≪ ”A circle of radius ” ≪ r1 ≪ ” has area ”<br />
≪ area1 ≪ ”.” ≪ std::endl;<br />
std::cout ≪ ”The average of ” ≪ r1 ≪ ” and ” ≪ r2 ≪ ” is ”<br />
≪ (r1+r2)/2 ≪ ”.” ≪ std::endl;<br />
return 0 ;<br />
}<br />
2.3.2 Boolean operators<br />
Boolean operators are logical and relational operators. Both return boolean values, there<strong>for</strong>e<br />
the name. Operators and their significations are:
2.3. OPERATORS 27<br />
Operator Meaning<br />
> greater than<br />
>= greater than or equal to<br />
< less than<br />
= 1 + 7 is evaluated as if it were written 4 >= (1 + 7).<br />
Advise<br />
Integer values can be treated in C ++ as boolean. For the sake of clarity it is<br />
always better to use bool <strong>for</strong> all logical expression.<br />
This is a legacy of C where bool does not exist. Almost all techniques from C work also in<br />
C ++— as the language name suggests — but using the new features of C ++ allows you to write<br />
programs with better structure. For instance, if you want to store the result of a comparison<br />
do not use an integer variable but a bool.<br />
bool out of bound = x < min || x > max;<br />
2.3.3 Bitwise operators<br />
Bitwise operators allow you to test or change the bits of integers. 5 There are the following<br />
operations:<br />
Operator Action<br />
& AND<br />
| OR<br />
ˆ exclusive OR<br />
∼ one’s complement (NOT)<br />
≫ shift right<br />
≪ shift left<br />
The shift operators bitwise shift the value on their left by the number of bits on their right:<br />
• ≪ shifts left and adds zeros at the right end.<br />
• ≫ shifts right and adds either 0s, if value is an unsigned type, or extends the top bit (to<br />
preserve the sign) if it is a signed type.<br />
5 The bitwise operators work also on bool but it is favorable to use the logical operators from the previous<br />
section. Especially the shift operators are rather silly <strong>for</strong> bool.
28 CHAPTER 2. <strong>C++</strong> BASICS<br />
The bitwise operations can be used to characterize properties in a very compact <strong>for</strong>m as in the<br />
following example:<br />
#include <br />
int main ()<br />
{<br />
int concave = 1, monotone = 2, continuous = 4;<br />
int f is = concave | continuous;<br />
std::cout ≪ ”f is ” ≪ f is ≪ std::endl;<br />
std::cout ≪ ”Is f concave? (0 means no, 1 means yes) ”<br />
≪ (f is & concave) ≪ std::endl;<br />
f is = f is | monotone;<br />
f is = f is ˆ concave;<br />
std::cout ≪ ”f is now ” ≪ f is ≪ std::endl;<br />
return 0 ;<br />
}<br />
Line 5 introduces three properties that can be combined arbitrarily. The numbers are powers<br />
of two so that their binary representations contain a single 1-bit respectively. In line 7 we used<br />
bitwise OR to combine two properties. Bitwise AND allows <strong>for</strong> masking single or multiple bits<br />
as shown in line 11. In line 13 an additional property is set with bitwise OR. Bitwise exclusive<br />
OR (XOR) like in line 14 allows <strong>for</strong> toggling a property. Operating systems and hardware driver<br />
use this style of operations exhaustively. But it needs some practice to get used to it.<br />
Shift operations provide an efficient way to multiply with or divide by powers of 2 as shown in<br />
the following code:<br />
int i = 78;<br />
std::cout ≪ ”i ∗ 8 is ” ≪ (i ≪ 3)<br />
≪ ”, i / 4 is ” ≪ (i ≫2) ≪ std::endl;<br />
Obviously, that needs some familiarization as well.<br />
On the per<strong>for</strong>mance side, processors are today quite fast in multiplying integers so that you<br />
will not see a big per<strong>for</strong>mance boost when replacing your products by left shifts. Division is<br />
still a bit slow and a right shift can make a difference. Even then the price of this source code<br />
obfuscation is only justified if the operation is critical <strong>for</strong> the overall per<strong>for</strong>mance of your entire<br />
application.<br />
2.3.4 Compound assignment<br />
The compound assignment operators apply an arithmetic operation to the left and right-hand<br />
side and store the result in the left hand side.<br />
There operators are +=, −=, ∗=, /=, %=, ≫=, ≪=, &=, ˆ=, and |=.<br />
The statement a+=b is equal to the statement a=a+b.
2.3. OPERATORS 29<br />
2.3.5 Bracket operators<br />
The operator [] is used access elements of an arrays (see § 2.9), and () <strong>for</strong> function calls.<br />
2.3.6 All operators<br />
We haven’t introduced all operators yet. They will be shown in an appropriate context. For<br />
now, we only list the entire operator set with their precedences and associativity. The table is<br />
taken from [?] (by courtesy of Bjarne Stroustrup). For more details about specific operators<br />
see there. The operators on top have the highest priorities. 6<br />
Operator Summary<br />
scope resolution class name :: member<br />
scope resolution namespace name :: member<br />
global :: name<br />
global :: qualified-name<br />
member selection object . member<br />
member selection pointer → member<br />
subscripting expr[ expr ]<br />
subscripting (user-defined) object [ expr ] 7<br />
function call expr ( expr list )<br />
value construction type ( expr list )<br />
post increment lvalue ++<br />
post decrement lvalue −−<br />
type identification typeid ( type )<br />
run-time type identification typeid ( expr )<br />
run-time checked conversion dynamic cast < type > ( expr )<br />
compile-time checked conversion static cast < type > ( expr )<br />
unchecked conversion reinterpret cast < type > ( expr )<br />
cast conversion const cast < type > ( expr )<br />
size of object sizeof expr<br />
size of type sizeof ( type )<br />
pre increment ++ lvalue<br />
pre decrement −− lvalue<br />
complement ∼ expr<br />
not ! expr<br />
unary minus − expr<br />
unary plus + expr<br />
address of & lvalue<br />
dereference ∗ lvalue<br />
create (allocate) new type<br />
create (allocate and initialize) new type( expr list )<br />
create (place) new ( expr list ) type<br />
create (place and initialize) new ( expr list ) type( expr list )<br />
destroy (deallocate) delete pointer<br />
destroy array delete [ ] pointer<br />
6 TODO: If possible references<br />
7 Not in [?].
30 CHAPTER 2. <strong>C++</strong> BASICS<br />
cast (type conversion) ( type ) expr<br />
member selection object.∗ pointer to member<br />
member selection pointer → ∗ pointer to member<br />
multiply expr ∗ expr<br />
divide expr / expr<br />
modulo (remainder) expr % expr<br />
add (plus) expr + expr<br />
subtract (minus) expr − expr<br />
shift left expr ≪ expr<br />
shift right expr ≫ expr<br />
less than expr < expr<br />
less than or equal expr expr<br />
greater than or equal expr >= expr<br />
equal expr == expr<br />
not equal expr != expr<br />
bitwise AND expr & expr<br />
bitwise exclusive OR (XOR) expr ˆ expr<br />
bitwise inclusive OR expr | expr<br />
logical AND expr && expr<br />
logical OR expr || expr<br />
conditional expression expr ? expr: expr<br />
simple assignemt lvalue = expr<br />
mulitply and assignemt lvalue ∗= expr<br />
divide and assignemt lvalue /= expr<br />
modulo and assignemt lvalue %= expr<br />
add and assignemt lvalue += expr<br />
subtract and assignemt lvalue −= expr<br />
shift left and assignemt lvalue ≪= expr<br />
shift right and assignemt lvalue ≫= expr<br />
AND and assignemt lvalue &= expr<br />
inclusive OR and assignemt lvalue |= expr<br />
exclusive OR and assignemt lvalue ˆ= expr<br />
throw exception throw expr<br />
comma (sequencing) expr , expr<br />
To see the operator precedences at one glance, use Table 2.13 on page 64. 8<br />
2.3.7 Overloading<br />
A very powerful aspect of C ++ is that the programmer can define operators <strong>for</strong> new types. This<br />
will be explained in section ??. Operators of built-in types cannot be changed. New operators<br />
cannot be added as in some other languages. If you redefine operators make sure that the<br />
expected priority of the operation corresponds to the operator precedence. For instance, you<br />
might have the idea using the L ATEX notation <strong>for</strong> exponentiation of matrices:<br />
8 TODO: Associativity?
2.4. EXPRESSIONS AND STATEMENTS 31<br />
A= Bˆ2;<br />
A is B squared. So far so good. That the original meaning of ˆ is a bitwise XOR does not<br />
worry us because we do not plan implementing bitwise operations on matrices.<br />
Now we add C:<br />
A= Bˆ2 + C;<br />
Looks nice. But does not work (or does something weird). — Why?<br />
Because + has a higher priority than ˆ. Thus, the compiler understands our expression as:<br />
A= B ˆ (2 + C);<br />
Oops. That looks wrong. 9 The operator gives a concise and intuitive interface but its priority<br />
would cause a lot of confusion. Thus, it is advisable to refrain from this overloading.<br />
2.4 Expressions and Statements<br />
C and C ++ distinguish between expressions and statements. Very casually spoken, one could<br />
just say that every expression becomes a statement if an semicolon is appended. However, we<br />
would like to discuss this topic a bit more.<br />
Let us build this recursively from bottom up. Any variable name (x, y, z, . . . ), constant or<br />
literal is an expression. One or more expressions with an operator is an expression, e.g. x + y or<br />
x ∗ y + z. In several languages, e.g. Pascal, the assignment is a statement. In C and C ++ it is an<br />
expression, e.g. x= y + z. As a consequence it can be used in another assignment: x2= x= y + z.<br />
Assignments are evaluated from right to left. Input and output operations as<br />
std::cout ≪ ”x is ” ≪ x ≪ ”\n”;<br />
are also expressions.<br />
A function call with expressions as arguments is an expression, e.g. abs(x), abs(x ∗ y + z). There<strong>for</strong>,<br />
function calls can be nested: pow(abs(x), y). In languages where a function call is a statement<br />
this would not be possible. As the assignment is an expression, it can be used as argument of a<br />
function: abs(x= y). Or I/O operations as those above. Needless to say that this is quite bad programming<br />
style. An expression surrounded by parenthesis is an expression as well, e.g. x + y.<br />
This allows us to change the order of evaluation, e.g. x ∗ (y + z) computes the addition first<br />
although the multiplication has the higher priority.<br />
A very special operator in C ++ is the ‘comma operator’ that provides a sequential evaluation.<br />
The meaning is simply evaluating first the sub-expression left of the comma and then that right<br />
of it. The value of the whole expression is that of the right sub-expression. The sub-expressions<br />
can contain the comma operator as well so that arbitrarily long sequences can be defined. With<br />
the help of the comma operator, one can evaluate multiple expressions in program locations<br />
where only one expression is allowed. If used as function argument it the comma expression<br />
needs surrounding parentheses; otherwise the comma is interpreted as separation of function<br />
arguments. The comma operator can be overloaded with a user-defined semantics. This can<br />
9 The precise interpretation is A.operator=(operatorˆ(B, operator+(2, C)));
32 CHAPTER 2. <strong>C++</strong> BASICS<br />
complicate the understanding of the program behavior dramatically and has to be used with<br />
utter care. In general it is advisable to use it not too often.<br />
Any of the above expression followed by a semicolon 10 is a statement, e.g.:<br />
x= y + z;<br />
y= f(x + z) ∗ 3.5;<br />
A statement like y + z; is allowed although it is most likely useless. During program execution<br />
the sum of y and z would be computed and then thrown away. Decent compilers would optimize<br />
out this useless computation. However, it is not guaranteed that this statement can be always<br />
omitted. If y or z is an object of a user type then the addition is also user-defined and might<br />
change y or z or something else. This is obviously bad programming style but legitimate in<br />
C ++.<br />
A single semicolon is an empty statement. There<strong>for</strong>, one can put as many semicolons after an<br />
expression as wanted. Some statements do not end with a semicolon, e.g. function definitions.<br />
If a semicolon is appended to such a statement it is not an error but just an extra empty<br />
statement. 11 Any sequence of statements surrounded by curly braces is a statement — called a<br />
compound statement.<br />
The variable and constant declarations we have seen be<strong>for</strong>e are also statements. As initial<br />
value of a variable or constant, one can use any of the expressions mentioned be<strong>for</strong>e (however<br />
involving assignment or comma operator is probably rather confusing). Other statements — to<br />
be discussed later — are function and class definitions, as well as control statements that we<br />
will introduce in the next section.<br />
2.5 Control statements<br />
Control statements allow us to steer the program execution be means of branching and repeating.<br />
2.5.1 If-statement<br />
This is the simplest <strong>for</strong>m of control and its meaning is intuitively clear, <strong>for</strong> instance in:<br />
if (weight > 100.0)<br />
cout ≪ ”This is quite heavy.\n”;<br />
else<br />
cout ≪ ”I can carry this.\n”;<br />
Often, the else branch is not needed and can be omitted. Say we have some value in variable x<br />
and compute something on its magnitude:<br />
if (x < 0.0)<br />
x= −x;<br />
// Now we now that x >= 0.0<br />
10 The usage of the semicolon in Pascal looks similar at the first glance. However, in Pascal the semicolon has<br />
a slightly different purpose which is separating statements. Thus, the semicolon can be omitted when only one<br />
statement exist in a line. Coming from Pascal, it takes some time to get used to this difference.<br />
11 Nonetheless some compilers print a warning in pedantic mode.
2.5. CONTROL STATEMENTS 33<br />
The expression in the parentheses must be logic expression or something convertible to bool.<br />
For instance, one can write:<br />
int i;<br />
// ...<br />
if (i) // bad style<br />
do something();<br />
In the example, do something is called if i is different from 0. Experienced C and C ++ programmers<br />
know that (from heart) but the intentions of the developer are better communicated if<br />
this is stated explicitly:<br />
int i;<br />
// ...<br />
if (i != 0) // much better<br />
do something();<br />
The branches of if consist each of one single statement. To per<strong>for</strong>m multiple operations one can<br />
use braces: 12<br />
int nr then= 0, nr else= 0;<br />
// ...<br />
if (...) {<br />
nr then++;<br />
cout ≪ ”In then−branche\n”;<br />
} else {<br />
nr else++;<br />
cout ≪ ”In else−branche\n”;<br />
}<br />
In the beginning, it is helpful to always write the braces. With more experience, most developers<br />
only write the braces where necessary. At any rate it is highly advisable to intend the branches<br />
<strong>for</strong> better readable with any degree of experience.<br />
An if statement can contain other if-statements:<br />
if (weight > 100.0) {<br />
if (weight > 200.0)<br />
cout ≪ ”This is extremely heavy.\n”;<br />
else<br />
cout ≪ ”This is quite heavy.\n”;<br />
} else {<br />
if (weight < 50.0)<br />
cout ≪ ”A child can carry this.\n”;<br />
else<br />
cout ≪ ”I can carry this.\n”;<br />
}<br />
In the above example, the parentheses could be omitted without changing the behavior but it<br />
is clearer to have them. The example is more readable if we reorganize the nesting:<br />
if (weight < 50.0) {<br />
cout ≪ ”A child can carry this.\n”;<br />
} else if (weight
34 CHAPTER 2. <strong>C++</strong> BASICS<br />
} else if (weight 100.0)<br />
if (weight > 200.0)<br />
cout ≪ ”This is extremely heavy.\n”;<br />
else<br />
cout ≪ ”This is quite heavy.\n”;<br />
It looks like the last line is executed when weight is between 100 and 200 assuming the first if<br />
has no else-branch. But we could also assume the second if comes without else-branch and the<br />
last line is executed when weight is less or equal 100. Fortunately, the C ++ standard specifies<br />
that an else-branch always belongs to the innermost possible if. So, we can count on our first<br />
interpretation. In case that the else-branch should belong to the first if we need braces:<br />
if (weight > 100.0) {<br />
if (weight > 200.0)<br />
cout ≪ ”This is extremely heavy.\n”;<br />
} else<br />
cout ≪ ”This is not so heavy.\n”;<br />
Maybe these examples convinced you that it is more productive to set more braces and save<br />
the time guessing what the branches belong to.<br />
Advise<br />
If you use an editor that understands C ++ (like the IDE from Visual Studio<br />
or emacs in C ++ mode) then automatic indentation is a great help with<br />
structured programming. Whenever a line is not indented as you expected,<br />
something is most likely not nested as you intended.<br />
2.5.2 Conditional Expression<br />
Although this section describes statements, we like to talk about the conditional expression<br />
here because of its proximity to the if-statement. The semantic of<br />
condition ? result <strong>for</strong> true : result <strong>for</strong> false<br />
is that if the condition in first sub-expression evaluates to true then the entire expression is the<br />
second expression otherwise the third one. For instance, we can compute the minimum of two<br />
values with either if-then-else or the conditional expression:
2.5. CONTROL STATEMENTS 35<br />
if (x
36 CHAPTER 2. <strong>C++</strong> BASICS<br />
eps/= 2.0;<br />
} while (eps > 0.0001);<br />
The loop is per<strong>for</strong>med at least one time — even with an extremely small value <strong>for</strong> eps in our<br />
example. The difference between a while-loop and a do-while-loop is irrelevant to most scientific<br />
software. Only loops with very few iterations and with extremely strong impact on the overall<br />
per<strong>for</strong>mance might matter because a do-while-loop per<strong>for</strong>ms one comparison and one jump less.<br />
2.5.4 For Loop<br />
The most common loop in C ++ is the <strong>for</strong>-loop. As simple example we like to add two vectors 15<br />
and print the result afterward:<br />
double v[3], w[]= {2., 4., 6.}, x[]= {6., 5., 4};<br />
<strong>for</strong> (int i= 0; i < 3; i++)<br />
v[i] = w[i] + x[i];<br />
<strong>for</strong> (int i= 0; i < 3; i++)<br />
cout ≪ ”v[” ≪ i ≪ ”] = ” ≪ v[i] ≪ ’\n’;<br />
The loop head consists of three components:<br />
• The initialization;<br />
• A Continuation criterion; and<br />
• A step operation.<br />
The example above is typical <strong>for</strong> a <strong>for</strong>-loop. In the initialization, one typically declares a new<br />
variable and initializes it to 0 because this is the start index of most indexed data structures.<br />
The condition usually tests if the loop index is smaller than a certain size and the last operation<br />
typically increments the loop index.<br />
It is a very popular beginners’ mistake to write conditions like “i
2.5. CONTROL STATEMENTS 37<br />
Here it was simpler to take out term 0 and start with term 1. We also used less-equal to assure<br />
that the term x 10 /10! is considered.<br />
The <strong>for</strong>-loop in C ++ is very flexible. The initialization part can be any expression, a variable<br />
declaration or empty. It is possible to introduce multiple new variables of the same type. This<br />
can be used to avoid repeating the same operation in the condition, e.g.:<br />
<strong>for</strong> (int i= xyz.begin(), end= xyz.end(); i < end; i++) ...<br />
Variables declared in the initialization are only visible within the loop and hide variables of the<br />
same names from outside the loop.<br />
The condition can be any expression that can be converted to a bool. An empty condition is<br />
always true and the loop is repeated infinitely unless from inside the body as we will discuss<br />
in the next section. We said that loop indices are typically incremented in the head’s third<br />
part. In principle, one can modify it within the loop body but programs are much clearer if it<br />
is done in the loop head. On the other hand, there is no limitation that only one variable is<br />
increased by 1. One can modify as many variables as wanted using the comma operator and by<br />
any modification desired such as:<br />
<strong>for</strong> (int i= 0, j= 0, p= 1; ...; i++, j+= 4, p∗= 2) ...<br />
This is of course more complex than having just one loop index but still more readable than<br />
declaring/modifying indices be<strong>for</strong>e the loop or inside the loop body.<br />
In fact, the <strong>for</strong>-loop in C and C ++ is just another notation of a while-loop. Any <strong>for</strong>-loop:<br />
<strong>for</strong> (init; cond; incr) {<br />
st1; st2; ... stn;<br />
}<br />
can be written with a while-loop:<br />
{<br />
}<br />
init;<br />
while (cond) {<br />
st1; st2; ... stn;<br />
incr;<br />
}<br />
Conversely, any while-loop can evidently be written as <strong>for</strong>-loop. We do not know if there is<br />
a design guideline from a software engineering guru when to use while or <strong>for</strong> but <strong>for</strong> is more<br />
concise if there is a local initialization or some incremental operation.<br />
2.5.5 Loop Control<br />
There are two statements to deviate from the regular loop evaluation:<br />
• break and<br />
• continue.<br />
A break terminates the loop entirely and continue ends only the current iteration and continues<br />
the loop with the next iteration, <strong>for</strong> instance:
38 CHAPTER 2. <strong>C++</strong> BASICS<br />
<strong>for</strong> (...; ...; ...) {<br />
...<br />
if (dx == 0.0) continue;<br />
x+= dx;<br />
...<br />
if (r < eps) break;<br />
...<br />
}<br />
In the example above we assumed that the remainder of the iteration is not needed when<br />
dx == 0.0. In some iterative computations it might be clear in the middle of an iteration (here<br />
when r < eps) that work is already done.<br />
Understanding the program behavior becomes more difficult the more breaks and continues<br />
are used. One should always aim <strong>for</strong> moving as much loop control as possible into the loop<br />
head. However, avoiding breaks and continues by excessive if-then-else branches is even less<br />
comprehensible.<br />
Sometimes, one might prefer per<strong>for</strong>ming some surplus operations inside a loop (if it has no<br />
perceivable impact on the overall per<strong>for</strong>mance) and keep the program simpler. Simpler programs<br />
on the other hand have a better chance to get optimized by the compiler. There is certainly<br />
no golden rule but as practical approach one should implement software first <strong>for</strong> maximal<br />
clarity and simplicity (but using efficient algorithms as early as possible). Once the software is<br />
working correctly one can try variations to investigate the impact of implementation details on<br />
per<strong>for</strong>mance.<br />
2.5.6 Switch Statement<br />
A switch is like a special kind of if. It provides a concise notation when different computations<br />
<strong>for</strong> different cases of a given integral value are per<strong>for</strong>med:<br />
switch(op code) {<br />
case 0: z= x + y; break;<br />
case 1: z= x − y; cout ≪ ”compute diff\n”; break;<br />
case 2:<br />
case 3: z= x ∗ y; break;<br />
default: z= x / y;<br />
}<br />
When people see the switch statement <strong>for</strong> the first time, they are usually surprised that one<br />
needs to say at end of each case that the statement is terminated. Otherwise the statements of<br />
the next case are executed as well. This can be used to per<strong>for</strong>m the same operation <strong>for</strong> different<br />
cases, e.g. <strong>for</strong> 2 and 3 in the example above.<br />
The continuation allows us also to implement short loops without the termination test after<br />
each iteration. Say we have vectors with dimension ≤ 5. Then we could implement a vector<br />
addition without a loop:<br />
assert(size(v)
2.6. FUNCTIONS 39<br />
case 3: v[i] = w[i] + x[i]; i++;<br />
case 2: v[i] = w[i] + x[i]; i++;<br />
case 1: v[i] = w[i] + x[i];<br />
case 0: ;<br />
}<br />
This technique is called Duff’s device. Although this is an interesting technique to realize an<br />
iterative computation without a loop, the per<strong>for</strong>mance impact is probably limited in practice.<br />
Such technique should be only considered in program parts with a significant fraction on the<br />
overall run time; otherwise readability of sources is more important.<br />
2.5.7 Goto<br />
DO NOT USE IT. NEVER! EVER!<br />
2.6 Functions<br />
Functions are important building blocks of C ++ programs. The first example we have seen is<br />
the main function in the hello-world program. main must be present in every executable and is<br />
called when the program starts. Other than that there is noting special about main.<br />
The general <strong>for</strong>m of a C ++ function is:<br />
[inline] return type function name (argument list)<br />
{<br />
body of the function<br />
}<br />
For instance, one can be implement a very simple function to square a value:<br />
double square(double x)<br />
{<br />
return x ∗ x;<br />
}<br />
In C and C ++ each function has a return type. A function that does not return a value has the<br />
pseudo-return-type “void”:<br />
void print(double x)<br />
{<br />
std::cout ≪ ”x is ” ≪ x ≪ ’\n’;<br />
}<br />
void is not a real type but moreover a placeholder that enables us to omit returning a value.<br />
We cannot define objects of it:<br />
void nothing; // error
40 CHAPTER 2. <strong>C++</strong> BASICS<br />
2.6.1 Inline Functions<br />
Calling a function requires a fair amount of activities:<br />
• The arguments (or at least their addresses) must be copied on the stack;<br />
• The current program counter must be copied on the stack to continue the execution at<br />
this point when the function is finished;<br />
• Save registers to allow the function using them;<br />
• Jump to the code of the function;<br />
• Execute the function;<br />
• Clean the arguments from the stack;<br />
• Copy the result on the stack;<br />
• Jump back to the calling code;<br />
• Store back registers.<br />
What happens exactly depends on the hardware. The good news is that the function call<br />
overhead is dramatically lower than in the past. Furthermore, the compiler can optimize out<br />
those activities not needed in a specific call.<br />
Nonetheless, <strong>for</strong> small functions, like the square above, the ef<strong>for</strong>t <strong>for</strong> calling the function is still<br />
significantly higher than what the function actually does. C programmers avoid the function-call<br />
overhead by macros. Macros create so many problems in the software development that they<br />
must only be used when there is absolutely no alternative whatsoever. Bjarne Stroustrup<br />
says “Almost every macro demonstrates a flaw in the programming language, in the program,<br />
or in the programmer.” We like to add a flaw “in the compiler optimization”. 16<br />
Fortunately, we have an excellent alternative to macros: inline functions. The programmer just<br />
adds the keyword inline to the function definition:<br />
inline double square(double x)<br />
{<br />
return x ∗ x;<br />
}<br />
and all the overhead of the function call vanishes into thin air.<br />
An excessive use of inline can have a negative effect on per<strong>for</strong>mance. When many large functions<br />
are inlined then the binary executable becomes very large. The consequence is that a lot of<br />
time is spend loading the binary from memory and lots of cache memory is wasted <strong>for</strong> it as<br />
well. This decreases the memory bandwidth and cache available <strong>for</strong> data, causing more slow<br />
down than what is saved on function calls.<br />
16 Advanced: Compilers are today really smart in eliminating unused code. However, we experienced that<br />
arguments of inline functions might be constructed although they are not used. This are usually only few<br />
machine instructions. But when this happens extremely frequently as in an index range check that should<br />
disappear in release mode, it can ruin the overall per<strong>for</strong>mance. We hope that further compiler improvement can<br />
rescue us from this kind of macro usage.
2.6. FUNCTIONS 41<br />
It should be mentioned here that the inline keyword is not mandatory. The compiler can decide<br />
against inlining <strong>for</strong> the reasons given in the previous paragraph. On the other hand, the compiler<br />
is free to inline functions without the inline keyword.<br />
For obvious reasons, the definition of an inline function must be visible in every compile unit<br />
where it is called. In contrast to other functions, it cannot be compiled separately. Conversely,<br />
a non-inline function cannot be visible in multiple compile units because it collides when the<br />
compiled parts are ‘linked’ together. Thus, there are two ways to avoid such collisions: assuring<br />
that the function definition is only present in one compile unit or declaring the function as<br />
inline.<br />
2.6.2 Function Arguments<br />
If we pass an argument to a function it creates by default a copy. For instance, the following<br />
would not work (as expected):<br />
void increment(int x)<br />
{<br />
x++;<br />
}<br />
int main()<br />
{<br />
int i= 4;<br />
increment(i);<br />
cout ≪ ”i is ” ≪ i ≪ ’\n’;<br />
}<br />
The output would be 4. The operation x++ in the second line only increments a local copy but<br />
not the original value. This kind of argument transfer is called ‘call-by-value’ or ‘pass-by-value’.<br />
To modify the value itself we have to ‘pass-by-reference’ the variable:<br />
void increment(int& x)<br />
{<br />
x++;<br />
}<br />
Now the variable itself is increment and the output will be 5 as expected. We will discuss<br />
references more detailed in § 2.10.2.<br />
Temporary variables — like the result of an operation — cannot be passed by reference:<br />
increment(i + 9); // error<br />
We could not compute (i + 9)++ anyway. In order to call such a function with some temporary<br />
value one needs to store it first in a variable and pass this variable to the function.<br />
Larger data structures like vectors and matrices are almost always passed by reference <strong>for</strong><br />
avoiding expensive copy operations:<br />
double two norm(vector& v) { ... }<br />
An operation like a norm should not change its argument. But passing the vector by reference<br />
bears the risk of accidentally overwriting it.
42 CHAPTER 2. <strong>C++</strong> BASICS<br />
To make sure that our vector is not changed (and not copied either), we pass it as constant<br />
reference:<br />
double two norm(const vector& v) { ... }<br />
If we would change v in this function the compiler would emit an error. Both call-by-value and<br />
constant references ascertain that the argument is not altered but by different means:<br />
• Arguments that are passed by value can be changed in the function since the function<br />
works with a copy. 17<br />
• With const references one works on the passed argument directly but all operations that<br />
might change the argument are <strong>for</strong>bidden. In particular, const-referred arguments cannot<br />
appear on the left side of an assignment or passed as non-const references to other functions<br />
(in fact the LHS of an assignment is also a non-const reference).<br />
In contrast to mutable references, constant ones allow <strong>for</strong> passing temporaries:<br />
alpha= two norm(v + w);<br />
This is admittedly not entirely consequent on the language design side, but it makes the life of<br />
programmers much easier.<br />
Values that are quite frequent as argument might be declared as default. Say we implement a<br />
function the computes the n nt root and mostly the square root then we can write:<br />
double root(double x, int degree= 2) { ... }<br />
This function can be called with one or two arguments:<br />
x= root(3.5, 3);<br />
y= root(7.0);<br />
One can declare multiple default arguments but only at the end. In other words, after an<br />
argument with a default value one cannot have one without.<br />
2.6.3 Returning Results<br />
In the examples be<strong>for</strong>e, we only returned double or int. These are the nice ones. Functions that<br />
compute new values of large data structures are more difficult.<br />
Default arguments<br />
Sometimes functions have arguments that are used very infrequently. To address this, you can<br />
give a parameter a default value that is automatically used when no argument corresponding<br />
to that parameter is specified. In this way the caller only needs to specify those arguments that<br />
are meaningful at a particular instance. Consider the following example:<br />
void foo( int a = 5, char ch =’A’ )<br />
{ std::cout ≪ a ≪ ” ” ≪ ch ≪ std::endl ;}<br />
17 This assumes that the argument is properly copied. For user-defined types one can implement its own copy<br />
operation with aliasing effect (on purpose or by accident). Then modifications of the copy also affect the original<br />
object.
2.6. FUNCTIONS 43<br />
foo takes one integer argument with default value 5 and one character argument with a default<br />
value of ‘A’. Now this function can be called by one of the three methods shown here:<br />
foo( 1, ’J’ );<br />
foo(24);<br />
foo();<br />
Which results in the following output:<br />
1 J<br />
24 A<br />
5 A<br />
Void functions<br />
When the result type of a function is void, we do not return a result. For example<br />
void foo( int i ) {<br />
std::cout ≪ ”My value is ” ≪ i ≪ std::endl ;<br />
}<br />
Constant arguments<br />
We can use const objects as arguments in functions to protect them from being changed. For<br />
example :<br />
bool bar( int const& x, int y ) {<br />
y = y+2;<br />
return y ==x ;<br />
}<br />
Since we do not want to modify x, we can add the keyword const. Note that const can be put<br />
be<strong>for</strong>e or behind the type, but it is recommended by the authors of this course to put it behind.<br />
2.6.4 Overloading<br />
In C ++, functions can share the same name as long as their parameter declarations are different.<br />
More precisely, the functions should differ in the number or the type of their parameters.<br />
The compiler can then use the number/type of the arguments to determine which version of<br />
the overloaded function should be used. Note that although overloaded functions may have<br />
different return types, a difference in return type alone is not sufficient to distinguish between<br />
two versions of a function.<br />
Consider the following example:<br />
#include <br />
#include <br />
int divide (int a, int b){<br />
return a / b ;<br />
}
44 CHAPTER 2. <strong>C++</strong> BASICS<br />
float divide (float a, float b){<br />
return std::floor( a / b ) ;<br />
}<br />
int main (){<br />
int x=5,y=2;<br />
float n=5.0,m=2.0;<br />
std::cout ≪ divide (x,y) ≪ std::endl;<br />
std::cout ≪ divide (n,m) ≪ std::endl;<br />
return 0;<br />
} % ≫ ≫ ≫ ≫<br />
In this case we have defined two functions with the same name, divide, but one of them accepts<br />
two parameters of type int and the other one accepts them of type float. In the first call to<br />
divide the two arguments passed are of type int, there<strong>for</strong>e, the function with the first prototype<br />
is called. This function returns the result of dividing one parameter by the other. The second<br />
call passes two arguments of type float, so the function with the second prototype is called.<br />
This one executes a similar division and rounds the result.<br />
2.6.5 Assertions<br />
The function assert is a special kind of function and has the following interface:<br />
void assert (int expression);<br />
If the argument expression evaluates to 0, this causes an assertion failure that terminates the<br />
program. A message is written to the standard error device and abort is called, terminating<br />
the program execution.<br />
The specifics of the message shown depend on the specific implementation in the compiler, but<br />
it shall include: the expression whose assertion failed, the name of the source file, and the line<br />
number where it happened. A usual expression <strong>for</strong>mat is:<br />
Assertion failed: expression, file filename, line linenumber<br />
This allows <strong>for</strong> a programmer to include many assert calls in a source code while debugging<br />
the program. The many assert calls may reduce the per<strong>for</strong>mance of the code and so it would<br />
be desirable to disable asserts <strong>for</strong> high-per<strong>for</strong>mance libraries. Asserts are disabled by including<br />
the following line<br />
#define NDEBUG<br />
at the beginning of his code, be<strong>for</strong>e the inclusion of cassert or by defining the variable in the<br />
compiler, e.g.<br />
g++ -DNDEBUG foo.cpp<br />
Example:<br />
#include <br />
#include <br />
int main ()<br />
{
2.7. INPUT AND OUTPUT 45<br />
std::ifstream datafile( ”file.dat” ) ;<br />
assert( datafile.is open() );<br />
datafile.close();<br />
return 0;<br />
}<br />
In this example, assert is used to abort the program execution if datafile compares equal to 0,<br />
which happens when the opening of the file was unsuccessful.<br />
2.7 Input and output<br />
C ++ uses a convenient abstraction called streams to per<strong>for</strong>m input and output operations in<br />
sequential media such as the screen or the keyboard. A stream is an object where a program<br />
can either insert characters to or extract from. The standard C ++ library includes the header<br />
file iostream, where the standard input and output stream objects are declared.<br />
2.7.1 Standard Output (cout)<br />
By default, the standard output of a program is the screen, and the C ++ stream object defined<br />
to access it, is cout.<br />
cout is used in conjunction with the insertion operator, which is written as ≪ . It may be used<br />
more than once in a single statement. This is especially useful if we want to print a combination<br />
of variables and constants or more than one variable. Consider this example:<br />
std::cout ≪ ”Hello World, my name is ” ≪ name ≪ std::endl ;<br />
std::cout ≪ ”I am ” ≪ age ≪ ” years old.” ≪ std::endl ;<br />
If we assume the name variable to contain the value Jane and the age variable to contain 25<br />
the output of the previous statement would be:<br />
Hello World, my name is Jane.<br />
I am 25 years old.<br />
The endl manipulator produces a newline character. An alternative representation of endl is the<br />
character ’n’.<br />
2.7.2 Standard Input (cin)<br />
The standard input device is usually the keyboard. Handling the standard input in C ++ is done<br />
by applying the overloaded operator of extraction ≫ on the cin stream. The operator must be<br />
followed by the variable that will store the data that is going to be extracted from the stream.<br />
For example:<br />
int age;<br />
std::cin ≫age;
46 CHAPTER 2. <strong>C++</strong> BASICS<br />
The first statement declares a variable of type int called age, and the second one waits <strong>for</strong> an<br />
input from cin (the keyboard) in order to store it in this integer variable. The input from the<br />
keyboard is processed once the RETURN key has been pressed.<br />
You can also use cin to request more than one datum input from the user:<br />
std::cin ≫a ≫b;<br />
is equivalent to:<br />
std::cin ≫a;<br />
std::cin ≫b;<br />
In both cases the user must give two data, one <strong>for</strong> variable a and another one <strong>for</strong> variable b that<br />
may be separated by any valid blank separator: a space, a tab character or a newline.<br />
2.7.3 Input/Output with files<br />
C ++ provides the following classes to per<strong>for</strong>m output and input of characters to/from files:<br />
• std::ofstream: used to write to files<br />
• std::ifstream: used to read from files<br />
• std::fstream: used to both read and write from/to files.<br />
We can use file streams the same way we are already used cin and cout, with the only difference<br />
that we have to associate these streams with physical files. Here is an example:<br />
#include <br />
#include <br />
int main () {<br />
std::ofstream myfile;<br />
myfile.open (”example.txt”);<br />
myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />
myfile.close();<br />
return 0;<br />
}<br />
This code creates a file called example.txt (or overwrites it if it already exists) and inserts a<br />
sentence into it in a way that is similar to the use of cout. C ++ has the concept of an output<br />
streams that is satisfied by an output file as well as be std::cout. That means that everything<br />
that can be written to std::cout can also be written to a file, and vice versa. If you define yourself<br />
the operator ≪ <strong>for</strong> a new type you do not need to program it <strong>for</strong> different output type but<br />
only once <strong>for</strong> a general output stream, see 18<br />
Alternatively, one can give the file stream object the file name as argument. This opens the file<br />
implicitly. The file is also implicitly closed when myfile at some point, in this case at the end<br />
of the main function. The mechanisms that control such implicit actions will become clear in<br />
§ 2.2.3. The bottom line is that you only in few cases must close your files explicitly. The short<br />
version of the previous listing is<br />
18 TODO: Where? New section needed.
2.8. STRUCTURING SOFTWARE PROJECTS 47<br />
#include <br />
#include <br />
int main () {<br />
std::ofstream myfile(”example.txt”);<br />
myfile ≪ ”Writing this to a file. ” ≪ std::endl;<br />
return 0;<br />
}<br />
2.8 Structuring Software Projects<br />
2.8.1 Namespaces<br />
In the last section we mentioned that equal names in different scopes hides the variables (or<br />
functions, types, . . . ) of the outer scopes while defining the same name in one scope is an error.<br />
Common function names like min, max or abs already exists and if you write a function with<br />
the same name (and same argument types) the compiler will tell you that the name already<br />
exists. But this does not only concern common names; you must be sure that every name you<br />
use is not already used in some other library. This really can be a hustle because you might<br />
add more libraries later and there is new potential <strong>for</strong> conflicts. Then you have to rename some<br />
of your functions and in<strong>for</strong>m everybody who uses your software. Or one of your software users<br />
is including a library that you do not know and has a name conflict. This can grow to a serious<br />
problem and it happens in C all the time.<br />
One possibility to deal with this is using different names like max , my abs, or library name abs.<br />
This in fact what is done in C. Main libraries have short function names, user-libraries longer<br />
names, and OS-related internals typically start with . This decreases the probability of conflicts<br />
but does not eliminate it entirely.<br />
Remark: Particularly annoying are macros. This is an old technique of code reuse by expanding<br />
macro names to their text definition, potentially with arguments. This gives a lot of possibilities<br />
to empower your program but much more <strong>for</strong> ruin it. Macros are resistent against namespaces<br />
because they are reckless text substitution without any notion of types, scopes or any other<br />
language feature. Un<strong>for</strong>tunately, some libraries define macros with common names like major.<br />
We uncompromisingly undefine such macros, e.g. #undef major, without merci <strong>for</strong> people that<br />
might want use those macros. Visual Studio defines — till today!!! — min and max as macros<br />
and we advise you to disable this by compiling with /DNO MIN MAX. Almost all macros can be<br />
replaced by other techniques (constants, templates, inline functions). But if you really do not<br />
find another way of implementing it use LONG AND UGLY NAMES IN CAPITALS like the library<br />
collection Boost does.<br />
2.8.2 Header and implementation<br />
It is usual to split class (Chapter 3) and function definition and implementation into different<br />
files. Classes and functions are typically defined in a header file (.hpp), and implemented in a<br />
cpp file, which is then compiled and added to a library. For example, the header file foo.hpp<br />
could be:<br />
foo.hpp:
48 CHAPTER 2. <strong>C++</strong> BASICS<br />
#ifndef athens foo hpp<br />
#define athens foo hpp<br />
double foo (double a, double b);<br />
#endif<br />
Note the ifndef and define C-preprocessor commands. These commands are called include guards<br />
and prevent the file from being included several times. The use of such guards in header files is<br />
quite common.<br />
The source file in the library would be contained in the file foo.cpp.<br />
#include ”foo.hpp”<br />
double foo (double a, double b)<br />
{ return a+b; }<br />
The main program file is contained in the file bar.cpp:<br />
#include <br />
#include ”foo.hpp”<br />
int main() {<br />
double a = 2.1;<br />
double b = 3.9;<br />
std::cout ≪ foo(a,b) ≪ std::endl ;<br />
}<br />
Include files usually contain the interface of software packages and are stored somewhere on<br />
disk. The compiler is told where to look <strong>for</strong> the include files. The programmer can partially<br />
control this as follows:<br />
• #include ”foo.hpp”: the compiler looks in the directory of the including file and the list of<br />
directories it is given.<br />
• #include : the compiler only looks in the list of directories it is given.<br />
Frequently used include files<br />
The types and functions defined in the following include files are in the namespace std.<br />
• : input and output stream, e.g. std::cin and std::cout<br />
• : file input and output<br />
• : For assertions, see §2.6.5.<br />
• : Headers <strong>for</strong> the C functions from math.h, among others: abs, fabs, pow, acos,<br />
asin, atan, atan2, ceil, floor, cos, cosh, sin, sinh, exp, fmod (floating point mod), modf (split in<br />
integer and fractional part (< 1)), log, log10, sqrt, tan, tanh<br />
And other useful functions such as isnan.<br />
• : String operations
2.9. ARRAYS 49<br />
• : Complex numbers<br />
• , , , , ...: STL, see Section 4.9<br />
Inline keyword<br />
Instead of creating a library as described at the beginning of this section, we can also store the<br />
implementation in the header file. We then have to add the keyword inline <strong>for</strong> two reasons. The<br />
code will not be stored in a library but inlined in the calling functions: this may lead to more<br />
efficient code when the functions are small. If we do not use the inline keyword, we may end<br />
up with multiple defined functions, since the compiler will create the methods in every source<br />
file they are used.<br />
Consider <strong>for</strong> example the following header file sqr.hpp:<br />
#ifndef athens sqr hpp<br />
#define athens sqr hpp<br />
inline double sqr(double a)<br />
{ return a∗a;}<br />
#endif<br />
2.9 Arrays<br />
C based programming languages are not very good at working with arrays. In this section, we<br />
discuss the language concepts <strong>for</strong> arrays. In Section 4.9, we will present more practical software<br />
<strong>for</strong> arrays and other complicated mass data structures.<br />
An array is created as follow<br />
int x[10];<br />
The variable x is a constant size array. It allows <strong>for</strong> fast creation (it is typically stored on the<br />
stack).<br />
Arrays are accessed by square brackets: x[i] is a reference to the ith element. The first element<br />
is x[0], the last one is x[9]. Arrays can be initialized at the definition<br />
float v[]= {1.0, 2.0, 3.0}, w[]= {7.0, 8.0, 9.0};<br />
In this case, the array size is deduced.<br />
Operations on arrays are typically per<strong>for</strong>med in loops, e.g. to compute x = v − 3w as vector<br />
operation is realized by<br />
float x[3];<br />
<strong>for</strong> (int i= 0; i < 3; i++)<br />
x[i]= v[i] − 3.0 ∗ w[i];<br />
One can also define arrays of higher dimension
50 CHAPTER 2. <strong>C++</strong> BASICS<br />
float A[7][9]; // a 7 by 9 matrix<br />
int q[3][2][3] // a 3 by 2 by 3 array<br />
The language does not provide linear algebra operations upon the arrays. There<strong>for</strong>e we will<br />
build our own linear algebra and look <strong>for</strong>ward to future C ++ standards coming with intrinsic<br />
higher math.<br />
Arrays have the following two disadvantages:<br />
• Indices are not checked be<strong>for</strong>e accessing an array and one can find himself outside the<br />
array and the program crashed with segmentation fault/violation. This is not even the<br />
worst case. If your program crashes you see that things go wrong. The false access can<br />
also mess up your own data, the program keeps running and produces entirely wrong<br />
results with whatever consequence you can imagine.<br />
• The size of the array must be known at compile time. 19 For instance, we have an array<br />
stored to a file and need to read it back into memory<br />
ifstream ifs(”some array.dat”);<br />
ifs ≫size;<br />
float v[size]; // error, size not known at compile time<br />
This does not work because we need the size already when the program is compiled.<br />
The first problem can be only solved with new array types and the second one with dynamic<br />
allocation. This leads us to pointers.<br />
2.10 Pointers and References<br />
2.10.1 Pointers<br />
A pointer is a variable that contains a memory address. This address can be that of another<br />
variable or dynamically allocated memory. Let’s start with the latter as we were looking <strong>for</strong><br />
arrays of dynamic size.<br />
int∗ y = new int[10];<br />
This allocates an array of 10 int. The size can now be chosen at run-time. We can also implement<br />
the vector reading example from the previous section<br />
ifstream ifs(”some array.dat”);<br />
int size;<br />
ifs ≫size;<br />
float∗ v= new float[size];<br />
<strong>for</strong> (int i= 0; i < size; i++)<br />
ifs ≫v[i];<br />
Pointers bear the same danger as arrays of risking to access out of range data with program<br />
crashes or data invalidation. It is also the programmer’s responsability to keep the in<strong>for</strong>mation<br />
of the array size.<br />
19 Some compilers support run-time values as array sizes. Since this is not guaranteed to with other compilers<br />
one should avoid this in portable software.
2.10. POINTERS AND REFERENCES 51<br />
Furthermore, the programmer is responsible <strong>for</strong> releasing the memory when not needed anymore.<br />
This is done by<br />
delete[] v;<br />
As we came from arrays, we made the second step be<strong>for</strong>e the first one regarding pointer usage.<br />
The simple use of pointers is allocating one single data item.<br />
int∗ ip = new int;<br />
Releasing such memory is per<strong>for</strong>med by<br />
delete ip;<br />
Note the duality of allocation and release: the single-object allocation requires a single-object<br />
release and the array allocation demands an array release. 20<br />
Pointers can also refer to other variables<br />
int i= 3;<br />
int∗ ip2= &i;<br />
The operator & takes an object and returns its address. The reverse operator is ∗ that takes<br />
an address and returns object.<br />
int j= ∗ip2;<br />
This is called dereferencing. It is clear from the context whether the symbol ∗ represents a<br />
dereference or a multiplication.<br />
A danger of pointers are memory leaks. For instance, our array y became too small and we<br />
want assign a new array<br />
int∗ y = new int[15];<br />
We can now use more space in y. Nice. But what happened with the memory that we allocated<br />
be<strong>for</strong>e? It is still there but we have no access to it anymore. We cannot release it anymore.<br />
This memory is lost <strong>for</strong> the rest of our program execution. Only when the program is finished<br />
the operation system will be able to free it. In the example it is only 40 byte out of how many<br />
Gigabyte you might have. But if this happens with larger data in an iterative process the dead<br />
memory grows and at some point the program crashes when all memory is used.<br />
The warnings above are not intended as fun killers. And we do not discourage the use of<br />
pointers. Many things can be only achieved with pointers: lists, queues, trees, graphs, . . . But<br />
pointers must be used with utter care to avoid all the really serious problems mentioned above.<br />
There are two strategies to minimize pointer-related errors:<br />
Use standard implementations from the standard library or other validated libraries. std::vector<br />
from the standard library provides you all the functionality of dynamic arrays, including<br />
resizing and range check, and the memory is released automatically, see § 4.9. Smart pointers<br />
from Boost provide automatic resource management: dynamically allocated memory<br />
that is not referred by a smart pointer is released automatically, see § 11.2.<br />
20 TODO: Otherwise?
52 CHAPTER 2. <strong>C++</strong> BASICS<br />
Encapsulate your dynamic memory management in classes. Then you have to deal with it<br />
only once per class. 21 If all memory allocated by an object is released when the object<br />
is destroyed then it does not matter how many memory you allocate. If you have 738<br />
objects with dynamic memory then it will be released 738 times. If you have called new<br />
738 times, partly in loops and branches, can you be sure that you have called delete 738<br />
times? We know that there are tools <strong>for</strong> this but these are errors you better prevent than<br />
fix. Even with the encapsulation there is probably something to fix inside the classes but<br />
this is orders of magnitude less work than having pointers spread all over your program.<br />
We have shown two main purposes of pointers:<br />
• Dynamic memory management; and<br />
• Referring other objects.<br />
For the <strong>for</strong>mer there is no alternative to pointers, dynamic memory handling needs pointers,<br />
either directly or using classes that contain pointers. To refer to other objects, there exist<br />
another kind of types called reference (surprise, surprise) that we will introduce in the next<br />
section.<br />
2.10.2 References<br />
The following code introduces a reference:<br />
int i= 5;<br />
int& j= i;<br />
j= 4;<br />
std::cout ≪ ”j = ” ≪ j ≪ ’\n’;<br />
The variable j is referring to i. Changing j will also alter i and vice versa, as in the example. i<br />
and j will always have the same value. One can think of a reference as an alias. Whenever one<br />
defines a reference, one must directly say what it is referring to (other then pointers). It is not<br />
possible to refer to another variable later.<br />
So far, that does not sound extremely useful. References are extremely useful <strong>for</strong> function<br />
arguments (§ 2.6), <strong>for</strong> refering parts of other objects (e.g. the seventh entry of a vector), and<br />
<strong>for</strong> building views ( 22 ).<br />
2.10.3 Comparison between pointers and references<br />
The advantage of pointers over references is the ability of dynamic memory management and<br />
address calculation. On the other hand, references refer to defined locations 23 , they always<br />
must refer to something, they do not leave memory leaks (unless you play really evil tricks),<br />
and they have the same notation in usage as the referred object.<br />
21<br />
It is save to assume that there are many more objects than classes; otherwise there is something wrong with<br />
the program.<br />
22<br />
TODO: reref to a section when it is written<br />
23<br />
References can refer to arbitrary addresses but one must work hard to achieve this. For your own savefy we<br />
will not show you how to make reference to behave as badly as pointers.
2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 53<br />
Feature Pointers References<br />
Referring defined location - +<br />
Mandatory initialisation - +<br />
Avoidance of memory leaks - +<br />
Object-like notation - +<br />
Memory management + -<br />
Adress calculation + -<br />
Table 2.2: comparison between pointers and references<br />
For short, references are not idiot-proof but much less error-prone than pointers. Pointers<br />
should be only used when dealing with dynamic memory and even then one should do this via<br />
well-tested types or encapsulate the pointer within a class.<br />
2.10.4 Do Not Refer Outdated Data<br />
Variables in functions are only valid within this function, <strong>for</strong> instance:<br />
double& square ref(double d) // DO NOT!<br />
{<br />
double s= d ∗ d;<br />
return s;<br />
}<br />
The variable s is not valid anymore after the function is finished. If you are lucky the memory<br />
where s was stored is not overwritten yet. But this is nothing one can count on. Good compilers<br />
will warn you that you are referring local variable. Sadly enough we have seen examples in web<br />
tutorial that do this!<br />
The same applies correspondingly to pointers:<br />
double∗ square ptr(double d) // DO NOT!<br />
{<br />
double s= d ∗ d;<br />
return &s;<br />
}<br />
This is as wrong as it is <strong>for</strong> references.<br />
There are cases where functions, esp. member functions return references and addresses and<br />
the destruction order of object impedes the invalidation of references, 24 cf. § ??.<br />
2.11 Real-world example: matrix inversion<br />
TODO: I am not sure anymore if this is very good here. I still think we should propagate<br />
abstraction and demonstrate how to develop reusable software but the section feels now a bit<br />
misplaced. At the beginning of the next chapter is not much better. Maybe a good intro<br />
paragraph saves the situation.<br />
24 Un<strong>for</strong>tunately there are ways to circumvent this and an exception to this rule.
54 CHAPTER 2. <strong>C++</strong> BASICS<br />
As a practical exercise, we now go step-by-step through the development process of a function<br />
<strong>for</strong> matrix inversion. This is easier than it seems. 25 For it, we use the Matrix Template Library 4<br />
— see http://www.mtl4.org. It already provides most of the functionality we need. 26<br />
In the program development, we follow some principles of Extreme Programming, especially<br />
writing tests first and implement the functionality afterwards. This has two significant advantages:<br />
• It prevents you as programmer (to some extend) from featurism — the obsession to add<br />
more features instead of finishing one thing after another. If you write down what you<br />
want to achieve you work more directly towards this goal and accomplish it usually much<br />
earlier. When writing the function call you specify the interface of the function you plan<br />
implementing, when testing your results against expected values you say something about<br />
the semantics of your function. Thus, tests are compilable documentation. The tests<br />
might not tell everything about the functions and classes you are going to implement, but<br />
what it says it does very precisely. Documentation in text can be much more detailed and<br />
comprehensible but also much vaguer than tests.<br />
• If you start writing tests after you finally finished the implementation — say on a late<br />
Friday afternoon — You Do Not Want To See It Failing. You will write the test with your<br />
nice data (whatever this means <strong>for</strong> the program in question) and minimize the risk that<br />
it fails. You might decide going home and swear to God that you will test it on Monday.<br />
For those reasons, you will be more honest if you write your tests first. Of course, you can<br />
modify your tests later if you realize that something does not work or you changed the design<br />
of some item or you want test more details. It goes without saying that verifying partial<br />
implementations requires uncommenting parts of your test, temporarily.<br />
Be<strong>for</strong>e we start implementing our inverse function and even the tests we have to choose an<br />
algorithm. We can use determinants of sub-matrices, block algorithms, Gauß-Jordan, or LU<br />
decomposition with or without pivoting. Let’s say we prefer LU factorization with column<br />
pivoting so that we have<br />
LU = P A,<br />
with a unit lower triangular matrix L, an upper triangular matrix U, and a permutation matrix<br />
P . Thus it is<br />
A = P −1 LU<br />
and<br />
A −1 = U −1 L −1 P. (2.1)<br />
We use the LU factorization from MTL4, implement the inversion of the lower and upper<br />
triangular matrix and compose it appropriately.<br />
Now we start with our test by defining an invertible matrix and printing it out.<br />
int main(int argc, char∗ argv[])<br />
{<br />
const unsigned size= 3;<br />
typedef dense2D Matrix;<br />
Matrix A(size, size);<br />
A= 4, 1, 2,<br />
25 At least with the implementations we already have.<br />
26 It actualy provides the inversion function inv already but we want to learn now how to get there.
2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 55<br />
1, 5, 3,<br />
2, 6, 9;<br />
cout ≪ ”A is:\n” ≪ A;<br />
For later abstraction we define the type Matrix and the constant size. The LU factorization in<br />
MTL4 is per<strong>for</strong>med in place. To not alter our original matrix we copy it into a new one.<br />
Matrix LU(A);<br />
We also define a vector <strong>for</strong> the permutation computed in the factorization.<br />
mtl::dense vector Pv(size);<br />
These are the two arguments <strong>for</strong> the LU factorization<br />
lu(LU, Pv);<br />
For our purpose it is more convenient to represent the permutation as matrix<br />
Matrix P(permutation(Pv));<br />
cout ≪ ”Permutation vector is ” ≪ Pv ≪ ”\nPermutation matrix is\n” ≪ P;<br />
For instance to show A in its permutated <strong>for</strong>m 27<br />
cout ≪ ”Permuted A is \n” ≪ Matrix(P ∗ A);<br />
We now define an identity matrix of appropriate size and extract L and U from our in-place<br />
factorization<br />
Matrix I(matrix::identity(size, size)), L(I + strict lower(LU)), U(upper(LU));<br />
Note that the unit diagonal of L is not stored and needs to be added. It could also be treated<br />
implicitly but we refrain from it <strong>for</strong> the sake of simplicity. We have now finished the preliminaries<br />
and come to our first test. If we had computed the inverse of U, say UI, the product must be<br />
the identity matrix, approximately.<br />
Matrix UI(inverse upper(U));<br />
cout ≪ ”inverse(U) [permuted] is:\n” ≪ UI ≪ ”UI ∗ U is:\n” ≪ Matrix(UI ∗ U);<br />
assert(one norm(Matrix(LI ∗ L − I)) < 0.1);<br />
Testing results of non-trivial numeric calculation <strong>for</strong> equality is quite certain to fail. There<strong>for</strong>e,<br />
we used the norm of the matrix difference as criterion. Likewise, the inversion of L (with a<br />
different function) is tested.<br />
Matrix LI(inverse lower(L));<br />
cout ≪ ”inverse(L) [permuted] is:\n” ≪ LI ≪ ”LI ∗ L is:\n” ≪ Matrix(LI ∗ L);<br />
assert(one norm(Matrix(LI ∗ L − I)) < 0.1);<br />
This enables us to calculate the inverse of A itself and test its correctness<br />
Matrix AI(UI ∗ LI ∗ P);<br />
cout ≪ ”inverse(A) [UI ∗ LI ∗ P] is \n” ≪ AI ≪ ”A ∗ AI is\n” ≪ Matrix(AI ∗ A);<br />
assert(one norm(Matrix(AI ∗ A − I)) < 0.1);<br />
27 If you wonder why we explicitly built a matrix <strong>for</strong> P ∗ A, you have wait until Chapter 5.3 <strong>for</strong> understanding<br />
that some functions return special types that need special treatment. Future versions of MTL4 will minimize the<br />
need of such special treatments.
56 CHAPTER 2. <strong>C++</strong> BASICS<br />
A function computing the inverse must return the same value and also pass the test agains<br />
identity:<br />
Matrix A inverse(inverse(A));<br />
cout ≪ ”inverse(A) is \n” ≪ A inverse ≪ ”A ∗ AI is\n” ≪ Matrix(A inverse ∗ A);<br />
assert(one norm(Matrix(A inverse ∗ A − I)) < 0.1);<br />
After establishing tests <strong>for</strong> all components of our calculation we start with their implementations.<br />
The first function we program is the inversion of an upper triangular matrix. This function<br />
takes a dense matrix as argument and returns another matrix:<br />
dense2D inline inverse upper(dense2D const& A) {<br />
}<br />
Since we do not need another copy of the input matrix we pass it as reference. The argument<br />
shall not be changed so we can pass it as const. The constancy has several advantages:<br />
• We improve the reliability of our program. Arguments passed as const are guaranteed<br />
not to change, if we accidentally modify them the compiler will tell us and abort the<br />
compilation. There is a way to remove the constancy but this should only be used as<br />
last resort, e.g. <strong>for</strong> interfacing obsolete libraries written by others. Everything you write<br />
yourself can be realized without eliminating the constancy of arguments.<br />
• Compilers can optimize better when the objects are guaranteed not to alter.<br />
• In case of references, the function can be called with expressions. Non-const references<br />
require to store the expression into a variable and pass the variable to the function.<br />
Another comment, people might tell you that it is too expensive to return containers as results<br />
and it is more efficient to use references. This is true — in principle. For the moment we accept<br />
this extra cost and pay more attention to clarity and convenience. Later in this book we will<br />
introduce techniques how to minimize the cost of returning containers from functions.<br />
So much <strong>for</strong> the function signature, let us now turn our attention to the function body. The<br />
first thing we do is verifying that our argument is valid. Obviously the matrix must be square:<br />
const unsigned n= num rows(A);<br />
assert(num cols(A) == n); // Matrix must be square<br />
The number of rows is needed several times in this function and is there<strong>for</strong>e stored in a variable,<br />
well constant. Another prerequisite is that the matrix has no zero entries in the diagonal. We<br />
leave this test to the triangular solver.<br />
Speaking of which, we can get our inverse triangular matrix with a triangular solver of a linear<br />
system, which we find in MTL4, more precisely the k-th vector of U −1 is the solution of<br />
Ux = ek<br />
where ek is the k-th unit vector. First we define a temporary variable <strong>for</strong> the result.<br />
dense2D Inv(n, n);<br />
Then we iterate over the columns of Inv:
2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 57<br />
<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />
}<br />
In each iteration we need the k-th unit vector.<br />
dense vector e k(n);<br />
<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />
if (i == k)<br />
e k[i]= 1.0;<br />
else<br />
e k[i]= 0.0;<br />
The triangular solver returns a column vector. We could assign the entries of this vector directly<br />
to entries of the target matrix:<br />
<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />
Inv[i][k]= upper trisolve(A, e k)[i];<br />
This is nicely short but we would compute upper trisolve n times! Although we said that per<strong>for</strong>mance<br />
is not our primary goal at this point, the raise of overall complexity from order 3 to 4 is<br />
too much waste of resources. There<strong>for</strong>e, we better store the vector and copy the entries from<br />
there.<br />
dense vector res k(n);<br />
res k= upper trisolve(A, e k);<br />
<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />
Inv[i][k]= res k[i];<br />
Return our temporary matrix finishes the function that we now give in its complete <strong>for</strong>m.<br />
dense2D inverse upper(dense2D const& A)<br />
{<br />
const unsigned n= num rows(A);<br />
assert(num cols(A) == n); // Matrix must be square<br />
}<br />
dense2D Inv(n, n);<br />
<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />
dense vector e k(n);<br />
<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />
if (i == k)<br />
e k[i]= 1.0;<br />
else<br />
e k[i]= 0.0;<br />
dense vector res k(n);<br />
res k= upper trisolve(A, e k);<br />
<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />
Inv[i][k]= res k[i];<br />
}<br />
return Inv;
58 CHAPTER 2. <strong>C++</strong> BASICS<br />
Now that the function is complete, we first run our test. Evidently, we have to uncomment<br />
part of the test because we only implemented one function so far. But it is worth to know if<br />
this first function already behaves as expected. It does and we could be now happy with it and<br />
turn our attention to the next task, there are still many. But we will not.<br />
Well, at least we can be happy to have a correctly running function. Nevertheless, it is still<br />
worth spending some time to improve it. Such improvements are called refactoring. Experience<br />
from practise has shown that refactoring immediately after implementation is takes much less<br />
time than later modification when bugs are discovered, the software is ported to other plat<strong>for</strong>ms<br />
or extended <strong>for</strong> more usability. Obviously, it is much easier now to simplify and structure our<br />
software immediately when we still know what is going on than in some week/months/years or<br />
when somebody else is refactoring it.<br />
First thing we might dislike is that something so simple as the initialization of a unit vector<br />
takes 5 lines. This is rather verbose. Putting the if statement in one line<br />
is badly structured.<br />
<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />
if (i == k) e k[i]= 1.0; else e k[i]= 0.0;<br />
C ++ and even good ole C have a special operator <strong>for</strong> conditions<br />
<strong>for</strong> (unsigned i= 0; i < n; ++i)<br />
e k[i]= i == k ? 1.0 : 0.0;<br />
The conditional operator ‘?:’ usually needs some time to get used to but it results in a more<br />
concise representation. There are also situations where one cannot use an if but the ?: operator.<br />
Although, we have not changed anything semantically in the program and it seems obvious that<br />
the result will still be the same, it cannot harm to run our test again. You will see, how often<br />
you are sure that your program changes could never possibly change the behavior but still do.<br />
And the sooner you realize the better. And with the test we already wrote it only takes a few<br />
seconds and makes you feel more confident.<br />
If we would like to be really cool we could explore some insider know how. The expression<br />
‘i == k’ returns a boolean and we know that bool can be converted implicitely into int. In this<br />
conversation false results in 0 and true returns 1 according to the standard. This are precisely<br />
the values we want as double:<br />
e k[i]= double(i == k);<br />
In fact, the conversion from int to double is per<strong>for</strong>med implicitly and can be omitted:<br />
e k[i]= i == k;<br />
As cute as this looks, it is some stretch to assign a logical value to a floating point number. It is<br />
well-defined by the implicit conversion chain bool → int → double but it will confuse potential<br />
readers and you might end up explaining them what is happening on a mailing list or you add<br />
a comment to the program. In both cases you end up writing more <strong>for</strong> the explication than you<br />
saved in the program.<br />
Another thought that might occur to us is that it is probably not the last time we need a unit<br />
vector. So, why don’t writing a function <strong>for</strong> it?
2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 59<br />
dense vector inline unit vector(unsigned k, unsigned n)<br />
{<br />
dense vector v(n, 0.0);<br />
v[k]= 1;<br />
return v;<br />
}<br />
As the function returns the unit vector we can just take it as argument of the triangular solver<br />
res k= upper trisolve(A, unit vector(k, n));<br />
For a dense matrix, MTL4 allows us to access a matrix column as column vector (instead of a<br />
sub-matrix). Then we can assign the result vector directly without a loop.<br />
Inv[irange(0, n)][k]= res k;<br />
As short explanation, the bracket operator is implemented in a manner that integer indices<br />
<strong>for</strong> rows and columns returns the matrix entry while ranges <strong>for</strong> rows and columns returns a<br />
sub-matrix. Likewise, a range of rows and a single column gives you a column of the according<br />
matrix — or part of this column. Vice versa, a row vector can be extracted from a matrix with<br />
an integer as row index and a range <strong>for</strong> the columns.<br />
This is an interesting example how to deal with the limitations as well as possibilities of C ++.<br />
Other languages have ranges as part of their intrinsic notation, e.g. Python has a symbol ‘:’<br />
<strong>for</strong> expressing ranges of indices. C ++ does not have this symbol but we can introduce a new<br />
type — like MTL4’s irange — and define the behavior of operator[] <strong>for</strong> this type. This leads to<br />
an extremely powerful mechanism!<br />
Extending Operator Functionality<br />
Since we cannot introduce new operators into C ++— not now (in 2010), not<br />
in the next standard (C ++0x), maybe in that afterwards — we define new<br />
types and give operators the desired behavior when applied to those types.<br />
This technique allows us providing a very broad functionality with a limited<br />
number of operators.<br />
The operator semantics on user types shall be intuitive and must be consistent with the operator<br />
priority (see example in § 2.3.7).<br />
Back to our algorithm. We store the result of the solver in a vector and then we assign it to a<br />
matrix column. In fact, we can assign the triangular solver’s result directly.<br />
Inv[irange(0, n)][k]= upper trisolve(A, unit vector(k, n));<br />
The range of all indices is predefined as iall:<br />
Inv[iall][k]= upper trisolve(A, unit vector(k, n));<br />
Next, we explore some mathematical back-ground. The inverse of an upper triangular matrix<br />
is also upper triangular. Thus, we only need to compute the upper part of the result and set<br />
the remainder to 0 — or the whole matrix to zero be<strong>for</strong>e computing the upper part. Of course,
60 CHAPTER 2. <strong>C++</strong> BASICS<br />
we need smaller unit vectors now and only sub-matrices of A. This can nicely be expressed with<br />
ranges:<br />
Inv= 0;<br />
<strong>for</strong> (unsigned k= 0; k < n; ++k)<br />
Inv[irange(0, k+1)][k]= upper trisolve(A[irange(0, k+1)][irange(0, k+1)], unit vector(k, k+1));<br />
Admittedly, the irange makes the expression hard to read. Although it looks like a function,<br />
irange is a type and we just created objects on the fly and passed them to passed them to the<br />
operator[]. As we use the same range 3 times, it is shorter to create a variable (or a constant).<br />
<strong>for</strong> (unsigned k= 0; k < n; ++k) {<br />
const irange r(0, k+1);<br />
Inv[r][k]= upper trisolve(A[r][r], unit vector(k, k+1));<br />
}<br />
This does not only make the second line shorter, it is also easier to see that this is all the same<br />
range.<br />
Another observation: after shortening the unit vectors they all have the one in the last entry.<br />
Thus, we only need the size of the vector and the position of the one is implied:<br />
dense vector inline last unit vector(unsigned n)<br />
{<br />
dense vector v(n, 0.0);<br />
v[n−1]= 1;<br />
return v;<br />
}<br />
We choose a different name to reflect the different meaning. Nonetheless, we wonder if we really<br />
want such a function. How is the probability to need this ever again? Charles H. Moore,<br />
the creator of the programming language Forth once said that “The purpose of functions is not<br />
to hash a program into tiny pieces but to create highly reusable entities.” All this said, we<br />
prefer the more general function that is much more likely to be useful later.<br />
After all these modifications, we are now satisfied with the implementation and go to the next<br />
function. We still might change something at a later point in time but having it made clearer<br />
and better structured will make the later modification much easier <strong>for</strong> us or somebody else.<br />
The more experience you gain, the less steps you will need to achieve the implementation that<br />
makes you happy. And it goes without saying that we tested the inverse upper repeatedly while<br />
modifying it.<br />
Now that we know how to invert triangular matrices we can do the same <strong>for</strong> the lower triangular<br />
accordingly. Alternatively we can just transpose the input and output:<br />
dense2D inline inverse lower(dense2D const& A)<br />
{<br />
dense2D T(trans(A));<br />
return dense2D(trans(inverse upper(T)));<br />
}<br />
Ideally this implementation should look like this:<br />
dense2D inline inverse lower(dense2D const& A)<br />
{<br />
return trans(inverse upper(trans(T)));<br />
}
2.11. REAL-WORLD EXAMPLE: MATRIX INVERSION 61<br />
This does not work yet <strong>for</strong> technical reasons but will in the future.<br />
You may argue that the transpostions and passing the matrix and the vector once more takes<br />
more time. More importantly, we know that the lower matrix has a unit diagonal and we did<br />
not explore this property, e.g. <strong>for</strong> avoiding the divisions in the triangular solver. We could<br />
even ignore or omit the diagonal and treat this implicitly in the algorithms. This is all true.<br />
However, we prioritized the simplicity and clarity of the implementation and the reusability<br />
aspect higher than per<strong>for</strong>mance here. 28<br />
We have now all we need to put the matrix inversion together. As above we start we checking<br />
the squareness.<br />
dense2D inline inverse(dense2D const& A)<br />
{<br />
const unsigned n= num rows(A);<br />
assert(num cols(A) == n); // Matrix must be square<br />
Then we per<strong>for</strong>m the LU factorization. For per<strong>for</strong>mance reasons this function does not return<br />
the result but takes its arguments as mutable references and factorizes in place. Thus, we need<br />
a copy of a matrix to pass and a permutation vector of appropriate size.<br />
dense2D PLU(A);<br />
dense vector Pv(n);<br />
lu(PLU, Pv);<br />
The upper triangular factor PU of the permuted A is stored in the upper triangle of PLU. The<br />
lower triangular factor PL is partly stored in the strict lower triangle of PLU while the unit<br />
diagonal is omitted. We there<strong>for</strong>e need to add it be<strong>for</strong>e inversion (or alternatively handle the<br />
unit diagonal implicitly in the inversion).<br />
dense2D PU(upper(PLU)), PL(strict lower(PLU) + matrix::identity(n, n));<br />
The inversion of a square matrix according to Equation (2.1) can then be per<strong>for</strong>med in one<br />
single line: 29<br />
return dense2D(inverse upper(PU) ∗ inverse lower(PL) ∗ permutation(Pv));<br />
During this section you have seen that you have always alternatives to implement the same<br />
behavior, most likely you already made this experience be<strong>for</strong>e. Despite we suggested <strong>for</strong> every<br />
choice we made that it is the most appropriate, there is not always THE single best solution and<br />
even while trading off pro and cons of the alternatives, one might not come to a final conclusion<br />
and just pick one. We also illustrated that the choices depend on the goals, <strong>for</strong> instance the<br />
implementation would look different if per<strong>for</strong>mance were the primary goal.<br />
The section shall show as well that that non-trivial programs are not written in a single sweep<br />
by an ingenious mind — exceptions might prove the rule — but are the result of a gradually<br />
improving development. Experience will make this journey shorter and directer but we will not<br />
write the perfect program at the first glance.<br />
28 People that care about per<strong>for</strong>mance do not use matrix inversion in the first place.<br />
29 The explicit conversion can probably be omitted in later versions of MTL4.
62 CHAPTER 2. <strong>C++</strong> BASICS<br />
2.12 Exercises<br />
2.12.1 Age<br />
Write a program that asks input from the keyboard and prints the result on the screen and a<br />
file. The question is: What is your age?<br />
2.12.2 Exercise on include<br />
We provide you the following files: foo.hpp included by bar1.hpp and bar2.hpp. The main<br />
program is in main.cpp.<br />
Compile and try to link the program. It should not link. Correct errors so that it links.<br />
2.12.3 Arrays and pointers<br />
1. Write the following declarations: pointer to a character, array of 10 integers, pointer to<br />
an array of 10 integers, pointer to an array of character strings, pointer to pointer to a<br />
character, integer constant, pointer to an integer constant, constant pointer to an integer.<br />
Initialize all of the objects.<br />
2. Read a sequence of double’s from an input stream. Let the value 0 define the end of a<br />
sequence. Print the values in the input order. Remove duplicate values. Sort the values<br />
be<strong>for</strong>e printing.<br />
3. Make a small program that creates arrays on the stack (fixed size arrays) and arrays on<br />
the heap (using allocation, i.e. new). Use valgrind to check what happens when you do<br />
not use delete correctly.<br />
2.12.4 Read the header of a Matrix-Market file<br />
The Matrix Market data <strong>for</strong>mat is used to store dense and sparse matrices in ASCII <strong>for</strong>mat.<br />
The header contains some in<strong>for</strong>mation about the type and the size of the matrix. For a sparse<br />
matrix, the data are stored in three columns. The first column is the row number, the second<br />
column the column number, and the third column the numerical value. If the matrix is complex,<br />
a fourth column is added <strong>for</strong> the imaginary part.<br />
An example of a Matrix Market file is:<br />
%%MatrixMarket matrix coordinate real general<br />
%<br />
% ATHENS course matrix<br />
%<br />
2025 2025 100015<br />
1 1 .9273558001498543E-01<br />
1 2 .3545880644900583E-01<br />
...................
2.12. EXERCISES 63<br />
The first line that does not start with % contains the number of rows, the number of columns<br />
and the number of non-zero elements on the sparse matrix.<br />
Use fstream to read the header of a MatrixMarket file and print the number of rows and columns,<br />
and the number of nonzeroes on the screen.<br />
2.12.5 String manipulation programs<br />
There is a type string in the standard library. This type contains a large number of string<br />
operations, such as string concatenation, string comparison, etc. Note the include of the header<br />
file string.<br />
#include <br />
#include <br />
int main()<br />
{<br />
std::string s1 = ”Hello”;<br />
std::string s2 = ”World”;<br />
std::string s3 = s1 + ”, ” + s2 ;<br />
std::cout ≪ s3 ≪ std::endl ;<br />
return 0;<br />
}<br />
In this example we have concatenated the strings s1 and s2 together with a string constants.<br />
Per<strong>for</strong>m the following exercises:<br />
1. Write a function itoa (int i, std::string& b) that constructs a string representation of i in b<br />
and returns b.<br />
2. Write a simple encryption program. It should read the input from cin and write the<br />
encrypted symbols in cout. Use the following simple encryption scheme: the code <strong>for</strong> a<br />
symbol c is c key[i] , where key is a string given as a parameter to a function. The symbols<br />
from key are used in a cyclic way. (After the repeated encryption with a same key key you<br />
should get the source string.)
64 CHAPTER 2. <strong>C++</strong> BASICS<br />
2.13 Operator Precedence<br />
The following table gives all operators on one page <strong>for</strong> quickly seeing their priorities, <strong>for</strong> meaning<br />
see Table 2.3.6. Semicolons are only separators.<br />
Operator Precedence<br />
class name :: member; namespace name :: member; :: name; :: qualified-name<br />
object . member; pointer → member; expr[ expr ]<br />
object [ expr ]; expr ( expr list ); type ( expr list ); lvalue ++; lvalue −−<br />
typeid ( type ); typeid ( expr ); dynamic cast < type > ( expr )<br />
static cast < type > ( expr ); reinterpret cast < type > ( expr )<br />
const cast < type > ( expr )<br />
sizeof expr; sizeof ( type ); ++ lvalue; −− lvalue; ∼ expr; ! expr; − expr<br />
+expr; & lvalue; ∗ lvalue; new type; new type( expr list )<br />
new ( expr list ) type; new ( expr list ) type( expr list )<br />
delete pointer; delete [ ] pointer; ( type ) expr<br />
object.∗ pointer to member; pointer → ∗ pointer to member<br />
expr ∗ expr; expr / expr; expr % expr<br />
expr + expr; expr − expr<br />
expr ≪ expr; expr ≫ expr<br />
expr < expr; expr expr; expr >= expr<br />
expr == expr; expr != expr<br />
expr & expr<br />
expr ˆ expr<br />
expr | expr<br />
expr && expr<br />
expr || expr<br />
expr ? expr: expr<br />
lvalue = expr; lvalue ∗= expr; lvalue /= expr; lvalue %= expr; lvalue += expr<br />
lvalue −= expr; lvalue ≪= expr; lvalue ≫= expr; lvalue &= expr<br />
lvalue |= expr; lvalue ˆ= expr<br />
throw expr<br />
expr , expr
Classes<br />
Chapter 3<br />
“Computer science is no more about computers than astronomy is about telescopes.”<br />
— Edsger W. Dijkstra.<br />
“Accordingly, computer science is more than programming language details.”<br />
Good programming is more then drilling on small language details and more then cleverly<br />
manipulating specific bits on the latest and greatest computer hardware. Focusing primarily<br />
on technical details can lead to clever codes that per<strong>for</strong>m a certain task in a certain context<br />
extremely efficiently. If one is good at this one might even create the fastest solution <strong>for</strong> this<br />
task and gain the admiration of the geeks.<br />
3.1 Program <strong>for</strong> universal meaning not <strong>for</strong> technical details<br />
Writing leading-edge scientific software with such an attitude is very painful and likely to fail.<br />
The most important tasks in scientific programming are:<br />
• Identifying the mathematical abstractions that are important in the domain; and<br />
• Representing this abstractions comprehensively and efficiently in software.<br />
Common abstractions that appear in almost every scientific application are vector spaces and<br />
linear operators. A linear operator projects from one vector space to another one.<br />
First we should decide how to represent this abstraction in a program. Be v an element of a<br />
vector space and L a linear operator. Then C ++ allows us to represent the application of L on<br />
v as<br />
or<br />
L(v)<br />
L ∗ v<br />
Which one is better suited is not so easy to say. What is easy to say is that both are better<br />
then<br />
65
66 CHAPTER 3. CLASSES<br />
apply symm blk2x2 rowmajor dnsvec multhr athlon(L.data addr, L.nrows, L.ncols,<br />
L.ldim, L.blksch, v.data addr, v.size);<br />
Developing software in this fashion is far from being fun. It wastes so much energy of the<br />
programmer. Getting such calls right is of course much more work than the <strong>for</strong>mer notations.<br />
If one of the arguments is stored in a different <strong>for</strong>mat, the function call must be meticulously<br />
adapted. Remember the person who implements the linear projection wanted to do science,<br />
actually.<br />
The cardinal error of scientific software providing such interfaces — there is even worse than<br />
our example — is to commit to too many technical details in the user interface. The reason lies<br />
partly in the usage of simplistic programming languages as C and Fortran 77 or in the ef<strong>for</strong>t to<br />
interoperate with software in these languages.<br />
Advise<br />
If you ever get <strong>for</strong>ced to write software that interoperates with C or Fortran,<br />
write your software first with a concise and intuitive interface in C ++ <strong>for</strong><br />
yourself and other C ++ programmer and add the C and Fortran interface on<br />
top of it.<br />
The elegant way of writing scientific software is to use and to provide the best abstraction. A<br />
good implementation reduces the user interface to the essential behavior and omits all surplus<br />
commitments to technical details. Applications with a concise and intuitive interface can be as<br />
efficient as their ugly and detail-obsessed counterparts.<br />
In our example, this is achieved by providing a class <strong>for</strong> every specific linear operator and implement<br />
the projection type-dependently. 1 This way, we can apply the projection without given<br />
all details and the user application is short and nice. This chapter will show the foundations of<br />
how providing new abstraction in scientific software and the following chapters will elaborate<br />
this.<br />
3.2 Class members<br />
Object types are called classes in C ++, defined by the class keyword. A class defines a new data<br />
type, which can be used to create objects. A class is a collection of:<br />
• data;<br />
• functions which are also referred to as member functions or methods;<br />
• types<br />
Furthermore class members can be public or private and classes can inherit from each other.<br />
Let us now give an example to illustrate the class concept. To have something tangible <strong>for</strong><br />
scientists, we refrain from foo and bar examples but implement gradually a class complex (al-<br />
1 Specializations <strong>for</strong> specific plat<strong>for</strong>ms can also be handled with the type system.
3.2. CLASS MEMBERS 67<br />
though this already exist). This class must contain variables to store the real and the imaginary<br />
part:<br />
class complex<br />
{<br />
double r, i;<br />
};<br />
Variables within a class are called ‘member variables’.<br />
3.2.1 Access attributes<br />
All items — variables, constants, functions, and types — of a class have access attributes. C ++<br />
provides the following three attributes:<br />
• public: Accessible from everywhere;<br />
• private: Accessible only within the class; and<br />
• protected: Accessible only within the class and in derived classes.<br />
The access attributes give the class designer good control how the class users can utilize the<br />
class. Defining more public members gives more freedom in usage but less control and vice<br />
versa more private members establishes a stricter user interface. Protected members are less<br />
restrictive then private ones and more restrictive then public ones. Since inheritence is not a<br />
major topic in this book, they are not very important in this context. All class members are<br />
by default ‘private’.<br />
3.2.2 Member functions<br />
It is common practice in object-oriented software to declare member variables as private and<br />
access them with functions. We do this here in a Java style:<br />
class complex<br />
{<br />
public:<br />
double get r() { return r; }<br />
void set r(double newr) { r = newr; }<br />
double get i() { return i; }<br />
void set i(double newi) { i = newi; }<br />
private:<br />
double r, i;<br />
};<br />
Functions in a class are called ‘member functions’. Member functions are also private by default,<br />
i.e. they can only be called by functions within the class. This is evidently not particularly<br />
useful <strong>for</strong> our getters and setters.<br />
There<strong>for</strong>e we declared them ‘public’. Public member functions and variables can be accessed<br />
outside the class. So, we can write c.get r() but not c.r. The class above can be used in the<br />
following way:
68 CHAPTER 3. CLASSES<br />
int main()<br />
{<br />
complex c1, c2;<br />
// set c1<br />
c1.set r(3.0);<br />
c1.set i(2.0);<br />
// copy c1 to c2<br />
c2.set r(c1.get r());<br />
c2.set i(c1.get i());<br />
return 0;<br />
}<br />
In line 3 we created two objects of type complex. Then we set one of the objects and copied it<br />
to the other one. This works but it is a bit clumsy, isn’t it?<br />
C ++ provides another keyword <strong>for</strong> defining classes: struct. The only difference 2 is that members<br />
are by default public, there<strong>for</strong>e the example above is equivalent to:<br />
struct complex<br />
{<br />
double get r() { return r; }<br />
void set r(double newr) { r = newr; }<br />
double get i() { return i; }<br />
void set i(double newi) { i = newi; }<br />
private:<br />
double r, i;<br />
};<br />
Our member variables can only be accessed via functions. This gives the class designer the<br />
maximal control over the behavior. The setter could only accept values in a certain range. We<br />
could count how often the setter and getter is called <strong>for</strong> each complex number or <strong>for</strong> all complex<br />
numbers in the execution. The functions could have additional print-outs <strong>for</strong> debugging. 3 We<br />
could even allow the reading only at certain times of the day or writing only if the program runs<br />
on a computer with a certain IP. We will most likely not do the latter, at least not <strong>for</strong> complex<br />
numbers, but we could. If the variables are public and accessed directly, such modifications<br />
would not be possible. Nevertheless, handling the real and imaginary part of a complex number<br />
is cumbersome and we will discuss alternatives.<br />
Most C ++ programmer would not implement it this way. What would a C ++ programmer do<br />
first then? Writing constructors.<br />
3.3 Constructors<br />
What are constructors? Constructors initialize objects of classes and create a working environment<br />
<strong>for</strong> member functions. Sometimes such an environment includes resources like files,<br />
memory or locks that have to be freed after use. We come back to this later.<br />
To start with let us define a constructor <strong>for</strong> complex:<br />
2 There is really no other difference. One can define operators and virtual functions or derived classes in the<br />
same manner as with class. Per<strong>for</strong>mance of class and struct is also absolutely identical.<br />
3 A debugger is usually a better alternative to putting print-outs into programs.
3.3. CONSTRUCTORS 69<br />
class complex<br />
{<br />
public:<br />
complex(double rnew, double inew)<br />
{<br />
r= rnew; i= inew;<br />
}<br />
// ...<br />
};<br />
Thus, a constructor is a member function with the same name as the class itself. It can have<br />
an arbitrary number of arguments. In our case, two arguments are most suitable because we<br />
want to set two member variables. This constructor allows us to set c1’s values directly in the<br />
definition:<br />
complex c1(2.0, 3.0);<br />
There is a special syntax <strong>for</strong> setting member variables in constructors<br />
class complex<br />
{<br />
public:<br />
complex(double rnew, double inew) : r(rnew), i(inew) {}<br />
// ...<br />
};<br />
This not only shorter but has also another advantage. It calls the constructors of the variables in<br />
class’s constructor. For plain old data types (POD) this does not make a significant difference.<br />
The situation is another one if the members are themselves classes.<br />
Imagine you have a class that solves linear systems with the same matrix and you store the<br />
matrix in your class<br />
class solver<br />
{<br />
public:<br />
solver(int nrows, int ncols) // : A() #1 → error<br />
{<br />
A(nrows, ncols); // this is not a constructor here #2 → error<br />
}<br />
// ...<br />
private:<br />
matrix type A;<br />
};<br />
Suppose our matrix class has a constructor setting the dimensions. This constructor cannot<br />
be called in the function body of the constructor (#2). The call in #2 is interpreted as<br />
A.operator()(nrows, ncols), see § 4.8.<br />
All member variables of the class are constructed be<strong>for</strong>e the class constructor reaches the opening<br />
{. Those members — like A — that do not appear in the list after the colon are built by a constructor<br />
without arguments, called the default constructor. Correspondingly, classes that have<br />
such a constructor are called default-constructible. Our matrix class is not default-constructible<br />
and the compiler will tell us something like “Operator matrix type::matrix type() not<br />
found”. Thus, we need
70 CHAPTER 3. CLASSES<br />
class solver<br />
{<br />
public:<br />
solver(int nrows, int ncols) : A(nrows, ncols) {}<br />
// ...<br />
private:<br />
matrix type A;<br />
};<br />
Often the matrix (or whatever other object) is already constructed and we do not like to waste<br />
the memory <strong>for</strong> a copy. In this case we will use a reference to the object. A reference must<br />
be set in the constructor because this is the only place to declare what it is referring to. The<br />
solver shall not modify the matrix, so we write:<br />
class solver<br />
{<br />
public:<br />
solver(const matrix type& A) : A(A) {}<br />
// ...<br />
private:<br />
const matrix type& A;<br />
};<br />
The code also shows that we can give the constructor arguments the same names as the member<br />
variables. After the colon, which A is which? The rule is that names outside the parenthesis<br />
refer to members and inside the parenthesis the constructor arguments are hiding the member<br />
variables. Some people are confused by this rule and use different names. To what refers A<br />
inside {}? To the constructor argument. Only names that does not exist as argument names<br />
are interpreted as member variables. In fact, this is a pure scope resolution: the scope of the<br />
function — in this case the constructor — is inside the scope of the class and thus the argument<br />
names hide the class member names.<br />
Let us return to our complex example. So far, we have a constructor allowing us to set the real<br />
and the imaginary part. Often only the real part is set and the imaginary is defaulted to 0.<br />
class complex<br />
{<br />
public:<br />
complex(double r, double i) : r(r), i(i) {}<br />
complex(double r) : r(rnew), i(0) {}<br />
// ...<br />
};<br />
We can also say that the number is 0 + 0i if no value is given, i.e. if the complex number is<br />
default-constructed:<br />
complex() : r(0), i(0) {}
3.3. CONSTRUCTORS 71<br />
Advise<br />
Define a default constructor <strong>for</strong> where it is possible although it might not<br />
seem necessary when you implement the class.<br />
For the complex class, we might think that we do not need a default constructor because we<br />
can delay its declaration until we know its value. The absence of a default constructor creates<br />
(at least) two problems:<br />
• We might need the variable outside the scope in which the values are computed. For<br />
instance, if the value depends on some condition and we would declare the (complex)<br />
variable in the two branches of if, the variable would not exist after the if.<br />
• We build containers of the type, e.g. a matrix of complex values. Then the constructor of<br />
the matrix must call constructors of complex <strong>for</strong> each entry and the default constructor<br />
is the most convenient fashion to handle this.<br />
For some classes, it might be very difficult to define a default constructor, e.g. when some of<br />
the members are references. In those cases, it can be easier to accept the be<strong>for</strong>e-mentioned<br />
drawbacks instead of building badly designed default constructors.<br />
We can combine all three of them with default arguments:<br />
class complex<br />
{<br />
public:<br />
complex(double r= 0, double i= 0) : r(r), i(i) {}<br />
// ...<br />
};<br />
In the previous main function we defined two objects, one a copy of the other. We can write a<br />
constructor <strong>for</strong> this — called copy constructor:<br />
class complex<br />
{<br />
public:<br />
complex(const complex& c) : i(c.i), r(c.r) {}<br />
// ...<br />
};<br />
But we do not have to. C ++ is doing this itself. If we do not define a copy constructor, i.e. a<br />
construstor that has one argument and which is a const reference to its type, than the compiler<br />
creates this construstor implicitly. This automatically built copies each member variable by<br />
calling the variables’ copy constructors and this is exactly what we did. In cases like this where<br />
copying all members is precisely what you want <strong>for</strong> your copy constructor you should use the<br />
default <strong>for</strong> the following reasons:<br />
• It is less verbose;<br />
• It is less error-prone;<br />
• Other people know directly what your copy constructor does without reading your code;<br />
and
72 CHAPTER 3. CLASSES<br />
• Compilers might find more optimizations.<br />
There are cases where the default copy constructor does not work, especially when the class<br />
contains pointers. Say we have a simple vector class with a copy constructor:<br />
class vector<br />
{<br />
public:<br />
vector(const vector& v)<br />
: size(v.size), data(new double[size])<br />
{<br />
<strong>for</strong> (unsigned i= 0; i < size; i++)<br />
data[i]= v.data[i];<br />
}<br />
// ...<br />
private:<br />
unsigned size;<br />
double ∗data;<br />
};<br />
If we omit this copy constructor the compiler would not complain and voluntarily built one<br />
<strong>for</strong> us. We are glad that our program is shorter and sexier but sooner or later we find that it<br />
behaves bizarrely. Changing one vector, modifies another one as well and when we observe this<br />
strange behavior we have to find the error in our program. This is particularly difficult because<br />
there is no error in what we have written but in what we have omitted.<br />
Another problem we can observe is that the run-time library will complain that we freed the<br />
same memory twice. 4 The reason <strong>for</strong> this is the way pointers are copied. Only the address is<br />
copied and the result is that both pointers point to the same memory. This might be useful in<br />
some cases but most of the time it is not, at least in our domain. Some pointer-addicted geeks<br />
might see this differently.<br />
3.3.1 Explicit and implicit constructors<br />
In C ++ we distinguish implicit and explicit constructors. Implicit constructors enable in addition<br />
to object initialization implicit conversions and assignment-like notation <strong>for</strong> construction.<br />
Instead of:<br />
complex c1(3.0);<br />
we can also write:<br />
or<br />
complex c1= 3.0;<br />
complex c1= pi∗pi/6.0;<br />
This notation is <strong>for</strong> many scientifically educated people more readable. Older compilers might<br />
generate more code in initializations using ‘=’ (the object is first created with the default<br />
constructor and the value is copied afterwards) while current compiler generate the same code<br />
<strong>for</strong> both notations.<br />
4 This is an error message every programmer experiences at least once in his/her life (or he/she is not doing<br />
serious business).
3.3. CONSTRUCTORS 73<br />
The implicit conversion kicks in when one type is needed and another one is given, e.g. a double<br />
instead of a complex. Assume we have a function: 5<br />
double inline complex abs(complex c)<br />
{<br />
return std::sqrt(real(c) ∗ real(c) + imag(c) ∗ imag(c));<br />
}<br />
and call this with a double, e.g.:<br />
cout ≪ ”|7| = ” ≪ complex abs(7.0) ≪ ’\n’;<br />
The constant ‘7.0’ is considered as a double but there is no function ‘complex abs’ <strong>for</strong> double.<br />
There is a function <strong>for</strong> complex and complex has a constructor that accepts a double. So, the<br />
complex value is implicitly built from the double.<br />
This can be <strong>for</strong>bidden by declaring the constructor as ‘explicit’:<br />
class complex { public:<br />
explicit complex(double nr= 0.0, double i= 0.0) : r(nr), i(i) {}<br />
};<br />
Then complex abs would not be called with a double or any other type complex. To call this<br />
function with a double we can write an overload <strong>for</strong> double or construct a complex explicitly in<br />
the call:<br />
cout ≪ ”|7| = ” ≪ complex abs(complex(7.0)) ≪ ’\n’;<br />
The explicit attribute is really important <strong>for</strong> the vector class. There will be a constructor taken<br />
the size of the vector as argument:<br />
class vector<br />
{<br />
public:<br />
vector(int n) : my size(n), data(new double[my size]) {}<br />
};<br />
A function computing a scalar product will expect two vectors as arguments:<br />
double dot(const vector& v, const vector& w) { ... }<br />
Calling this function with integer arguments<br />
double d= dot(8, 8);<br />
will compile. What happened? Two temporary vectors of size 8 are created with the implicit<br />
constructor and passed to the function dot. This nonsense can be easily avoided by declaring<br />
the constructor explicit.<br />
Discussion 3.1 Which constructor shall be explicit is in the end the class designer’s decision.<br />
It is pretty obvious in the vector example: no right-minded programmer wants the compiler<br />
converting integers automatically into vectors.<br />
Whether the constructor of the complex class should be explicit depends on the expected utilization.<br />
Since a complex number with a zero imaginary part is mathematically identical with<br />
5 The definitions of real and imag will be given soon.
74 CHAPTER 3. CLASSES<br />
a real number, the implicit conversion does not create semantic inconsistencies. An implicit<br />
constructor is more convenient because doubles and double literals can be given whereever a<br />
complex is expected. Functions that are not per<strong>for</strong>mance-critical can be implemented only once<br />
<strong>for</strong> complex and used <strong>for</strong> double. Vice versa, in per<strong>for</strong>mance-critical applications it might be<br />
preferable using an explicit constructor because the compiler will refuse to call complex functions<br />
with double arguments. Then the programmer can implement overload of those functions with<br />
double arguments that do not waste run time on null imaginaries.<br />
That does not mean that high-per<strong>for</strong>mance implementations necessarily have to be realized with<br />
explicit constructors. The implicit conversion might happen in rarely called functions and the<br />
impact on the overall per<strong>for</strong>mance might be negligible. The compiler cannot tell us but a profiling<br />
tool can. A function that consumes less than 1 % of the execution time is not worth to spend<br />
much time on tuning it. All this considered, there are more reasons <strong>for</strong> an implicit constructor<br />
than <strong>for</strong> an explicit one and so it is implemented in std::complex.<br />
3.4 Destructors<br />
A destructor is a function that is called every time an object of this class is destroyed, <strong>for</strong><br />
example:<br />
∼complex()<br />
{<br />
std::cout ≪ ”So long and thanks <strong>for</strong> the fish.\n”;<br />
}<br />
Since the destructor is the complementary operation of the default constructor it uses the<br />
complementary notation in the signature. Opposed to the constructor there is only one single<br />
overload and arguments are not allowed — what could they are good <strong>for</strong> anyway, as grave<br />
goods? There is no live after death in C ++.<br />
In our example, there is nothing to do when a complex number is destroyed and we can omit<br />
the destructor. A destructor is needed when the object acquired resources, e.g. memory. In<br />
this cases the memory must be freed in the destructor and the other ressource be released.<br />
class vector<br />
{<br />
public:<br />
// ...<br />
∼vector()<br />
{<br />
if (data) // check if pointer was allocated<br />
delete[] data;<br />
}<br />
// ...<br />
private:<br />
unsigned my size;<br />
double ∗data;<br />
};<br />
Files that are opened with std::ifstream or std::ofstream does not need to closed explicitly, their<br />
destructors will do this if necessary. Files that are opened with old C handles require explicit<br />
closing and this is only one reason <strong>for</strong> not using them.
3.5. ASSIGNMENT 75<br />
It must be paid attention that the freed ressources are not used or released somewhere else in<br />
the program afterwards. C ++ generates a default destructor in the same way as the default<br />
constructor: calling the destructor of each member but in the reverse order. 6<br />
3.5 Assignment<br />
Assignment operators are used to enable <strong>for</strong> user-defined types expressions like:<br />
x= y;<br />
u= v= w= x;<br />
As usual we consider first the class complex. Assigning a complex to a complex requires an<br />
operator like:<br />
complex& operator=(const complex& src)<br />
{<br />
r= src.r; i= src.i;<br />
return ∗this;<br />
}<br />
Evidently, we copy the members ‘r’ and ‘i’. The operator returns a reference to the object<br />
<strong>for</strong> enabling multiple assignments. ‘this’ is a pointer to the object itself and since we need a<br />
reference <strong>for</strong> syntactic reasons it is dereferred. What happens if we assign a double?<br />
c= 7.5;<br />
It compiles without the definition of an assignment operator <strong>for</strong> double. Once again, we have a<br />
implicit conversion: the implicit constructor creates a complex on the fly and assigns this one.<br />
If this becomes a per<strong>for</strong>mance issue we can add an assignment <strong>for</strong> double:<br />
complex& operator=(double nr)<br />
{<br />
r= nr; i= 0;<br />
return ∗this;<br />
}<br />
An assignment operator like the first one that assigns a an object of the same type is called<br />
Copy Assignment and this operator is synthesized by the compiler. In the case of complex<br />
numbers the generated copy assignment operator per<strong>for</strong>ms exactly what we need, copying all<br />
members.<br />
As <strong>for</strong> the vector the synthesized operator is not satisfying because it only copies the address<br />
of the data and not the data itself. The implementation is very similar to the copy constructor:<br />
vector& operator=(const vector& src)<br />
{<br />
if (this == &src)<br />
return ∗this;<br />
assert(my size == src.my size);<br />
<strong>for</strong> (int i= 0; i < my size; i++)<br />
data[i]= src.data[i];<br />
6 TODO: Good and short explanation why. If possible with example.
76 CHAPTER 3. CLASSES<br />
}<br />
return ∗this;<br />
In fact every class implementation where the copy assignment and the copy constructor have<br />
essential differences in their implementation are very confusing in their behavior and should not<br />
be used, cf. [SA05, p. 94]. The two operations differ in the respect that a constructor creates<br />
content in a new object while an assignment replaces content in an existing object. However,<br />
both the creation as well as the replacement is per<strong>for</strong>med with a copy semantics and the two<br />
operations should behave consistently there<strong>for</strong>e.<br />
An assignment of an object to itself (source and target have the same address) can be skipped,<br />
line 3 and 4. In line 5 it is tested whether the assignment is a legal operation by checking<br />
the equality of their size. Alternatively the assignment could resize the target if the sizes are<br />
different but that does not correspond to the authors’ understanding of vector behavior — or<br />
can you think of a context in mathematics or physics where a vector space all of a sudden<br />
changes its dimension.<br />
3.6 Automatically Generated Operators<br />
If you define a class without operators C ++ will generate the following four:<br />
• Default constructor;<br />
• Copy constructor;<br />
• Destructor; and<br />
• Copy assignment.<br />
Assume you have a class without any function but with some member variables like this:<br />
class my class<br />
{<br />
type1 var1;<br />
type2 var2;<br />
// ...<br />
typen varn;<br />
};<br />
Then the compiler adds the four operators and your class behaves as you would have written:<br />
class my class<br />
{<br />
public:<br />
my class()<br />
: var1(),<br />
var2(),<br />
// ...<br />
varn()<br />
{}<br />
my class(const my class& that)<br />
: var1(that.var1),<br />
var2(that.var2),
3.6. AUTOMATICALLY GENERATED OPERATORS 77<br />
{}<br />
//...<br />
varn(that.varn)<br />
∼my class()<br />
{<br />
varn.∼typen();<br />
// ...<br />
var2.∼type2();<br />
var1.∼type1();<br />
}<br />
my class& operator=(const my class& that)<br />
{<br />
var1= that.var1;<br />
var2= that.var2;<br />
// ...<br />
varn= that.varn;<br />
return ∗this;<br />
}<br />
private:<br />
type1 var1;<br />
type2 var2;<br />
// ...<br />
typen varn;<br />
};<br />
The generation is straight <strong>for</strong>ward. The four operators are respectively called on each member<br />
variable. The careful reader has realized that the constructors and the assignment is per<strong>for</strong>med<br />
in the exact order as the variables are defined. The destructors are called in reverse order.<br />
The generation of these operators will be disabled if you define your own. The rules <strong>for</strong> this<br />
are quite simple. The simplest is <strong>for</strong> the destructor: either you define it or the compiler does.<br />
There is only one destructor (because it has no arguments). The default constructor generation<br />
is disabled when any constructor is defined by the user — even a private constructor.<br />
The copy constructor and copy assignment operator are generated automatically unless there<br />
is a user-defined version <strong>for</strong> the class type or a reference of it. In detail, if the user defines one<br />
or two of the following:<br />
• return type operator=(my class that);<br />
• return type operator=(const my class& that); or<br />
• return type operator=(my class& that);<br />
Then the compiler does not generated it. Typically, one defines only the second operator<br />
because the first one causes an extra copy 7 and the last one requires mutability what is usually<br />
not necessary <strong>for</strong> the assignment. The copy constructor can only be defined <strong>for</strong> references<br />
because it need itself <strong>for</strong> passing a value as argument. Defining a constructor or assignment <strong>for</strong><br />
any other type does not disable the generation of the copy operators.<br />
7 An exception is user-defined move semantics. 8
78 CHAPTER 3. CLASSES<br />
This mechanism applies recursively. For instance, if type1 is itself a class with an automatically<br />
generated default constructor the default constructors of its members are called in the order<br />
of their definition. Of those variables or some of them are also classes then their default<br />
constructors are called and so <strong>for</strong>th. If the type of a member variable is an intrinsic type like int<br />
or float then there are evidently no such operators because the types are no classes. However,<br />
the behavior can be easily emulated: the “default constructor” just creates it with a random<br />
value (whatever bits where set on the according memory position be<strong>for</strong>e determine its value),<br />
the “copy constructor” and the “copy assignment” copy the values and the “destructor” does<br />
nothing.<br />
3.7 Accessing object members<br />
3.7.1 Access functions<br />
In § 3.2.2 we introduced getters and setters to access the variables of the class complex. This<br />
becomes cumbersome when we want <strong>for</strong> instance increment the real part:<br />
c.set r(c.get r() + 5.);<br />
This does not really look like numeric operations and is not very readable either. A better way<br />
dealing with this is writing a member function that returns a reference:<br />
class complex { public:<br />
double& real() { return r; }<br />
};<br />
With this function we can write:<br />
c.real()+= 5.;<br />
This looks already much better but still a little bit weird. Why not incrementing like this:<br />
real(c)+= 5.;<br />
To do this, we write a free function:<br />
inline double& real(complex& c) { return c.r; }<br />
But this function access the private member ‘r’. We can modify the free function calling the<br />
member function:<br />
inline double& real(complex& c) { return c.real(); }<br />
Or alternatively declaring the free function as friend of complex:<br />
class complex { public:<br />
friend double& real(complex& c);<br />
};<br />
Functions or classes that are friends can access private and protected data. A strange issue<br />
with this free function is that the inline attribute must be written be<strong>for</strong>e the reference type.<br />
Usually it does not matter whether the inline is written be<strong>for</strong>e or after the return type. 9<br />
9 TODO: Anybody a decent explanation <strong>for</strong> this?
3.7. ACCESSING OBJECT MEMBERS 79<br />
This function works only the complex number is not constant. So we also need a function that<br />
takes a constant reference as argument. In return it can only provide a constant reference of<br />
the number’s real part.<br />
inline const double& real(const complex& c) { return c.r; }<br />
This function requires a friend declaration, too.<br />
The functions — in free as well as in member <strong>for</strong>m — can evidently only be called when object<br />
is created. The references of the number’s real part that we use in the statement<br />
real(c)+= 5.;<br />
exist only until the end of the statement. The variable c lives longer. We can create a reference<br />
variable:<br />
double &rr= real(c);<br />
C ++ destroys objects in reverse order. That means that even if rr and c are in the same function<br />
or block, c lives longer than rr.<br />
The same is true <strong>for</strong> constant references if objects from variable declarations are referred.<br />
Temporary objects can also be passed as constant references enabling the definition of outdated<br />
references:<br />
const double &rr= real(complex()); // Bad thing!!!<br />
cout ≪ ”The real part is ” ≪ rr ≪ ’\n’;<br />
The complex variable is created temporarily and only exist until the end of the first statement.<br />
The reference to its real part lives till the end of the surrounding block.<br />
Advise<br />
Do Not Make Constant References Of Temporary Expressions!<br />
They are invalid be<strong>for</strong>e you use them the first time.<br />
3.7.2 Subscript operator<br />
A really stupid way to access vector entries would be writing a function <strong>for</strong> each one:<br />
class vector<br />
{<br />
public:<br />
double& zeroth() { return data[0]; }<br />
double& first() { return data[1]; }<br />
double& second() { return data[2]; }<br />
// ...<br />
int size() const { return my size; }<br />
};
80 CHAPTER 3. CLASSES<br />
One could not even write a loop over all elements.<br />
To enable such iteration, we need a function like:<br />
class vector<br />
{<br />
public:<br />
double at(int i)<br />
{<br />
assert(i >= 0 && i < my size);<br />
return data[i];<br />
}<br />
};<br />
Summing the entries of vector v reads:<br />
double sum= 0.0;<br />
<strong>for</strong> (int i= 0; i < v.size(); i++)<br />
sum+= v.at(i);<br />
C ++ and C access entries of (fixed-size) arrays with the subscript operator. It is, thus, only<br />
natural doing the same <strong>for</strong> (dynamically sized) vectors. Then we could rewrite the previous<br />
example as:<br />
double sum= 0.0;<br />
<strong>for</strong> (int i= 0; i < v.size(); i++)<br />
sum+= v[i];<br />
This is more concise and shows more clearly what we are doing.<br />
The operator overloading has the same syntax as the assignment operator and the implementation<br />
from function at:<br />
class vector<br />
{<br />
public:<br />
double& operator[](int i)<br />
{<br />
assert(i >= 0 && i < my size);<br />
return data[i];<br />
}<br />
};<br />
With this operator we can access vector elements with brackets but only if the vector is mutable<br />
vectors.<br />
3.7.3 Constant member functions<br />
This raises the more general question: How can we write operators and member functions that<br />
accept constant objects? In fact, operators are a special <strong>for</strong>m of member functions and can be<br />
called like a member function:<br />
v[i]; // is syntactic sugar <strong>for</strong>:<br />
v.operator[](i);
3.7. ACCESSING OBJECT MEMBERS 81<br />
Of course, the long <strong>for</strong>m is almost never called but it illustrates that operators are regular<br />
functions that only provide an extra syntax to call them.<br />
Free functions allow qualifying the const-ness of each argument. Member functions do not even<br />
mention the processed object in the signature. How const-ness can be specified then? There is<br />
a special notation that notates the applicability of a member function to constant objects after<br />
the function header, e.g. our subscript operator:<br />
class vector<br />
{<br />
public:<br />
const double& operator[](int i) const<br />
{<br />
assert(i >= 0 && i < my size);<br />
return data[i];<br />
}<br />
};<br />
The const attribute is not just a casual gesture of the programmer that he/she does not mind<br />
calling this member function with a constant object. C ++ takes this constancy very seriously<br />
and will verify that the function does not modify the object, i.e. some of its members, that the<br />
object is only passed as const when free functions are called and that called member functions<br />
have the const attribute as well.<br />
This constancy guarantee also impedes returning non-constant pointers or references. One can<br />
return constant pointers or references as well as objects. A returned object does not need to<br />
be constant (but it could) because it is a copy of the object, of one of its member variables<br />
(or constants), or of a temporary variable; and because it is a copy the object is guaranteed to<br />
remain unchanged.<br />
Constant member functions can be called <strong>for</strong> non-constant objects (because C ++ implicitly<br />
converts non-constant references into constant references when necessary). There<strong>for</strong>e, it is<br />
often sufficient to provide only the constant member function. For instance a function that<br />
returns the size of the vector:<br />
class vector<br />
{<br />
public:<br />
int size() const { return my size; }<br />
// int size() { return my size; } // futile<br />
};<br />
The non-constant size function does the same as the constant one and is there<strong>for</strong>e useless.<br />
For our subscript operator we need both the constant and the mutable version. If we only<br />
had the constant member function, we could use it to read the elements of both constant and<br />
mutable vectors but we could not modify the elements. By the way, our abandonned getters<br />
should have been const since they are only used to read values regardless of whether the object<br />
is constant or mutable.<br />
3.7.4 Accessing multi-dimensional arrays<br />
Let us assume that we have a simple matrix class like the following:
82 CHAPTER 3. CLASSES<br />
class matrix<br />
{<br />
public:<br />
matrix() : nrows(0), ncols(0), data(0) {}<br />
matrix(int nrows, int ncols)<br />
: nrows(nrows), ncols(ncols), data( new double[nrows ∗ ncols] ) {}<br />
matrix(const matrix& that)<br />
: nrows(that.nrows), ncols(that.ncols), data(new double[nrows ∗ ncols])<br />
{<br />
<strong>for</strong> (int i= 0, size= nrows∗ncols; i < size; ++i)<br />
data[i]= that.data[i];<br />
}<br />
∼matrix() { if (data) delete [] data; }<br />
void operator=(const matrix& that)<br />
{<br />
assert(nrows == that.nrows && ncols == that.ncols);<br />
<strong>for</strong> (int i= 0, size= nrows∗ncols; i < size; ++i)<br />
data[i]= that.data[i];<br />
}<br />
int num rows() const { return nrows; }<br />
int num cols() const { return ncols; }<br />
private:<br />
int nrows, ncols;<br />
double∗ data;<br />
};<br />
So far, the implementation is done in the same manner as be<strong>for</strong>e: variables are private, the<br />
constructors establish defined values <strong>for</strong> all members, the copy constructor and the assignment<br />
are consistent, size in<strong>for</strong>mation are provided by a constant function.<br />
What is still missing is the access to the matrix entries.<br />
Be aware!<br />
The bracket operator accepts only one argument.<br />
That means we cannot define<br />
double& operator[](int r, int c) { ... }<br />
Approach 1: Parenthesis<br />
The simplest way handling multiple indices is replacing the square brackets with parentheses:<br />
double& operator()(int r, int c)
3.7. ACCESSING OBJECT MEMBERS 83<br />
{<br />
}<br />
return data[r∗ncols + c];<br />
Adding range checking — in a separate function <strong>for</strong> better reuse — can safe us a lot of debug<br />
time in the future. We also implement the constant access:<br />
private:<br />
void check(int r, int c) const { assert(0
84 CHAPTER 3. CLASSES<br />
Approach 3: Returning proxies<br />
Instead of returning a pointer we can build a specific type that keeps a reference to the matrix<br />
and the row index and that provide an operator[] <strong>for</strong> accessing matrix entries. This proxy must<br />
be there<strong>for</strong>e a friend of the matrix class to reach its private data. Alternatively, we can keep<br />
the operator with the parentheses and call this one from the proxy. In both cases, we encounter<br />
cyclic dependencies. 10<br />
If we have several matrix types, each of them would need its own proxy. We would also need<br />
different proxies <strong>for</strong> constant and mutable access respectively. In Section 6.5 we will show how<br />
to write a proxy that works <strong>for</strong> all matrix types. The same templated proxy will handle constant<br />
and mutable access. Fortunately, it even solves the problem of mutual dependencies. The only<br />
minor flaw is that eventual errors cause lenghty compiler messages.<br />
Approach 4: Multi-index type (advanced)<br />
Preliminary note: this approach contains several new language features and discusses some<br />
subtle details. If you do not understand the first time, don’t worry. If you like to skip it, do<br />
it. That will not be a problem <strong>for</strong> understanding the rest of the book. But please read the<br />
comparing discussion.<br />
The fact that operator[] accepts only one argument does not necessarily mean that we cannot<br />
give two. But we need a tricky technique to build one object out of two, without explicitly<br />
constructing the object. The implementation is based on the matrix example from an onlinetutorial<br />
[Sch].<br />
First, we define a type:<br />
struct double index<br />
{<br />
double index (int i1, int i2): i1 (i1), i2 (i2) {}<br />
int i1, i2;<br />
};<br />
For this type we define the access operator:<br />
double& operator[](double index i) { return data[i.i1∗ncols + i.i2]; }<br />
const double& operator[](double index i) const { return data[i.i1∗ncols + i.i2]; }<br />
Now we can write:<br />
A[double index(1, 0)];<br />
This works but it was not the concise notation we were looking <strong>for</strong>.<br />
We introduce a second type:<br />
struct single index<br />
{<br />
single index (int i1): i1 (i1) {}<br />
double index operator, (single index j) const {<br />
10 The dependencies cannot be resolved with <strong>for</strong>ward declaration because we not only define references or<br />
pointers but call member functions in the matrix and in the proxy. We will explain this in § ??.
3.7. ACCESSING OBJECT MEMBERS 85<br />
};<br />
}<br />
return double index (i1, j.i1);<br />
operator int() const { return i1; }<br />
single index& operator++ () {<br />
++i1; return ∗this;<br />
}<br />
int i1;<br />
This new type overloaded the comma operator so that a second index creates a double index.<br />
The constructor is implicit and the class contains an operator to int. This enables the compiler<br />
to switch between single index and int in both ways.<br />
This allows us to write code like:<br />
or<br />
single index i= 0, j= 1;<br />
std::cout ≪ ”A[0, 1] is ” ≪ A[i, j] ≪ ’\n’;<br />
<strong>for</strong> (single index i= 0; i < A.num rows(); ++i)<br />
<strong>for</strong> (single index j= 0; j < A.num cols(); ++j)<br />
std::cout ≪ ”A[” ≪ i ≪ ”, ” ≪ j ≪ ”] is ” ≪ A[i, j] ≪ ’\n’;<br />
In the loop, an single index (i) is compared with an int (A.num rows()). This comparison operator<br />
is not defined. The compiler converts i implicitly to an int and compares the values as int.<br />
Thus, the conversion operator allows us to use all operations that are defined <strong>for</strong> int without<br />
implementing them.<br />
At this opportunity we can introduce another operator. C and C ++ provide a prefix and postfix<br />
increment/decrement. The difference only manifests if we read the incremented/decremented<br />
value, e.g., j= i++; is differs from j= ++i; by having the old value of i in j (in the first statement)<br />
or the already incremented i (in the second statement). If the increment is the only expression<br />
in the statement, e.g., i++; or ++i;, there is no semantic difference. There<strong>for</strong>e, it does not<br />
matter <strong>for</strong> loops whether we use the postfix or prefix notation.<br />
<strong>for</strong> (single index i= 0; i < A.num rows(); ++i)<br />
is (semantically) equivalent to:<br />
<strong>for</strong> (single index i= 0; i < A.num rows(); i++)<br />
For C ++ integer types it really does not matter. For user-defined types, the compiler will tell<br />
us that this operation is not defined. The GNU Compiler emits the following error message:<br />
no ≫operator++(int)≪ <strong>for</strong> suffix ≫++≪ declared, instead prefix operator tried<br />
Fortunately, it already reveals the solution.<br />
The operator++ without arguments is understood as prefix operator. To define a postfix operator<br />
we must define it with a dummy int argument. This argument has no effect but we need<br />
a way to define the symbol ++ as prefix and postfix operator. Unary operators are defined<br />
as member functions without argument. This works <strong>for</strong> all other unary operators but in case
86 CHAPTER 3. CLASSES<br />
of the decrement/increment we have the same symbol <strong>for</strong> two operators respectively that are<br />
distinguished by the position.<br />
To make a long story short, if we write i++ we must define the postfix increment:<br />
single index operator++ (int)<br />
{<br />
single index tmp(∗this);<br />
++i1;<br />
return tmp;<br />
}<br />
We see that the operation requires an extra copy. The object itself must be incremented but<br />
the returned valued must be still the old one. If we returned the object itself, i.e. ∗this, then we<br />
had no possibility to increment it after the return. There<strong>for</strong>e we need a copy be<strong>for</strong>e we modify<br />
the object. Alternatively we could omit the copy and return a new object with the old value:<br />
single index operator++ (int)<br />
{<br />
++i1;<br />
return single index(i1 − 1);<br />
}<br />
This avoids the copy at the beginning but we still create a new object. These implementations<br />
show that the postfix operators are somewhat more expensive than prefix operators; and this<br />
true <strong>for</strong> all user-defined types. For C ++-own types the compiler can generate efficient executables<br />
<strong>for</strong> both <strong>for</strong>ms.<br />
The really sad part of the story is that we put so much ef<strong>for</strong>t returning the old value of our<br />
index and does not even use it. There<strong>for</strong>e, we give the following<br />
Advise<br />
If you increment or decrement user-defined types prefer the prefix notation,<br />
especially if the value of the changed variable is not used in the statement.<br />
In the examples, we declared both indices as single index. It is sufficient doing this <strong>for</strong> the first<br />
one and let the implicit constructor convert the second one:<br />
A[single index(0), 1]<br />
Un<strong>for</strong>tunately, we cannot write<br />
A[0, 1]<br />
The compiler will give an error message 11 like:<br />
no match <strong>for</strong> ≫operator[]≪ in ≫A[(0, 0)]≪<br />
To call operator[], the compiler would need to per<strong>for</strong>m multiple steps that depend on each other:<br />
first the zeros that are considered int would need to be converted to single index and then the<br />
11 This is the message from GNU compiler.
3.7. ACCESSING OBJECT MEMBERS 87<br />
comma operator has to be applied on them. A language that would allow such dependent<br />
conversions would end up in extremely long compile times to considered all possibilities 12 and<br />
probability of ambiguities would increase tremendously.<br />
Instead the compiler considers ‘0, 0’ as a sequence of two expressions where each expression is<br />
an integer constant. The result of a sequence is the result of the last expression, i.e. the integer<br />
constant zero in our case. This cannot be converted into a double index.<br />
To throw in a really bad idea, we give the second constructor argument of double index a default<br />
value:<br />
struct double index<br />
{<br />
double index (int i1, int i2= 0) // Very bad<br />
: i1 (i1), i2 (i2) {}<br />
int i1, i2;<br />
};<br />
Then the expression A[0, 1] compiles, as well as A[0, 1, 2, 3, 4]. It evaluates the integer sequence<br />
and the result is the last expression. A single integer can be implicitly converted into double index.<br />
As a result, the last integer is considered the row and the column is zero.<br />
Comparing the approaches<br />
The previous implementations show that C ++ allows us to provide different notations <strong>for</strong> userdefined<br />
types and we can implement it in the manner that seems most appropriate to us. The<br />
first approach was replacing square brackets by round parentheses to enable multiple arguments.<br />
This was the simplest solution and if one is willing to accept this syntax, one can safe oneself<br />
the length we went through to come up with a fancier notation. The technique of returning a<br />
pointer was not complicated either but it relies to strongly on the internal representation. If<br />
we use some internal blocking or some other specialized internal storage scheme, we will need<br />
an entirely different technique. Another drawback was that we cannot test the range of the<br />
column index.<br />
The last approach, introduced special types and the fact that we must always specify the type<br />
of the index explicitly makes the notation <strong>for</strong> constant indices clumsier instead of clearer. It<br />
also introduced a lot of implicit conversions and in a large code base we might have enormous<br />
trouble to avoid ambiguities. Another un<strong>for</strong>tunate aspect is the overloading of the comma<br />
operator. It makes the understanding of programs more difficult — because one has to pay a<br />
lot of attention to the types of expressions to distinguish it from non-overloaded sequences —<br />
and can cause weird affects. Thus, our first recommendation is keep reading since the proxy<br />
solution in § ?? is in our opinion preferable to the previous approaches (although not perfect<br />
either).<br />
Resuming, C ++ gives us the opportunity to handle programming tasks in different ways. Several<br />
times, none of the solutions will be perfect. Even if oneself is satisfied with the solution,<br />
then there will be most certainly some (allegedly) experienced C ++ programmer who finds a<br />
disadvantage.<br />
12 It might even become undecidable.
88 CHAPTER 3. CLASSES<br />
There are two lessons we can learn from this, firstly:<br />
Advise<br />
Don’t push C ++ too far! Avoid fragile features and minimize implicit conversions.<br />
C ++ enables many techniques but that doesn’t mean one have to use them all. Especially the<br />
comma operator bears so much danger that its utilization must be limited to very rare cases<br />
or better avoided entirely. It is important to have an appropriate notation and time spent on<br />
syntactic sugar is really worthwhile <strong>for</strong> the sake of better usability of new classes. But some<br />
tricks provide a little improvement in the syntax and create large problems in the interplay with<br />
other techniques.<br />
Secondly:<br />
Advise<br />
If you can’t find a perfect solution, pick what serves you best and accept it.<br />
We dare the hypothesis that there is no single C ++ program that everybody is happy with. The<br />
attempt to come up with the world’s first perfect C ++ program will end in failure and bitterness.<br />
Of course that does not mean always willingly accepting the first working implementation one<br />
comes up with. Software always can be improved and should be. As mentions in § 2.11,<br />
experiences have shown that is most efficient to refactor software as early as possible than<br />
retroactively fixing issues when important applications crash, users are angry and the program<br />
author(s) <strong>for</strong>got the details or are already gone. On the other hand, by the time one reaches<br />
a really good implementation one has certainly spent already much more time than initially<br />
planned.<br />
3.8 Other Operators
Generic programming<br />
Chapter 4<br />
In this chapter we will explain the use of templates in C ++ to create generic functions and<br />
classes. We will also discuss metaprogramming and the Standard Template Library.<br />
4.1 Templates<br />
Templates are a feature of the C ++ programming language that create functions and classes<br />
that operate with generic types — also called parametric types. As a result, a function or class<br />
can work with many different data types without being manually rewritten <strong>for</strong> each one.<br />
A template parameter is a special kind of parameter that can be used to pass a type as an<br />
argument: just like regular function parameters can be used to pass values to a function,<br />
template parameters allow to pass also types to a function or a class. These generic functions<br />
can use these parameters as if they were any other regular type.<br />
4.2 Generic functions<br />
Generic functions — also called function templates — are in some sort generalizations of overloaded<br />
functions.<br />
Suppose we want to write the function max(x,y) where x and y are variables or expressions of<br />
some type. Using overloading, we can easily do this as follows:<br />
int inline max (int a, int b)<br />
{<br />
if (a > b)<br />
return a;<br />
else<br />
return b;<br />
}<br />
double inline max (double a, double b)<br />
{<br />
if (a > b)<br />
return a;<br />
89
90 CHAPTER 4. GENERIC PROGRAMMING<br />
}<br />
else<br />
return b;<br />
Note that the function body is exactly the same <strong>for</strong> both int and double.<br />
With the template mechanism we can write just one generic implementation:<br />
template <br />
T inline max (T a, T b)<br />
{<br />
if (a > b)<br />
return a;<br />
else<br />
return b;<br />
}<br />
The function can be used in the same way as the overloaded functions:<br />
std::cout ≪ ”The maximum of 3 and 5 is ” ≪ max(3, 5) ≪ ’\n’;<br />
std::cout ≪ ”The maximum of 3l and 5l is ” ≪ max(3l, 5l) ≪ ’\n’;<br />
std::cout ≪ ”The maximum of 3.0 and 5.0 is ” ≪ max(3.0, 5.0) ≪ ’\n’;<br />
In the first case, ‘3’ and ‘5’ are literals of type int and the max function is instantiated to<br />
int inline max (int, int);<br />
Likewise the second and third call of max instantiate<br />
long inline max (long, long);<br />
double inline max (double, double);<br />
as the literals are interpreted as long and double.<br />
In the same way the template function can be called with variables and expressions:<br />
unsigned u1= 2, u2= 8;<br />
std::cout ≪ ”The maximum of u1 and u2 is ” ≪ max(u1, u2) ≪ ’\n’;<br />
std::cout ≪ ”The maximum of u1∗u2 and u1+u2 is ” ≪ max(u1∗u2, u1+u2) ≪ ’\n’;<br />
Here the function is instantiated <strong>for</strong> short.<br />
Instead of typename one can also write class in this context but we do not recommend this<br />
because typename expresses better the intention of a generic function.<br />
What does instantiation mean? When you write a non-generic function, the compiler reads<br />
its definition, checks <strong>for</strong> errors, and generates executable code. When the compiler processes<br />
a generic function’s definition it only checks certain errors (parsing errors) and generates no<br />
executable code. For instance:<br />
template <br />
T inline max (T a, T b)<br />
{<br />
if a > b // Error !<br />
return a;<br />
else<br />
return b;<br />
}
4.2. GENERIC FUNCTIONS 91<br />
would not compile because the if statement without the parentheses is not a legal expression of<br />
the C ++ grammar. Meanwhile the following stupid implementation:<br />
template <br />
T inline max (T a, T b)<br />
{<br />
if (a > b)<br />
return max(a, b); // Infinite loop !<br />
else<br />
return max(b, a); // Infinite loop !<br />
}<br />
compiles because it does not violate any grammar rule. It obviously results in an infinite loop<br />
but this is beyond the compiler’s responsibility.<br />
So far, the compiler only checked the grammatical correctness of the definition but did not<br />
generate code. If we do not call the template function, the binary will have no trace of our max<br />
function. What happens when we call the generic function and cause their instantiation. The<br />
compiler first checks if the function can be compiled with the given argument type. It can do<br />
it <strong>for</strong> int or double as we have seen be<strong>for</strong>e. What about types that have no ‘>’? For instance<br />
std::complex. Let us try to compile:<br />
std::complex z(3, 2), c(4, 8);<br />
std::cout ≪ ”The maximum of c and z is ” ≪ ::max(c, z) ≪ ’\n’;<br />
The double colons in front of max shall avoid ambiguities with the standard libraries max that<br />
some compilers may include implicitly (as g++ apparently). Our compilation attempt will end<br />
in error like:<br />
Error: no match <strong>for</strong> ≫operator>≪ in ≫a > b≪<br />
Obviously, we cannot call the max function with types that have no “greater than” operator.<br />
In fact, there is no maximum function <strong>for</strong> complex numbers.<br />
What happens when our template function calls another template function which in turn . . . ?<br />
Likewise, these functions are only completely checked at instantiation time. Let us look at the<br />
following program:<br />
#include <br />
#include <br />
#include <br />
#include <br />
int main ()<br />
{<br />
using namespace std;<br />
vector v;<br />
sort(v.begin(), v.end());<br />
}<br />
return 0 ;<br />
Without going into detail, the problem is the same as be<strong>for</strong>e: we cannot compare complex<br />
numbers and thus not sort arrays of it. This time the missing comparison is discovered in<br />
an indirectly called function and the compiler provides you the entire call stack so that you
92 CHAPTER 4. GENERIC PROGRAMMING<br />
can trace back the error. Please try to compile this example on different compilers at your<br />
availability and see if you can make any sense out of the error messages.<br />
If you run into such lengthy error message 1 DON’T PANIC! First, look at the error itself<br />
and take out what is useful <strong>for</strong> you, e.g. missing “operator>” or something not assignable,<br />
i.e. missing “operator=” or something const that should not. Then find in the call stack your<br />
innermost code that is the part of your program where you call somebody else’s template<br />
function. Stare <strong>for</strong> a while at this and its preceding lines because this is the most likely place<br />
where the error is made. Does a type of the template function function’s argument is missing<br />
an operator or function as mentioned in the error? Do not get scared away from this, often<br />
the problem is much simpler than it seems from the never-ending error message. From our<br />
experience, most errors in template functions one can find faster than run-time errors.<br />
Another question we have not answered so far is what happens if we use two different types:<br />
unsigned u1= 2;<br />
int i= 3;<br />
std::cout ≪ ”The maximum of u1 and i is ” ≪ max(u1, i) ≪ ’\n’;<br />
The compiler tell us — this time briefly — something like<br />
Error: no match <strong>for</strong> function call ≫max(unsigned int&, int)≪<br />
Indeed, we assumed that both types are the same. Now can we write a template function with<br />
two template parameters? Of course, we can. But that does not help us much here because we<br />
would not know what return type the function had.<br />
There are different options. First we could add a non-templated function like:<br />
int inline max (int a, int b) { return a > b ? a : b; }<br />
This can be called with mixed types and the unsigned argument would be implicitly converted<br />
into an int. But what would happen if we also add a function <strong>for</strong> unsigned?<br />
int max(unsigned a, unsigned b) { return a > b ? a : b; }<br />
Shall the int be converted into an unsigned or vice versa? The compiler does not know and will<br />
complain about this ambibuity.<br />
At any rate, adding non-templated overloads to the templated implemention is far from being<br />
elegant nor productive. So, we remove all non-templated overloads and look what we can do in<br />
the function call. We can explicitly convert one argument to the type of the other:<br />
unsigned u1= 2;<br />
int i= 3;<br />
std::cout ≪ ”The maximum of u1 and i is ” ≪ max(int(u1), i) ≪ ’\n’;<br />
Now max is called with two ints. Another option is specifying the template type explicitly in<br />
the function call:<br />
unsigned u1= 2;<br />
int i= 3;<br />
std::cout ≪ ”The maximum of u1 and i is ” ≪ max(u1, i) ≪ ’\n’;<br />
1 The longest we have heard off was 18MB what corresponds to about 9000 pages of text.
4.2. GENERIC FUNCTIONS 93<br />
Then the arguments are converted to int. 2<br />
After these less pleasant details on templates one really good news: template functions per<strong>for</strong>m<br />
as efficient as their non-templated counterpart! The reason is that C ++ generates new code<br />
<strong>for</strong> every type or type combination that the function is called with. Java in contrast compiles<br />
templates only once and executes them <strong>for</strong> different types by casting them to the corresponding<br />
types. This results in faster compilation and shorter executables but it is less efficient than<br />
non-templated implementations (which are already less efficient than C ++ programs).<br />
Another price we have to pay <strong>for</strong> the fast templates is that we have longer executables because<br />
of the multiple instantiations <strong>for</strong> each type (combination). However, in practice the number of<br />
a function’s instances will not be that large and it only really matters <strong>for</strong> non-inline functions<br />
with long implementations (including called template functions). Inline functions’ binary codes<br />
are at any rate inserted directly in the exutable at the location of the function call so that the<br />
impact on the executable length is the same <strong>for</strong> template and non-template functions.<br />
4.2.1 The function accumulate<br />
TODO: An example on containers is much better than with ugly pointer arithmetic.<br />
Consider an array double a[n] which is described by its begin and end pointers a and a + n<br />
respectively. 3<br />
We create a function <strong>for</strong> the sum of an array of doubles. The loop over the array uses pointers<br />
as was explained in Section 2.9. Figure 4.1 shows the positions of the begin pointer a and the<br />
end pointer a+n that is directly past the end of the array.<br />
a<br />
❄<br />
✲<br />
a + n<br />
Figure 4.1: An array of length n with begin and end pointers<br />
Thus, we specify the range of entries by an right-open interval of adresses.<br />
2<br />
For complicated reasons of compiler internals the explicit type parameter turns off argument-dependent<br />
name lookup (ADL).<br />
3<br />
An array and a pointer are treated in much the same way in C/C ++. So one can pass an array when a<br />
pointer is expected and it takes the address of the first entry &a[0]. a + n is <strong>for</strong> a pointer or array a and an<br />
integer n equivalent to &a[n].<br />
❄
94 CHAPTER 4. GENERIC PROGRAMMING<br />
Advise<br />
Unless you have strong reasons against it, use right-open intervals because:<br />
• It is easy to represent empty sets by two equal locations (pointers,<br />
iterators, . . . ).<br />
• It works on types without an ordering: if you specify the end by the<br />
locaation of the last element you need an operator
4.3. GENERIC CLASSES 95<br />
+= operator to variables of type T. This operator is defined <strong>for</strong> int and double types. This<br />
implies that the following main program will compile without the need <strong>for</strong> another definition of<br />
the accumulate function:<br />
int main()<br />
{<br />
const int n = 10;<br />
float a[n] ;<br />
int b[n] ;<br />
<strong>for</strong> (int i= 0; i < n; ++i) {<br />
a[i]= float(i) + 1.0f;<br />
b[i]= i + 1;<br />
}<br />
float s= accumulate(a, a + n);<br />
int r= accumulate(b, b + n);<br />
return 0;<br />
}<br />
As well as in the previous example we do not need to indicate explicitly that T is double or int.<br />
The compiler deduces this <strong>for</strong> us from the function. We can, however, fill in the correct value<br />
of the type as follows:<br />
int r = accumulate(b, b+n);<br />
If you fill in the wrong type the compiler will give you a type error by saying that no matching<br />
function exists.<br />
4.3 Generic classes<br />
In the previous section, we described the use of templates to create generic functions. Templates<br />
can also be used to create generic classes, that define a certain behaviour that is independent<br />
of the types they operate on. Good candidates are <strong>for</strong> example container classes like vectors,<br />
matrices and lists. We can also extend the complex class with a parametric value type but we<br />
spent already so much time with it that we will now look at something else.<br />
Let us write a generic vector class. 4 First we just implement a class with the most fundamental<br />
operators:<br />
template <br />
class vector<br />
{<br />
void check size(int that size) const { assert(my size == that size); }<br />
void check index(int i) const { assert(i >= 0 && i < my size); }<br />
public:<br />
explicit vector(int size)<br />
: my size(size), data( new T[my size] )<br />
{}<br />
vector()<br />
: my size(0), data(0)<br />
{}<br />
4 In the sense of linear algebra not like STL vector.
96 CHAPTER 4. GENERIC PROGRAMMING<br />
vector( const vector& that )<br />
: my size(that.my size), data( new T[my size] )<br />
{<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
data[i]= that.data[i];<br />
}<br />
∼vector() { if (data) delete [] data ; }<br />
vector& operator=( const vector& that )<br />
{<br />
check size(that.my size);<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
data[i]= that.data[i];<br />
}<br />
int size() const { return my size ; }<br />
const T& operator[]( int i ) const<br />
{<br />
check index(i);<br />
return data[i];<br />
}<br />
T& operator[]( int i )<br />
{<br />
check index(i);<br />
return data[i] ;<br />
}<br />
vector operator+( const vector& that ) const<br />
{<br />
check size(that.my size);<br />
vector sum(my size);<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
sum[i]= data[i] + that[i];<br />
return sum ;<br />
}<br />
private:<br />
int my size ;<br />
T∗ data ;<br />
};<br />
Listing 4.1: Template vector class<br />
The template class is not essentially different to a non-template class. There is only the extra<br />
parameter T as placeholder <strong>for</strong> the type that the class is used with.<br />
We have member variables like my size and member functions size() that are not affected by<br />
the template parameter. Other functions like the access operator or the first constructor are<br />
parametrized. However the difference is minimal, whereever we had double (or another type)<br />
be<strong>for</strong>e we put now the type parameter T, e.g. <strong>for</strong> return types or within new. Likewise our<br />
member variables and constants can be parametrized by T as <strong>for</strong> data. Even program parts<br />
that use generic functions or data can be often implemented without explicitly stating the
4.4. CONCEPTS AND MODELING 97<br />
type parameters. For instance the destructor uses the pointer data with a template type but<br />
the delete function can deduce its type automatically and <strong>for</strong> the null pointer test it does not<br />
matter either.<br />
Template arguments can have default values. Assume, our vector class has in addition to the<br />
value type also two parameters <strong>for</strong> the orientation and location:<br />
template <br />
class vector;<br />
The arguments of a vector can be fully declared:<br />
vector v;<br />
The last argument is equal to the default value and can be omitted:<br />
vector v;<br />
As <strong>for</strong> functions, only the last arguments can be omitted. For instance, if the second argument<br />
is the default and the last one is not we must write them all:<br />
vector w;<br />
If all template arguments are the default values, we can of course omit them all. However the<br />
type is still a template class and the compiler gets confused if we skip the brackets in the type:<br />
vector x; // wrong, it is considered a non−template class<br />
vector y; // looks a bit strange but is correct<br />
Other than the defaults of function arguments, the template defaults can refer to previous<br />
template arguments:<br />
template <br />
class pair;<br />
This is a class <strong>for</strong> two values that might have different types. If they do not we do not want to<br />
declare it twice:<br />
pair p1; // object with an int and float value<br />
pair p2; // object with two int values<br />
The dependency on previous arguments can be more complex than just equality when using<br />
meta-functions that we will introduce in Chapter ??.<br />
TODO: transition to next section<br />
4.4 Concepts and Modeling<br />
In the previous sections one could get the impression that template parameters can be replaced<br />
by any type. This is in fact not entirely true. The programmer of templated classes and functions<br />
makes assumptions about the operations that can be per<strong>for</strong>med on the templated variables. So<br />
it is very important to know which types may correctly be substituted <strong>for</strong> the <strong>for</strong>mal template<br />
parameters, in C ++ lingo which types the template function or class can be instantiated with.<br />
Clearly, accumulate can be instantiated with int or double. Types without addition like a solver
98 CHAPTER 4. GENERIC PROGRAMMING<br />
class (on page 70) cannot be used <strong>for</strong> accumulate. What should be accumulated from a set of<br />
solvers? All the requirements <strong>for</strong> the template T of the function accumulate can be summarized<br />
as follows:<br />
• T is CopyConstructable;<br />
– Copy constructor T::T(const T&) exists so that ‘T a(b);’ is compilable if b is of type<br />
T.<br />
• T is PlusAssignable:<br />
– PlusAssign operator T::operator+=(const T&) exists so that ‘a+= b;’ is compilable if<br />
b is of type T.<br />
• T is Constructible from int<br />
– Constructor T::T(int) exists so that ‘T a(0);’ is compilable.<br />
Such a set of type requirements is called a ‘Concept’. A concept CR that contains all requirements<br />
of concept C and additional requirements is called a ‘Refinement’ of C. A type t that<br />
holds all requirements of concept C is called a ‘Model’ of C.<br />
A complete definition of a template function or type shall contain the list of required concepts<br />
like it is done <strong>for</strong> functions from the Standard Template Library, see http://www.sgi.com/<br />
tech/stl/.<br />
Today such requirements are mere documentation. A prototype of a C ++ Concept Compiler [?]<br />
that checks that<br />
• Whether a function can be called with a certain type (combination 5 );<br />
• Whether a class can be instantiated with a certain types (combination); and<br />
• Whether a function’s requirement list covers all used expressions, including those in subfunctions.<br />
The compiler generates short and comprehensive message when template functions or classes<br />
are used erroneously. People interested in generic programming shall try the compiler, it helps<br />
<strong>for</strong> better understanding. However, the compiler really is a prototype and must not be used in<br />
production code. This functionality was even planned <strong>for</strong> the next language standard but the<br />
committee could achieve a consensus on its details, to make a (very) long story short.<br />
Discussion 4.1 The most vulnerable aspect of generic programming is the semantic con<strong>for</strong>mance,<br />
that is which Semantic Concepts are modelled. For instance, an algorithm might require<br />
that a binary operation is associative to calculate correctly. One can express this requirement<br />
in the functions documentation but if someone calls this function with an operation that is not<br />
associative the compiler has no idea about this. If one violates a syntactical requirement than<br />
the compiler will complain about the missing function or operator — often in a hardly readable<br />
<strong>for</strong>m — but it will be caught no matter what. If one violates a semantic requirement the<br />
compiler generates erroneous executables and the compilation does not give not any warning<br />
because it is entirely unaware of the user types’ semantic. The only way to find such semantic<br />
errors in templates with today’s compilers is careful documentation (and its reading of course).<br />
Latest research gives hope that future C ++ standards and compilers will provide more reliable<br />
and elegant possibilities to ensure semantic correctness of template programs.<br />
5 If you have multiple template arguments
4.5. INHERITANCE OR GENERICS? 99<br />
For illustration purpose we like to show the conceptualized implementation of a generic sorting<br />
function as used in the library of the concept compiler:<br />
template<br />
requires LessThanComparable<br />
&& CopyAssignable<br />
&& Swappable<br />
&& CopyConstructible<br />
inline void sort(Iter first, Iter last);<br />
If the function is called erroneously, the compiler will detect this directly in the function call<br />
not deep inside its implementation.<br />
4.5 Inheritance or Generics?<br />
In this section we will discuss the commonalities and difference of/between object-oriented<br />
programming (OOP) and generic programming. People that do not know OOP so far will not<br />
learn it in this section. The purpose of this section is to motivate why we pay more attention<br />
to generic than to object-oriented programming in this book. The short answer is per<strong>for</strong>mance<br />
and applicability. If this answer is good enough <strong>for</strong> you, you can skip this section and continue<br />
with the next one. Programmers that are used to OOP and think they can implement the<br />
functionality with inheritance instead of templates should take the time and read this section.<br />
Inheritance and generic programming are similar in the sense that most programming problems<br />
that can be solved by inheritance have a generic alternative solution and vice versa. The<br />
following table summarizes the basic components of inheritance and the corresponding building<br />
blocks of generic programming:<br />
Inheritance Generic Programming<br />
base class concept<br />
derived class model<br />
In the remainder of this section we will discuss the differences between generic programming<br />
and inheritance.<br />
We will focus on functions, but similar arguments hold <strong>for</strong> templated classes. The advantage<br />
of using a base class reference or pointer as argument type of a function is that we are sure<br />
that all derived classes can be used as argument too, see § ??. 6 Inheritance in C ++ and other<br />
OOP languages is designed in a fashion that a function in a derived class can substitute (hide)<br />
the one in the base class with the identic signature. Thus calling the function <strong>for</strong> a base class<br />
argument will either use the base class’s implementation or those of the derived class (if the<br />
function is virtual). In both cases we can rely on the existance. We will explain OOP in more<br />
detail in Section ??. Here, we only name advantages and disadvantages of the two approaches<br />
regarding different aspects of programming.<br />
Compile time: With the OOP approach, function is only compiled once. The distinction<br />
between the different calculations is realized at run-time. The generic implementation requires<br />
6 TODO: OOP section is still not written yet.
100 CHAPTER 4. GENERIC PROGRAMMING<br />
a new compilation <strong>for</strong> each combination of types. As a consequence, the sources must reside in<br />
header files and cannot be stored to libraries. 7<br />
Executable size: As mentioned be<strong>for</strong>e, generic functions need multiple compilations and<br />
as a result of this, the generated executable contains code <strong>for</strong> each instatiation. A function<br />
programmed against an abstract interface exist only once. On the other hand, the virtual<br />
functions introduce some additional memory need to store the virtual function tables. Except<br />
<strong>for</strong> some pathological examples, one can expect that this additional space is less than the extra<br />
space needed <strong>for</strong> having separate machine code <strong>for</strong> every instantiation of a generic function. In<br />
extreme cases, a very large executable size can negatively impact the per<strong>for</strong>mance due to waste<br />
of cache memory.<br />
Per<strong>for</strong>mance: The higher compilation ef<strong>for</strong>ts <strong>for</strong> generic programming has a double per<strong>for</strong>mance<br />
benefit. Functions within the multi-functional computations do not need to be called<br />
indirectly via expensive function pointers but can be called directly. Whenever appropriate they<br />
can be even inlined saving the function call overhead entirely. We once measured the impact<br />
of the approaches to the per<strong>for</strong>mance of an accumulate function (a more general approach than<br />
in § 4.2.1) [?]. The generic version was in our case about 40 times faster than the inheritancebased<br />
implementation. This value varies from plat<strong>for</strong>m to plat<strong>for</strong>m but <strong>for</strong> small functions<br />
one can expect that an inlined template function is 10–100 times faster than virtual functions.<br />
Conversely, <strong>for</strong> long calculations like solving a large linear system the per<strong>for</strong>mance difference is<br />
unperceivable.<br />
Concept refinement: that is adding (syntactic) requirements is feasible with the inheritance<br />
approach but it is very tedious and obfuscates the program source, details in [?].<br />
Intrusiveness: The genericity emulation by inheritance can induce a deep class hierarchy [?],<br />
more critical <strong>for</strong> the universal applicability is that the technology is intrusive. A type cannot<br />
be used as argument of an OOP implementation if it is not derived from the according class<br />
even if the provides the correct interface! Thus, we have to add additional base class(es) to the<br />
type. This is particularly problematic if we use types from third-party libraries or intrinsic types<br />
because we cannot add base classes their. Generic functions have not such rigid constraints. We<br />
can even adapt a third-party or intrinsic type to meet a generic function’s syntactic requirements<br />
without modifying third-party programs.<br />
Time of selection: At least one advantage of the OOP-style polymorphism we should mention<br />
at the end. The argument type of generic function call must be known at compile time so that<br />
the compiler can instantiate the template function. The type of an OOP function argument can<br />
be chosen during the execution of the program and there<strong>for</strong>e depend on preceeding calculations<br />
or input data. For instance, one can define in a file which linear solver is used in an application.<br />
Résumé: It is not our goal to compare object-oriented and generic programming in general.<br />
The two approaches complete each other in many respects and this is beyond the scope of this<br />
discussion. However, when only considering the aspect of maximal applicability with optimal<br />
per<strong>for</strong>mance the generic approach is undoubtly superior. Especially if functions of a library are<br />
used with types defined outside this library, possibly necessary interface adaption is quite easy<br />
without modifying the type definition while the addition of extra base classes <strong>for</strong>ces changing<br />
the type definition what is not always possible (or desirable). In contexts where functions are<br />
used with limited numbers of types and they are defined in the same library, derivation can be<br />
7 Libraries in the classical sense that are linked with separately compiled sources as opposed to template<br />
libraries.
4.6. TEMPLATE SPECIALIZATION 101<br />
an appropriate technique to achieve polymorphism.<br />
4.6 Template Specialization<br />
Although one of the advantages of a generic implementation is that the same code can be used<br />
<strong>for</strong> all objects that satisfy the corresponding concept, this is not always the best approach.<br />
Sometimes the same behavior can be implemented more efficiently <strong>for</strong> a specific type. In<br />
principle, one can even implement a different behaviour <strong>for</strong> a specific type but this is not<br />
advisable in general because the program becomes much more complicated to understand and<br />
using the specialized classes can require a whole chain of further specialization (bearing the<br />
danger of errors when imcompletely realized). C ++ provides an enormous flexibility and the<br />
programmer is in charge to use this flexibility responsibly and <strong>for</strong> being consistent to himself.<br />
4.6.1 Specializing a Class <strong>for</strong> One Type<br />
In the following, we want to specialize our vector example from page 96 <strong>for</strong> bool. Our goal is<br />
to save memory by packing 8 bools into one byte. Let us start with the class definition:<br />
template <br />
class vector<br />
{<br />
// ..<br />
};<br />
Although our specialized class is not type-parametric, we still need the template key word and<br />
the empty triangle brackets. After the class the complete type list must be given. This syntax<br />
looks a bit cumbersome in this context but makes more sense <strong>for</strong> multiple template arguments<br />
where only some are specialized. For instance, if we had some container with 3 arguments and<br />
specialize the second one:<br />
template <br />
class some container<br />
{<br />
// ..<br />
};<br />
Back to our boolean vector class. In the class we define a default constructor <strong>for</strong> empty vectors,<br />
a constructor <strong>for</strong> vectors of size n and a destructor. In the size of the array, we have to pay<br />
some attention if the vector size is not disible by 8 because the integer division simply cuts off<br />
the remainder.<br />
template <br />
class vector<br />
{<br />
public:<br />
explicit vector(int size)<br />
: my size(size), data(new unsigned char[(my size + 7) / 8] )<br />
{}<br />
vector() : my size(0), data(0) {}
102 CHAPTER 4. GENERIC PROGRAMMING<br />
∼vector() { if (data) delete [] data ; }<br />
private:<br />
int my size;<br />
unsigned char∗ data ;<br />
};<br />
One thing we realize is that the default constructor and the destructor are identic with the<br />
non-specialized version (in the following also referred to as general version). Un<strong>for</strong>tunately, this<br />
is not ‘inherited’ to the specialization. If we write a specialization we have to define everything<br />
from scratch. We are free to omit member functions or variables from the general but <strong>for</strong> the<br />
sake of consistency we do this only <strong>for</strong> good reasons, <strong>for</strong> very good reasons. For instance, we<br />
might omit the operator+ because we have no addition <strong>for</strong> bool. The constant access operator<br />
is implemented with shifting and bit masking:<br />
template class vector<br />
{<br />
bool operator[](int i) const { return (data[i/8] ≫i%8) & 1; }<br />
};<br />
The mutable access is trickier because we cannot refer to single bits. The trick is to returns<br />
some helper type — called ‘Proxy’ — that can per<strong>for</strong>m the assignment and returning boolean<br />
from a byte reference and the position within the byte.<br />
template class vector<br />
{<br />
vector bool proxy operator[](int i)<br />
{<br />
return vector bool proxy(data[i/8], i%8);<br />
}<br />
};<br />
Let us now implement our proxy:<br />
class vector bool proxy<br />
{<br />
public:<br />
vector bool proxy(unsigned char& byte, int p) : byte(byte), mask(1 ≪ p) {}<br />
private:<br />
unsigned char& byte;<br />
unsigned char mask;<br />
};<br />
To simplify further operations we create a mask that has 1 on the position in question and 0<br />
on all other positions.<br />
The reading access is implemented by simply masking in the conversion operator:<br />
class vector bool proxy<br />
{<br />
operator bool() const { return byte & mask; }<br />
};<br />
Setting a bit is realized by an assignment operator <strong>for</strong> bool:
4.6. TEMPLATE SPECIALIZATION 103<br />
class vector bool proxy<br />
{<br />
vector bool proxy& operator=(bool b)<br />
{<br />
if (b)<br />
byte|= mask;<br />
else<br />
byte&= ∼mask;<br />
return ∗this;<br />
}<br />
};<br />
If our argument is true we ‘or’ it with the mask, i.e. on the considered position the one bit in<br />
the mask turns on the bit in the byte reference and in all other positions the zero bits in the<br />
mask leave the according positions unchanged. Reversely with a false argument, we first invert<br />
the mask and ‘and’ it with the byte reference so that the mask’s zero bit on the active position<br />
turns the bit off and on all other positions the ‘and’ with one bits conserves the old bit values. 8<br />
4.6.2 Specializing a Function to a Specific Type<br />
Functions can be specialized in the same manner as classes. Assume we have a generic function<br />
that computes the power x y and want specialize this one:<br />
template <br />
Base inline power(const Base& x, const Exponent);<br />
template <br />
double inline power(const double& x, const double& y); // Do not use this<br />
Un<strong>for</strong>tunately many of such specializations are ignored. There<strong>for</strong>e, we give the following<br />
Advise<br />
Do not use function template specialization!<br />
To specialize a function to one specific type or type tuple as above, we can simply use overloading.<br />
This works better and is even simpler. Back to our example, assume we have an entirely<br />
generic power method. 9 In the case that both arguments are double we want nevertheless use<br />
the standard implementation hoping that some caffeine-drugged geeks figured out an incredibly<br />
fast assembler hack <strong>for</strong> our plat<strong>for</strong>m and put it in our Linux distribution. Excited by the<br />
incredible per<strong>for</strong>mance — even if it is only the hope <strong>for</strong> it — we overload our power function<br />
as follows:<br />
#include <br />
template <br />
Base inline power(const Base& x, const Exponent)<br />
8 TODO: picture<br />
9 TODO: Anybody an idea <strong>for</strong> an implementation? Or a better example?
104 CHAPTER 4. GENERIC PROGRAMMING<br />
{<br />
}<br />
...<br />
double inline power(double x, double y)<br />
{<br />
return std::pow(x, y);<br />
}<br />
Speaking of plat<strong>for</strong>m-specific assembler hacks, maybe we are eager to contribute a code that<br />
explores SSE units by per<strong>for</strong>ming two computations in parallel:<br />
template <br />
Base inline power(const Base& x, const Exponent) { ... }<br />
#ifdef SSE FOR TRYPTICHON WQ OMICRON LXXXVI SUPPORTED<br />
std::pair inline power(const std::pair& x, double y)<br />
{<br />
asm {<br />
# Yo, I’m the greatestest geek under the sun!<br />
}<br />
return whatever;<br />
}<br />
#endif<br />
#ifdef ... more hacks ...<br />
What is to say about this snippet? If you do not like to write such specializations, we will<br />
not blame you. If you do, always put such hacks in conditional compilation. You have<br />
to make sure as well that your build system only enables the macro when it is definitely a<br />
plat<strong>for</strong>m that supports the hack. For the case that it does not, we must guarentee that the<br />
generic implementation or another overload can deal with pairs of double. Last but not least,<br />
you have to rewrite your applications <strong>for</strong> using this function. Convincing others to use such<br />
special implementation could be even more work than getting the assembler hack producing<br />
plausible numbers. More importantly, such special signatures undermines the ideal of a clear<br />
and intuitive programming. However, if power functions are computed on entire vectors and<br />
matrices, one could per<strong>for</strong>m the calculation pairwise internally without affecting the interface<br />
or the user application.<br />
You might also think that SSEs were yesterday and today we have GPUs and GPGPUs but<br />
programming generically still takes a lot of tricks (at least in the beginning of 2010). But this is<br />
another story and we digress. Resuming: programming <strong>for</strong> highest per<strong>for</strong>mance can be tricky<br />
but at least there often ways to explore unportable feature (where available) without sacrificing<br />
portability at the application level. 10<br />
In the previous examples, we specialized all arguments of the function. It is also possible to<br />
specialize some argument(s) and leave the remaining argument(s) as template(s):<br />
template <br />
Base inline power(const Base& x, const Exponent& y);<br />
template <br />
10 TODO: Is this comprehensible?
4.6. TEMPLATE SPECIALIZATION 105<br />
Base inline power(const Base& x, int y);<br />
template <br />
double inline power(double x, const Exponent& y);<br />
The compiler will find all overloads that match the argument combination and select the most<br />
specific. For instance, power(3.0, 2u) will match <strong>for</strong> the first and third overload where the latter<br />
is more specific. 11 To put it to higher math: 12 type specificity is a partial order that <strong>for</strong>ms a<br />
lattice and the compiler picks the maximum of the available overloads. However, you do not<br />
need to dive deeply into algebra to see which type or type combination is more specific.<br />
If we call power(3.0, 2) with the previous overloads all three matches. However, this time we<br />
cannot determine the most specific overload. The compiler will tell us that the call is ambiguous<br />
and show us overload 2 and 3 as candidates. As we implemented the overloads consisently and<br />
with optimal per<strong>for</strong>mance we might be glad with either choice but the compiler will not choose.<br />
To disambiguate the overloads we must add:<br />
double inline power(double x, int y);<br />
The lattice people from the previous paragraph will think “Of course, we were missing the join<br />
in the specificity order.” Again, one can understand C ++ without studying lattices.<br />
4.6.3 Partial Specialization<br />
If you implemented template classes you will run sooner or later in the situation where you like<br />
to specialize a template class <strong>for</strong> another template class. Suppose we have a templated complex<br />
class:<br />
template <br />
class complex;<br />
Assume further that we had some really boosting algorithmic specialization <strong>for</strong> complex vectors<br />
13 that safes tremendous compute time. Then we start specializing our vector class:<br />
template <br />
class vector;<br />
template <br />
class vector; // again ??? :−/<br />
template <br />
class vector; // how many more ??? :−P<br />
Apparently, this lacks elegance to reimplement the specialization <strong>for</strong> all possible and impossible<br />
instantiations of complex. Much worse, it destroys our ideal of universal applicability because<br />
the complex class is intended to support user-defined types as Real but the specialization of the<br />
vector class will be ignored <strong>for</strong> those types.<br />
The solution to the implementation redundancy and the ignorance of new types is ‘Partial<br />
Specialization’. We specialize our vector class <strong>for</strong> all complex instantiations:<br />
11 TODO: Exercises <strong>for</strong> which type is more specific than which.<br />
12 For those who like higher mathematics. And only <strong>for</strong> those.<br />
13 TODO: Anyone a good example?
106 CHAPTER 4. GENERIC PROGRAMMING<br />
template <br />
class vector<br />
{<br />
...<br />
};<br />
That will do the trick. Pay attention to put a space between the closing ‘¿’; otherwise the<br />
compiler will take two subsequent ‘¿’ as shift operator ‘¿¿’ and becomes pretty confused. 14<br />
This also works <strong>for</strong> classes with multiple parameters, <strong>for</strong> instance:<br />
template <br />
class vector<br />
{<br />
...<br />
};<br />
We can also specialize <strong>for</strong> all pointers:<br />
template <br />
class vector<br />
{<br />
...<br />
};<br />
Whenever the set of types is expressible by a Type Pattern we can apply partial specialization<br />
on it.<br />
Partial template specialization can be combined with regular template specialization from § 4.6.1<br />
— let us call it ‘Complete Specialization’ <strong>for</strong> distinction. In this case, the complete specialization<br />
is prioritized over the partial one. Between different partial specializations the most specific is<br />
selected. In the following example:<br />
template <br />
class vector<br />
{<br />
...<br />
};<br />
template <br />
class vector<br />
{<br />
...<br />
};<br />
the second specialization is more specific than the first one and picked when matches. In this<br />
sense a complete specialization is always more specific than a partial one.<br />
4.6.4 Partially Specializing Functions<br />
The C ++ standard committee distinguishes between explicit specialization as in the first paragraph<br />
of § 4.6.2 and implicit specialization. An example <strong>for</strong> implicit specialization is the following<br />
computation of a value’s magnitude:<br />
14 In the next (new depending on publication date) standard, closing ‘¿’ without intermediate spaces. Some<br />
compilers — e.g., VS 2008 already support the conglutinated notation today.
4.6. TEMPLATE SPECIALIZATION 107<br />
template <br />
T inline abs(const T& x)<br />
{<br />
return x < T(0) ? −x : x;<br />
}<br />
template // Do not specialize functions like this either<br />
T inline abs(const std::complex& x)<br />
{<br />
return sqrt(real(x)∗real(x) + imag(x)∗imag(x));<br />
}<br />
This works significantly better than the explicit specialization but even this <strong>for</strong>m of specialization<br />
fails sometimes in the sense that a template function is selected which is not the most<br />
specific. 15 A mean aspect of this implicit specialization is that it seems to work properly with<br />
few specializations and when a software project grows eventually it goes wrong. Since the<br />
developers have seen the specialization working be<strong>for</strong>e, they might not expect it and the unintended<br />
function selection might remain unobserved while corrupting results or at least wasting<br />
resources. It is also possible that the specialization behavior varies from compiler to compiler. 16<br />
The only conclusion from this is to not specializing function templates! It introduces an<br />
unnecessary fragility into our software. Instead we introduce an additional class (called functor<br />
§ 4.8) with an operator(). Template classes are properly specialized on all compilers 17 both<br />
partially and completely.<br />
In our abs example we start with the function itself and a <strong>for</strong>ward declaration of the template<br />
class:<br />
template struct abs functor;<br />
template <br />
typename abs functor::result type<br />
inline abs(const T& x)<br />
{<br />
abs functor functor object;<br />
return functor object(x);<br />
}<br />
Alternatively to the <strong>for</strong>ward declaration we could have declared the class directly. The return<br />
type of our function refers to a typedef or (as correct term in generic programming) to a<br />
‘Associated Type’ of abs functor. Already <strong>for</strong> complex numbers we do not return the argument<br />
type itself but its associated type value type. Using an associated type here gives us all possible<br />
flexibility <strong>for</strong> further specialization. For instance, the magnitude of a vector could be the sum<br />
or maximum of the elements’ magnitudes or a vector with the magnitudes of each element.<br />
Evidently the functor classes must define a result type to be called.<br />
Inside the function, we instantiate the functor class with the argument type: abs functor<br />
and create an object of this type. Then we call the object’s application operator. As we do not<br />
15 TODO: Good example.<br />
16 TODO: Ask a compiler expert about this.<br />
17 Several years ago many compilers failed in partial specialization, e.g. VS 2003, but today all major compiler<br />
handle this properly. If you nevertheless experience problems with this feature in some compiler take your hands<br />
off of it, most likely you will encounter further problems. Even the CUDA compiler that is far from being<br />
standard-compliant supports partial specialization.
108 CHAPTER 4. GENERIC PROGRAMMING<br />
really the object itself but only use it <strong>for</strong> the calculation, we can as well create an anonymous<br />
object and per<strong>for</strong>m the creation/construction and calcution in one expression:<br />
template <br />
typename abs functor::result type<br />
inline abs(const T& x)<br />
{<br />
return abs functor()(x);<br />
}<br />
In this expression we have two pairs of parentheses: the first one contains the arguments of the<br />
constructor, which are empty, and the arguments of the application operator, which is/are the<br />
argument(s) of the function. If would write:<br />
template <br />
typename abs functor::result type<br />
inline abs(const T& x)<br />
{<br />
return abs functor(x); // error<br />
}<br />
then x would be interpreted as argument of the constructor and an object of the functor class<br />
would be returned. 18<br />
Now we have to implement our functor classes:<br />
template <br />
struct abs functor<br />
{<br />
typedef T result type;<br />
};<br />
T operator()(const T& x)<br />
{<br />
return x < T(0) ? −x : x;<br />
}<br />
template <br />
struct abs functor<br />
{<br />
typedef T result type;<br />
};<br />
T operator()(const std::complex& x)<br />
{<br />
return sqrt(real(x)∗real(x) + imag(x)∗imag(x));<br />
}<br />
We wrote a general implementation that works <strong>for</strong> all fixed-point and floating-point types.<br />
18 Many years and versions ago, g++ tolerated this expression (sometimes) despite it is not standard-compliant.
4.7. NON-TYPE PARAMETERS FOR TEMPLATES 109<br />
4.7 Non-Type Parameters <strong>for</strong> Templates<br />
So far, we used template arguments only <strong>for</strong> types. Values can be template arguments as well.<br />
Not all values but only integral types, i.e. fixed point numbers and bool.<br />
Very popular is the definition of short vectors and small matrices with size arguments as template<br />
parameters, <strong>for</strong> instance:<br />
template <br />
class fsize vector<br />
{<br />
typedef fsize vector self;<br />
void check index(int i) const { assert(i >= 0 && i < my size); }<br />
public:<br />
typedef T value type;<br />
const static int my size= Size;<br />
fsize vector() {}<br />
fsize vector( const self& that )<br />
{<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
data[i]= that.data[i];<br />
}<br />
self& operator=( const self& that )<br />
{<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
data[i]= that.data[i];<br />
}<br />
int size() const { return my size ; }<br />
const T& operator[]( int i ) const<br />
{<br />
check index(i);<br />
return data[i];<br />
}<br />
T& operator[]( int i )<br />
{<br />
check index(i);<br />
return data[i] ;<br />
}<br />
self operator+( const self& that ) const<br />
{<br />
self sum;<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
sum[i]= data[i] + that[i];<br />
return sum ;<br />
}<br />
private:
110 CHAPTER 4. GENERIC PROGRAMMING<br />
};<br />
T data[Size] ;<br />
If you compare this implementation with the implementation in Section 4.3 on page 95 you<br />
realize that there not so many differences.<br />
The essential difference is that the size is now part of the type and that the compiler knows it.<br />
Let us start with the latter. The compiler can use its knowlegde <strong>for</strong> optimization. For instance,<br />
if we create a variable<br />
fsize vector v(w);<br />
the compiler can decide that the generated code <strong>for</strong> the copy constructor is not per<strong>for</strong>med in a<br />
loop but as a sequence of independent operations like:<br />
fsize vector( const self& that )<br />
{<br />
data[0]= that.data[0];<br />
data[1]= that.data[1];<br />
data[2]= that.data[2];<br />
}<br />
This saves the incrementation of the counter and the test <strong>for</strong> the loop end. In some sense,<br />
this test is already per<strong>for</strong>med at compile time. As a rule of thumb, the more is known during<br />
compilation the more potential <strong>for</strong> optimization exist. We will come back to this in more detail<br />
in Section 8.2 and Chapter ??.<br />
Which optimization is induced by additional compile-time in<strong>for</strong>mation is of course compilerdependent.<br />
One can only find out which trans<strong>for</strong>mation is actually done by reading the generated<br />
assembler code — what is not that easy, especially with high optimization and with low<br />
optimization the effect will probably not be there — or indirectly by observing per<strong>for</strong>mance and<br />
comparing it with other implementations. In the example above, the compiler will probably<br />
unroll the loop as shown <strong>for</strong> small sizes like 3 and keep the loop <strong>for</strong> larger sizes say 100. You<br />
see, why this compile-time sizes are particularly interesting <strong>for</strong> small matrices and vectors, e.g.<br />
three-dimensional coordinates or rotations.<br />
Another benefit of knowning the size at compile time is that we can store the values in an array<br />
and even inside the class. Then the values of temporary objects are stored on the stack and not<br />
on the heap. 19 The creation and destruction is much less expensive because only the change of<br />
the program counter at function begin and end needs to adapted to the objects size compared<br />
to dynamic memory allocation on the heap that involves the management of lists to keep track<br />
of allocated and free memory blocks. 20 To make a long story short, keeping the data in small<br />
arrays is much less expensive than dynamic allocation.<br />
We said that the size becomes part of the type. The careful reader might have realized that we<br />
omitted the checks whether the vectors have the same size. We do not need them anymore. If<br />
an argument has the class type, it implicitly has the same size. Consider the following program<br />
snippet:<br />
fsize vector v;<br />
fsize vector w;<br />
19 TODO: Picture.<br />
20 TODO: Need easier or longer explication. or citation.
4.8. FUNCTORS 111<br />
vector x(3), y(4);<br />
v= w;<br />
x= y;<br />
The last two lines are incompatible vector assignments. The difference is that the imcompatibility<br />
in the second assignment x= y; is discovered at run time in our assertion. The assignment<br />
v= w; does not even compile because fixed-size vectors of dimension 3 only accept vectors of the<br />
same dimension as argument.<br />
Like type arguments, non-type template arguments can have defaults. Say the most frequent<br />
dimension of our vectors is three because we live in a three-dimensional world, relativity and<br />
string theory aside. Then we save some typing with a default:<br />
template <br />
class fsize vector<br />
{ /∗ ... ∗/ };<br />
fsize vector v, w, x, y;<br />
fsize vector space time;<br />
fsize vector string;<br />
4.8 Functors<br />
Let us develop a mathematical algorithm <strong>for</strong> computing the finite difference of a differentiable<br />
function f. The finite difference is an approximation of the first derivative by<br />
f ′ (x) ≈<br />
where h is a small value also called spacing.<br />
f(x + h) − f(x)<br />
h<br />
A general function <strong>for</strong> computing the finite difference is presented here:<br />
#include <br />
#include <br />
// Function taking a function argument<br />
double finite difference( double f( double ), double x, double h ) {<br />
return ( f(x+h) − f(x) ) /h ;<br />
}<br />
double sin plus cos( double x ) {<br />
return sin(x) + cos(x) ;<br />
}<br />
int main() {<br />
std::cout ≪ finite difference( sin plus cos, 1., 0.001 ) ≪ std::endl ;<br />
std::cout ≪ finite difference( sin plus cos, 0., 0.001 ) ≪ std::endl ;<br />
}
112 CHAPTER 4. GENERIC PROGRAMMING<br />
Note that the function finite difference takes an arbitrary function (from double to double) as<br />
argument.<br />
Now suppose we want to compute the second order derivative. It would make sense to call<br />
finite difference with finite difference as argument. Un<strong>for</strong>tunately this is not possible since we have<br />
three arguments in this function and the first argument of finite difference only accepts a function<br />
with a single argument.<br />
For this reason, we can use ‘functors’. Functors — not to confuse with functors from category<br />
theory — are either functions or objects of classes providing operator(). This means that<br />
‘functors’ are things which can be called liked functions but are not necessarily functions.<br />
Using objects of a class providing operator() has the additional advantage that it can use an<br />
internal state in terms of member variables. 21<br />
For our example, the functor could be implemented as follows:<br />
struct sin plus cos<br />
{<br />
double operator() (double x) const<br />
{<br />
return sin(x) + cos(x) ;<br />
}<br />
};<br />
but we could also consider a functor with an parameter like this:<br />
class para sin plus cos<br />
{<br />
public:<br />
para sin plus cos(double parameter) : parameter(parameter) {}<br />
double operator() (double x) const<br />
{<br />
return sin(parameter ∗ x) + cos(x) ;<br />
}<br />
private:<br />
double parameter;<br />
} ;<br />
How can we use the functor in a function? We want to be able to pass objects of both sin plus cos<br />
and para sin plus cos to our finite difference function. There are two possible solutions: inheritance<br />
and generic programming, which we now discuss.<br />
4.8.1 Functors via inheritance<br />
TODO: Better as counter-example in OO chapter. We haven’t introduced virtual functions<br />
yet.<br />
Let us first rewrite our function finite difference using an abstract base class.<br />
21 TODO: Do we want the following sentences?: Functors can encapsulate C and C ++ function pointers<br />
employing the concepts templates and polymorphism. All the functions must have the same return-type and<br />
calling parameters.
4.8. FUNCTORS 113<br />
struct functor base<br />
{<br />
virtual double operator() (double x) const= 0 ;<br />
} ;<br />
double finite difference( functor base const& f, double x, double h )<br />
{<br />
return ( f(x+h) − f(x) ) /h ;<br />
}<br />
The functor class has a pure 22 virtual function operator() and thus can not be used. We can<br />
however alter the functor para sin plus cos such that it inherits from the abstract base class and<br />
specializes operator().<br />
class para sin plus cos<br />
: public functor base<br />
{<br />
public:<br />
para sin plus cos(double p) : parameter(p) {}<br />
double operator() (double x) const // Is virtual function in base<br />
{<br />
return sin( parameter ∗ x ) + cos(x);<br />
}<br />
private:<br />
double parameter;<br />
};<br />
Now we can use an object of this class as the first argument of finite difference.<br />
The whole program looks as follows:<br />
#include <br />
#include <br />
struct functor base {<br />
virtual double operator() ( double x ) const = 0 ;<br />
} ;<br />
double finite difference( functor base const& f, double x, double h ) {<br />
return ( f(x+h) − f(x) ) /h ;<br />
}<br />
class para sin plus cos<br />
: public functor base<br />
{<br />
public:<br />
para sin plus cos( double const& p )<br />
: parameter ( p )<br />
{}<br />
double operator() ( double x ) const { // Virtual function<br />
return sin( parameter ∗ x )+ cos(x) ;<br />
22 TODO: undefined
114 CHAPTER 4. GENERIC PROGRAMMING<br />
}<br />
private:<br />
double parameter ;<br />
} ;<br />
int main() {<br />
para sin plus cos sin 1( 1.0 ) ;<br />
std::cout ≪ finite difference( sin 1, 1., 0.001 ) ≪ std::endl ;<br />
std::cout ≪ finite difference( para sin plus cos(2.0), 1., 0.001 ) ≪ std::endl ;<br />
std::cout ≪ finite difference( para sin plus cos(2.0), 0., 0.001 ) ≪ std::endl ;<br />
}<br />
4.8.2 Functors via generic programming<br />
If we make the functor argument in finite difference generic, we do not need a functor base<br />
any longer. There is also no need to alter our previously defined functors sin plus cos and<br />
para sin plus cos. This is a perfect example of the fact that using generic programming makes<br />
extending software easier. The program now looks like:<br />
#include <br />
#include <br />
template <br />
T inline finite difference(F const& f, const T& x, const T& h)<br />
{<br />
return ( f(x+h) − f(x) ) / h ;<br />
}<br />
class para sin plus cos<br />
{<br />
public:<br />
para sin plus cos(double p) : parameter(p) {}<br />
double operator() ( double x ) const<br />
{<br />
return sin( parameter ∗ x ) + cos(x);<br />
}<br />
private:<br />
double parameter;<br />
};<br />
int main()<br />
{<br />
para sin plus cos sin 1( 1.0 ) ;<br />
std::cout ≪ finite difference( sin 1, 1., 0.001 ) ≪ std::endl ;<br />
std::cout ≪ finite difference( para sin plus cos(2.0), 1., 0.001 ) ≪ std::endl ;<br />
std::cout ≪ finite difference( para sin plus cos(2.0), 0., 0.001 ) ≪ std::endl ;<br />
}<br />
return 0;
4.8. FUNCTORS 115<br />
Since we are using a template argument F we need to define the constraints that it has to satisfy.<br />
For this function, we need F to be a functor with one argument. This is called a UnaryFunctor.<br />
Formally, we can write this as follows:<br />
• Let f be of type F.<br />
• Let x be of type X, where X is the argument type of F.<br />
• f(x) calls f with one argument, and returns an object of the result type.<br />
In this example we also require that the argument type and result type of F are identical. We<br />
can remove this restriction if we establish a unique way to deduce the return type. This can be<br />
achieved by meta-programming or with the type deduction in the next C ++ standard.<br />
So far so good. We complained be<strong>for</strong>e that we cannot apply the finite differences on themselves<br />
to compute higher order derivatives. Actually, we still cannot. The problem is that the<br />
finite difference expects (amongst others) a unary functor and is itself a ternary function. So it<br />
cannot use itself as argument. The solution is to realize its functionality in a unary functor that<br />
we call derivative:<br />
template <br />
class derivative<br />
{<br />
public:<br />
derivative(const F& f, const T& h) : f(f), h(h) {}<br />
T operator()(const T& x) const<br />
{<br />
return ( f(x+h) − f(x) ) / h ;<br />
}<br />
private:<br />
const F& f;<br />
T h;<br />
};<br />
Now we can create an object that approximates the derivative from f(x) = sin(1 · x) + cos x:<br />
typedef derivative spc der 1;<br />
spc der 1 spc(sin 1, 0.001);<br />
The object spc can be used like a function and it approximates f ′ (x). In addition it is a unary<br />
functor. That means we can compute its derivative:<br />
typedef derivative spc der 2;<br />
spc der 2 spc scd(spc, 0.001);<br />
std::cout ≪ ”Second derivative of sin(0) + cos(0) is ” ≪ spc scd(0.0) ≪ ’\n’;<br />
The object spc scd is again a unary functor and aproximates f ′′ (x). We could again construct<br />
a functor <strong>for</strong> its derivative and continue this game eternally.<br />
Assume that we need second derivatives from different functions. Then it becomes annoying to<br />
define first the type of the first derivative constructing a functor from it <strong>for</strong> finally creating a<br />
functor the second one. According to Greg Wilson’s [?] 23 maxim “Whatever you use twice,<br />
automate!” we write a class that provides us the second derivative directly:<br />
23 This online course contains a gigantic collection of tips how to develop software successfully and avoid<br />
frustrating unproductivity. We highly recommend you reading this material.
116 CHAPTER 4. GENERIC PROGRAMMING<br />
template <br />
class second derivative<br />
{<br />
public:<br />
second derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />
T operator()(const T& x) const<br />
{<br />
return ( fp(x+h) − fp(x) ) / h ;<br />
}<br />
private:<br />
T h;<br />
derivative fp;<br />
};<br />
Now we can build the f ′′ functor from f:<br />
second derivative spc scd2(para sin plus cos(1.0), 0.001);<br />
When we think about how we would implement the third, fourth or in general the n-th derivative,<br />
we realize that they would look much like the second one: calling the (n-1)-th derivative on x+h<br />
and x. We can explore this with a recursive implementation:<br />
template <br />
class nth derivative<br />
{<br />
typedef nth derivative prec derivative;<br />
public:<br />
nth derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />
T operator()(const T& x) const<br />
{<br />
return ( fp(x+h) − fp(x) ) / h ;<br />
}<br />
private:<br />
T h;<br />
prec derivative fp;<br />
};<br />
To save the compiler from infinite recursion we must stop this mutual referring when we reach<br />
the first derivative. Note that we cannot use ‘if’ or ‘?:’ to stop the recursion because both of its<br />
respective branches are evaluated and one of them still contains the infinite recursion. Recursive<br />
template definitions are terminated with a specialization like this:<br />
template <br />
class nth derivative<br />
{<br />
public:<br />
nth derivative(const F& f, const T& h) : f(f), h(h) {}<br />
T operator()(const T& x) const<br />
{<br />
return ( f(x+h) − f(x) ) / h ;<br />
}<br />
private:
4.8. FUNCTORS 117<br />
};<br />
const F& f;<br />
T h;<br />
This specialization is identical with the class derivative that we now could throw away. If we keep<br />
it, we can at least reuse its functionality and variables to reduce redundancy. This is achieved<br />
by derivation (more in Chapter 6).<br />
template <br />
class nth derivative<br />
: public derivative<br />
{<br />
public:<br />
nth derivative(const F& f, const T& h) : derivative(f, h) {}<br />
};<br />
With our recursive definition we can easily define the twenty-second derivative:<br />
nth derivative spc 22(para sin plus cos(1.0), 0.00001);<br />
The new object spc 22 is again a unary functor. Un<strong>for</strong>tunately, it approximates so badly that<br />
we are too ashamed to present the results here. From Taylor series we know that the error of<br />
the f ′′ approximation is reduced from O(h) to O(h 2 ) when a backward difference is applied<br />
on the <strong>for</strong>ward difference. This said, maybe we can improve our approximation if we alternate<br />
between <strong>for</strong>ward and backward differences:<br />
template <br />
class nth derivative<br />
{<br />
typedef nth derivative prec derivative;<br />
public:<br />
nth derivative(const F& f, const T& h) : h(h), fp(f, h) {}<br />
T operator()(const T& x) const<br />
{<br />
return N & 1 ? ( fp(x+h) − fp(x) ) / h<br />
: ( fp(x) − fp(x−h) ) / h ;<br />
}<br />
private:<br />
T h;<br />
prec derivative fp;<br />
};<br />
Sadly, our 22nd derivative is still as wrong as be<strong>for</strong>e, well slightly worse. Which is particularly<br />
frustrating when we become aware that we evaluate f over four million times. 24 Decreasing h<br />
does not help either: the tangent approaches better the derivative but on the other hand the<br />
values of f(x) and f(x ± h) become quite close and their difference has only few meaningful<br />
bits. At least the second derivative improved by our alternating difference scheme as the Taylor<br />
series teach us. Another consolidating fact is that we probably did not pay <strong>for</strong> the alteration.<br />
The template argument N is known at compile time and the condition N&1 whether the last<br />
bit is on can be also evaluated during compilation. When N is odd than the operator reduces<br />
effectively to:<br />
24 TODO: Is there an efficient and well-approximating recursive scheme to compute higher order derivatives?
118 CHAPTER 4. GENERIC PROGRAMMING<br />
T operator()(const T& x) const<br />
{<br />
return ( fp(x+h) − fp(x) ) / h ;<br />
}<br />
Likewise <strong>for</strong> even N, only the backward difference is computed without testing.<br />
If nothing else we learned something about C ++ and we are confirmed in the<br />
Truism<br />
Not even the coolest programming can substitute <strong>for</strong> solid mathematics.<br />
In the end, this script is primarily about programming. To improve the expressiveness of our<br />
software, functors are an extremely powerful approach. We have seen how to take an arbitrary<br />
unary function and construct a unary function that approximates its derivative or a higher-order<br />
derivative.<br />
If we do not know the type of a function or we do not like to bother with it we can write a<br />
convenience function that detects the type automatically:<br />
template <br />
nth derivative<br />
inline make nth derivative(const F& f, const T& h)<br />
{<br />
return nth derivative(f, h);<br />
}<br />
Here F and T are types of function arguments and can be detected by the compiler. The only<br />
template argument that the compiler does not detect is N. Note that such arguments must be<br />
at the beginning of the template argument list and the compiler-detected at the end. There<strong>for</strong>e<br />
the following template function is wrong:<br />
template // error<br />
nth derivative<br />
inline make nth derivative(const F& f, const T& h)<br />
{<br />
return nth derivative(f, h);<br />
}<br />
If you call this one, the compiler will complain that it cannot detect N. This leads us to the<br />
question how we call this function. Of course, we can explicitly declare all argument types:<br />
make nth derivative(sin 1, 0.00001);<br />
But this is exactly what we wanted to avoid with implementing this function. As said, F and T<br />
can be detected by the compiler and we only need to provide N:<br />
make nth derivative(sin 1, 0.00001);<br />
What is this expression good <strong>for</strong>? Written like this, not much. It creates a function that will<br />
be immediately destroyed. If it is a function we should be able to call it with an argument:
4.8. FUNCTORS 119<br />
std::cout ≪ ”Seventh derivative of sin 1 at x=3 is ”<br />
≪ make nth derivative(sin 1, 0.00001)(3.0) ≪ ’\n’;<br />
In the cases above the type of the functor was obvious because we wrote the class ourselves. The<br />
type is less obvious if the type is constructed from an expression, <strong>for</strong> instance by a λ-function.<br />
Support <strong>for</strong> λ-functions will be introduced with C ++0x. 25 Emulation is available since some<br />
years with Boost.Lambda [?]. For instance, we can generate a functor object that computes<br />
with the following short expression:<br />
(3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1;<br />
p(x) = 3.5x 3 + 4x 2 = (3.5x + 4)x 2<br />
This expression can be used with our derivative function:<br />
make nth derivative((3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1, 0.0001)<br />
to generate a functor computing (approximating) 21x + 8.<br />
With the lambda expressions, we do not even know the type of our functor but we can compute<br />
its derivative. The type is in fact so long 26 that it is much easier to implement our own functor<br />
when we were obliged to spell the type out.<br />
The following listing illustrates how to approximate p ′′ (2):<br />
#include <br />
// .. our definitions of derivatives<br />
int main()<br />
{<br />
using boost::lambda:: 1;<br />
std::cout ≪ ”Second derivative of 3.5∗xˆ3+4∗xˆ2 at x=2 is ”<br />
≪ make nth derivative((3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1, 0.0001)(2) ≪ ’\n’;<br />
return 0;<br />
}<br />
Un<strong>for</strong>tunately, we cannot keep the results of our computations if we do not know their types<br />
with current standard C ++. In C ++0x, we will be able to let the compiler deduce the type:<br />
auto p= (3.5 ∗ 1 + 4.0) ∗ 1 ∗ 1; // With <strong>C++</strong>0x<br />
auto p2= make nth derivative(p, 0.0001);<br />
Once defined, we can reuse p and p2 as often as we want. Of course, calculating the derivatives<br />
of polynomials can be done better than with differential quotients. We will discuss this in<br />
Section 8.2.<br />
25 TODO: Try in g++ 4.3 and 4.4?<br />
26 boost::lambda::lambda functor
120 CHAPTER 4. GENERIC PROGRAMMING<br />
4.8.3 The function accumulate with a functor argument<br />
TODO: Again, I don’t like the use of pointers here — Peter<br />
Recall the function accumulate from section 4.2.1 that we used to introduce Generic Programming.<br />
In this section, we will generalize this function. We introduce a binary functor (concept<br />
BinaryFunctor) that implements an operation on two arguments as function or callable class<br />
object. 27 Then we can accumulate values with respect to this binary operation:<br />
template <br />
T accumulate( T∗ a, T∗ a end, T init, BinaryFunctor op ) {<br />
T sum( init ) ;<br />
<strong>for</strong> ( ; a!=a end; ++a ) {<br />
sum = op( sum, ∗a ) ;<br />
}<br />
return sum ;<br />
}<br />
The concept BinaryFunctor is defined as follows: 28<br />
• Let op be of type BinaryFunctor.<br />
– has the method op( first argument type, second argument type ) with result type being<br />
convertible to T. T should be convertible to the first and second argument types.<br />
From this generic example, it is quite clear that the conceptual conditions are becoming complicated<br />
when we are mixing types. Usually, we make sure that the first argument type, second<br />
argument type and result type are the same, but strictly speaking, this is not required, since<br />
the compiler is allowed to per<strong>for</strong>m conversions.<br />
The main program could be as follows:<br />
struct sum functor<br />
{<br />
double operator() ( double a, double b ) const {<br />
return a + b ;<br />
}<br />
} ;<br />
struct product functor<br />
{<br />
double operator() ( double a, double b ) const {<br />
return a ∗ b ;<br />
}<br />
} ;<br />
int main()<br />
{<br />
int n=10;<br />
double a[n] ;<br />
double s = accumulate( a, a+n, 0.0, sum functor() ) ;<br />
s = accumulate( a, a+n, 1.0, product functor() ) ;<br />
}<br />
27 TODO: Introduce term.<br />
28 TODO: revisit
4.9. STL — THE MOTHER OF ALL GENERIC LIBRARIES 121<br />
4.9 STL — The Mother of All Generic Libraries<br />
The Standard Template Library — STL — is an example of a generic C ++ library. It defines<br />
generic container classes, generic algorithms, and iterators. Online documentation is provided<br />
under www.sgi.com/tech/stl. There are also entire books written about the usage of STL so<br />
that we can keep it short here and refer to these books [?].<br />
4.9.1 Introducing Example<br />
Containers are classes whose purpose is to contain other objects. The classes vector and list are<br />
examples of STL container classes. Each of these classes is templated, and can be instantiated<br />
to contain any type of object (that is a model of the appropriate concept). For example, the<br />
following lines create a vector containing doubles and another one containing integers:<br />
std::vector vec d ;<br />
std::vector vec i ;<br />
The STL also includes a large collection of algorithms that manipulate the data stored in<br />
containers. The accumulate algorithm, <strong>for</strong> example, can be used to compute any reduction —<br />
such as sum, product, or minimum — on a list or vector in the following way:<br />
std::vector vec ; // fill the vector...<br />
std::list lst ; // fill the list...<br />
double vec sum = std::accumulate( vec.begin(), vec.end(), 0.0 ) ;<br />
double lst sum = std::accumulate( lst.begin(), lst.end(), 0.0 ) ;<br />
Notice the use of the functions begin() and end(), that denote the begin and end of the vector<br />
and the list represented by ‘Iterators’. Iterators are the central concept of the STL and we will<br />
have a closer look at it.<br />
4.9.2 Iterators<br />
Disrespectfully spoken, an iterator is a generalized pointer: one can dereference it and change the<br />
referred location. This over-simplified view is not doing justice to its importance. Iterators are a<br />
Fundamental Methodology to Decouple the Implementation of Data Structures and Algorithms.<br />
Figure 4.2 29 depicts this central role of iterators. Every data structure provides an iterator <strong>for</strong><br />
traversing it and all algorithms are implemented in terms of iterators.<br />
To program m algorithms on n data structures, one needs in classical C and Fortran programming<br />
m · n implementations.<br />
Expressing algorithms in terms of iterators decreases this to only<br />
m + n implementations!<br />
29 TODO: Flatter boxes and more containers and algos, maybe.
122 CHAPTER 4. GENERIC PROGRAMMING<br />
Data Structures Algorithms<br />
vector<br />
set<br />
map<br />
queue<br />
: :<br />
Iterators<br />
Figure 4.2: Central role of iterators in STL<br />
copy<br />
search<br />
replace<br />
Evidently, not all algorithms can be implemented on every data structure. Which algorithm<br />
works on a given data structure depends on the kind of iterator provided by the container.<br />
Iterators can be distinguished by the <strong>for</strong>m of access:<br />
InputIterator: an iterator concept <strong>for</strong> reading the referred entries.<br />
OutputIterator: an iterator concept <strong>for</strong> writing to the referred entries.<br />
Note that the ability to write does not imply readability, e.g., an ostream iterator is an STL<br />
interface used to write to output streams like files opened in write mode. Another differentiation<br />
of iterators is the <strong>for</strong>m of traversal:<br />
ForwardIterator: a concept <strong>for</strong> iterators that can pass from one element to the next, i.e. types<br />
that provide an operator++. It is a refinement of InputIterator and OutputIterator. In contrast<br />
to those, ForwardIterator they allows <strong>for</strong> traversing multiple times.<br />
BidirectionalIterator: a concept <strong>for</strong> iterators with step-wise <strong>for</strong>ward and backward traversal,<br />
i.e. types with operator++ and operator−−. It refines ForwardIterator.<br />
RandomAccessIterator: a concept <strong>for</strong> iterators that can increment their position by an arbitrary<br />
integer, i.e. types that also provide operator[]. It refines BidirectionalIterator.<br />
Data structures that provide more refined iterators (e.g. modeling RandomAccessIterator) can be<br />
used in more algorithms. Dually, algorithm implementations that require less refined iterators<br />
(like InputIterator) can be applied to more data structures. The interfaces are designed with<br />
backward compatibility in mind and old-style pointers can be used as iterators.<br />
All standard container templates provide a rich and consistent set of iterator types. The<br />
following very simple example shows a typical use of iterators:<br />
std::list l ;<br />
<strong>for</strong> (std::list::const iterator it = l.begin(); it != l.end(); ++it) {<br />
std::cout ≪ ∗it ≪ std::endl;<br />
}<br />
sort<br />
: :
4.10. CURSORS AND PROPERTY MAPS 123<br />
As illustrated above, iterators are usually used in pairs, where one is used <strong>for</strong> the actual iteration<br />
and the second serves to mark the end of the collection. The iterators are created by the<br />
corresponding container class using standard methods such as begin() and end(). The iterator<br />
returned by begin() points to the first element. The iterator returned by end() points past the<br />
end of elements to mark the end. All algorithms are implemented with right-open intervals<br />
[b, e) operating on the value referred by b until b = e. There<strong>for</strong>e intervals of the <strong>for</strong>m [x, x) are<br />
regarded empty.<br />
A more general (and more useful) algorithm is the linear search on an arbitrary sequence. This<br />
is provided by the STL function find in the following fashion:<br />
template <br />
InputIterator find(InputIterator first, InputIterator last, const T& value) {<br />
while (first != last && ∗first != value)<br />
++first;<br />
return first;<br />
}<br />
find takes three arguments: two iterators that define the right-open interval of the search space,<br />
and a value to search <strong>for</strong> in that range. Each entry referred by ‘first’ is compared with ‘value’.<br />
When a match is found, the iterator pointing to it is returned. If the value is not contained<br />
in the sequence, an iterator equal to ‘last’ is returned. Thus, the caller can test whether the<br />
search was successful by comparing its result with ‘last’. In fact, one must per<strong>for</strong>m this test<br />
because after a failed search the returned iterator cannot dereferred correctly (it points outside<br />
the given range and might cause segmentation violations or corrupt data).<br />
This section only scratched the surface of STL and was primarily intended to introduce the<br />
iterator concept that we will generalize in the following section.<br />
4.10 Cursors and Property Maps<br />
The essential idea of iterators is to represent a position and a referred value. A further generalization<br />
of this idea is to decouple the the notion of position and value. Dietmar Kühl<br />
proposed this mechanism in his master thesis (Diplomarbeit) [?] <strong>for</strong> the generic treatment of<br />
grahps. The Boost Graph Library [?] provides the notion of property maps in the <strong>for</strong>m that<br />
properties are available <strong>for</strong> vertices and edges and all properties can be accessed independently<br />
from each other and from the traversal of the graph.<br />
As case study we implement a simple sparse matrix class with cursors and property maps. The<br />
minimalistic implementation of the sparse matrix is:<br />
#include <br />
#include <br />
#include <br />
#include <br />
template <br />
class coo matrix<br />
{<br />
typedef Value value type; // better in trait<br />
public:<br />
coo matrix(int nr, int nc) : nr(nr), nc(nc) {}
124 CHAPTER 4. GENERIC PROGRAMMING<br />
void insert(int r, int c, Value v)<br />
{<br />
assert(r < nr && c < nc);<br />
row index.push back(r);<br />
col index.push back(c);<br />
data.push back(v);<br />
}<br />
void sort() {}<br />
int nnz() const { return row index.size(); }<br />
int num rows() const { return nr; }<br />
int num cols() const { return nc; }<br />
int begin row(int r) const<br />
{<br />
unsigned i= 0;<br />
while (i < row index.size() && row index[i] < r) ++i;<br />
return i;<br />
}<br />
template friend struct coo col;<br />
template friend struct coo row;<br />
template friend struct coo const value;<br />
template friend struct coo value;<br />
private:<br />
int nr, nc;<br />
std::vector row index, col index;<br />
std::vector data;<br />
};<br />
The matrix is supposed to be sorted lexicographically (although we omitted the implementation<br />
of the sort function <strong>for</strong> the sake of brevity). For any offset i the i th entry in each of the vectors<br />
row index, row index and data represent row, column and value of one non-zero entry in the matrix.<br />
The traversal over all non-zeros of the matrix can be realized with a cursor that contains just<br />
this offset.<br />
struct nz cursor<br />
{<br />
typedef int key type;<br />
nz cursor(int offset) : offset(offset) {}<br />
nz cursor& operator++() { offset++; return ∗this; }<br />
nz cursor operator++(int) { nz cursor tmp(∗this); offset++; return tmp; }<br />
key type operator∗() const { return offset; }<br />
bool operator!=(const nz cursor& other) { return offset != other.offset; }<br />
protected:<br />
int offset;<br />
};
4.10. CURSORS AND PROPERTY MAPS 125<br />
The cursor is initialized with one offset. Many cursor classes will keep a reference to the traversed<br />
matrix object but we do not need this here. The cursor can be incremented, compared, and<br />
dereferred. The result of the dereferentiation is a ‘key’. For simplicity we used an int as key<br />
type.<br />
Like the begin and end functions in STL we define:<br />
template <br />
nz cursor nz begin(const Matrix& A)<br />
{<br />
return nz cursor(0);<br />
}<br />
template <br />
nz cursor nz end(const Matrix& A)<br />
{<br />
return nz cursor(A.nnz());<br />
}<br />
the function nz begin that returns a cursor on the first non-zero entry and nz end which gives a<br />
past-the-end cursor to terminate the traversal<br />
A key can be used as argument <strong>for</strong> a property map that we will define now:<br />
template <br />
struct coo col<br />
{<br />
typedef int key type;<br />
coo col(const Matrix& ref) : ref(ref) {}<br />
int operator()(key type k) const { return ref.col index[k]; }<br />
private:<br />
const Matrix& ref;<br />
};<br />
Property maps have typically a reference to the matrix in order to read internal data from it.<br />
They are often declared as friends because they are an important tool to access the object’s<br />
internal data — it might even be the only way to access data as in the Boost Graph Library.<br />
Property maps to read the row index or the value fo the offset key are equivalent and there<strong>for</strong>e<br />
omitted here.<br />
A property map <strong>for</strong> mutable entries is implemented as follows:<br />
template <br />
struct coo value<br />
{<br />
typedef int key type;<br />
typedef typename Matrix::value type value type;<br />
coo value(Matrix& ref) : ref(ref) {}<br />
value type operator()(key type k) const { return ref.data[k]; }<br />
void operator()(key type k, const value type& v) { ref.data[k]= v; }
126 CHAPTER 4. GENERIC PROGRAMMING<br />
private:<br />
Matrix& ref;<br />
};<br />
In contrast to the previous maps it contains a mutable reference and another operator <strong>for</strong> setting<br />
a value.<br />
To test our implementation we create matrix A:<br />
coo matrix A(3, 5);<br />
A.insert(0, 0, 2.3);<br />
A.insert(0, 3, 3.4);<br />
A.insert(1, 2, 4.5);<br />
and define the three property maps:<br />
coo col col(A);<br />
coo row row(A);<br />
coo value value(A);<br />
A read-only traversal of all non-zero entries reads:<br />
<strong>for</strong> (nz cursor c= nz begin(A), end= nz end(A); c != end; ++c)<br />
std::cout ≪ ”A[” ≪ row(∗c) ≪ ”][” ≪ col(∗c) ≪ ”] = ” ≪ value(∗c) ≪ ”\n”;<br />
Scaling all non-zero elements can be achieved similarly:<br />
<strong>for</strong> (nz cursor c= nz begin(A), end= nz end(A); c != end; ++c)<br />
value(∗c, 2.0 ∗ value(∗c));<br />
Note that we did not used all property maps in the last algorithm. In fact, this is one of the<br />
motivation <strong>for</strong> property maps. Only data really needed in the algorithm must be provided. In<br />
today’s computer landscape, this can make a significant difference in per<strong>for</strong>mance since reading<br />
and writing data is much more time-consuming than most numeric computations. Or if data is<br />
only available implicitly and needs recomputation.<br />
Another advantage of this approach is the easier realization of nested traversals. Say we have<br />
an algorithm that iterates over rows and within each row over the non-zero entries. In this case,<br />
we need other cursor type(s) but can reuse the property maps — if our new cursor derefer to<br />
the same key type. First we need a cursor to iterate over all rows of a matrix:<br />
template <br />
struct row cursor<br />
{<br />
row cursor(int r, const Matrix& ref) : r(r), ref(ref) {}<br />
row cursor& operator++() { r++; return ∗this; }<br />
row cursor operator++(int) { row cursor tmp(∗this); r++; return tmp; }<br />
bool operator!=(const row cursor& other) { return r != other.r; }<br />
nz cursor begin() const { return nz cursor(ref.begin row(r)); }<br />
nz cursor end() const { return nz cursor(ref.begin row(r+1)); }<br />
protected:<br />
int r;<br />
const Matrix& ref;<br />
};
4.10. CURSORS AND PROPERTY MAPS 127<br />
Its implementation is almost the same as nz cursor and with some refactoring one could certainly<br />
combine it in one implementation that serves both cursors as base class. For the sake of<br />
simplicity we refrain from it here. The two main differences to nz cursor are<br />
• The lack of operator∗ because the cursor is not intended to be dereferred; and<br />
• The functions begin and end to provide the inter loop traversal.<br />
The according functions to provide a right-open interval of row cursors are straight <strong>for</strong>ward:<br />
template <br />
row cursor row begin(const Matrix& A)<br />
{<br />
return row cursor(0, A);<br />
}<br />
template <br />
row cursor row end(const Matrix& A)<br />
{<br />
return row cursor(A.num rows(), A);<br />
}<br />
We can now write begin and end functions that take a row cursor (instead of a matrix) as<br />
argument and give the right-open interval of the rows non-zeros:<br />
template <br />
nz cursor nz begin(const row cursor& c)<br />
{<br />
return c.begin();<br />
}<br />
template <br />
nz cursor nz end(const row cursor& c)<br />
{<br />
return c.end();<br />
}<br />
For the inner loop we can reuse nz cursor and only need to determine the right intervals within<br />
each row. This is per<strong>for</strong>med with the begin and end function from row cursor which in turn uses<br />
begin row from the matrix. That is why the row cursor needs a matrix reference.<br />
A two-dimensional traversal is realized as follows:<br />
<strong>for</strong> (row cursor< coo matrix > c= row begin(A), end= row end(A); c != end; ++c) {<br />
std::cout ≪ ”−−−−−\n”;<br />
<strong>for</strong> (nz cursor ic= nz begin(c), iend= nz end(c); ic != iend; ++ic)<br />
std::cout ≪ ”A[” ≪ row(∗ic) ≪ ”][” ≪ col(∗ic) ≪ ”] = ” ≪ value(∗ic) ≪ ”\n”;<br />
}<br />
std::cout ≪ ”−−−−−\n”;<br />
The outer loop iterates over all rows of the matrix and the inner loop over all non-zeros in this<br />
row.<br />
Résumé The technique is more complicated and less readable than accessing entries with<br />
operator[] and needs some familiarization. However, it allows <strong>for</strong>
128 CHAPTER 4. GENERIC PROGRAMMING<br />
• High Code Reuse with very Diverse Data Structures;<br />
• While still enabling High Per<strong>for</strong>mance.<br />
4.11 Exercises<br />
TODO: Move exercises to next chapter<br />
4.11.1 Unroll a loop<br />
Look at the loop from Subsection ??:<br />
int sum = 0;<br />
<strong>for</strong> (int i = 1 ; i
4.11. EXERCISES 129<br />
1<br />
2<br />
3<br />
4<br />
function gcd(a, b):<br />
if b = 0 return a<br />
else return gcd(b, a mod b)<br />
Then write an integral metafunction that executes the same algorithm but at compile time.<br />
Your metafunction should be of the following <strong>for</strong>m:<br />
template <br />
struct gcd meta {<br />
static int const value = ... ;<br />
} ;<br />
i.e. gcd meta::value is the GCD of a and b. Verify whether the results correspond with your<br />
C ++ function gcd().<br />
4.11.6 Overloading of functions<br />
Overloading of functions is possible <strong>for</strong> different types, e.g.<br />
void foo( int i ) { ... }<br />
void foo( double d ) { ... }<br />
This is an exercise on another <strong>for</strong>m of overloading: based on a boolean meta expression. We<br />
will use the Boost functions enable if and disable if <strong>for</strong> this exercise.<br />
#include <br />
#include <br />
template <br />
typename boost::enable if< boost::is integral, T >::type foo( T const& v ) {<br />
return v ;<br />
}<br />
template <br />
typename boost::disable if< boost::is integral, T >::type foo( T const& v ) {<br />
return std::floor( v ) ;<br />
}<br />
If we call e.g. foo(5);, the compiler uses the special version <strong>for</strong> integers:<br />
template <br />
T foo( T const& v ) {<br />
return v ;<br />
}<br />
If we call e.g. foo(5.0);, the compiler uses the special version <strong>for</strong> types that are not integral:<br />
template <br />
T foo( T const& v ) {<br />
return std::floor( v ) ;<br />
}
130 CHAPTER 4. GENERIC PROGRAMMING<br />
Create a meta function to check whether a type is a pointer. Write a function evaluate that<br />
returns the same value as its argument, except when the argument is a pointer, in which case<br />
you return the value pointed to by the pointer. Hint: look at http://www.boost.org/libs/<br />
utility/enable_if.html <strong>for</strong> enable if c.<br />
4.11.7 Meta-list<br />
Revisit exercise ??.<br />
Make a list of types. Make meta functions insert, append, delete and size.<br />
4.11.8 Iterator of a vector<br />
Revisit exercise ??. Add methods begin() and end() <strong>for</strong> returning a begin and end iterator. Add<br />
the types iterator and const iterator to the class. Note that pointers are iterators.<br />
Use the STL functions sort and lower bound.<br />
4.11.9 Iterator of a list<br />
Revisit exercise ??.<br />
Make a generic list type.<br />
Add methods begin() and end() <strong>for</strong> returning a begin and end const iterator. Add the type<br />
const iterator to the class. Note that pointers cannot be used as iterators.<br />
4.11.10 Trapezoid rule<br />
A simple method <strong>for</strong> computing the integral of a function is the trapezoid rule. Suppose we<br />
want to integrate the function f over the interval [a, b]. We split the interval in n small intervals<br />
[xi, xi+1] of the same length h = (b − a)/n and approximate f by a piecewise linear function.<br />
The integral is then approximated by the sum of the integrals of the piecewise linear function.<br />
This gives us the <strong>for</strong>mula :<br />
I = h<br />
2<br />
f(a) + h<br />
2<br />
n−1 �<br />
f(b) + h<br />
j=1<br />
f(a + jh) (4.1)<br />
In this exercise, we develop a function <strong>for</strong> the trapezoid rule, with a functor argument. We<br />
develop software using inheritance and using generic programming. Then we use the function<br />
<strong>for</strong> integrating the following functions:<br />
• f = exp(−3x) <strong>for</strong> x ∈ [0, 4]. Try the following arguments of trapezoid:<br />
double exp3( double x ) {<br />
return std::exp( 3.0 ∗ x ) ;<br />
}<br />
struct exp3 {
4.11. EXERCISES 131<br />
double operator() ( double x ) const {<br />
return std::exp( 3.0 ∗ x ) ;<br />
}<br />
} ;<br />
• f = sin(x) if x < 1 and f = cos(x) if x ≥ 1 <strong>for</strong> x ∈ [0, 4].<br />
• Can we use trapezoid( std::sin, 0.0, 2.0 ); ?<br />
As a second exercise, develop a functor <strong>for</strong> computing the finite difference. Then integrate the<br />
finite difference to verify that you get the function value back.<br />
4.11.11 STL and functor<br />
Write a generic function that copies the values of a container to another container after trans<strong>for</strong>mation<br />
using a functor:<br />
struct double functor {<br />
int operator() ( int v ) const {<br />
if (v my input vec ; ...<br />
std::vector< int > my output vec ;<br />
trans<strong>for</strong>m( my input vec.begin(), my input vec.end(), my output vec.begin(), double functor() ) ;<br />
Write code <strong>for</strong> the function trans<strong>for</strong>m and test it.
132 CHAPTER 4. GENERIC PROGRAMMING
Meta-programming<br />
Chapter 5<br />
‘Meta-programming’ is actually discovered by accident. Erwin Unruh wrote in the early 90’s<br />
a program that printed prime number as error messages. This showed that C ++ compilers can<br />
compute. Because the language has changed since Unruh wrote the example, here is a version<br />
adapted to today’s standard C ++:<br />
// Prime number computation by Erwin Unruh<br />
template struct D { D(void∗); operator int(); };<br />
template struct is prime {<br />
enum { prim = (p==2) || (p%i) && is prime2?p:0), i−1> :: prim };<br />
};<br />
template struct Prime print {<br />
Prime print a;<br />
enum { prim = is prime::prim };<br />
void f() { D d = prim ? 1 : 0; a.f();}<br />
};<br />
template struct is prime { enum {prim=1}; };<br />
template struct is prime { enum {prim=1}; };<br />
template struct Prime print {<br />
enum {prim=0};<br />
void f() { D d = prim ? 1 : 0; };<br />
};<br />
main() {<br />
Prime print a;<br />
a.f();<br />
}<br />
When tried to compile with g++ 4.1.2, one will observe the following error message: TODO:<br />
Need English error message.<br />
TODO: Ask Erwin Unruh if we can use his example.<br />
After people realized the computational power of the C ++ compiler, it was used to realize very<br />
powerful per<strong>for</strong>mance optimization techniques. In fact, one can per<strong>for</strong>m entire applications<br />
during compile time. Jeremiah Wilcock once wrote a Lisp interpreter that evaluated Lisp<br />
133
134 CHAPTER 5. META-PROGRAMMING<br />
expression during a C ++ compilation [?]. Todd Veldhuizen showed that the template type<br />
system of C ++ is Turing complete [?].<br />
On the other hand, excessive usage of meta-programming techniques can end in quite long<br />
compile times. Entire research projects were cancelled after many millions dollars of funding<br />
because even short applications of less than 20 lines took weeks of compile time on parallel<br />
computers. We know people who managed to produce a 18 MB error message (it came mainly<br />
from one single error). Nevertheless, the authors used a fair amount of meta-programming in<br />
their scientific projects and could still avoid exhaustive compile times. 1 Also compilers improved<br />
significantly in the last decade. Meanwhile the compile time grew quadratically in the template<br />
instantiation depth in old compilers, today compile grows only linearly [?].<br />
5.1 Let the Compiler Compute<br />
Typical introduction examples <strong>for</strong> meta-programming are factorial and Fibonacci numbers. It<br />
is computed recursively:<br />
template <br />
struct fibonacci<br />
{<br />
static const long value= fibonacci::value + fibonacci::value;<br />
};<br />
template <br />
struct fibonacci<br />
{<br />
static const long value= 1;<br />
};<br />
template <br />
struct fibonacci<br />
{<br />
static const long value= 1;<br />
};<br />
Note that we need the specialization <strong>for</strong> 1 and 2 to terminate the recursion. The following<br />
definition:<br />
template <br />
struct fibonacci<br />
{<br />
static const long value= N < 3 ? 1 : fibonacci::value + fibonacci::value; // error<br />
};<br />
ends in an infinite compile loop. For N = 2, the compiler would evaluate the expression:<br />
template <br />
struct fibonacci<br />
{<br />
static const long value= 2 < 3 ? 1 : fibonacci::value + fibonacci::value; // error<br />
};<br />
1 TODO: Oder René?
5.2. PROVIDING TYPE INFORMATION 135<br />
This requires the evaluation of fibonacci::value as<br />
template <br />
struct fibonacci<br />
{<br />
static const long value= 0 < 3 ? 1 : fibonacci< −1>::value + fibonacci< −2>::value; // error<br />
};<br />
which needs fibonacci< −1>::value . . . . Although the values <strong>for</strong> N < 3 are not used in the end,<br />
the compiler will nevertheless generate these terms infinitely and die at some point.<br />
We said be<strong>for</strong>e that we implement the computation recursively. In fact, all repetive calculations<br />
must be realized recursively as there is no iteration <strong>for</strong> 2 meta-functions.<br />
If we write <strong>for</strong> instance<br />
std::cout ≪ fibonacci::value ≪ ”\n”;<br />
the value would be already calculated during the compilation and the program just prints<br />
it. If you do not believe us, you can read the assembler code (e.g. compile with ‘g++ -S<br />
fibonacci.cpp -o fibonacci.asm’).<br />
We mentioned long compilations with meta-programming at the beginning of the chapter. The<br />
compilation <strong>for</strong> Fibonacci number 45 took less than a second. Compared to it, a naïve run-time<br />
implemtation:<br />
long fibonacci2(long x)<br />
{<br />
return x < 3 ? 1 : fibonacci2(x−1) + fibonacci2(x−2);<br />
}<br />
took 14s on the same computer. The reason is that the compiler remember intermediate results<br />
while the run-time version recomputes everything. We are, however, convinced that every reader<br />
of this book can rewrite fibonacci2 without the exponential overhead of recomputations.<br />
5.2 Providing Type In<strong>for</strong>mation<br />
5.2.1 Type Traits<br />
When we write template functions, we can easily define temporary values because they have<br />
usually the same type as one of the template arguments. But not always. Imagine you have a<br />
function that returns from two value that with the minimal magnitude:<br />
template <br />
T inline min magnitude(const T& x, const T& y)<br />
{<br />
using std::abs;<br />
T ax= abs(x), ay= abs(y);<br />
return ax < ay ? x : y;<br />
}<br />
We can call this <strong>for</strong> int, unsigned, double values:<br />
2 The Meta Programming Library provides compile-time iterators but even those are recursive internally.
136 CHAPTER 5. META-PROGRAMMING<br />
double d1= 3., d2= 4.;<br />
std::cout ≪ ”min magnitude(d1, d2) = ” ≪ min magnitude(d1, d2) ≪ ’\n’;<br />
If we call this function with two complex values:<br />
std::complex c1(3.), c2(4.);<br />
std::cout ≪ ”min magnitude(c1, c2) = ” ≪ min magnitude(c1, c2) ≪ ’\n’;<br />
we will see the error message<br />
no match <strong>for</strong> ≫operator< ≪in ≫ax < a≪<br />
The problem is that abs returns in this case double values which provides the comparison operator<br />
but we store them as complex values in the temporaries.<br />
The careful reader might think we do we store them at all, if we compared the magnitudes<br />
directly we might safe memory and we could compare them as they are. This absolutely true<br />
and this is how we would implement the function normally. However, there are situations where<br />
one need a temporary, e.g., when computing the value with the minimal magnitude in a vector.<br />
For the sake of simplicity we just look at two values. With the new standard we can also handle<br />
the issue easily with auto types like:<br />
template <br />
T inline min magnitude(const T& x, const T& y)<br />
{<br />
using std::abs;<br />
auto ax= abs(x), ay= abs(y);<br />
return ax < ay ? x : y;<br />
}<br />
To make a long story short, sometimes we need to know explicitly the result type of an expression<br />
or a type in<strong>for</strong>mation in general. Just think of a member variable of a template class: we must<br />
know the type of the member in the definition of the class.<br />
This leads us to ‘type traits’. Type traits meta-functions that provide an in<strong>for</strong>mation about a<br />
type.<br />
In the example here we search <strong>for</strong> a given type an appropriate type <strong>for</strong> its magnitude. We can<br />
provide such type in<strong>for</strong>mation by template specialization:<br />
template <br />
struct Magnitude {};<br />
template <br />
struct Magnitude<br />
{<br />
typedef int type;<br />
};<br />
template <br />
struct Magnitude<br />
{<br />
typedef float type;<br />
};<br />
template
5.2. PROVIDING TYPE INFORMATION 137<br />
struct Magnitude<br />
{<br />
typedef double type;<br />
};<br />
template <br />
struct Magnitude<br />
{<br />
typedef float type;<br />
};<br />
template <br />
struct Magnitude<br />
{<br />
typedef double type;<br />
};<br />
Admittedly, this is rather cumbersome.<br />
We can abbreviate the first definitions by postulating “if we do not know better, we assume<br />
that T’s Magnitude type is T itself.”<br />
template <br />
struct Magnitude<br />
{<br />
typedef T type;<br />
};<br />
This is true <strong>for</strong> all intrinsic types and we handle them all correctly with one definition. A slight<br />
disadvantage of this definition is that it incorrectly applies to all types whose type trait is not<br />
specialized. A set of classes where we know that the above definition is not correct, are all<br />
instantiations of the template class complex. So we define specializations like:<br />
template <br />
struct Magnitude<br />
{<br />
typedef double type;<br />
};<br />
Instead of defining them individually <strong>for</strong> complex, complex, . . . we use a templated<br />
<strong>for</strong>m to treat them all<br />
template <br />
struct Magnitude<br />
{<br />
typedef T type;<br />
};<br />
Now that the type traits are defined we can refactor our function to use it:<br />
template <br />
T inline min magnitude(const T& x, const T& y)<br />
{<br />
using std::abs;<br />
typename Magnitude::type ax= abs(x), ay= abs(y);<br />
return ax < ay ? x : y;<br />
}
138 CHAPTER 5. META-PROGRAMMING<br />
We can now consider extending this definition to vectors and matrices, e.g., to determine the<br />
return type of a norm. The specialization reads<br />
template <br />
struct Magnitude<br />
{<br />
typedef T type; // not really perfect<br />
};<br />
However, if the value type of the vector is complex, its norm will not. Instead, we need the<br />
magitude type from the values:<br />
template <br />
struct Magnitude<br />
{<br />
typedef typename Magnitude::type type;<br />
};<br />
5.2.2 A const-clean View Example<br />
In this section, we look at an efficient and expressive implementation of a transposed matrix. If<br />
you compute the transposed of a matrix, many software packages return a new matrix object<br />
with the interchanged values. This is a quite expensive operation: it requires memory allocation<br />
and deallocation and often copying a lot of data.<br />
Writing a Simple View Class<br />
A much more efficient approach is implementing a ‘View’ of the existing object. We refer<br />
internally to the viewed object and just adapt its interface. This can be done very nicely <strong>for</strong><br />
the transposed of a matrix:<br />
1 template <br />
2 class transposed view<br />
3 {<br />
4 public:<br />
5 typedef typename mtl::Collection::value type value type;<br />
6 typedef typename mtl::Collection::size type size type;<br />
7<br />
8 transposed view(Matrix& A) : ref(A) {}<br />
9<br />
10 value type& operator()(size type r, size type c) { return ref(c, r); }<br />
11 const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />
12<br />
13 private:<br />
14 Matrix& ref;<br />
15 };<br />
Listing 5.1: Simple view implementation<br />
We assume that the matrix class has an operator() taking two arguments <strong>for</strong> the row and column<br />
index respectively. We further suppose that type traits are defined <strong>for</strong> value type and size type.<br />
This is all we need to know about the referred matrix, 3 at least in this mini example.<br />
3 TODO: We should define a concept <strong>for</strong> it.
5.2. PROVIDING TYPE INFORMATION 139<br />
The reader will imagine that implementations in libraries like MTL or GLAS will provide<br />
a larger interface in such classes. this short example is expressive enough to demonstrate the<br />
approach. However, the example is large enough to demonstrate the need of meta-programming<br />
in certain views.<br />
An object of this class can be handled like a matrix so that a template function use it as<br />
argument whereever a matrix is expected. The transposition is achieved by calling operator() in<br />
the referred object with switched indices. For every matrix object we can define a transposed<br />
view that behaves like a matrix<br />
mtl::dense2D A(3, 3);<br />
A= 2, 3, 4,<br />
5, 6, 7,<br />
8, 9, 10;<br />
tst::transposed view At(A);<br />
When we access At(i, j) we will get A(j, i). We even define a non-const access so that we can<br />
even change entries:<br />
At(2, 0)= 4.5;<br />
This operation sets A(0, 2) to 4.5.<br />
The definition of a transposed view object does not leed to particularly concise programs. For<br />
convience we define a function that returns the transposed view.<br />
template <br />
transposed view inline trans(Matrix& A)<br />
{<br />
return transposed view(A);<br />
}<br />
Now we can use the transposed elegantly in our scientific software, <strong>for</strong> instance in a matrix<br />
vector product:<br />
v= trans(A) ∗ q;<br />
In this case, a temporary view is created and used in the product. Since operator() from the<br />
view is inlined the transposed product will be as fast as with A itself.<br />
Dealing with Const-ness<br />
So far, so good. Problems arise if we build the transposed view of a constant matrix:<br />
const mtl::dense2D B(A);<br />
We still can create the transposed view of B but we cannot access its elements:<br />
std::cout ≪ ”tst::trans(B)(2, 0) = ” ≪ tst::trans(B)(2, 0) ≪ ’\n’; // error<br />
The compiler will tell us that it cannot initialize a ‘float&’ from a ‘const float’. If we look at the<br />
location of the error we will realize that this is line 9 in Listing 5.1. But we did the compiler<br />
used the non-constant version of the operator? In line 10 we defined an operator <strong>for</strong> constant<br />
objects which returns a constant reference and fits perfectly <strong>for</strong> this situation.
140 CHAPTER 5. META-PROGRAMMING<br />
First of all, is the ref member really constant? We never used const in the class definition or<br />
the function trans. Help is provided from the ‘Run-Time Type Identification (RTTI)’. We add<br />
the header ‘typeinfo’ and print the type in<strong>for</strong>mation:<br />
#include <br />
...<br />
std::cout ≪ ”typeid of trans(A) = ” ≪ typeid(tst::trans(A)).name() ≪ ’\n’;<br />
std::cout ≪ ”typeid of trans(B) = ” ≪ typeid(tst::trans(B)).name() ≪ ’\n’;<br />
This will produce the following output: 4<br />
typeid of trans(A) = N3tst15transposed_viewIN3mtl6matrix7dense2DIfNS2_10<br />
parametersINS1_3tag9row_majorENS1_5index7c_indexENS1_9non_fixed10<br />
dimensionsELb0EEEEEEE<br />
typeid of trans(B) = N3tst15transposed_viewIKN3mtl6matrix7dense2DIfNS2_10<br />
parametersINS1_3tag9row_majorENS1_5index7c_indexENS1_9non_fixed10<br />
dimensionsELb0EEEEEEE<br />
The output is apparently not very clear. However, if we look very careful, we see the extra<br />
‘K’ in the second line that tells us that the view is instantiated with a constant matrix type.<br />
Another disadvantage of RTTI is that we only see the const attribute of template parameters.<br />
That is printing the type in<strong>for</strong>mantion of trans(B).ref would not tell wether or not this type is<br />
constant.<br />
An alternative that solves both problems is inspecting the type by provocing an error message.<br />
We can <strong>for</strong> instance write:<br />
int ta= trans(A);<br />
int tb= trans(B);<br />
Then the compiler gives us a message like:<br />
trans_const.cpp:120: Error: ≫mtl::matrix::transposed_view >≪ cannot be converted to ≫int≪<br />
in initialization<br />
trans_const.cpp:121: Error: ≫const mtl::matrix::transposed_view≪ cannot be<br />
converted to ≫int≪ in initialization<br />
Here the types are much more readable. 5 We can see clearly that trans(B) returns a view with<br />
a constant template parameter. The same trick could be done <strong>for</strong> the reference in the view:<br />
int tar= trans(A).ref;<br />
int tbr= trans(B).ref;<br />
The error message would be accordingly:<br />
4<br />
With g++, on other compilers it might be different but the essential in<strong>for</strong>mation will be the same. The lines<br />
are broken manually.<br />
5<br />
TODO: Why the hell is this const outside in line 121???
5.2. PROVIDING TYPE INFORMATION 141<br />
trans_const.cpp:121: Error: ≫const mtl::matrix::dense2D≪ cannot be converted to ≫int≪<br />
in initialization<br />
Obviously, with this trick we will not get an executable binary. But we know more about the<br />
types in our program and can now better solve our problems. In the rare case that the type you<br />
examine is convertible to int, you can take any other type like std::set to which the examined<br />
class is not convertible. To exclude convertibility entirely you can introduce a new type.<br />
After this short excursion into type introspection we know <strong>for</strong> certain that the member ref is a<br />
constant reference. The following happens:<br />
• When we call trans(B) the function’s template argument is instantiated with const dense2D.<br />
• Thus, the return type is transposed view.<br />
• The constructor argument has type const dense2D&.<br />
• Likewise the member has type const dense2D&.<br />
It remains the question why the non-const version of the operator (line 9) is called despite we<br />
refer a constant matrix. The answer is that the constancy of ref does not matter <strong>for</strong> the choice<br />
but whether or not the view object is constant. Thus, we can write:<br />
const tst::transposed view Bt(B);<br />
std::cout ≪ ”Bt(2, 0) = ” ≪ Bt(2, 0) ≪ ’\n’;<br />
This works but it is not very elegant.<br />
A brutal possibility to get the view compiled <strong>for</strong> constant matrices is to cast away the constancy.<br />
The undesired result would be that mutable views on constant matrices enable the modification<br />
of the allegedly constant matrix. This violates so heavily our principles that we do not even<br />
show how the code would read.<br />
Rule<br />
Never cast away const.<br />
In the following we will empower you with very strong methodologies <strong>for</strong> handling constancy<br />
correctly. Every const cast is an indicator <strong>for</strong> a severe design error. As Sutter and Alexandrescu<br />
phrased it “If you go const you never go back.” The only situation where a const cast<br />
is needed by using const-incorrect third-party software, i.e. read-only arguments are passed as<br />
mutable pointers or references. That is not our fault and we have no choice. Un<strong>for</strong>tunately,<br />
there is still a lot of const-incorrect packages around and some of them would take too much<br />
resources to reimplement that we have to live with them. The best we can do is to add an<br />
appropriate API on top of it and avoid working with the original API. This saves ourselves<br />
from spoiling our applications with const casts and restricts the unspeakable const cast to the<br />
interface. A good example of such a layer is ‘Boost::Bindings’ [?] that provides const-correct
142 CHAPTER 5. META-PROGRAMMING<br />
high-quality interface to BLAS, LAPACK and other libraries with similarly old-fashioned 6 interfaces.<br />
Conversely, as long as we only use our own functions and classes we can avoid every<br />
const cast. 7<br />
We could implement a second view class <strong>for</strong> constant matrices and overload the trans function<br />
to return this view:<br />
template <br />
class const transposed view<br />
{<br />
public:<br />
typedef typename mtl::Collection::value type value type;<br />
typedef typename mtl::Collection::size type size type;<br />
};<br />
const transposed view(const Matrix& A) : ref(A) {}<br />
const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />
//private:<br />
const Matrix& ref;<br />
template <br />
const transposed view inline trans(const Matrix& A)<br />
{<br />
return const transposed view(A);<br />
}<br />
This works fine and the user could use the trans function <strong>for</strong> both constant and mutable matrices.<br />
However, a complete new class definition is a fair amount of work where just one piece of the<br />
class definition needs to be altered. For this purpose we introduce two meta-functions.<br />
Check <strong>for</strong> Constancy<br />
Our problem with the view in Listing 5.1 is that it cannot handle constant types as template<br />
argument. To modify the behavior <strong>for</strong> constant arguments we first need to find out whether<br />
an argument is constant. The meta-function that provides this in<strong>for</strong>mantion is very simple to<br />
implement by partial template specialization:<br />
template <br />
struct is const<br />
{<br />
static const bool value= false;<br />
};<br />
template <br />
struct is const<br />
{<br />
static const bool value= true;<br />
};<br />
6 To phrase it diplomatically.<br />
7 We disagree with Sutter and Alexandrescu on the other exception <strong>for</strong> using const cast [SA05, page 179],<br />
this can be handled easily with an extra function. 8
5.2. PROVIDING TYPE INFORMATION 143<br />
Constant types match both definitions but the second one is more specific and there<strong>for</strong>e picked<br />
by the compiler. Non-constant types match only the first one. Note that the constancy of<br />
template parameters is not considered, e.g., view is not regarded constant.<br />
Compile-time Branching<br />
The other tool we need <strong>for</strong> our view is a type selection depending on a logical condition. Introduced<br />
was this technology by Krzysztof Czarnecki 9 and Ulrich W. Eisenecker [CE00]<br />
This can be achieved by a rather simple implementation<br />
1 template <br />
2 struct if c<br />
3 {<br />
4 typedef ThenType type;<br />
5 };<br />
6<br />
7 template <br />
8 struct if c<br />
9 {<br />
10 typedef ElseType type;<br />
11 };<br />
Listing 5.2: Compile-time if<br />
When this template is instantiated with a logical expressions and two type, only the general<br />
definition in line 1 matches when the first argument evaluates to true and the ‘ThenType’ is used<br />
in the type definition. If the first argument evaluates to false then the specialization in line 7<br />
is more specific so that the ‘ElseType’ is used. Like many ingenious inventions it is very simple<br />
once it is found.<br />
This allows us to define funny things like using double <strong>for</strong> temporaries when our maximal<br />
iteration number is larger than 100 other float:<br />
typedef tst::if c 100, double, float>::type tmp type;<br />
std::cout ≪ ”typeid = ” ≪ typeid(tmp type).name() ≪ ’\n’;<br />
Needless to say that ‘max iter’ must be known at compile time. Admittedly, the example does<br />
not look extremely useful and the meta-if is not so important in small isolated code snippets.<br />
On the other hand, <strong>for</strong> the development of large generic software packages, it becomes extremely<br />
important.<br />
A convenience function as defined in the Meta-Programming Library [GA04] is ‘if ’<br />
template <br />
struct if<br />
: if c<br />
{};<br />
It expects as first argument a type with a static const member named value and convertible to<br />
bool. In other words, it selects the type based on the value of condition (and saves typing 8<br />
characters).<br />
9 Zu dem Zeitpunkt war er Doktorand an der TU Ilmenau.
144 CHAPTER 5. META-PROGRAMMING<br />
The Solution<br />
Now we have all we need to revise the view from Listing 5.1. The problem was that we returned<br />
an entry of a constant matrix as mutable reference. To avoid this we can try to make the<br />
mutable access operator disappear in the view when the referred matrix is constant. This is<br />
possible but too complicated <strong>for</strong> the momemt. We will come back to this in Section 5.2.4.<br />
An easier solution is to keep both the mutable and the constant access operator but choose the<br />
return type of the <strong>for</strong>mer depending on the type of the template argument:<br />
1 template <br />
2 class transposed view<br />
3 {<br />
4 public:<br />
5 typedef typename mtl::Collection::value type value type;<br />
6 typedef typename mtl::Collection::size type size type;<br />
7 private:<br />
8 typedef typename if ::type vref type;<br />
12 public:<br />
13 transposed view(Matrix& A) : ref(A) {}<br />
14<br />
15 vref type operator()(size type r, size type c) { return ref(c, r); }<br />
16 const value type& operator()(size type r, size type c) const { return ref(c, r); }<br />
17<br />
18 private:<br />
19 Matrix& ref;<br />
20 };<br />
Listing 5.3: Const-safe view implementation<br />
This implementation returns a constant reference in line 15 when the referred matrix is constant<br />
and a mutable referrence <strong>for</strong> mutable referred matrix. Let us see if this is what we need. For<br />
mutable matrix references, the return type of operator() depends on the constancy of the view<br />
object:<br />
• If the view object is mutable (line 15) then operator() returns a mutable reference (line 10);<br />
and<br />
• If the view object is constant (line 16) then operator() returns a constant reference.<br />
This is the same behavior as in Listing 5.1.<br />
If the matrix reference is constant, then a constant reference is always returned:<br />
• If the view object is mutable (line 15) then operator() returns a mutable reference (line 9);<br />
and<br />
• If the view object is constant (line 16) then operator() returns a constant reference.<br />
Altogether, we implemented a view object that provides read and write access whereever appropriate<br />
and disables it where inappropriate.
5.2. PROVIDING TYPE INFORMATION 145<br />
5.2.3 More Useful Meta-functions<br />
The Boost Type Traits library [?] provides a large spectrum of meta-functions to test or manipulate<br />
attributes of types. Some of them are rather easy to implement — like the previously<br />
introduced is const — and others — like has trivial constructor or is base — require deep insight<br />
into C ++ subtleties and often into compiler internals as well. Unless one only uses very simple<br />
type traits and wants absolutely avoid the dependency of an external library, it is advisable to<br />
favor the extensively tested implementations from the type traits library over rewriting it.<br />
With the boost::is xyz we can implement special behavior <strong>for</strong> certain sets of types. One can easily<br />
add tests <strong>for</strong> domain specific type sets:<br />
template <br />
struct is matrix<br />
: boost::mpl::false<br />
{};<br />
template <br />
struct is matrix<br />
: boost::mpl::true<br />
{};<br />
// more matrix classes ...<br />
template <br />
struct is matrix<br />
: is matrix<br />
{};<br />
// more views ...<br />
Our program snippet is in line with the implementations in Boost. Instead of defining a static<br />
constant as in Section 5.2.2 we derive the meta function from boost::mpl::false and boost::mpl::true<br />
where static constants are defined with some additional typedefs. This not only shorter but<br />
requires also a bit less compile time, see [?]. 10<br />
The code is quite self-explanatory. Type we do not know are considered not being a matrix.<br />
Then we specialize <strong>for</strong> known matrix classes. For views we can further refer to the matrix-ness<br />
of the template argument.<br />
Alternatively, we can say in the type trait that every transposed view is a matrix and instead<br />
require <strong>for</strong> template arguments of transposed view that they are matrices.<br />
#include <br />
template <br />
class transposed view<br />
{<br />
BOOST STATIC ASSERT((is matrix::value)); // Make sure that the argument is a matrix type<br />
// ...<br />
};<br />
This additional assertion guarantees that the view class can only be instantiated with known<br />
matrix types. For other argument types the compilation, will terminate in this line. Un<strong>for</strong>tunately,<br />
the error message is not very in<strong>for</strong>mative <strong>for</strong> not saying confusing:<br />
10 TODO: page
146 CHAPTER 5. META-PROGRAMMING<br />
trans_const.cpp:96: Error: Invalid application of ≫sizeof≪ on incomplete type<br />
≫boost::STATIC_ASSERTION_FAILURE≪<br />
If you see an error message with “STATIC ASSERTION” in it, do not think about the message<br />
itself (it is meaningless) but look at the source code line that caused this error and hope that<br />
the author of the assertion will provide more in<strong>for</strong>mation in a comment.<br />
When we try to compile our test with the assertion we will see that trans(A) compiles but<br />
not trans(B). The reason is that ‘const dense2D’ is considered different from ‘dense2D’ in<br />
template specialization so that it is still considered non-matrix. The good new is that we do not<br />
need to double our specializations <strong>for</strong> mutable and constant types but we can write a partial<br />
specialization <strong>for</strong> all constant arguments:<br />
template <br />
struct is matrix<br />
: is matrix {};<br />
Note that BOOST STATIC ASSERT is a macro and does not understand C ++. This manifests in<br />
particular if the argument contains one or more commas. Than the preprocessor will interpret<br />
this as multiple arguments <strong>for</strong> the macro and get confused. This confusion can be avoided<br />
by enclosing the argument of BOOST STATIC ASSERT with two enclosing parentheses as we did<br />
in the example (although it was not necessary here). Despite the double parentheses and the<br />
rather arbitrary error message, static assertions are very useful to increase reliabily. The next<br />
C ++ standard will provide static assertions in the language like:<br />
template <br />
class transposed view<br />
{<br />
static assert(is matrix::value, ”transposed view requires a matrix as argument”);<br />
// ...<br />
};<br />
As the reader can see, the integration into the language overcomes the be<strong>for</strong>e-mentioned deficiencies<br />
of the macro implementation.<br />
Useful are meta-functions to remove something from a type if exists, e.g. remove const trans<strong>for</strong>ms<br />
const T into T and non-constant types remain unchanged. Note that this only removes the<br />
constancy of entire types not that of template arguments, e.g., in vector the constancy<br />
of the arguments is not removed.<br />
Dually, meta-functions can add something to a type:<br />
typedef typename boost::add reference::type ref type;<br />
It would be shorter to just add an & but this is easily overseen in longer type definitions. More<br />
importantly, if some trait returns already a reference then it is an error to add another one. The<br />
meta-function adds the reference only to types that are no references yet. To adding const to a<br />
type we find it more concise without the meta-function:<br />
typedef typename some trait::type const const type;<br />
If the type trait returns already a constant type, the second const will be ignored.<br />
The widest functionality in the area of meta-programming provides the Boost Meta-Programming<br />
Library (MPL) [GA04]. The library implements most of the STL algorithms (§ 4.9) and also
5.2. PROVIDING TYPE INFORMATION 147<br />
provides similar data types, e.g., vector or map. Another interesting library is Boost Fusion [?]<br />
that helps the mixing the execution at compile and run time. Both libraries are well documented<br />
and there<strong>for</strong>e not further discussed here.<br />
5.2.4 Enable-If<br />
A very powerful mechanism <strong>for</strong> meta-programming is “enable if” discovered by Jaakko Järvi<br />
and Jeremiah Wilcock. It bases on the paradigm SFINAE — Substitution Failure Is Not An<br />
Error. Imagine a function call with a given argument type — say dense vector. One of the<br />
overloads has a return type that is determined by a meta-function depending on the function<br />
argument. Then compiler will substitute the meta-function argument with dense vector<br />
to find out the return type. If this meta-function is defined dense vector then the template<br />
function (overload) has no return type. Instead of generating an error message, the C ++ compiler<br />
diligently ignores this overload. Of course, an error might occur later if all overloads are ignored<br />
<strong>for</strong> the given type or compiler cannot determine the most specific overload between those that<br />
are not ignored the.<br />
This compiler behavior can be explored to select an implementation based on meta-functions. As<br />
an example think of the L1 norm. It is defined <strong>for</strong> vector spaces and linear operators. Although<br />
these definitions are related, the practical real-world implementation <strong>for</strong> finite-dimensional vectors<br />
and matrices is different. Of course we could implement L1 norm <strong>for</strong> every matrix and<br />
vector type so that the call one norm(x) would select the appropriate implementation <strong>for</strong> this<br />
type.<br />
More productively, we like have one single implementation <strong>for</strong> all matrix types (including views)<br />
and one single implementation <strong>for</strong> all vector types. We use meta-function is matrix and implement<br />
accordingly is vector:<br />
template <br />
struct is vector<br />
: boost::mpl::false<br />
{};<br />
template <br />
struct is vector<br />
: boost::mpl::true<br />
{};<br />
// ... more vector types<br />
We also need the meta-function Magnitude to handle the magnitude of complex matrices and<br />
vectors.<br />
The implementation of enable if is very simple. It defines a type if the condition holds and none<br />
if the condition does not. The version in Boost adds a second level to access the static value<br />
member in types:<br />
template <br />
struct enable if c {<br />
typedef T type;<br />
};<br />
template <br />
struct enable if c {};
148 CHAPTER 5. META-PROGRAMMING<br />
template <br />
struct enable if<br />
: public enable if c<br />
{};<br />
The real enabling behavior is realized in enable if c whereas enable if is merely a convience function<br />
to avoid type ‘::value’.<br />
Now we have all we need to implement the L1 norm in the generic fashion we aimed <strong>for</strong>:<br />
1 template <br />
2 typename boost::enable if::type<br />
3 inline one norm(const T& A)<br />
4 {<br />
5 using std::abs;<br />
6 typedef typename Magnitude::type mag type;<br />
7 mag type max(0);<br />
8 <strong>for</strong> (unsigned c= 0; c < num cols(A); c++) {<br />
9 mag type sum(0);<br />
10 <strong>for</strong> (unsigned r= 0; r < num cols(A); r++)<br />
11 sum+= abs(A[r][c]);<br />
12 max= max < sum ? sum : max;<br />
13 }<br />
14 return max;<br />
15 }<br />
16<br />
17 template <br />
18 typename boost::enable if::type<br />
19 inline one norm(const T& v)<br />
20 {<br />
21 using std::abs;<br />
22 typedef typename Magnitude::type mag type;<br />
23 mag type sum(0);<br />
24 <strong>for</strong> (unsigned r= 0; r < size(v); r++)<br />
25 sum+= abs(v[r]);<br />
26 return sum;<br />
27 }<br />
The selection is now driven by enable if in line 2 and 18. Let us look at line 2 in detail <strong>for</strong> a<br />
matrix argument:<br />
1. is matrix is evaluated to (i.e. inherited from) true ;<br />
2. enable if passes true ::value i.e. true to enable if c;<br />
3. enable if c< >::type is set to typename Magnitude::type;<br />
4. This is the return type of the function overload.<br />
What happens in this line when the argument is not a matrix type:<br />
1. is matrix is evaluated to (i.e. inherited from) false ;<br />
2. enable if passes false ::value i.e. false to enable if c<br />
3. enable if c< >::type is not set in this case;
5.2. PROVIDING TYPE INFORMATION 149<br />
4. The function overload has no return type;<br />
5. Is there<strong>for</strong>e ignored.<br />
For short, the overload is only enabled if the argument is a matrix — as the names of the<br />
meta-functions say. Likewise the second overload is only available <strong>for</strong> vectors. A short test<br />
demonstrates this:<br />
mtl::dense2D A(3, 3);<br />
A= 2, 3, 4,<br />
5, 6, 7,<br />
8, 9, 10;<br />
mtl::dense vector v(3);<br />
v= 3, 4, 5;<br />
std::cout ≪ ”one norm(A) is ” ≪ tst::one norm(A) ≪ ”\n”;<br />
std::cout ≪ ”one norm(v) is ” ≪ tst::one norm(v) ≪ ”\n”;<br />
For types that are neither matrix or vector it will look as there is no function one norm at all.<br />
Types that are considered both matrix and vector would cause an ambiguity.<br />
Draw-backs: The mechanism of enable if is very powerful but not particularly pleasant to<br />
debug. Error messages caused by enable if are usually rather long but not very meaningful. If<br />
a function match is missing <strong>for</strong> a given argument type, it is hard to determine why because no<br />
helpful in<strong>for</strong>mation is provided to the programmer, he/she is only told that no match is found,<br />
period. The enabling mechanism can not select the most specific condition. For instance,<br />
we cannot specialize implementation <strong>for</strong> say is sparse matrix. This can be achieved by avoid<br />
ambiguities in the conditions:<br />
template <br />
typename boost::enable if c::type<br />
inline one norm(const T& A);<br />
template <br />
typename boost::enable if::type<br />
inline one norm(const T& A);<br />
Evidently, this will become quite confusing if too many hierarchical conditions are considered.<br />
The SFINAE paradigm only applies to template arguments of the function itself. There<strong>for</strong>e,<br />
member functions cannot be enabled depending on the class’ template argument. For instance,<br />
the mutable access operator in line 9 of Listing 5.1 cannot be hidden with enable if <strong>for</strong> views on<br />
constant matrices because the operator itself is not a template function. There are possibilities<br />
to introduce a template argument artificially <strong>for</strong> a member function to enable enable if but this<br />
really does not contribute to the clarity of the program.<br />
Concepts can handle hierarchies in conditions, non-template member functions and provide also<br />
more helpful error messages. Un<strong>for</strong>tunately, they will not be available in C ++0x and it is not<br />
clear yet when they will usable <strong>for</strong> mainstream programming.
150 CHAPTER 5. META-PROGRAMMING<br />
5.3 Expression Templates<br />
Scientific software has usually strong per<strong>for</strong>mance requirements — especially those problems<br />
we tackle with C ++. Many large-scale simulations of physical, chemical, or biological processes<br />
run <strong>for</strong> weeks or months and everybody is glad if at least a part of this very long execution<br />
times can be safed. Such safings are often at the price of readable and maintainable program<br />
sources. In Section 5.3.1 we will show a simple implementation of an operator and discuss why<br />
this is not efficient and in the remainder of Section 5.3 we will demonstrate how to improve to<br />
improve the per<strong>for</strong>mance without sacrificing the natural notation.<br />
5.3.1 Simple Operator Implementation<br />
Assume we have an application with vector addition. We want <strong>for</strong> instance write an expression<br />
of the following <strong>for</strong>m <strong>for</strong> vectors w, x, y and z:<br />
w = x + y + z;<br />
Say, we have a vector class as in Section 4.3:<br />
template <br />
class vector<br />
{<br />
public:<br />
explicit vector(int size) : my size(size), data(new T[my size]) {}<br />
vector() : my size(0), data(0) {}<br />
};<br />
friend int size(const vector& x) { return x.my size; }<br />
const T& operator[](int i) const { check index(i); return data[i]; }<br />
T& operator[](int i) { check index(i); return data[i]; }<br />
// ...<br />
We can of course provide an operator <strong>for</strong> adding such vectors:<br />
template <br />
vector inline operator+(const vector& x, const vector& y)<br />
{<br />
x.check size(size(y));<br />
vector sum(size(x));<br />
<strong>for</strong> (int i= 0; i < size(x); ++i)<br />
sum[i] = x[i] + y[i];<br />
return sum;<br />
}<br />
A short test program checks that everything works:<br />
int main()<br />
{<br />
vector x(4), y(4), z(4), w(4);<br />
x[0]= x[1]= 1.0; x[2]= 2.0; x[3] = −3.0;<br />
y[0]= y[1]= 1.7; y[2]= 4.0; y[3] = −6.0;<br />
z[0]= z[1]= 4.1; z[2]= 2.6; z[3] = 11.0;
5.3. EXPRESSION TEMPLATES 151<br />
}<br />
std::cout ≪ ”x = ” ≪ x ≪ std::endl;<br />
std::cout ≪ ”y = ” ≪ y ≪ std::endl;<br />
std::cout ≪ ”z = ” ≪ z ≪ std::endl;<br />
w= x + y + z;<br />
std::cout ≪ ”w= x + y + z = ” ≪ w ≪ std::endl;<br />
return 0;<br />
If this works properly, what is wrong with it? From the software engineering prospective:<br />
nothing. From the per<strong>for</strong>mance prospective: a lot.<br />
How is the statement executed:<br />
1. Create temporary variable sum <strong>for</strong> the addition of x and y;<br />
2. Per<strong>for</strong>m a loop reading x and y, adding it element-wise, and writing the result to sum;<br />
3. Copy sum to a temporary variable, say t xy, in the return statement;<br />
4. Delete sum;<br />
5. Create temporary variable sum <strong>for</strong> the addition of t xy and z;<br />
6. Per<strong>for</strong>m a loop reading t xy and z, adding it element-wise, and writing the result to sum;<br />
7. Copy sum to a temporary variable, say t xyz, in the return statement;<br />
8. Delete sum;<br />
9. Delete t xy;<br />
10. Per<strong>for</strong>m a loop reading t xyz and writing to w;<br />
11. Delete t xyz;<br />
This is admittedly the worst-case scenario. But it was the code that old compilers generated.<br />
Modern compilers per<strong>for</strong>m more optimizations by static code analysis and can avoid copying<br />
the return value into the temporaries t xy and t xyz. Instead of creating the temporaries t xy<br />
and t xyz, they become aliases <strong>for</strong> the respective sum temporaries.<br />
The optimized version per<strong>for</strong>ms:<br />
1. Create temporary variable sum (<strong>for</strong> distinction sum xy) <strong>for</strong> the addition of x and y;<br />
2. Per<strong>for</strong>m a loop reading x and y, adding it element-wise, and writing the result to sum;<br />
3. Create temporary variable sum (<strong>for</strong> distinction sum xyz) <strong>for</strong> the addition of sum xy and z;<br />
4. Per<strong>for</strong>m a loop reading sum xy and z, adding it, and writing the result to sum xyz;<br />
5. Delete sum xy;<br />
6. Per<strong>for</strong>m a loop reading sum xyz and writing to w;<br />
7. Delete sum xyz;<br />
How much operations did we per<strong>for</strong>m? Say our vectors have lenght n then we have in total:<br />
• 2n additions;
152 CHAPTER 5. META-PROGRAMMING<br />
• 3n assignments;<br />
• 5n reads;<br />
• 3n writes;<br />
• 2 memory allocations; and<br />
• 2 memory deallocations.<br />
As comparison, if we could write a single loop or an inline function:<br />
template <br />
void inline add3(const vector& x, const vector& y, const vector& z, vector& sum)<br />
{<br />
x.check size(size(y));<br />
x.check size(size(z));<br />
x.check size(size(sum));<br />
<strong>for</strong> (int i= 0; i < size(x); ++i)<br />
sum[i] = x[i] + y[i] + z[i];<br />
}<br />
This function per<strong>for</strong>ms:<br />
• 2n additions;<br />
• n assignments;<br />
• 3n reads;<br />
• n writes;<br />
The call of this function:<br />
add3(x, y, z, w);<br />
is of course less elegant than the operator notation. Often, one need another look at the<br />
documentation wether the first or the last argument contains the result. With operators this is<br />
evident.<br />
In high-per<strong>for</strong>mance software, programmers tend to implement a hard-coded version of every<br />
important operation instead of freely compose them from smaller expressions. The reason is<br />
obvious, our operator implementation per<strong>for</strong>med additionally:<br />
• 2n assignments;<br />
• 2n reads;<br />
• 2n writes;<br />
• 2 memory allocations; and<br />
• 2 memory deallocations.<br />
The good news is we have not per<strong>for</strong>med additional arithmetic. The bad news is that the<br />
operations above are more expensive. On modern computers, it takes much more time to<br />
read data from or write to the memory than executing fixed or floating point operations. 11<br />
Un<strong>for</strong>tunately, vectors in scientific applications tend to be rather long, often larger than the<br />
11 TODO: Maybe quantifying <strong>for</strong> some machine.
5.3. EXPRESSION TEMPLATES 153<br />
caches of the plat<strong>for</strong>m and the vectors must really be transfer to and from main memory. In<br />
case of shorter vectors, the data might reside in L1 or L2 cache and the data transfer is less<br />
critical. But in this case, the allocation and deallocation becomes a serious slow down factor.<br />
The purpose of expression templates is to keep the original operator notation without introducing<br />
the overhead induced by temporaries.<br />
5.3.2 An Expression Template Class<br />
The solution is to introduce a special class that keeps references to the vectors and allows us<br />
to per<strong>for</strong>m all computations later in one sweep. The addition does not return now a vector but<br />
an object with the references:<br />
template <br />
class vector sum<br />
{<br />
public:<br />
vector sum(const vector& v1, const vector& v2) : v1(v1), v2(v2) {}<br />
private:<br />
const vector& v1, v2;<br />
};<br />
template <br />
vector sum inline operator+(const vector& x, const vector& y)<br />
{<br />
return vector sum(x, y);<br />
}<br />
Now we can already write x + y but not w= x + y yet. It is not only that the assignment is not<br />
defined, we have not yet provided vector sum with enough functionality to per<strong>for</strong>m something<br />
useful in the assignment. Thus, we first extend vector sum so that it looks like a vector itself:<br />
template <br />
class vector sum<br />
{<br />
void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />
public:<br />
vector sum(const vector& v1, const vector& v2) : v1(v1), v2(v2)<br />
{<br />
assert(size(v1) == size(v2));<br />
}<br />
friend int size(const vector sum& x) { return size(x.v1); }<br />
T operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />
private:<br />
const vector& v1, v2;<br />
};<br />
For the sake of defensive programming, we added a test that the two vectors have the same<br />
size and can be consistently added. Then we consider the size of the first vector as the size of<br />
our vector sum. The most important function is the bracket operator: when the i th entry we<br />
compute the sum of the operands i th entries.
154 CHAPTER 5. META-PROGRAMMING<br />
Discussion 5.1 The drawback is that if the entries are accessed multiple times the sum is<br />
recomputed. On the other hand, most expressions are only used once and this is not a problem.<br />
An example where vector entries are accessed several times is A ∗ (x+y). Here, it is preferable<br />
to compute a true vector first instead of computing the matrix vector product on the expression<br />
template. 12<br />
To evaluate w= x + y we also need an assignment operator <strong>for</strong> vector sum:<br />
template class vector sum; // <strong>for</strong>ward declaration<br />
template <br />
class vector<br />
{ // ...<br />
vector& operator=(const vector sum& that)<br />
{<br />
check size(size(that));<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
data[i]= that[i];<br />
return ∗this;<br />
}<br />
};<br />
The assignment runs a loop over w and that. As that is an object of type vector sum the expression<br />
that[i] computes x[i] + y[i]. In contrast to the implementationn in Section 5.3.1 we have now<br />
• Only one loop;<br />
• No temporary vector;<br />
• No additional memory allocation and deallocation; and<br />
• No addional data reads and writes.<br />
In fact, the same operations are per<strong>for</strong>med as in the loop<br />
<strong>for</strong> (int i= 0; i < size(w); ++i)<br />
w[i] = x[i] + y[i];<br />
The cost to create a vector sum object is negligible. The object will be kept on the stack and does<br />
not require memory allocation. Even that little ef<strong>for</strong>t <strong>for</strong> creating the object will be optimized<br />
away by most compilers with static code analysis.<br />
What happens when we like to add three vectors? The naïve implementation from § 5.3.1<br />
returns a vector and this vector can be added to another vector. Our approach returns a<br />
vector sum and we have no addition <strong>for</strong> vector sum and vector. Thus we would need another ET<br />
class and an according operation:<br />
template <br />
class vector sum3<br />
{<br />
void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />
public:<br />
vector sum3(const vector& v1, const vector& v2, const vector& v3) : v1(v1), v2(v2), v3(v3)<br />
{<br />
assert(size(v1) == size(v2)); assert(size(v1) == size(v3));<br />
12 TODO: Shall we provide a solution <strong>for</strong> this as well? This something that is over-due in MTL4 anyway.
5.3. EXPRESSION TEMPLATES 155<br />
}<br />
friend int size(const vector sum3& x) { return size(x.v1); }<br />
T operator[](int i) const { check index(i); return v1[i] + v2[i] + v3[i]; }<br />
private:<br />
const vector& v1, v2, v3;<br />
};<br />
template <br />
vector sum3 inline operator+(const vector sum& x, const vector& y)<br />
{<br />
return vector sum3(x.v1, x.v2, y);<br />
}<br />
Furthermore, vector sum must declare our new plus operator as friend to access its private<br />
members and vector needs an assignment <strong>for</strong> vector sum3. This becomes increasingly annoying.<br />
Also, what happens if we per<strong>for</strong>m the second addition first w= x + (y + z)? Then we<br />
need another plus operator. What if some of the vectors are multiplied by a scalar, e.g.,<br />
w= x + dot(x, y) ∗ y + 4.3 ∗ z, and this scalar product is also implemented by an ET? Our implementation<br />
ef<strong>for</strong>t runs into combinatorial explosion and we need a more flexible solution that<br />
we introduce in the next section.<br />
5.3.3 Generic Expression Templates<br />
So far we started from a specific class (vector) and generalized the implementation gradually.<br />
Although this can help us to understand the mechanism, we like to go now to the general version<br />
that takes arbitrary vector types:<br />
template <br />
vector sum inline operator+(const V1& x, const V2& y)<br />
{<br />
return vector sum(x, y);<br />
}<br />
We now need an expression class with arbitrary arguments:<br />
template <br />
class vector sum<br />
{<br />
typedef vector sum self;<br />
void check index(int i) const { assert(i >= 0 && i < size(v1)); }<br />
public:<br />
vector sum(const V1& v1, const V2& v2) : v1(v1), v2(v2)<br />
{<br />
assert(size(v1) == size(v2));<br />
}<br />
???? operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />
friend int size(const self& x) { return size(x.v1); }<br />
private:<br />
const V1& v1;
156 CHAPTER 5. META-PROGRAMMING<br />
};<br />
const V2& v2;<br />
This is rather straight<strong>for</strong>ward. The only issue is what type to return in operator[]? For this<br />
we must define value type in each class — more flexible would be an external type trait. In<br />
vector sum we take the value type of the first argument which can itself be taken from another<br />
class.<br />
template <br />
class vector sum<br />
{<br />
// ...<br />
typedef typename V1::value type value type;<br />
};<br />
value type operator[](int i) const { check index(i); return v1[i] + v2[i]; }<br />
To assign such an expression to a vector we can also generalize the assign operator:<br />
template <br />
class vector<br />
{<br />
public:<br />
typedef T value type;<br />
};<br />
template <br />
vector& operator=(const Src& that)<br />
{<br />
check size(size(that));<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
data[i]= that[i];<br />
return ∗this;<br />
}<br />
This assigment can also handle vector as argument and we can omit the standard assignment<br />
operator.<br />
Advantages of expression templates: Although the availability of operator overloading<br />
in C ++ resulted in notationally nicer code, the scientific community refused to give up programming<br />
in Fortran or to implement the loops directly in C/C ++. The reason was that the<br />
traditional operator implementations were too expensive. Due to the overhead of the creation<br />
of temporary variables and the copying of vector and matrix objects, C ++ could not compete<br />
with the per<strong>for</strong>mance of programs written in Fortran. This problem has now been resolved<br />
by the introduction of generics and expression templates. Now it is possible to write efficient<br />
scientific programs in a notationally convenient manner.<br />
5.4 Meta-Tuning: Write Your Own Compiler Optimization<br />
Compiler technology is progressing and provides us an increasing number of optimization techniques.<br />
Ideally, everyone writes his software in the way it is the easiest <strong>for</strong> him and the compiler
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 157<br />
trans<strong>for</strong>ms the operations to a <strong>for</strong>m that is best <strong>for</strong> execution time. We would only need a new<br />
compiler and our programs become faster. 13 But live — especially as advanced C ++ programmer<br />
— is no walk in the park. Of course, the compiler helps us a lot to speed up our programs.<br />
But there are limitations, many optimizations need knowledge of the semantic behavior and can<br />
there<strong>for</strong>e only be applied on types and operations where the semantic is known at the time the<br />
compiler is written, see also discussion in [?]. Research is going on, to overcome this limitations<br />
by providing concept-based optimization [?]. Un<strong>for</strong>tunately, this will take time until it becomes<br />
mainstream, especially now that concepts are taken out of the C ++0x standard. An alternative<br />
is source-to-source code trans<strong>for</strong>mation with external tools like ROSE [?].<br />
Even <strong>for</strong> types and operations that the compiler can handle, it has its limitations. Most compilers<br />
(gcc, . . . 14 ) only deal with the inner loop in nested ones (see solution in Section 5.4.2)<br />
and does not dare to introduce extra temporaries (see solution in Section ??). Some compilers<br />
are particularly tuned <strong>for</strong> benchmarks. 15 For instance, they have pattern matching to recognize<br />
a 3-nested loop that computes a dense matrix product and trans<strong>for</strong>m those in BLAS-like code<br />
with 7 or 9 plat<strong>for</strong>m-dependent loops. 16 All this said, writing high-per<strong>for</strong>mance software is no<br />
walk in the park. That does not mean that such software must be unreadable and unmaintainable<br />
hackery. The route of success is again to provide appropriate abstractions. Those can be<br />
empowered with compile-time optimizations so that the applications are still writen in natural<br />
mathematical notation whereas the generated binaries can still explore all known techniques<br />
<strong>for</strong> fast execution.<br />
5.4.1 Classical Fixed-Size Unrolling<br />
The easiest <strong>for</strong>m of compile-time optimization can be realized <strong>for</strong> fixed-size data types, in<br />
particular vectors as in Section 4.7. Simular to the default assignment, we can write a generic<br />
vector assignment:<br />
template <br />
class fsize vector<br />
{<br />
public:<br />
const static int my size= Size;<br />
};<br />
template <br />
self& operator=(const self& that)<br />
{<br />
<strong>for</strong> (int i= 0; i < my size; ++i)<br />
data[i]= that[i];<br />
}<br />
13<br />
In some sense, this is the programming equivalent of communism: everybody contributes as much as he<br />
pleases and like he pleases and in the end, the right thing will happen anyway thanks to a self-improving society.<br />
Likewise, some people write software in a very naïve fashion and blame the compiler not trans<strong>for</strong>ming their<br />
programs into high-per<strong>for</strong>mance code.<br />
14<br />
TODO: we should run some benchmarks on MSVC and icc.<br />
15<br />
TODO: search <strong>for</strong> paper on kcc.<br />
16<br />
One could sometimes get the impression that the HPC community beliefs that multiplying dense matrices<br />
at near-peak per<strong>for</strong>mance solves all per<strong>for</strong>mance issues of the world or at least demonstrates that everything can<br />
be computed at near-peak per<strong>for</strong>mance if only one tries hard enough. Fortunately, more and more people in the<br />
supercomputer centers realize that their machines are not only running BLAS3 and LAPACK operations and<br />
that real-world applications are more often than not limited by memory bandwidth and latency.
158 CHAPTER 5. META-PROGRAMMING<br />
A state-of-the-art compiler will recognize that all iterations are independent one from each<br />
other, e.g., data[2]= that[2]; is independent of data[1]= that[1];. The compiler will also determine<br />
the size of loop during compilation. As a consequence, the generated binary of a type with size<br />
3 will be equivalent to:<br />
template <br />
class fsize vector<br />
{<br />
template <br />
self& operator=(const self& that)<br />
{<br />
data[0]= that[0];<br />
data[1]= that[1];<br />
data[2]= that[2];<br />
}<br />
};<br />
The right-hand-side vector that might be an expression template § 5.3 <strong>for</strong> say alpha ∗ x + y and<br />
its evaluation will be also inlined:<br />
template <br />
class fsize vector<br />
{<br />
template <br />
self& operator=(const self& that)<br />
{<br />
data[0]= alpha ∗ x[0] + y[0];<br />
data[1]= alpha ∗ x[1] + y[1];<br />
data[2]= alpha ∗ x[2] + y[2];<br />
}<br />
};<br />
To make the unrolling more explicit and <strong>for</strong> the sake of step-wise introducing meta-tuning we<br />
develop a functor that computes the assignment:<br />
template <br />
struct fsize assign<br />
{<br />
void operator()(Target& tar, const Source& src)<br />
{<br />
fsize assign()(tar, src);<br />
std::cout ≪ ”assign entry ” ≪ N ≪ ’\n’;<br />
tar[N]= src[N];<br />
}<br />
};<br />
template <br />
struct fsize assign<br />
{<br />
void operator()(Target& tar, const Source& src)<br />
{<br />
std::cout ≪ ”assign entry ” ≪ 0 ≪ ’\n’;<br />
tar[0]= src[0];<br />
}<br />
};
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 159<br />
The print-outs shall show us the execution. For convenience, one can templatize the operator<br />
on the argument types:<br />
template <br />
struct fsize assign<br />
{<br />
template <br />
void operator()(Target& tar, const Source& src)<br />
{<br />
fsize assign()(tar, src);<br />
std::cout ≪ ”assign entry ” ≪ N ≪ ’\n’;<br />
tar[N]= src[N];<br />
}<br />
};<br />
template <br />
struct fsize assign<br />
{<br />
template <br />
void operator()(Target& tar, const Source& src)<br />
{<br />
std::cout ≪ ”assign entry ” ≪ 0 ≪ ’\n’;<br />
tar[0]= src[0];<br />
}<br />
};<br />
Then the vector types can by deduced by the compiler when the operator is called. Instead of<br />
the previous loop, we call the assignment functor in the operator:<br />
template <br />
class fsize vector<br />
{<br />
BOOST STATIC ASSERT((my size > 0));<br />
};<br />
self& operator=( const self& that )<br />
{<br />
fsize assign()(∗this, that);<br />
return ∗this;<br />
}<br />
template <br />
self& operator=( const Vector& that )<br />
{<br />
fsize assign()(∗this, that);<br />
return ∗this;<br />
}<br />
The execution of the following code fragment<br />
yields<br />
fsize vector v, w;<br />
v[0]= v[1]= 1.0; v[2]= 2.0; v[3]= −3.0;<br />
w= v;
160 CHAPTER 5. META-PROGRAMMING<br />
assign entry 0<br />
assign entry 1<br />
assign entry 2<br />
assign entry 3<br />
In this implementation, we replaced the loop by a recursion — counting on the compiler to<br />
inline the operations (otherwise it would be even slower as the loop) — and made sure that no<br />
loop index is incremented and tested <strong>for</strong> termination. This is only beneficial <strong>for</strong> small loops that<br />
run in L1 cache. Larger loops are dominated by loading the data from memory and the loop<br />
overhead is irrelevant. To the contrary, unrolling operations on very large vectors entirely will<br />
probably decrease the per<strong>for</strong>mance because a lot of instructions need to be loaded and decrease<br />
there<strong>for</strong>e the available bandwidth <strong>for</strong> the data. As mentioned be<strong>for</strong>e, compilers can unroll such<br />
operations by themselves — and hopefully know when it is better not to — and sometimes this<br />
automatic unrolling is even slightly faster then the explicit implementation.<br />
5.4.2 Nested Unrolling<br />
From our experience, compilers usually unroll nested loops. Even a good compiler that can<br />
handle certain nested loops will not be able to optimize every program kernel, in particular those<br />
with heavily templatized programs instantiated with user-defined types. We will demonstrate<br />
here how to unroll nested loops at compile time at the example of matrix vector multiplication.<br />
For this purpose, we introduce a simplistic fixed-size matrix type:<br />
template <br />
class fsize matrix<br />
{<br />
typedef fsize matrix self;<br />
public:<br />
typedef T value type;<br />
BOOST STATIC ASSERT((Rows ∗ Cols > 0));<br />
const static int my rows= Rows, my cols= Cols;<br />
fsize matrix()<br />
{<br />
<strong>for</strong> (int i= 0; i < my rows; ++i)<br />
<strong>for</strong> (int j= 0; j < my cols; ++j)<br />
data[i][j]= T(0);<br />
}<br />
fsize matrix( const self& that ) { ... }<br />
// cannot check column index<br />
const T∗ operator[](int r) const { return data[r]; }<br />
T∗ operator[](int r) { return data[r]; }<br />
mat vec et operator∗(const fsize vector& v) const<br />
{<br />
return mat vec et (∗this, v);<br />
}<br />
private:
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 161<br />
};<br />
T data[Rows][Cols];<br />
The bracket operator returns a pointer <strong>for</strong> the sake of simplicity but a good implementation<br />
should return a proxy that allows <strong>for</strong> checking the column index. The multiplication with a<br />
vector is realized by means of an expression template <strong>for</strong> not copying the result vector.<br />
Then the vector assigment needs a specialization <strong>for</strong> the expression template 17<br />
template <br />
class fsize vector<br />
{<br />
template <br />
self& operator=( const mat vec et& that )<br />
{<br />
typedef mat vec et et;<br />
fsize mat vec mult()(that.A, that.v, ∗this);<br />
return ∗this;<br />
}<br />
};<br />
The functor fsize mat vec mult must now compute the matrix vector product on the three arguments.<br />
The general implementation of the functor reads:<br />
template <br />
struct fsize mat vec mult<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
fsize mat vec mult()(A, v in, v out);<br />
v out[Rows]+= A[Rows][Cols] ∗ v in[Cols];<br />
}<br />
};<br />
Again, the functor is only templatized on the sizes and the container types are deduced. The<br />
operator assumes that all smaller column indices are already handled and we can increment<br />
v out[Rows] by A[Rows][Cols] ∗ v in[Cols]. In particular, we assume that the first operation on<br />
v out[Rows] initializes it. Thus we need a (partial) specialization <strong>for</strong> Cols = 0:<br />
template <br />
struct fsize mat vec mult<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
fsize mat vec mult()(A, v in, v out);<br />
v out[Rows]= A[Rows][0] ∗ v in[0];<br />
}<br />
};<br />
The careful reader noticed the substitution of += by =. We also notice that we have to call the<br />
computation <strong>for</strong> the preceeding row with all columns and inductively <strong>for</strong> all smaller rows. The<br />
17 A better solution would be implementing all assignments with a functor and specialize the functor because<br />
partial template specialization of functions does not always work as expected.
162 CHAPTER 5. META-PROGRAMMING<br />
number of columns in the matrix is taken from an internal definition in the matrix type <strong>for</strong> the<br />
sake of simplicity. Passing this as extra template argument or taking a type traits would have<br />
been more general because we are now limited to types where my cols is defined in the class.<br />
We still need a (full) specialization to terminate the recursion:<br />
template <br />
struct fsize mat vec mult<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
v out[0]= A[0][0] ∗ v in[0];<br />
}<br />
};<br />
With the inlining, our program will execute the operation w= A ∗ v <strong>for</strong> vectors of size 4 as:<br />
w[0]= A[0][0] ∗ v[0];<br />
w[0]+= A[0][1] ∗ v[1];<br />
w[0]+= A[0][2] ∗ v[2];<br />
w[0]+= A[0][3] ∗ v[3];<br />
w[1]= A[1][0] ∗ v[0];<br />
w[1]+= A[1][1] ∗ v[1];<br />
w[1]+= A[1][2] ∗ v[2];<br />
w[1]+= A[1][3] ∗ v[3];<br />
w[2]= A[2][0] ∗ v[0];<br />
w[2]+= A[2][1] ∗ v[1];<br />
w[2]+= A[2][2] ∗ v[2];<br />
w[2]+= A[2][3] ∗ v[3];<br />
w[3]= A[3][0] ∗ v[0];<br />
w[3]+= A[3][1] ∗ v[1];<br />
w[3]+= A[3][2] ∗ v[2];<br />
w[3]+= A[3][3] ∗ v[3];<br />
Our tests have shown that such an implementation is really faster than the compiler optimization<br />
on loops. 18<br />
Increasing Concurrency<br />
A disadvantage of the preceeding implementation is that all operations on an entry of the target<br />
vector are per<strong>for</strong>med in one sweep. There<strong>for</strong>e, the second operation must wait <strong>for</strong> the first the<br />
third <strong>for</strong> the second on so on. The fifth operation can be done in parallel with the <strong>for</strong>th,<br />
the ninth with the eighth but this is not satisfying. We like to have more concurrency in our<br />
program that enables parallel pipelines in superscalar processors. Again, we can twiddle our<br />
thumbs and hope that the compiler will reorder the statements or take it in our hands. More<br />
concurrency is provided by the following operation sequence:<br />
w[0]= A[0][0] ∗ v[0];<br />
w[1]= A[1][0] ∗ v[0];<br />
w[2]= A[2][0] ∗ v[0];<br />
w[3]= A[3][0] ∗ v[0];<br />
w[0]+= A[0][1] ∗ v[1];<br />
18 TODO: Give numbers
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 163<br />
w[1]+= A[1][1] ∗ v[1];<br />
w[2]+= A[2][1] ∗ v[1];<br />
w[3]+= A[3][1] ∗ v[1];<br />
w[0]+= A[0][2] ∗ v[2];<br />
w[1]+= A[1][2] ∗ v[2];<br />
w[2]+= A[2][2] ∗ v[2];<br />
w[3]+= A[3][2] ∗ v[2];<br />
w[0]+= A[0][3] ∗ v[3];<br />
w[1]+= A[1][3] ∗ v[3];<br />
w[2]+= A[2][3] ∗ v[3];<br />
w[3]+= A[3][3] ∗ v[3];<br />
We only need to reorganize our functor. The general template reads now:<br />
template <br />
struct fsize mat vec mult cm<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
fsize mat vec mult cm()(A, v in, v out);<br />
v out[Rows]+= A[Rows][Cols] ∗ v in[Cols];<br />
}<br />
};<br />
Now, we need a partial specialization <strong>for</strong> row 0 to go the next column:<br />
template <br />
struct fsize mat vec mult cm<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
fsize mat vec mult cm()(A, v in, v out);<br />
v out[0]+= A[0][Cols] ∗ v in[Cols];<br />
}<br />
};<br />
The partial specialization <strong>for</strong> column 0 is also needed to initialize the entry of the output vector:<br />
template <br />
struct fsize mat vec mult cm<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
fsize mat vec mult cm()(A, v in, v out);<br />
v out[Rows]= A[Rows][0] ∗ v in[0];<br />
}<br />
};<br />
Finally, we still need a specialization <strong>for</strong> row and column 0 to terminate the recursion. This<br />
can be reused from the previous functor:<br />
template <br />
struct fsize mat vec mult cm<br />
: fsize mat vec mult {};
164 CHAPTER 5. META-PROGRAMMING<br />
Using Registers<br />
Another feature of modern processors one should keep in mind: cache coherency. Processors<br />
are nowadays designed to share memory while pertaining consistency in their caches. As a<br />
result, every time we write into data structure in memory like our vector w a cache invalidation<br />
signal is sent on the bus. Even if no other processor is present. Un<strong>for</strong>tunately, this slows down<br />
computation perceivably (from our experience).<br />
Fortunately, this can be avoided in many cases in a rather simple way by introducing a temporary<br />
in a function that resides in register(s) if the type allows. We can rely on the compiler to<br />
decide reasonably the location of temporaries.<br />
This implementation requires two classes: one <strong>for</strong> the outer and one <strong>for</strong> the inner loop. Let us<br />
start with the outer loop:<br />
1 template <br />
2 struct fsize mat vec mult reg<br />
3 {<br />
4 template <br />
5 void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
6 {<br />
7 fsize mat vec mult reg()(A, v in, v out);<br />
8<br />
9 typename VecOut::value type tmp;<br />
10 fsize mat vec mult aux()(A, v in, tmp);<br />
11 v out[Rows]= tmp;<br />
12 }<br />
13 };<br />
We assume that fsize mat vec mult aux is defined or declared be<strong>for</strong>e this class. The first statement<br />
in line 7 calls the computations on the preceeding rows. A temporary is defined in line 9 with<br />
the hope that it will be located in a register. Then we call the computation within this row. The<br />
temporary is passed as reference to an inline function so that the summation will be per<strong>for</strong>med<br />
in a register. In line 10 we write the result back to v out. This still causes the invalidation signal<br />
on the bus but only once <strong>for</strong> each entry.<br />
The functor must be specialized <strong>for</strong> row 0 to avoid infinite loops:<br />
template <br />
struct fsize mat vec mult reg<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, VecOut& v out)<br />
{<br />
typename VecOut::value type tmp;<br />
fsize mat vec mult aux()(A, v in, tmp);<br />
v out[0]= tmp;<br />
}<br />
};<br />
Within each row we iterate over the columns and increment the temporary (in the register<br />
hopefully):<br />
template <br />
struct fsize mat vec mult aux
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 165<br />
{<br />
};<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />
{<br />
fsize mat vec mult aux()(A, v in, tmp);<br />
tmp+= A[Rows][Cols] ∗ v in[Cols];<br />
}<br />
To terminate the computation in the column we write a specialization.<br />
template <br />
struct fsize mat vec mult aux<br />
{<br />
template <br />
void operator()(const Matrix& A, const VecIn& v in, ScalOut& tmp)<br />
{<br />
tmp= A[Rows][0] ∗ v in[0];<br />
}<br />
};<br />
In this section we showed different ways to optimize a two-dimensional loop (with fixed sizes).<br />
There are certainely more possibilities: <strong>for</strong> instance, we could try to implement it in a way that<br />
uses registers but with the same concurrency as in the second-last implementation. Another<br />
<strong>for</strong>m of optimization could be to agglomerate the write-backs so that multiple invalidation<br />
signals are sent at a time and maybe behave less interruptive.<br />
5.4.3 Dynamic Unrolling – Warm up<br />
⇒ vector unroll example.cpp<br />
As important as the fixed-size optimization is, acceleration <strong>for</strong> dynamically sized containers is<br />
needed even more. We start here with a simple example and some observations. We will reuse<br />
the vector class from Listing 4.1. To show the implementation more clearly, we write the code<br />
without operators and expression templates. Our test case will compute<br />
u = 3v + w<br />
<strong>for</strong> three short vectors of size 1000. The wall clock time will be measured with boost::timer. 19<br />
The vectors v and w will be initialized and to have the data ready to use (i.e. the vectors are<br />
definitively in cache 20 ) we run few additional operations without timing:<br />
#include <br />
#include <br />
// ...<br />
int main()<br />
{<br />
unsigned s= 1000;<br />
if (argc > 1) s= atoi(argv[1]); // read (potentially) from command line<br />
19 See http://www.boost.org/doc/libs/1_43_0/libs/timer/timer.htm<br />
20 TODO: shouldn’t the initialization make this sure? Do we have a better explanation? Reference to benchmark<br />
literature? Do we really need a bullet proof justification here?
166 CHAPTER 5. META-PROGRAMMING<br />
}<br />
vector u(s), v(s), w(s);<br />
vector u(s), v(s), w(s);<br />
<strong>for</strong> (unsigned i= 0; i < s; i++) {<br />
v[i]= float(i);<br />
w[i]= float(2∗i + 15);<br />
}<br />
<strong>for</strong> (unsigned j= 0; j < 3; j++)<br />
<strong>for</strong> (unsigned i= 0; i < s; i++)<br />
u[i]= 3.0f ∗ v[i] + w[i];<br />
const unsigned rep= 200000;<br />
boost::timer native;<br />
<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />
<strong>for</strong> (unsigned i= 0; i < s; i++)<br />
u[i]= 3.0f ∗ v[i] + w[i];<br />
std::cout ≪ ”Compute time native loop is ” ≪ 1000000.0 ∗ native.elapsed() / double(rep) ≪ ” µs.\n”;<br />
return 0 ;<br />
Alternatively we compute this with an unrolling of 4 cycles:<br />
<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />
<strong>for</strong> (unsigned i= 0; i < s; i+= 4) {<br />
u[i]= 3.0f ∗ v[i] + w[i];<br />
u[i+1]= 3.0f ∗ v[i+1] + w[i+1];<br />
u[i+2]= 3.0f ∗ v[i+2] + w[i+2];<br />
u[i+3]= 3.0f ∗ v[i+3] + w[i+3];<br />
}<br />
This code will obviously only work if the vector size is divisible by 4. To avoid errors we can<br />
add an assertion on the vector size but this is not really satisfying. Instead, we generalize this<br />
implementation to arbitrary vector sizes:<br />
boost::timer unrolled;<br />
<strong>for</strong> (unsigned j= 0; j < rep; j++) {<br />
unsigned sb= s / 4 ∗ 4;<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= 4) {<br />
u[i]= 3.0f ∗ v[i] + w[i];<br />
u[i+1]= 3.0f ∗ v[i+1] + w[i+1];<br />
u[i+2]= 3.0f ∗ v[i+2] + w[i+2];<br />
u[i+3]= 3.0f ∗ v[i+3] + w[i+3];<br />
}<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
u[i]= 3.0f ∗ v[i] + w[i];<br />
}<br />
std::cout ≪ ”Compute time unrolled loop is ” ≪ 1000000.0 ∗ unrolled.elapsed() / double(rep) ≪ ” µs.\n”;<br />
std::cout ≪ ”u is ” ≪ u ≪ ’\n’;<br />
Listing 5.4: Unrolled computation of u = 3v + w<br />
The little program was compiled with g++ 4.1.2 with the flags -O3 -ffast-math -DNDEBUG
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 167<br />
and resulted on the test computer 21 in:<br />
Compute time native loop is 2.64 µs.<br />
Compute time unrolled loop is 1.15 µs.<br />
Alternatively to our hand-coded unrolling we can use the compiler flag -funroll-loops. This<br />
results in the following execution time on the test machine:<br />
Compute time native loop is 2.51 µs.<br />
Compute time unrolled loop is 1.22 µs.<br />
The original loop became slightly faster while our optimized version slowed down a bit. An<br />
entirely different behavior we see if we replace the size s by a constant:<br />
const unsigned s= 1000;<br />
In this case the compiler knows the size of the loops and it might be easier to trans<strong>for</strong>m the<br />
loop or to determine that a trans<strong>for</strong>mation is beneficial.<br />
Compute time native loop is 1.6 µs.<br />
Compute time unrolled loop is 1.55 µs.<br />
Now the native loop is clearly accelerated by the compiler optimization. Why our hand-written<br />
unrolling is slower than be<strong>for</strong>e is not clear. Apparently, the manual and the automatic optimization<br />
got into conflict or the latter overrode the first.<br />
Discussion 5.2 Software tuning and benchmarking is an art of its own with complex compiler<br />
optimization. The tiniest modification in the source can change the run-time behavior of an<br />
examined computation. In the example it should not have mattered whether the size is known<br />
at compile time or not. But it did. Especially when the code is compiled without -DNDEBUG<br />
the compiler might omit the index check in some situations and per<strong>for</strong>m it on others. It is<br />
also important to print out computed values (and filter them out with grep or such) because<br />
the compiler might omit an entire computation when it is obvious that the result is not needed.<br />
Such optimization happen in particular if the results are intrinsic types while computations on<br />
user-defined types are usually subject to such omissions (but one should not count on it).<br />
The goal of this section is not to determine why which code is how much faster than another<br />
one. Besides, each compiler has a different sensitivity on sizes and flags so that we would need a<br />
different line of argumentation and calculation <strong>for</strong> each of them. The only conclusion we like to<br />
draw from these observations is that despite all the progress in compiler technology, we cannot<br />
rely blindly on it and still need hand-tuned implementations and careful benchmarking when<br />
maximal per<strong>for</strong>mance is needed. On the other hand, program snippets as in the last listing shall<br />
not appear in scientific applications <strong>for</strong> the sake of readability, maintainability, portability, . . .<br />
Another question we have not raised so far is: What is the optimal block size <strong>for</strong> the<br />
unrolling?<br />
• Does it depend on the expression?<br />
• Does it depend on the types of the arguments?<br />
• Does it depend on the computer architecture?<br />
21 Phenom II X2 545 3.0 GHz, 3600 MHz PSB, 7MB total cache, Sockel AM2,2x 2GB DDR2-800
168 CHAPTER 5. META-PROGRAMMING<br />
The answer is yes. All of them. The main reason (but not the only one) is that different<br />
processors have different numbers of registers. How many registers are needed in one iteration<br />
depends on the expression and on the types (a complex value needs more registers than a float).<br />
In the following section we will address both issues: how to encapsulate the trans<strong>for</strong>mation<br />
so that it does not show up in the application and how we can change the block size without<br />
rewritten the loop.<br />
5.4.4 Unrolling Vector Expressions<br />
For easier understanding, we discuss the abstraction in meta-tuning step by step. We start with<br />
the previous loop and implement a function <strong>for</strong> it. Say the function’s name is my axpy and it<br />
has a template argument <strong>for</strong> the block size so that we can write <strong>for</strong> instance<br />
<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />
my axpy(u, v, w);<br />
This function shall contain a main loop in unrolled manner with customizable size and a clean-up<br />
loop at the end:<br />
template <br />
void my axpy(U& u, const V& v, const W& w)<br />
{<br />
assert(u.size() == v.size() && v.size() == w.size());<br />
unsigned s= u.size(), sb= s / BSize ∗ BSize;<br />
}<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
my axpy ftor()(u, v, w, i);<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
u[i]= 3.0f ∗ v[i] + w[i];<br />
As mentioned be<strong>for</strong>e, deduced template types, as the vector types in our case, must be defined<br />
at the end and the explicitly given arguments, in our case the block size, must be at the<br />
beginning of the template arguments. The implementation of the block statement in the first<br />
loop can be implemented similarly to the functor in Section 5.4.1. We deviate a bit from this<br />
implementation by using two template arguments where the <strong>for</strong>mer is increased until it is equal<br />
to the second. It appeared that this approach yielded faster binaries on gcc than using only<br />
one argument and counting it down to zero. 22 In addition, the two-argument version is more<br />
consistent with the multi-dimensional implementation in Section ??. As <strong>for</strong> fixed-size unrolling<br />
we need a recursive template definition. Within the operators, a single statement is per<strong>for</strong>med<br />
and the following statements are called:<br />
template <br />
struct my axpy ftor<br />
{<br />
template <br />
void operator()(U& u, const V& v, const W& w, unsigned i)<br />
{<br />
u[i+Offset]= 3.0f ∗ v[i+Offset] + w[i+Offset];<br />
22 TODO: exercise <strong>for</strong> it
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 169<br />
};<br />
}<br />
my axpy ftor()(u, v, w, i);<br />
The only difference to fixed-size unrolling is that the indices are relative to an argument —<br />
here i. The operator() is first called with Offset equal to 0, then with 1, 2, . . . Since each call is<br />
inlined the functor call results in one monolithic block of operations without loop control and<br />
function call. Thus, the call of my axpy ftor()(u, v, w, i) per<strong>for</strong>ms the same operations as<br />
one iteration of the first loop in Listing 5.4.<br />
Of course this compilation would end in an infinite loop if we <strong>for</strong>get to specialize it <strong>for</strong> Max:<br />
template <br />
struct my axpy ftor<br />
{<br />
template <br />
void operator()(U& u, const V& v, const W& w, unsigned i) {}<br />
};<br />
Per<strong>for</strong>ming the considered vector operation with different unrollings yields<br />
Compute time unrolled loop is 1.44 µs.<br />
Compute time unrolled loop is 1.15 µs.<br />
Compute time unrolled loop is 1.15 µs.<br />
Compute time unrolled loop is 1.14 µs.<br />
Now we can call this operation <strong>for</strong> any block size we like. On the other hand, it is rather<br />
cumbersome to implement the according functions and functors <strong>for</strong> each vector expression.<br />
There<strong>for</strong>e, we combine this technique now with expression templates.<br />
5.4.5 Tuning an Expression Template<br />
⇒ vector unroll example2.cpp<br />
Let us recall Section 5.3.3. So far, we developed a vector class with expression templates <strong>for</strong><br />
vector sums. In the same manner we can implement the product of a scalar and a vector but<br />
we leave this as exercise and consider expressions with addition only, <strong>for</strong> example:<br />
u = v + v + w<br />
Now we frame this vector operation with a repeting loop and the time measure:<br />
boost::timer t;<br />
<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />
u= v + v + w;<br />
std::cout ≪ ”Compute time is ” ≪ 1000000.0 ∗ t.elapsed() / double(rep) ≪ ” µs.\n”;<br />
This results in:<br />
Compute time is 1.72 µs.<br />
To incorporate meta-tuning into expression templates we only need to modify the actual assignment<br />
because only here a loop is per<strong>for</strong>med. All the other operations (well so far we have
170 CHAPTER 5. META-PROGRAMMING<br />
only a sum but in theory it could be tons of them) only return objects with references. The<br />
loop operator= is split into the unrolled at the beginning and the one-by-one completion at the<br />
end:<br />
template <br />
class vector<br />
{<br />
template <br />
vector& operator=(const Src& that)<br />
{<br />
check size(size(that));<br />
unsigned s= my size, sb= s / 4 ∗ 4;<br />
};<br />
}<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= 4)<br />
assign()(∗this, that, i);<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
data[i]= that[i];<br />
return ∗this;<br />
The assign functor is realized analogous to my axpy ftor:<br />
template <br />
struct assign<br />
{<br />
template <br />
void operator()(U& u, const V& v, unsigned i)<br />
{<br />
u[i+Offset]= v[i+Offset];<br />
assign()(u, v, i);<br />
}<br />
};<br />
template <br />
struct assign<br />
{<br />
template <br />
void operator()(U& u, const V& v, unsigned i) {}<br />
};<br />
Computing the expression above we yield:<br />
Compute time is 1.37 µs.<br />
With this rather simple modification we now accelerated ALL vector expression templates.<br />
In comparison with the previous implementation we lost however the flexibility to costumize<br />
the loop unrolling. The functor assign has two arguments thus allowing <strong>for</strong> customization. The<br />
problem is the assignment operator. In principle we can define an explicit template argument<br />
there:<br />
template <br />
vector& operator=(const Src& that)<br />
{
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 171<br />
}<br />
check size(size(that));<br />
unsigned s= my size, sb= s / BSize ∗ BSize;<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
assign()(∗this, that, i);<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
data[i]= that[i];<br />
return ∗this;<br />
The drawback is that we cannot use the symbol ‘=’ naturally as infix operator but must write:<br />
u.operator=(v + v + w);<br />
This has in fact a certain geeky charm and one could also argue that people did (and still do)<br />
more painful things <strong>for</strong> per<strong>for</strong>mance. Nonetheless, it does not meet our ideals of intuitiveness<br />
and readability.<br />
Alternative notations are:<br />
or<br />
unroll(u= v + v + w);<br />
unroll(u)= v + v + w;<br />
Both version are implementable and provide comparable intuitiveness. The <strong>for</strong>mer expresses<br />
more correctly what we are doing while the latter is easier to implement and the structure of the<br />
computed expression remains better visibility. There<strong>for</strong>e we show the realization of the second<br />
<strong>for</strong>m.<br />
The function unroll is simple to implement: it just returns an object with a reference to the<br />
vector and a type in<strong>for</strong>mation <strong>for</strong> the unroll size:<br />
template <br />
unroll vector inline unroll(Vector& v)<br />
{<br />
return unroll vector(v);<br />
}<br />
The class unroll vector is not complicated either. It only needs to take a reference to the target<br />
vector and an assignment operator:<br />
template <br />
class unroll vector<br />
{<br />
public:<br />
unroll vector(V& ref) : ref(ref) {}<br />
template <br />
V& operator=(const Src& that)<br />
{<br />
assert(size(ref) == size(that));<br />
unsigned s= size(ref), sb= s / BSize ∗ BSize;
172 CHAPTER 5. META-PROGRAMMING<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
assign()(ref, that, i);<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
ref[i]= that[i];<br />
return ref;<br />
}<br />
private:<br />
V& ref;<br />
};<br />
Evaluting the considered vector expressions <strong>for</strong> some block sizes yields:<br />
Compute time unroll(u)= v + v + w is 1.72 µs.<br />
Compute time unroll(u)= v + v + w is 1.52 µs.<br />
Compute time unroll(u)= v + v + w is 1.36 µs.<br />
Compute time unroll(u)= v + v + w is 1.37 µs.<br />
Compute time unroll(u)= v + v + w is 1.4 µs.<br />
This few benchmarks are consistent with the previous results, i.e. unroll is equal to the<br />
canocical implementation and unroll is as fast as the hard-wired unrolling.<br />
5.4.6 Tuning Reduction Operations<br />
Reducing on a Single Variable<br />
⇒ reduction unroll example.cpp<br />
In the preceding vector operations, the i th entry of each vector was handled independently of<br />
any other entry. For reduction operations, they are related by one or more temporary variables.<br />
And this temporary variable(s) can become a serious bottle neck.<br />
First, we test if a reduction operation, say the discrete L1 norm (also known as Manhattan<br />
norm) can be accelerated by the techniques from Section 5.4.4. We implement the one norm<br />
function in terms of a functor <strong>for</strong> the iteration block:<br />
template <br />
typename Vector::value type<br />
inline one norm(const Vector& v)<br />
{<br />
using std::abs;<br />
typename Vector::value type sum(0);<br />
unsigned s= size(v), sb= s / BSize ∗ BSize;<br />
}<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
one norm ftor()(sum, v, i);<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
sum+= abs(v[i]);<br />
return sum;
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 173<br />
The functor is also implemented in the same manner as be<strong>for</strong>e:<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& sum, const V& v, unsigned i)<br />
{<br />
using std::abs;<br />
sum+= abs(v[i+Offset]);<br />
one norm ftor()(sum, v, i);<br />
}<br />
};<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& sum, const V& v, unsigned i) {}<br />
};<br />
The measured run-time behavior behavior is:<br />
Compute time one_norm(v) is 7.42 µs.<br />
Compute time one_norm(v) is 3.64 µs.<br />
Compute time one_norm(v) is 1.9 µs.<br />
Compute time one_norm(v) is 1.25 µs.<br />
Compute time one_norm(v) is 1.03 µs.<br />
This is already a good improvement but maybe we can do better. 23<br />
Reducing on an Array<br />
⇒ reduction unroll array example.cpp<br />
When we look at the previous computation, we see that a different entry of v is used in each<br />
iteration. But every computation accesses the same temporary variable sum and this limits<br />
concurrency. To provide more concurrency, we can use multiple temporaries 24 in an array <strong>for</strong><br />
instance. The modified function reads then:<br />
template <br />
typename Vector::value type<br />
inline one norm(const Vector& v)<br />
{<br />
using std::abs;<br />
typename Vector::value type sum[BSize];<br />
<strong>for</strong> (unsigned i= 0; i < BSize; i++)<br />
sum[i]= 0;<br />
23 TODO: Test it with gcc 3.4 and MSVC. Speed up in table<br />
24 Strictly speaking, this is not true <strong>for</strong> every possible scalar type we can think of. The addition of the sum type<br />
must be a commutative monoid because we change the evaluation order. This holds of course <strong>for</strong> all intrinsic<br />
numeric types and certainly <strong>for</strong> almost all user-defined arithmetic types. But one is free to define an addition<br />
that is not commutative or not monoidal. In this case our trans<strong>for</strong>mation would be wrong. To deal with such<br />
exceptions we need semantic concepts which hopefully become part of C ++ in the next years.
174 CHAPTER 5. META-PROGRAMMING<br />
}<br />
unsigned s= size(v), sb= s / BSize ∗ BSize;<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
one norm ftor()(sum, v, i);<br />
<strong>for</strong> (unsigned i= 1; i < BSize; i++)<br />
sum[0]+= sum[i];<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
sum[0]+= abs(v[i]);<br />
return sum[0];<br />
The according functor must refer the right element in the sum array:<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S∗ sum, const V& v, unsigned i)<br />
{<br />
using std::abs;<br />
sum[Offset]+= abs(v[i+Offset]);<br />
one norm ftor()(sum, v, i);<br />
}<br />
};<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S∗ sum, const V& v, unsigned i) {}<br />
};<br />
On the test machine this took:<br />
Compute time one_norm(v) is 7.33 µs.<br />
Compute time one_norm(v) is 5.15 µs.<br />
Compute time one_norm(v) is 2 µs.<br />
Compute time one_norm(v) is 1.4 µs.<br />
Compute time one_norm(v) is 1.16 µs.<br />
This is even a bit slower than the version with one variable. Maybe an array is more expensive<br />
to pass as argument even in an inline function. Let us try something else.<br />
Reducing on a Nested Class Object<br />
⇒ reduction unroll nesting example.cpp<br />
To avoid arrays, we can define a class <strong>for</strong> n temporary variables where n is a template argument.<br />
Such a class is designed more consistently with the recursive scheme of the functors:<br />
template <br />
struct multi tmp
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 175<br />
{<br />
};<br />
typedef multi tmp sub type;<br />
multi tmp(const Value& v) : value(v), sub(v) {}<br />
Value value;<br />
sub type sub;<br />
template <br />
struct multi tmp<br />
{<br />
multi tmp(const Value& v) {}<br />
};<br />
An object of this type can be recursively initialized so that we do not need a loop as <strong>for</strong> the<br />
array. A functor can operate on the value member and pass a reference to the sub member to<br />
its successor. This leads us to the implementation of our functor:<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& sum, const V& v, unsigned i)<br />
{<br />
using std::abs;<br />
sum.value+= abs(v[i+Offset]);<br />
one norm ftor()(sum.sub, v, i);<br />
}<br />
};<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& sum, const V& v, unsigned i) {}<br />
};<br />
The unrolled function that uses this functor reads:<br />
template <br />
typename Vector::value type<br />
inline one norm(const Vector& v)<br />
{<br />
using std::abs;<br />
typedef typename Vector::value type value type;<br />
multi tmp multi sum(0);<br />
unsigned s= size(v), sb= s / BSize ∗ BSize;<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
one norm ftor()(multi sum, v, i);<br />
value type sum= multi sum.sum();<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
sum+= abs(v[i]);
176 CHAPTER 5. META-PROGRAMMING<br />
}<br />
return sum;<br />
There is one piece still missing. We need to reduce the partial sums in multi sum. Un<strong>for</strong>tunately<br />
we cannot write a loop over the members of multi sum. So, we need a recursive function that<br />
dives down into multi sum. This would be a bit cumbersome as free function, especially as we<br />
try to avoid partial specialization of template. As a member function, it is much easier and the<br />
specialization happens more safely on the class level:<br />
template <br />
struct multi tmp<br />
{<br />
Value sum() const { return value + sub.sum(); }<br />
};<br />
template <br />
struct multi tmp<br />
{<br />
Value sum() const { return 0; }<br />
};<br />
Note that we started the summation with 0 not the innermost value member. We could do this<br />
but then we need another specialization <strong>for</strong> multi tmp. Likewise we can implement a<br />
general reduction but we need as in std::accumulate an initial element:<br />
template <br />
struct multi tmp<br />
{<br />
template <br />
Value reduce(Op op, const Value& init) const { return op(value, sub.reduce(op, init)); }<br />
};<br />
template <br />
struct multi tmp<br />
{<br />
template <br />
Value reduce(Op, const Value& init) const { return init; }<br />
};<br />
The compute time of this version is:<br />
Compute time one_norm(v) is 7.47 µs.<br />
Compute time one_norm(v) is 1.14 µs.<br />
Compute time one_norm(v) is 0.71 µs.<br />
Compute time one_norm(v) is 0.75 µs.<br />
Compute time one_norm(v) is 1.01 µs.<br />
Pushing Temporaries into Registers<br />
⇒ reduction unroll registers example.cpp
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 177<br />
Earlier experiments with older compilers (gcc 3.4) 25 exposed a serious overhead <strong>for</strong> using arrays<br />
or nested classes; it was finally even slower then using one single variable. The reason was<br />
probably that the compiler could not use registers <strong>for</strong> these types. 26<br />
The most likely way to store temporaries in registers is to declare them as separate variables:<br />
inline one norm(const Vector& v)<br />
{<br />
typename Vector::value type s0(0), s1(0), s2(0), ...<br />
}<br />
As one can see, the problem is how many one declares. The number cannot depend on the<br />
template argument but must be fix <strong>for</strong> all sizes (unless one writes a different implementation<br />
<strong>for</strong> each number and undermines the expressiveness of templates). Thus, we have to fix a certain<br />
number of variables — say 8. Then, we cannot unroll it more than eight times.<br />
The next issue we run into is the number of function arguments. When we call the iteration<br />
block we pass all variables (registers):<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
one norm ftor()(s0, s1, s2, s3, s4, s5, s6, s7, v, i);<br />
The first calculation in such a block is per<strong>for</strong>med on s0 and s1–s2 are only passed to the functors<br />
<strong>for</strong> the following computations. After this, the second computation must accumulate on the<br />
second function argument, the third calculation on the third argument, . . . This is un<strong>for</strong>tunately<br />
not implementable with templates (only with very ugly and highly error-prone source code<br />
manipulations by macros).<br />
Alternatively, each computation is per<strong>for</strong>med on its first function argument and subsequent<br />
functors are called with omitted first argument:<br />
one norm ftor()(s1, s2, s3, s4, s5, s6, s7, v, i);<br />
one norm ftor()(s2, s3, s4, s5, s6, s7, v, i);<br />
one norm ftor()(s3, s4, s5, s6, s7, v, i);<br />
This is neither realizable with templates.<br />
The solution is to rotate the references to registers:<br />
one norm ftor()(s1, s2, s3, s4, s5, s6, s7, s0, v, i);<br />
one norm ftor()(s2, s3, s4, s5, s6, s7, s0, s1, v, i);<br />
one norm ftor()(s3, s4, s5, s6, s7, s0, s1, s2, v, i);<br />
This rotation is achieved by the following functor implementation:<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& s0, S& s1, S& s2, S& s3, S& s4, S& s5, S& s6, S& s7, const V& v, unsigned i)<br />
{<br />
using std::abs;<br />
s0+= abs(v[i+Offset]);<br />
one norm ftor()(s1, s2, s3, s4, s5, s6, s7, s0, v, i);<br />
25 TODO: Show!!!<br />
26 TODO: which raises the question why they can do it today
178 CHAPTER 5. META-PROGRAMMING<br />
};<br />
}<br />
template <br />
struct one norm ftor<br />
{<br />
template <br />
void operator()(S& s0, S& s1, S& s2, S& s3, S& s4, S& s5, S& s6, S& s7, const V& v, unsigned i) {}<br />
};<br />
The according one norm function based on this functor is straight<strong>for</strong>ward:<br />
template <br />
typename Vector::value type<br />
inline one norm(const Vector& v)<br />
{<br />
using std::abs;<br />
typename Vector::value type s0(0), s1(0), s2(0), s3(0), s4(0), s5(0), s6(0), s7(0);<br />
unsigned s= size(v), sb= s / BSize ∗ BSize;<br />
}<br />
<strong>for</strong> (unsigned i= 0; i < sb; i+= BSize)<br />
one norm ftor()(s0, s1, s2, s3, s4, s5, s6, s7, v, i);<br />
s0+= s1 + s2 + s3 + s4 + s5 + s6 + s7;<br />
<strong>for</strong> (unsigned i= sb; i < s; i++)<br />
s0+= abs(v[i]);<br />
return s0;<br />
A slight disadvantage is that all registers must be accumulated after the first iteration no matter<br />
how small BSize is and how short the vector. A great advantage of the rotation is that BSize<br />
is not limited to the number of temporary variables in such accumulations. If BSize is larger<br />
then some or all variables are used multiple times without corrupting the result. The number<br />
of temporaries is nonetheless a limiting factor <strong>for</strong> the concurrency.<br />
The execution of this implementation durates on the test machine:<br />
Compute time one_norm(v) is 6.77 µs.<br />
Compute time one_norm(v) is 1.13 µs.<br />
Compute time one_norm(v) is 0.71 µs.<br />
Compute time one_norm(v) is 0.75 µs.<br />
Compute time one_norm(v) is 1.07 µs.<br />
This is comparable with the nested class (in this environment).<br />
Résumé on Reduction Tuning<br />
The goal of this section was not to determine the ultimately tuned reduction implementation<br />
<strong>for</strong> superscalar processors. 27 The main ambition of this section, in fact of the whole book, is to<br />
demonstrate the diversity of implementation opportunities. With the enormous expressiveness<br />
27 In the presence of the new GPU cards with hundreds of cores and millions of threads, the fight <strong>for</strong> this little<br />
concurrency is not so impressive. Nonetheless, we will still need per<strong>for</strong>mance tuning on single-core and “few-core”
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 179<br />
of C ++ one can use (or abuse) the compiler to generate the most efficient version without<br />
rewriting the program sources, as one would need in C or Fortran. The power of internal<br />
code generation with the C ++ compiler only makes external code generation as in ATLAS 28<br />
unnecessary. In ATLAS, functions are written in a domain specific language and C programs 29 in<br />
slight variations are generated with a tool and compared regarding per<strong>for</strong>mance. The techniques<br />
presented here empower us to generate binaries equivalent to those variations by just using a<br />
C ++ compiler. Thus, we can tune our programs by changing template arguments or constants<br />
(that might be set plat<strong>for</strong>m-dependently).<br />
5.4.7 Tuning Nested Loops<br />
⇒ matrix unroll example.cpp<br />
The most used (and abused) example in per<strong>for</strong>mance discussions is dense matrix multiplication.<br />
We do not claim to compete with hand-tuned assembler codes but we show the power of metaprogramming<br />
to generate code variations from a single implementation. As starting point we<br />
use a templatized implementation of matrix class from Section 3.7.4.<br />
We begin our implementation with a simple test case:<br />
int main()<br />
{<br />
const unsigned s= 4; // s= 4 <strong>for</strong> testing and 128 <strong>for</strong> timing<br />
matrix A(s, s), B(s, s), C(s, s);<br />
}<br />
<strong>for</strong> (unsigned i= 0; i < s; i++)<br />
<strong>for</strong> (unsigned j= 0; j < s; j++) {<br />
A(i, j)= 100.0 ∗ i + j;<br />
B(i, j)= 200.0 ∗ i + j;<br />
}<br />
mult(A, B, C);<br />
std::cout ≪ ”C is ” ≪ C ≪ ’\n’;<br />
A matrix multiplication is easily implemented with three nested loops. One of the 6 possible<br />
nestings is a dot-product-like calculation of each entry from C:<br />
cik = Ai · B k<br />
where Ai is the i th row of A and Bk the k th column of B. We use a temporary in the innermost<br />
loop to decrease the cache-invalidation overhead of writing to C’s elements in each operation:<br />
template <br />
void inline mult(const Matrix& A, const Matrix& B, Matrix& C)<br />
{<br />
assert(A.num rows() == B.num rows()); // ...<br />
machines at least <strong>for</strong> some years since not everybody has GPU card <strong>for</strong> numerics and not every algorithm is<br />
already successfully ported (e.g. incomplete LU on arbitrary sparse matrices). By the time of this writing their<br />
is not even support <strong>for</strong> std::complex.<br />
28 http://math-atlas.source<strong>for</strong>ge.net/<br />
29 In some cases the C programs contain assembler snippets <strong>for</strong> a given plat<strong>for</strong>m in order to achieve per<strong>for</strong>mance<br />
close to peak.
180 CHAPTER 5. META-PROGRAMMING<br />
}<br />
typedef typename Matrix::value type value type;<br />
unsigned s= A.num rows();<br />
<strong>for</strong> (unsigned i= 0; i < s; i++)<br />
<strong>for</strong> (unsigned k= 0; k < s; k++) {<br />
value type tmp(0);<br />
<strong>for</strong> (unsigned j= 0; j < s; j++)<br />
tmp+= A(i, j) ∗ B(j, k);<br />
C(i, k)= tmp;<br />
}<br />
For this implementation, we write a benchmark function:<br />
template <br />
void bench(const Matrix& A, const Matrix& B, Matrix& C, const unsigned rep)<br />
{<br />
boost::timer t1;<br />
<strong>for</strong> (unsigned j= 0; j < rep; j++)<br />
mult(A, B, C);<br />
double t= t1.elapsed() / double(rep);<br />
unsigned s= A.num rows();<br />
}<br />
std::cout ≪ ”Compute time mult(A, B, C) is ”<br />
≪ 1000000.0 ∗ t ≪ ” µs. This are ”<br />
≪ s ∗ s ∗ (2∗s − 1) / t / 1000000.0 ≪ ” MFlops.\n”;<br />
The run time and per<strong>for</strong>mance of our canonical implementation (with 128 × 128 matrices) is:<br />
Compute time mult(A, B, C) is 5290 µs. This are 789.777 MFlops.<br />
This implementation is our reference regarding per<strong>for</strong>mance and results.<br />
For the development of the unrolled implementation we go back to 4 × 4 matrices. In contrast<br />
to Section 5.4.6 we do not unroll a single reduction but per<strong>for</strong>m multiple reductions in parallel.<br />
That means <strong>for</strong> the three loops to unroll the two outer loops and to replace the body in the<br />
inner loop by multiple operations. The latter we achieve as usual with a functor.<br />
As in the canonical implementation, the reduction shall not be per<strong>for</strong>med in elements of C<br />
but in temporaries. For this purpose we use the class multi tmp from § 5.4.6. For the sake of<br />
simplicity we limit ourselves to matrix sizes that are multiples of the unroll parameters. 30 An<br />
unrolled matrix multiplication is shown in the following code:<br />
template <br />
void inline mult(const Matrix& A, const Matrix& B, Matrix& C)<br />
{<br />
assert(A.num rows() == B.num rows()); // ...<br />
assert(A.num rows() % Size0 == 0); // we omitted cleanup here<br />
assert(A.num cols() % Size1 == 0); // we omitted cleanup here<br />
typedef typename Matrix::value type value type;<br />
unsigned s= A.num rows();<br />
30 A full implementation <strong>for</strong> arbitrary matrix sizes is realized in MTL4.
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 181<br />
}<br />
mult block block;<br />
<strong>for</strong> (unsigned i= 0; i < s; i+= Size0)<br />
<strong>for</strong> (unsigned k= 0; k < s; k+= Size1) {<br />
multi tmp tmp(value type(0));<br />
<strong>for</strong> (unsigned j= 0; j < s; j++)<br />
block(tmp, A, B, i, j, k);<br />
block.update(tmp, C, i, k);<br />
}<br />
We still owe the reader the implementation of the functor mult block. The techniques are the<br />
same as in vector operations but we have to deal with more indices and their respective limits:<br />
template <br />
struct mult block<br />
{<br />
typedef mult block next;<br />
template <br />
void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />
{<br />
std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />
k + Index1 ≪ ”]\n”;<br />
tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />
next()(tmp.sub, A, B, i, j, k);<br />
}<br />
};<br />
template <br />
void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />
{<br />
std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />
C(i + Index0, k + Index1)= tmp.value;<br />
next().update(tmp.sub, C, i, k);<br />
}<br />
template <br />
struct mult block<br />
{<br />
typedef mult block next;<br />
template <br />
void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />
{<br />
std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />
k + Max1 ≪ ”]\n”;<br />
tmp.value+= A(i + Index0, j) ∗ B(j, k + Max1);<br />
next()(tmp.sub, A, B, i, j, k);<br />
}<br />
template <br />
void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />
{<br />
std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;
182 CHAPTER 5. META-PROGRAMMING<br />
};<br />
}<br />
C(i + Index0, k + Max1)= tmp.value;<br />
next().update(tmp.sub, C, i, k);<br />
template <br />
struct mult block<br />
{<br />
template <br />
void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />
{<br />
std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Max0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />
k + Max1 ≪ ”]\n”;<br />
tmp.value+= A(i + Max0, j) ∗ B(j, k + Max1);<br />
}<br />
};<br />
template <br />
void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />
{<br />
std::cout ≪ ”C[” ≪ i + Max0 ≪ ”][” ≪ k + Max1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />
C(i + Max0, k + Max1)= tmp.value;<br />
}<br />
In order to verify that all operations are per<strong>for</strong>med, we log them completely but look here only<br />
at tmp.4 and tmp.3:<br />
tmp.4+= A[1][0] * B[0][0]<br />
tmp.3+= A[1][0] * B[0][1]<br />
tmp.4+= A[1][1] * B[1][0]<br />
tmp.3+= A[1][1] * B[1][1]<br />
tmp.4+= A[1][2] * B[2][0]<br />
tmp.3+= A[1][2] * B[2][1]<br />
tmp.4+= A[1][3] * B[3][0]<br />
tmp.3+= A[1][3] * B[3][1]<br />
C[1][0]= tmp.4<br />
C[1][1]= tmp.3<br />
tmp.4+= A[3][0] * B[0][0]<br />
tmp.3+= A[3][0] * B[0][1]<br />
tmp.4+= A[3][1] * B[1][0]<br />
tmp.3+= A[3][1] * B[1][1]<br />
tmp.4+= A[3][2] * B[2][0]<br />
tmp.3+= A[3][2] * B[2][1]<br />
tmp.4+= A[3][3] * B[3][0]<br />
tmp.3+= A[3][3] * B[3][1]<br />
C[3][0]= tmp.4<br />
C[3][1]= tmp.3
5.4. META-TUNING: WRITE YOUR OWN COMPILER OPTIMIZATION 183<br />
This log shows that C[1][0] and C[1][1] are computed alternately so that it can be per<strong>for</strong>med in<br />
parallel on a super-scalar computer. One can also verify that<br />
cik =<br />
3�<br />
j=0<br />
aijbjk.<br />
Printing C will also show the same result as <strong>for</strong> the canonical matrix multiplication.<br />
The implementation above can be simplified. The first functor specialization is only different<br />
to the general functor in the way how the indices are incrememted. We can factor this out with<br />
an additional loop class:<br />
template <br />
struct loop2<br />
{<br />
static const unsigned next index0= Index0, next index1= Index1 + 1;<br />
};<br />
template <br />
struct loop2<br />
{<br />
static const unsigned next index0= Index0 + 1, next index1= 0;<br />
};<br />
Such a general class has a high potential of reuse. With this class we can fuse the funtor<br />
template and the first specialization:<br />
template <br />
struct mult block<br />
{<br />
typedef loop2 l;<br />
typedef mult block next;<br />
template <br />
void operator()(Tmp& tmp, const Matrix& A, const Matrix& B, unsigned i, unsigned j, unsigned k)<br />
{<br />
std::cout ≪ ”tmp.” ≪ tmp.bs ≪ ”+= A[” ≪ i + Index0 ≪ ”][” ≪ j ≪ ”] ∗ B[” ≪ j ≪ ”][” ≪<br />
k + Index1 ≪ ”]\n”;<br />
tmp.value+= A(i + Index0, j) ∗ B(j, k + Index1);<br />
next()(tmp.sub, A, B, i, j, k);<br />
}<br />
};<br />
template <br />
void update(const Tmp& tmp, Matrix& C, unsigned i, unsigned k)<br />
{<br />
std::cout ≪ ”C[” ≪ i + Index0 ≪ ”][” ≪ k + Index1 ≪ ”]= tmp.” ≪ tmp.bs ≪ ”\n”;<br />
C(i + Index0, k + Index1)= tmp.value;<br />
next().update(tmp.sub, C, i, k);<br />
}<br />
The other specialization remains unaltered.<br />
Last but not least we like to see impact of our not-so-simple matrix product. The benchmark<br />
yielded on our test machine:
184 CHAPTER 5. META-PROGRAMMING<br />
Compute time mult(A, B, C) is 5250 µs. This are 795.794 MFlops.<br />
Compute time mult(A, B, C) is 2770 µs. This are 1508.27 MFlops.<br />
Compute time mult(A, B, C) is 1990 µs. This are 2099.46 MFlops.<br />
Compute time mult(A, B, C) is 2230 µs. This are 1873.51 MFlops.<br />
Compute time mult(A, B, C) is 2130 µs. This are 1961.46 MFlops.<br />
Compute time mult(A, B, C) is 2930 µs. This are 1425.91 MFlops.<br />
Compute time mult(A, B, C) is 2350 µs. This are 1777.84 MFlops.<br />
Compute time mult(A, B, C) is 3420 µs. This are 1221.61 MFlops.<br />
Compute time mult(A, B, C) is 4010 µs. This are 1041.88 MFlops.<br />
Compute time mult(A, B, C) is 2870 µs. This are 1455.72 MFlops.<br />
Compute time mult(A, B, C) is 3230 µs. This are 1293.47 MFlops.<br />
Compute time mult(A, B, C) is 3060 µs. This are 1365.33 MFlops.<br />
Compute time mult(A, B, C) is 2780 µs. This are 1502.85 MFlops.<br />
One can see that mult has the same per<strong>for</strong>mance as the original implementation which<br />
in fact is per<strong>for</strong>ming the operations in exactly the same order (so far the compiler optimization<br />
does not change the order internally). We see also that the unrolled versions are all faster, up<br />
to a speed-up of 2.6.<br />
With double matrices the per<strong>for</strong>mance is lower in total:<br />
Compute time mult(A, B, C) is 10080 µs. This are 414.476 MFlops.<br />
Compute time mult(A, B, C) is 8700 µs. This are 480.221 MFlops.<br />
Compute time mult(A, B, C) is 7470 µs. This are 559.293 MFlops.<br />
Compute time mult(A, B, C) is 5910 µs. This are 706.924 MFlops.<br />
Compute time mult(A, B, C) is 3750 µs. This are 1114.11 MFlops.<br />
Compute time mult(A, B, C) is 5140 µs. This are 812.825 MFlops.<br />
Compute time mult(A, B, C) is 3420 µs. This are 1221.61 MFlops.<br />
Compute time mult(A, B, C) is 4590 µs. This are 910.222 MFlops.<br />
Compute time mult(A, B, C) is 4310 µs. This are 969.355 MFlops.<br />
Compute time mult(A, B, C) is 6280 µs. This are 665.274 MFlops.<br />
Compute time mult(A, B, C) is 5310 µs. This are 786.802 MFlops.<br />
Compute time mult(A, B, C) is 4290 µs. This are 973.874 MFlops.<br />
Compute time mult(A, B, C) is 3490 µs. This are 1197.11 MFlops.<br />
It shows that other parametrizations yield more acceleration and that the per<strong>for</strong>mance could<br />
almost be tripled.<br />
Which configuration is best and why is — as mentioned be<strong>for</strong>e — not topic of this script; we<br />
only show programming techniques. The reader is invited to try this program on his/her own<br />
computer. The technique in this section is intended <strong>for</strong> L1 cache usage. If matrices are larger,<br />
one should use more levels of blocking. A general-purpose methodology <strong>for</strong> locality on L2, L3,<br />
main memory, local disk, . . . is recursion. This avoids reimplementation <strong>for</strong> each cache size and<br />
per<strong>for</strong>ms even reasonably well in virtually memory, see <strong>for</strong> instance [?].
5.5. EXERCISES 185<br />
5.5 Exercises<br />
5.5.1 Vector class<br />
Revisit the vector example from §??.<br />
Make an expression <strong>for</strong> a scalar times a vector:<br />
class scalar times vector expressions {<br />
} ;<br />
that inherits from base vector. Use the inheritance mechanism to assign scalar times vector expressions<br />
into vector.<br />
5.5.2 Vector expression template<br />
Make a vector concept, which you call Vector. Make a vector class (you can use std::vector)<br />
that satisfies this concept. This vector class should have at least the following members:<br />
class my vector {<br />
public:<br />
typedef double value type ;<br />
public:<br />
my vector( int n ) ;<br />
// Copy Constructor from type itself<br />
my vector( my vector& ) ;<br />
// Constructor from generic vector<br />
template <br />
my vector( Vector& ) ;<br />
// Assignment operator<br />
my vector& operator=( my vector const& v ) ;<br />
// Assignment <strong>for</strong> generic Vector<br />
template <br />
my vector& operator=( Vector const& v ) ;<br />
value type& operator() ( int i ) ;<br />
public: // Vector concept<br />
int size() const ;<br />
value type operator() ( int i ) const ;<br />
} ;<br />
Make an expression <strong>for</strong> a scalar times a vector:<br />
template <br />
class scalar times vector expression{<br />
} ;
186 CHAPTER 5. META-PROGRAMMING<br />
template <br />
scalar times vector expressions operator∗( Scalar const& s, Vector const& v ) {<br />
return scalar times vector expressions( s, v ) ;<br />
}<br />
Put all classes and functions in the namespace athens. You can also make an expression template<br />
<strong>for</strong> the addition of two vectors.<br />
Write a small program, e.g.<br />
int main() {<br />
athens::my vector v( 5 ) ;<br />
... Fill in some values of v ...<br />
athens::my vector w( 5 ) ;<br />
w = 5.0 ∗ v ;<br />
w = 5.0 ∗ (7.0 ∗ v ) ;<br />
w = v + 7.0∗v ; // (If you have added the operator+)<br />
}<br />
Use the debugger to see what happens.
Inheritance<br />
Chapter 6<br />
C ++ is a multi-paradigm language and the paradigm that is most strongly associated with C ++<br />
is ‘Object-Oriented Programming’ (OOP). The authors feel nevertheless that it is not the most<br />
important paradigm <strong>for</strong> scientific programming because it is inferior to generic programming<br />
<strong>for</strong> two major reasons:<br />
• Flexibility and<br />
• Per<strong>for</strong>mance.<br />
However, the impact of these two disadvantages is negligible in some situations. The per<strong>for</strong>mance<br />
is only deteriorated when we use virtual functions (§ 6.1).<br />
OOP in combination with generic programming is a very powerful mechanism to provide a <strong>for</strong>m<br />
of reusability that neither of the paradigms can provide on it own (§ 6.3–§ 6.5).<br />
6.1 Basic Principles<br />
See section ?? from page ?? to page ??.<br />
6.2 Dynamic Selection by Sub-typing<br />
solver base class<br />
The way solvers are selected in AMDiS. The MTL4 solvers generic functions. AMDiS is only<br />
slightly generic but many decisions are made at run-time (by means of pointers and virtual<br />
functions). So, we needed a way to call the generic functions but decide at run time which one.<br />
The dynamic solver selection can be done with classical C features like:<br />
#include <br />
#include <br />
class matrix {};<br />
class vector {};<br />
void cg(const matrix& A, const vector& b, vector& x)<br />
187
188 CHAPTER 6. INHERITANCE<br />
{<br />
}<br />
std::cout ≪ ”CG\n”;<br />
void bicg(const matrix& A, const vector& b, vector& x)<br />
{<br />
std::cout ≪ ”BiCG\n”;<br />
}<br />
int main (int argc, char∗ argv[])<br />
{<br />
matrix A;<br />
vector b, x;<br />
}<br />
switch (std::atoi(argv[1])) {<br />
case 0: cg(A, b, x); break;<br />
case 1: bicg(A, b, x); break;<br />
}<br />
return 0 ;<br />
This works but it is not scalable with respect to source code complexity. If we call the solver<br />
with other vectors and matrices somewhere else we must copy the whole switch-case-block<br />
<strong>for</strong> each argument combination. This can avoided by encapsulating the block into a function<br />
and call this one with different arguments. More complicated is to different preconditioners<br />
(diagonal, ILU, IC, . . . ) that are also dynamically selected. Shall we copy a switch block <strong>for</strong><br />
the preconditioners into each case block of the solvers?<br />
An elegant solution is an abstract solver class and derived classes <strong>for</strong> the solvers:<br />
struct solver<br />
{<br />
virtual void operator()(const matrix& A, const vector& b, vector& x)= 0;<br />
virtual ∼solver() {}<br />
};<br />
// potentially templatize<br />
struct cg solver : solver<br />
{<br />
void operator()(const matrix& A, const vector& b, vector& x) { cg(A, b, x); }<br />
};<br />
struct bicg solver : solver<br />
{<br />
void operator()(const matrix& A, const vector& b, vector& x) { bicg(A, b, x); }<br />
};<br />
In the application we can define one or multiple pointers of type solver and assign them the<br />
desired solver:<br />
// Factory<br />
solver∗ my solver= 0;<br />
switch (std::atoi(argv[1])) {<br />
case 0: my solver= new cg solver; break;<br />
case 1: my solver= new bicg solver; break;
6.3. REMOVE REDUNDANCY WITH BASE CLASSES 189<br />
}<br />
This idea is discussed thouroughly in the design patterns book [?] as factory pattern. Once we<br />
have defined a pointer of such a abstract class (also called interface), we can call it directly:<br />
(∗my solver)(A, b, x);<br />
Without going into detail, we can have multiple factories and use the pointers together without<br />
the combinatorial explosion in the program sources:<br />
// Preconditioner factory<br />
precon∗ my precon= 0;<br />
switch (std::atoi(argv[2])) { ... }<br />
(∗my solver)(∗my precon, A, b, x);<br />
C ++ does not allow virtual template functions because this would make the compiler implementation<br />
very complicated to avoid infinite function pointer tables. However, template classes can<br />
have virtual functions. This enables generic programming with virtual functions by templatizing<br />
the entire class not single member functions.<br />
6.3 Remove Redundancy With Base Classes<br />
especially when no type infos involved<br />
6.4 Casting Up and Down and Elsewhere<br />
In C ++, there are four different cast operators:<br />
• static cast;<br />
• dynamic cast;<br />
• const cast; and<br />
• reinterpret cast.<br />
Its linguistic root C knew only one casting operator: ‘( type ) expr’. The trouble with this<br />
single operator is that it is not standardized or clearly defined which casting is per<strong>for</strong>med under<br />
which conditions. As a consequence, the behavior of the casting can change from compiler to<br />
compiler. C ++ still allows the this old-style casting but all C ++ experts agree on discouraging<br />
its use. Another quite important issue is that this notation is not easy to find in large code<br />
bases (there is no regular expression to filter out all C casts) what increases significantly the<br />
maintenance costs, see also discussion in [SA05, chapter 95]. In this section, we will show you<br />
the different cast operators and discuss the pros and cons of different casts in different contexts.
190 CHAPTER 6. INHERITANCE<br />
6.4.1 Casting Between Base and Derived Classes<br />
Casting Up<br />
⇒ up down cast example.cpp<br />
Casting up, i.e. from a derived to a base class, is always possible if there are no ambiguities and<br />
can be even per<strong>for</strong>med implicitly. Assume we have the following class structure: 1<br />
struct A<br />
{<br />
virtual void f(){}<br />
virtual ∼A(){}<br />
int ma;<br />
};<br />
struct B : A { float mb; };<br />
struct C : A {};<br />
struct D : B, C {};<br />
and the following unary functions:<br />
void f(A a) { /∗ ... ∗/ }<br />
void g(A& a) { /∗ ... ∗/ }<br />
void h(A∗ a) { /∗ ... ∗/ }<br />
An object of type B can be passed to all three funtions:<br />
int main (int argc, char∗ argv[])<br />
{<br />
B b;<br />
f(b);<br />
g(b);<br />
h(&b);<br />
}<br />
return 0 ;<br />
In all three cases the object b is implicitly converted to an object of type A. The call of function<br />
‘f’ is however a bit different: only b’s members within class A are copied into the function<br />
argument and the remainder — in our example the member ‘mb’ is not accessible in f by any<br />
means. The functions ‘g’ and ‘h’ refer to an object of type A by reference or pointer. If an<br />
object of a derived class is passed to one of those functions the other members are in principle<br />
still there but just hidden. One could still access them by down-casting the argument in the<br />
function. Be<strong>for</strong>e we down-cast we should ask ourselves the following questions:<br />
• How do we assure that the argument passed to function is really an object of the derived<br />
class? For instance with extra arguments or with run-time tests.<br />
• What can we do if the object cannot be down-casted?<br />
• Can we write directly a function <strong>for</strong> the derived class?<br />
• Why we do not overload the function <strong>for</strong> the base and the derived type? This is definitively<br />
a much cleaner design and always feasible.<br />
1 TODO: picture
6.4. CASTING UP AND DOWN AND ELSEWHERE 191<br />
Up-casting only fails if the base class is ambiguous. In the current example we cannot up-cast<br />
from D to A:<br />
D d;<br />
A ad(d); // error: ambiguous<br />
because the compiler does not know if we mean the base class A from B or from C. We can<br />
clarify this with an explicit intermediate up-cast:<br />
A ad(B(d));<br />
Or we can share A between B and C: 2<br />
struct B : virtual A { float mb; };<br />
struct C : virtual A {};<br />
Now the members of A exist only once in D which is probably the best solution <strong>for</strong> multiple<br />
inheritance in most cases because we safe memory and do not need to pay attention which<br />
replica of A is accessed.<br />
Casting Down<br />
There are situations where references or pointers are casted down, e.g. in the next section § 6.5.<br />
This can be per<strong>for</strong>med with static cast or dynamic cast. As the names suggest, static cast is<br />
statically type-checked during compile time whereas dynamic cast per<strong>for</strong>ms run-time tests (with<br />
only minimal compile-time tests). We still use our diamond-shaped class hierarchy A–D as case<br />
study. Now we introduce to pointers of type B∗ holding objects of types B and D:<br />
B ∗bbp= new B, ∗bdp= new D;<br />
When we cast these pointers down to D∗, dynamic cast verifies whether the referred object<br />
actually allows this cast. Since this in<strong>for</strong>mation is in general only known at run time, e.g.:<br />
B ∗bxp= argc > 1 ? new B : new D;<br />
dynamic cast must verify the referred object’s type with run-time in<strong>for</strong>mation (RTTI). Per<strong>for</strong>ming<br />
an incorrect cast leads to a null pointer:<br />
D∗ dbp= dynamic cast(bbp); // error: cannot downcast from B to D<br />
D∗ ddp= dynamic cast(bdp); // ok: bdp points to bn object of type D<br />
std::cout ≪ ”Dynamic downcast of bbp should fail and pointer should be 0, it is: ” ≪ dbp ≪ ’\n’;<br />
std::cout ≪ ”Dynamic downcast of bdp should succeed and pointer should not be 0, it is: ” ≪ ddp ≪<br />
’\n’;<br />
The programmer can check the zeroness of the pointer and eventually react to the failed downcast.<br />
Likewise, incorrect down-casts of references throw an exception of type std::bad cast can<br />
be handled in a try-catch block.<br />
In contrast to it, static cast only verifies that the target type is a derived class of the source<br />
type — respectively references or pointers thereof — or vice versa:<br />
2 TODO: picture
192 CHAPTER 6. INHERITANCE<br />
dbp= static cast(bbp); // erroneous downcast per<strong>for</strong>med<br />
ddp= static cast(bdp); // correct downcast but not checked by the system<br />
std::cout ≪ ”Erroneous downcast of bbp will not return 0, it is: ” ≪ dbp ≪ ’\n’;<br />
std::cout ≪ ”Correct downcast of bdp but not checked at run−time, it is: ” ≪ ddp ≪ ’\n’;<br />
Whether the referred object really allows <strong>for</strong> the downcast cannot be decided at compile time<br />
and lies in the responsibility of the programmer.<br />
Cross-casting<br />
An interesting feature of dynamic cast is casting across from B to C when the referred object’s<br />
type is a derived class of both types:<br />
C∗ cdp= dynamic cast(bdp); // cross−cast from B to C ok: bdp points to an object of type D<br />
std::cout ≪ ”Dynamic cross−cast of bdp should succeed and pointer should not be 0, it is: ” ≪ cdp ≪<br />
’\n’;<br />
Static cross-casting from B to C:<br />
cdp= static cast(bdp); // error: cross−cast from B to C does not compile<br />
is not possible because C is neither a base or derived class of B. It can be casted indirectly over<br />
D:<br />
cdp= static cast(static cast(bdp)); // error: cross−cast from B to C via D<br />
This again is in the responsibility of the programmer whether the addressed object can be really<br />
casted this way.<br />
Comparing Static and Dynamic Cast<br />
Dynamic casting is safer but slower then static casting due the run-time check of the referred<br />
object’s type. Static casting allows <strong>for</strong> casting up and down with the programmer’s responsibility<br />
that the referred objects are handled correctly. Dynamic casting is in some sense always<br />
up, namely from the referred object’s type to a super-type (including itself).<br />
Furthermore, dynamic casting can only be applied on ‘Polymorphic Types’ that are class that<br />
define or inherit a virtual function. The following summarizes the differences between the to<br />
<strong>for</strong>ms of casting:<br />
static cast dynamic cast<br />
Applicability all only polymorphic classes<br />
Cross-casting no yes<br />
Run-time check no yes<br />
Speed no run-time overhead overhead <strong>for</strong> checking<br />
Table 6.1: Static vs. dynamic cast
6.5. BARTON-NACKMAN TRICK 193<br />
6.4.2 Const Cast<br />
const cast adds or removes the attributes const and/or volatile. The key word volatile in<strong>for</strong>ms the<br />
compiler that a variable can be modified by other programs. It is there<strong>for</strong>e not hold or cached<br />
in registers and accessed each time memory. This feature is not used in this script. Adding<br />
an attribute is an implicit conversion in C ++. That is one can always assign an expression<br />
to an variable of the same type with extra attributes without the need <strong>for</strong> a cast. Removing<br />
an attribute requires a const cast and should be only done when unavoidable, e.g. to interface<br />
old-style software that is lacking appropriate const attributes.<br />
6.4.3 Reinterpretation Cast<br />
This is the most aggressive <strong>for</strong>m of casting and not used in this script. It takes an address or an<br />
object’s memory location and interprets the bits there as it was of the target type. One can <strong>for</strong><br />
instance change a single bit in a floating point number by casting it to a bit chain. It is more<br />
important <strong>for</strong> programming hardware drivers than complex flux solvers. Needless to say that<br />
reinterpret cast is one of the most efficient ways to undermine the portability of an application.<br />
6.5 Barton-Nackman Trick<br />
This section describes the ‘Curiously Recurring Template Pattern’ (CRTP). It was introduced<br />
by John Barton and Lee Nackman [?] and is there<strong>for</strong>e also referred to as the ‘Barton-<br />
Nackman Trick’.<br />
6.5.1 A Simple Example<br />
⇒ crtp simple example.cpp<br />
We will explain this with a simple example. Assume we have a class point with an equality<br />
operator:<br />
class point<br />
{<br />
public:<br />
point(int x, int y) : x(x), y(y) {}<br />
bool operator==(const point& that) const { return x == that.x && y == that.y; }<br />
private:<br />
int x, y;<br />
};<br />
We can program the unequality by using common sense or by applying de Morgan’s law:<br />
bool operator!=(const point& that) const { return x != that.x || y != that.y; }<br />
Or we can simplify our live and just negate the result of the equality:<br />
bool operator!=(const point& that) const { return !(∗this == that); }
194 CHAPTER 6. INHERITANCE<br />
Our compilers are so sophisticated, they certainly handle de Morgan’s law perfectly. Negating<br />
the equality operator is something we can do on every type that has an equality operator. We<br />
could copy-and-past this code snippet and just replace the type of the argument.<br />
Alternatively, we can write a class like this:<br />
template <br />
struct unequality<br />
{<br />
bool operator!=(const T& that) const { return !(static cast(∗this) == that); }<br />
};<br />
and derive from it:<br />
class point : public unequality { ... };<br />
This mutual dependency:<br />
• One class is derived from the other and<br />
• The latter takes the derived class’ type as template argument<br />
is somewhat confusing at the first view.<br />
Essential <strong>for</strong> this to work is that the code of a template class member is only generated when<br />
the class is instantiated and the function is actually called. At the time the template class<br />
‘unequality is parsed, the compiler checks only the correctness of the syntax.<br />
When we write<br />
int main (int argc, char∗ argv[])<br />
{<br />
point p1(3, 4), p2(3, 5);<br />
std::cout ≪ ”p1 != p2 is ” ≪ (p1 != p2 ? ”true” : ”false”) ≪ ’\n’;<br />
}<br />
return 0 ;<br />
fter the definition of unequality and point both types are completely known to the compiler.<br />
What happens when we call p1 != p2?<br />
1. The compiler searches <strong>for</strong> operator!= in class point → without success.<br />
2. The compiler looks <strong>for</strong> operator!= in the base class unequality → with success.<br />
3. The this pointer of unequality refers a component of point’s this pointer.<br />
4. Both types are completely known and we can statically down-cast the this pointer to point.<br />
5. Since we know that the this pointer of unequality is an up-casted this pointer to<br />
point 3 we are save to down-cast it to its original type.<br />
6. The equality operator <strong>for</strong> point is called. Its implementation is already known at this point<br />
because the code of unequality’s operator!= is not generated be<strong>for</strong>e the instantiation<br />
of point.<br />
3 Unless the first argument is really of type unequality. There are also ways to impede this, e.g. http:<br />
//en.wikipedia.org/wiki/Barton-Nackman_trick but we used this unary operator notation <strong>for</strong> the sake of<br />
simplicity.
6.5. BARTON-NACKMAN TRICK 195<br />
Likewise every class U with an equality operator can be derived from unequality. A collection<br />
of such CRTP templates <strong>for</strong> operator defaults is provided by Boost.Operators from Jeremy<br />
Siek and David Abrahams.<br />
Alternatively to the above implementation where the this pointer is dereferred and casted as<br />
reference, one can cast the pointer first and derefer it afterwards:<br />
template <br />
struct unequality<br />
{<br />
bool operator!=(const T& that) const { return !(∗static cast(this) == that); }<br />
};<br />
There is no difference, this is just a question of taste.<br />
6.5.2 A Reusable Access Operator<br />
⇒ matrix crtp example.cpp<br />
We still owe the reader the reusable implementation of the matrix bracket operator promised<br />
in Section 3.7.4. Back then we did not know enough language features.<br />
First of all we had no templates which are indispensable <strong>for</strong> a proxy. We will show you why.<br />
Say we have a matrix class as in § 3.7.4 and we just want to call the binary operator() from the<br />
unary operator[] via a proxy:<br />
class matrix; // Forward declaration<br />
class simple bracket proxy<br />
{<br />
public:<br />
simple bracket proxy(matrix& A, int r) : A(A), r(r) {}<br />
double& operator[](int c){ return A(r, c); }<br />
private:<br />
matrix& A;<br />
int r;<br />
};<br />
class matrix<br />
{<br />
// ...<br />
double& operator()(int r, int c) { ... }<br />
};<br />
simple bracket proxy operator[](int r)<br />
{<br />
return simple bracket proxy(∗this, r);<br />
}<br />
This does not compile because operator[] from simple bracket proxy calls operator() from matrix but<br />
this is not defined yet. The <strong>for</strong>ward declaration of matrix is not sufficient because we need the<br />
complete definition of matrix not only the assertion that the type exist. Vice versa if we define<br />
matrix first, we would miss the constructor of simple bracket proxy in the operator[] implementation.
196 CHAPTER 6. INHERITANCE<br />
Another disadvantage of the implementation above is that we would need another proxy <strong>for</strong> the<br />
constant access.<br />
This is an interesting aspect of templates. It does not only enable writing type-parametric software<br />
but can also help to break mutual dependencies thanks to its post-poned code generation.<br />
By templetizing the proxy the dependency is gone:<br />
template <br />
class bracket proxy<br />
{<br />
public:<br />
bracket proxy(Matrix& A, int r) : A(A), r(r) {}<br />
Result& operator[](int c){ return A(r, c); }<br />
private:<br />
Matrix& A;<br />
int r;<br />
};<br />
class matrix<br />
{<br />
// ...<br />
bracket proxy operator[](int r)<br />
{<br />
return bracket proxy(∗this, r);<br />
}<br />
};<br />
With this implementation we can now write A[i][j] and it is realized by the binary operator()<br />
however this is implemented. Such a bracket operator is useful in every matrix class and the<br />
implementation will be always the same.<br />
For this reason we like to have this implementation only once in our code base and reuse<br />
whereever appropriate. The only way to achieve this is with the CRTP paradigm:<br />
template <br />
class bracket proxy<br />
{<br />
public:<br />
bracket proxy(Matrix& A, int r) : A(A), r(r) {}<br />
Result& operator[](int c){ return A(r, c); }<br />
private:<br />
Matrix& A;<br />
int r;<br />
};<br />
template <br />
class crtp matrix<br />
{<br />
public:<br />
bracket proxy operator[](int r)<br />
{<br />
return bracket proxy(static cast(∗this), r);<br />
}
6.5. BARTON-NACKMAN TRICK 197<br />
};<br />
bracket proxy operator[](int r) const<br />
{<br />
return bracket proxy(static cast(∗this), r);<br />
}<br />
class matrix : public crtp matrix<br />
{<br />
// ...<br />
};<br />
Once we have such a CRTP class we can provide a bracket operator <strong>for</strong> every matrix class with a<br />
binary application operator. In a full-fledged linear algebra package one needs to pay attention<br />
which matrices return references and which are mutable but the approach is as described above.<br />
Several timings have shown that the indirection with the proxy did not create run-time overhead<br />
compared to the direct usage of the binary access operator. Apparently, the compilers optimized<br />
the creation of proxies away in the executables.
198 CHAPTER 6. INHERITANCE
Effective Programming: The<br />
Polymorphic Way<br />
Chapter 7<br />
Fools ignore complexity. Pragmatists suffer it. Some can avoid it. Geniuses remove<br />
it.<br />
—Alan Perlis<br />
To remove complexity in scientific application development (but not only there), several programming<br />
techniques, methods, and application of paradigms have to be used accordingly. This<br />
not only depends on the ability to combine application-specific functionality with other librarycode<br />
from a variety of sources but also to restrict the amount of application-specific glue code.<br />
So libraries must remain open <strong>for</strong> extension but closed <strong>for</strong> modification, which can be attributed<br />
to a technique called polymorphic programming.<br />
The presented sections of this book introduced important mechanisms to successfully develop<br />
scientific applications such as <strong>C++</strong> basics, encapsulation, generic and meta programming as<br />
well as inheritance. An important part of scientific computing, matrix containers and matrix<br />
algorithms, has been presented to aid the topics so far. Effective programming is then possible if<br />
these mechanisms are not viewed as separate entities, but as different characteristics to achieve<br />
important goals, such as<br />
• uncompromising efficiency of simple basic operations (e.g., array subscripting should not<br />
incur the cost of a function call),<br />
• type-safety (e.g., an object from a container should be usable without explicit or implicit<br />
type conversion),<br />
• code reuse and extensibility,<br />
all with their respective advantages and disadvantages. This section reviews important techniques<br />
to achieve polymorphism from a more general point of view and highlights a basic but<br />
very important recurring principle <strong>for</strong> scientific computing: code reusability. This is not mainly<br />
because programmers are lazy people, but also because applications have to be tested. For<br />
the field of scientific applications this is particularlly important due to large parameter sets,<br />
changing boundary and initial conditions, as well as long run-times of simulation codes. Hence<br />
it should not be underestimated how much time and ef<strong>for</strong>t can be saved, if already tested code<br />
can be used as starting point or reference. So code reusability is not only about programming<br />
less, but also because of extend code quality. Most of the presented and discussed technique so<br />
199
200 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
far already deal with some kind of code reusability, but mostly in an implicit way. The following<br />
section overview polymorphic mechanisms in a more explicit way.<br />
As soon as code reusability is covered, an almost equal importance is placed on code extensibility,<br />
which should not be constrained by reused code. Scientific code development is always<br />
driven by trans<strong>for</strong>ming newly developed scientific methods into executable code. Various programming<br />
techniques with different scopes are there<strong>for</strong>e mandatory. If programming techniques<br />
are analyzed this way, it becomes understandable why some of the presented programming<br />
paradigms are not ideally suited to accomplish code reusability and extensibility together (e.g.,<br />
the object-oriented inheritance model).<br />
No technique, or more generally paradigm, will result in the ultimate and final solution, but<br />
each of the techniques results in tools to manage the complexity of a problem. It does not give<br />
you the ability to do so. A bad problem specification will lead to a bad solution independently<br />
from the technique or paradigm used <strong>for</strong> implementation.<br />
The usage of the Boost graph library (BGL) is an excellent example. There is great diversity<br />
of requirements in the field of graph algorithms and data structures. Even so, the per<strong>for</strong>mance<br />
claim <strong>for</strong> a library like this, is very high. Nevertheless, it was possible to implement all necessary<br />
functionality at a high per<strong>for</strong>mance level. More than this, the library can be extended greatly<br />
in many different ways. But on the other side, this library is not easy to use or extend without<br />
an understanding of the underlying techniques.<br />
And as a reminder of the main goal of this book is how to write good scientific software.
7.1. IMPERATIVE PROGRAMMING 201<br />
7.1 Imperative Programming<br />
Imperative programming may be viewed as the very bones on which all other abstractions<br />
depend. This programming paradigm uses a sequence of instructions which act on a state to<br />
realize algorithms. Thus it is always specified in detail what and how to execute next. The<br />
modification of the program state while convenient is also an issue, as with increasing size of<br />
the program, unintended modifications of the state becomes an increasing problem. In order<br />
to address this issue the imperative programming method has been refined to procedural and<br />
structured programming paradigms, which attempt to provide more control of the modifications<br />
of the program state. Hence it is based upon organized procedure calls. Procedure calls, also<br />
known as routines, subroutines, methods, or functions simply contain a series of computational<br />
steps to be carried out. Any given procedure might be called at any point during a program’s<br />
execution, including other procedures or itself. A function consists of:<br />
• The return type of the function: A function returns the value at a user specified position.<br />
C or C ++ which does not provide procedures explicitely has to use the keyword void <strong>for</strong><br />
indicating that a function does not return a value.<br />
• The name of the function: Therewith the function can be called. The name should be as<br />
expressive possible. Never underestimate names with good significance.<br />
• The parameter list of the function: The parameters of a function serve as placeholders<br />
<strong>for</strong> values that are later supplied by the user during each invocation of the function. A<br />
function can have an empty parameter list. The values of the parameter list can be given<br />
by value or by reference.<br />
• The body of the function: The body of a function implements the logic of the operation.<br />
Typically, it manipulates the named parameters of the function.<br />
The advantages of this paradigm are:<br />
• Few techniques<br />
• Rapid prototyping <strong>for</strong> easy problems<br />
• Functions can be put into a library<br />
• Fast compilation<br />
The disadvantages of this paradigm are:<br />
• Test ef<strong>for</strong>t is high<br />
• Source of error is manifold<br />
• Non-trivial problems cause high programming line ef<strong>for</strong>t<br />
• No user defined data types<br />
• No locality of data<br />
• Only very few and simple functions can be put into a library<br />
Even in the refined <strong>for</strong>m as procedural programming the incurred overhead can be limited to a<br />
bare minimum as the level of abstraction is relatively low. This was well suited <strong>for</strong> the situation
202 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
of scarce computing resources and a lack of mature and powerful tools. Under these circumstances<br />
the overall per<strong>for</strong>mance, in terms of execution speed or memory consumption is solely<br />
dependent on the skill and ingenuity of the programmer and has resulted in the almost mythical<br />
”hand optimized” code. However, to achive the desired specifications in such a fashion the<br />
clarity and readability, and therby the maintainability of the code were sacrificed. Furthermore,<br />
the low level of abstraction also hinders portability, as different architectures favour different<br />
assumptions to produce efficient execution. To address this effect, implementations were duplicated<br />
in order to optimize <strong>for</strong> different architectures and plat<strong>for</strong>ms, which of course makes a<br />
mockery of goals such as code reusability or even extensiblity.<br />
This paradigm and the derived techniques are then used differently in Section 2.11, where<br />
generic programming is used to offer an efficient approach <strong>for</strong> matrix operations.
7.2. GENERIC PROGRAMMING 203<br />
7.2 Generic Programming<br />
Generic programming may be viewed as having been developed in order to further facilitate<br />
the goals of code reusability and extensibility. From a general view the generic programming<br />
paradigm is about generalizing software components so that they can be directly reused easily<br />
in a wide variety of situations. While these are among the goals, which lead to the development<br />
of object oriented programming, it may vary quite profoundly in the realization. A major<br />
distinction from object oriented programming, which is focused on data structures and their<br />
states, is that it especially allows <strong>for</strong> a very abstract and orthogonal description of algorithms.<br />
To achieve this kind of generalization a separation of the basic tools of programming are important:<br />
algorithms, containers (data structures), and a glue between them (so called iterators<br />
or more generally traversors). As introduced as an important part <strong>for</strong> effective programming,<br />
the minimization of glue code, iterators and traversal objects operate as a minimal but fully<br />
abstract interface between data structures and algorithms.<br />
While the desired functionality is often implemented using static polymorphism mechanisms,<br />
such as templates in <strong>C++</strong>, generic programming should not be equated with simply programming<br />
with templates. However, when generic programming is realized using purely compile<br />
time facilities such as static polymorphism, not only is implementation ef<strong>for</strong>t reduced but the<br />
resulting run time per<strong>for</strong>mance optimized.<br />
In the following, the process of generic programming is given by elevating a procedural code to<br />
a generic one simultanioulsy fullfilling the important topics of effective programming (efficiency,<br />
type-safety, code reuse):<br />
• Algorithm: Generic algorithms are generic in two ways. First the data type which they<br />
are operating on is arbitrary and second, the type of container within which the elements<br />
are held is arbitrary.<br />
To get in touch with the generic approach, a generalization of the memcpy() function of the<br />
C standard library is discussed. An implementation of memcpy() might look somewhat<br />
like the following:<br />
void∗ memcpy(void∗ region1, const void∗ region2, size t n)<br />
{<br />
const char∗ first = (const char∗)region2;<br />
const char∗ last = ((const char∗)region2) + n;<br />
char∗ result = (char∗)region1;<br />
while (first != last)<br />
∗result++ = ∗first++;<br />
return result;<br />
}<br />
The memcpy() function is already generalized to some extent by the use of void∗ so that<br />
the function can be used to copy arrays of different kinds of data.<br />
Looking at the body of memcpy(), the function’s minimal requirements are that it needs to<br />
traverse the sequence using some sort of pointer, access the elements pointed to, copy the<br />
elements to the destination, and compare pointers to know when to stop. The memcpy()<br />
function can then be written in a generic manner:<br />
template <br />
OutputIterator
204 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
copy(InputIterator first, InputIterator last, OutputIterator result)<br />
{<br />
while (first != last)<br />
∗result++ = ∗first++;<br />
return result;<br />
}<br />
With this code the same functionality of the memcpy() from the C library is achieved.<br />
All kinds of data structure which offer an begin() and end() iterator can be used.<br />
• Container: An abstraction to all kinds of data structures which can store other data<br />
types.<br />
• Iterator: This is the glue between the containers and the algorithms. First, it separates<br />
the usage of data structures and algorithms. Second, it provides a concept hierarchy <strong>for</strong><br />
all kinds of traversal within data structures.<br />
This type of genericity is called parametric polymorphism (see Section 7.5.2). Section 4.9<br />
introduced the Standard Template Library (STL). The STL solves many standard data structure<br />
and algorithmic problems. The STL is (or should be) the first choice in all code development<br />
steps.<br />
• Algorithm/Data-Structure Interoperability: First, each algorithm is written in a datastructure<br />
neutral way, allowing a single template function to operate on many different<br />
classes of containers. The concept of an iterator is the key ingredient in this decoupling of<br />
algorithms and data-structures. The impact of this technique is a reduction of the STL’s<br />
code size from O(M*N) to O(M+N), where M is the number of algorithms and N is the<br />
number of containers. Considering a situation of 20 algorithms and 5 data-structures,<br />
this makes the difference between writing 100 functions versus only 25 functions! And the<br />
differences grows faster as the number of algorithms and data-structures increase.<br />
• Extension through Function Objects: The second way that the STL is generic is that its<br />
algorithms and containers are extensible. The user can adapt and customize the STL<br />
through the use of function objects. This flexibility is what makes STL such a great<br />
tool <strong>for</strong> solving real-world problems. Each programming problem brings its own set of<br />
entities and interactions that must be modeled. Function objects provide a mechanism<br />
<strong>for</strong> extending the STL to handle the specifics of each problem domain.<br />
• Element Type Parametrization: The third way that STL is generic is that its containers<br />
are parametrized on the element type.<br />
Most people think, that element type parametrization is the feature that makes the successful.<br />
This is perhaps the least interesting way in which STL is generic. The interoperability with<br />
iterators and the extensibility by function objects are more important parts of the STL. But<br />
the essence is the programming with concepts. The programmer can write the data structures<br />
and algorithms, or in other words the concept of these, as it should be. Next to these facts, the<br />
STL has proven that with the generic programming paradigm, high per<strong>for</strong>mance computing<br />
can be accomplished as well on several different computer architectures.<br />
The advantages of this paradigm are:<br />
• Programming with concepts<br />
• Great number of available libraries
7.2. GENERIC PROGRAMMING 205<br />
• Great expandibility<br />
• Great code reusability<br />
• Development of high per<strong>for</strong>mance code<br />
• All other paradigms can be used<br />
• Concepts can be proven by the compiler<br />
The disadvantages of this paradigm are:<br />
• Long compilation times: <strong>C++</strong> and the statical type checking requires a complete template<br />
instantiation and type checking.<br />
• Steep learning curve due to many complex techniques<br />
• Code bloat: Due to an incorrect usage of templates, the compiler can produce an excessive<br />
amount of code.
206 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
7.3 Programming with Objects<br />
Programming with objects may be viewed as an evolution from the structured imperative<br />
paradigm. It, on the one hand, tries to address the issue of code reuseability by providing a<br />
specific type of polymorphism, sub-typing. On the other hand it addresses the issue of unchecked<br />
modification of state by en<strong>for</strong>cing data encapsulation, thus en<strong>for</strong>cing changes through defined<br />
interfaces. Both of these notions are attached to an entity called an object. There<strong>for</strong>e an object<br />
serves as a self contained unit which interacts with the environment via messages. It thus<br />
accomplishes a decoupling of the internal implementation within the object and the interaction<br />
with the sourrounding environment. Thus en<strong>for</strong>cing (clean) interfaces, which is essential <strong>for</strong><br />
effective programming. The algorithms are expressed much more by the notion of what is to be<br />
done as an interaction and modification of objects, where the details of how, are encapsulated<br />
to a great extent within the objects themselves.<br />
Another benefit <strong>for</strong> programming with objects is, that these entities can be placed in libraries.<br />
This saves the ef<strong>for</strong>t of continually rewriting the same code <strong>for</strong> every new program. Furthermore,<br />
because objects can be made polymorphic, object libraries offer the programmer more flexibility<br />
and functionality than subroutine libraries (their counterparts in the procedural paradigm).<br />
Technically, object libraries are quite feasible, and the advantages of extensibility can be significant.<br />
However, the real challenge to making code reusable is not technical. Rather, it is<br />
identifying functionality that other people both understand and want. People who use procedural<br />
languages have been writing and using subroutine libraries <strong>for</strong> decades. These libraries are<br />
most successful when they per<strong>for</strong>m simple, clearly defined functions, such as calculating square<br />
roots or computing trigonometric functions. An object library can provide complex functions<br />
more easily than a subroutine library. However, unless those functions are clearly defined, well<br />
understood and generally useful, the library is unlikely to be used widely.<br />
To give an intuitive specification of the programming approach with objects, the following list<br />
specifies different points in the object world:<br />
• Identity is the quantization of data in discrete, distinguishable entities called objects<br />
• Classification is the grouping of objects with the same structure and behavior into classes<br />
• Polymorphism is the differentiation of behavior of the same operation on different classes<br />
• Inheritance is the sharing of structure and behavior among classes in a hierarchical relationship<br />
But one of the biggest problems of this programming approach is the interaction of objects with<br />
algorithms. The problem can easily be seen using the example of a simple sorting algorithm.<br />
Should the algorithm be placed into the object? Should an algorithm work on a class hierarchy<br />
with a common interface?<br />
The problem cannot be solved easily within this paradigm. A possible solution is some kind of<br />
polymorphism, which is explained in Section 7.5.2.<br />
7.3.1 Object-Based Programming<br />
In languages which support identity and classification the object-based paradigm can be used<br />
efficiently.
7.3. PROGRAMMING WITH OBJECTS 207<br />
The advantages of this paradigm are:<br />
• User defined data structures with data locality: programming can be more intuitive than<br />
compared to the procedural paradigm. and algorithms can be put into a library<br />
• Library code can be tested independently<br />
• Fast compilation, may be slower than procedural<br />
The disadvantages of this paradigm are:<br />
• Runtime per<strong>for</strong>mance<br />
• Library/code reusability<br />
7.3.2 Object-Oriented Programming<br />
To overcome the mentioned problem of code reusability, inheritance and polymorphism were<br />
introduced 1 . Inheritance is deployed with the aim of reducing implementation ef<strong>for</strong>ts by allowing<br />
refinement of already existing objects. By using inheritance and the connected sub typing also<br />
makes polymorphic programming available at run time:<br />
• Inheritance allows us to group classes into families of related types, allowing to share<br />
common operations and data. The reuse of already existing code can be accomplished.<br />
• Polymorphism allows us to implement these families as a unit rather than as individual<br />
classes, giving us greater flexibility in adding or removing any particular class. This point<br />
is explained in more detail in Section7.5.2, where this type of polymorphism is called<br />
subtyping polymorphism.<br />
• Dynamic binding is a third aspect of object-oriented programming.<br />
The actual member function resolution is delayed until run time. With the combination<br />
of inheritance and (subtyping) polymorphism a generic way of dealing with geometrical<br />
objects can be achieved.<br />
While the concepts of object orientation have proved to be invaluable to the development of<br />
modular software, its limits also became apparent as the goal of general reusability suffers from<br />
the stringent limitations of the required sub typing. Which may be viewed as a consequence<br />
that objects are not necessarily fit to accomodate the required abstractions such as in the case<br />
the algorithms themselves. Furthermore, the extension of existing codes is often only possible<br />
by intrusive means, such as changing the already existing implementations thus not leading to<br />
the high degree of reduction of ef<strong>for</strong>t as was hoped <strong>for</strong>.<br />
Compared to the run time environment or compiler required to realize the simple imperative<br />
programming paradigm, the object oriented paradigm requires more sophistication as it needs<br />
to be able to handle run time dispatches using virtual functions <strong>for</strong> instance. Additionally,<br />
seemingly simple statements may hide the true complexity encapsulated within the objects.<br />
Thus not only is the demand on the tools higher but the programmer also needs to be aware<br />
of the implications of the seemingly simple statements in order to achive desirable levels of<br />
per<strong>for</strong>mance.<br />
1 If a language supports all these features (identity, classification, polymorphism, and inheritance), then the<br />
object-oriented paradigm is supported in this language.
208 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
Behind the Dynamic Polymorphism in <strong>C++</strong><br />
A programmer must be aware of the fact, that inheritance is one of the<br />
strongest bonds between objects. In real-world examples, few problems can<br />
be modeled successfully by class-inheritance only. The coupling by inheritance<br />
should be used very carefully.<br />
The advantages of this paradigm are:<br />
• Library: Data-types can be enhanced greatly.<br />
• Abstract algorithms with polymorphism enable greater code reusability compared to the<br />
procedural paradigm.<br />
• Strong binding of data structures and methods: Logical connections can be modeled easily.<br />
Logical errors can be detected easily.<br />
The disadvantages of this paradigm are:<br />
• The binary-method problem (see Section 7.5.3)<br />
• Bad optimization capability of a compiler due to the subtyping polymorphism (see Section<br />
??)<br />
• Strong binding of data structures and methods: Only usable on object-oriented problems.
7.4. FUNCTIONAL PROGRAMMING 209<br />
7.4 Functional Programming<br />
In contrast to the procedural and object oriented paradigm, which explicitly <strong>for</strong>mulate algorithms<br />
and programs as a sequence of instructions which act on a program state, the functional<br />
paradigm uses mathematical functions <strong>for</strong> this task and <strong>for</strong>goes the use of a state altogether.<br />
There<strong>for</strong>e, there are no mutable variable and no side effects in purely functional programming.<br />
As such it is declarative in nature and relies on the language’s environment to produce an imperative<br />
representation which can be run on a physical machine. Among the greatest strengths of<br />
the functional paradigm is the availability of a strong theoretical framework of lambda calculus<br />
(cite()), which is explained in more detail in Section ref(), <strong>for</strong> the different implementations.<br />
Higher-order functions are an important concept of functional programming due to its usability<br />
in procedural languages. They were studied in lambda calculus theory well be<strong>for</strong>e the notion of<br />
functional programming existed and present the design of a number of functional programming<br />
languages, such as Scheme and Haskell.<br />
As modern procedural languages and their implementations have started to put greater emphasis<br />
on correctness, rather than raw speed, and the implementations of functional languages have<br />
begun to emphasize speed as well as correctness, the per<strong>for</strong>mance of functional languages and<br />
procedural languages has begun to converge. For programs which spend most of their time<br />
doing numerical computations, some functional languages (such as OCaml and Clean) can<br />
approach the per<strong>for</strong>mance of programs written in C speed, while <strong>for</strong> programs that handle<br />
large matrices and multidimensional databases, array functional languages (such as J and K)<br />
are usually faster than most non-optimized C programs. Functional languages have long been<br />
criticized as resource-hungry, both in terms of CPU resources and memory. This was mainly<br />
due to two things:<br />
• some early functional languages were implemented with no concern <strong>for</strong> efficiency<br />
• non-functional languages achieved speed at least in part by neglecting features such as<br />
checking of bounds or garbage collection which are viewed as essential parts of modern<br />
computing frameworks representing an overhead which was built-in to functional languages<br />
by default<br />
Since a purely functional description is free of side effects, it is a favourable choice <strong>for</strong> parallelization,<br />
as the description does not contain a state, which would require synchronization.<br />
Data related dependencies, however, must still be considered in order to ensure correct operation.<br />
Since the declarative style connected to the functional paradigm distances itself from<br />
the traditional imperative paradigm and its connection to states, input and output opearions<br />
pose a hurdle which is often addressed in a manner, which is not purely functional. As such<br />
functional interdependencies may be specified trivially, while the details how these are to be<br />
met remain opaque and as a choice to the specificc implementaiton.<br />
Last, we give an example of pure functional programming. We point out, that the next code<br />
snippet is presented in Haskel, not in C ++ syntax. The ”hello world” program in the functional<br />
programming paradigm: the factorial calculation<br />
fac :: Integer → Integer<br />
fac 0 = 1<br />
fac n | n>0 = n ∗ fac (n−1)
210 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
7.4.1 Lambda Calculus<br />
As was presented in Section ref(), is not very easy to reuse the STL standard function objects<br />
because the use is not very intuitive. Either a function object <strong>for</strong> each loop or binder has to be<br />
written. A binder (or binder object) is passed at construction time, to another function object<br />
which per<strong>for</strong>ms an action. The binder takes the function object as well as another binding value<br />
and makes a binary function unary by fixing the first parameter. However it is not obvious at<br />
first. An easy way to implemented such a functionality is to write it as it is:<br />
std::<strong>for</strong> each(vec.begin(), vec.end(), std::cout ≪ ∗vec iter);<br />
Of course this can not compile <strong>for</strong> several reasons. First the third argument is not a function<br />
object. Second the variable vec iter does not exist, nor does it know anything about the<br />
iterated container vec. Anyway, an expression like this is easy to write and less error prone<br />
compared to a binder object. To enable a program like this the following has to be accomplised:<br />
First the output-stream operator <br />
ArgumentT<br />
operator()(ArgumentT arg)<br />
{<br />
return arg1;<br />
}<br />
template< typename Argument1T, typename Argument2T><br />
ArgumentT<br />
operator()(Argument1T arg1, Argument2T arg2)<br />
{<br />
return arg1;<br />
}<br />
};<br />
So what does this object really do? It provides unary and binary bracket operators <strong>for</strong> one and<br />
two objects which return the argument passed. A function object is implemented next which<br />
stores an arbitrary stream type.<br />
template<br />
output function object<br />
{<br />
output function object (StreamType stream, FunctionObjectT func) : stream(stream), func(func) {}<br />
template < typename ArgumentT><br />
void operator()(ArgumentT arg)<br />
{
7.4. FUNCTIONAL PROGRAMMING 211<br />
}<br />
};<br />
stream ≪ func(arg);<br />
The only thing we have to do by now is to write an appropriate object generator around in<br />
order to persuade the C ++ syntax to accept something like the first line of code of this chapter.<br />
template<br />
output function object<br />
operator≪ (StreamType stream, FunctionObjectT func)<br />
{<br />
return output function object(stream, func);<br />
}<br />
By using these objects it is almost possible to offer a convenient way to write the already<br />
presented <strong>for</strong> each - code snippet. The remaining adaptation is to use a so-called unnamed<br />
object instead of the dereferenced iterator arg1.<br />
argument 1 function object arg1;<br />
std::<strong>for</strong> each(vec.begin(), vec.end(), std::cout ≪ arg1);<br />
By creating a collection of functor objects 2 a functional programming style can be mimiced.<br />
As can then be obvserved, polymorphism, which has to be especially provided in the imperative<br />
world, comes naturally to the functional paradigm as no specific assumptions about data types<br />
are required, only conceptual requirements need to be met.<br />
2 Instead of creating all of these functors again, the Boost Phoenix library or the <strong>C++</strong> TR1 lambda library<br />
can be used.
212 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
7.5 From Monomorphic to Polymorphic Behavior<br />
As presented in the last sections, each programming techique (or paradigm) offers different key<br />
benefits regarding effective programming. The imperative programming, the related procedural<br />
paradigm, and the object-related programming are simple and require that all calls to an object<br />
or function have exactly the same typing as the signature. So type checks and type constraints<br />
can be derived directly from the program text. But the effectivness (genericity and applicability)<br />
is greatly reduced <strong>for</strong> real world problems. This is in contrast to polymorphic code that freely<br />
operates only on abstract concept types. Polymorphic behavior enables the use of algorithms<br />
and data structures with several different types. The object-oriented, generic, and functional<br />
programming offer an additional mechanism which delays the actual type instantiation to a<br />
later evaluation point. Compared to the simple monomorphic way, the polymorphic mechanism<br />
is composed of a complex set of inference rules, because there is propagation of type in<strong>for</strong>mation<br />
between the object and function signature and the call signature in both directions.<br />
In object-oriented programming, libraries typically specify that the types supplied to the library<br />
must be derived from a common abstract base class, providing implementa- tions <strong>for</strong> a collection<br />
of pure virtual functions. The library knows only about the abstract base class interface,<br />
but can be extendedto work with new user types derived from the abstract interface. That is,<br />
variability is achieved through differing implementations of the virtual functions in the derived<br />
classes. This is how object-oriented programming supports modules that are closed <strong>for</strong> modification,<br />
yet remain open <strong>for</strong> extension. One strength of this paradigm is its support <strong>for</strong> varying<br />
the types supplied to a module at runtime. Composability of modules is limited, however,<br />
since independently produced modules generally do not agree on common abstract interfaces<br />
from which supplied types must inherit. The paradigm of generic programming, pioneered by<br />
Stepanov, Musser and their collaborators, is based on the principle of decomposing software into<br />
efficient components which make only minimal assumptions about other components, allowing<br />
maximum flexibility in composition. <strong>C++</strong> libraries developed following the generic programming<br />
paradigm typically rely on templates <strong>for</strong> the parametric and ad- hoc polymorphism they<br />
offer. Composability is enhanced as use of a library does not require inheriting from a particular<br />
abstract interface. Interfaces of library components are specified using concept collections of<br />
requirements analogous to, say, Haskell type classes. The key difference to abstract base classes<br />
and inheritance is that a type can be made to satisfy the constraints of a concept retroactively,<br />
independently of the definition of the type. Also, generic programming strives to make algorithms<br />
fully generic, while remaining as efficient as non-generic hand-written algorithms. Such<br />
an approach is not possible when the cost of any customization is a virtual function call.<br />
The strength of polymorphism is that the same piece of code can operate on different types,<br />
even types that are not known at the time the code was written. Such applicability is the<br />
cornerstone of polymorphism because it amplifies the usefulness and reusability of code. If<br />
the types of poymorphism are analysed in more detail, then two different main types can be<br />
observed:<br />
• Ad-hoc polymorphism<br />
• Universal polymorphism<br />
Only the second type, universal polymorphism, is actually important <strong>for</strong> effective programming,<br />
where the first type, ad-hoc polymorphism, is rather convenience.
7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 213<br />
7.5.1 Ad-hoc Polymorphism<br />
This kind of polymorphic behavior is expressed with ad-hoc, which should point out, that this<br />
kind of behavior is locality. Common to these two types (overloading and coercion) is the fact<br />
that the programmer has to specify exactly what types are to be usable with the polymorphic<br />
function.<br />
Overloading<br />
Is a simple convenient way of programming, to ease the programmer’s life.<br />
class my stack<br />
{<br />
virtual bool push(int ..) {}<br />
virtual bool push(double ..) {}<br />
virtual bool push(complex ..) {}<br />
virtual int pop() {..}<br />
virtual double pop() {..}<br />
// ....<br />
};<br />
Coercion<br />
Coercion is automatic type conversion. The following stack example can be used with all<br />
numerical data types, which can be converted to double:<br />
class my stack<br />
{<br />
virtual bool push(double ..) {}<br />
virtual double pop() {..}<br />
// ....<br />
};
214 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
7.5.2 Universal Polymorphism<br />
The universal in the title means, that the different kinds of expression <strong>for</strong> polymorphic behavior<br />
in this section are the most useful techniques to accomplish the desired behavior and should be<br />
used preferably:<br />
• Dynamic polymorphism (subtyping)<br />
• Static polymorphism (parametric)<br />
Subtyping Polymorphism<br />
In <strong>C++</strong> the object-oriented paradigm implements subtyping polymorphism 3 using sub-classing.<br />
The term dynamic polymorphism is often found <strong>for</strong> this type of polymorphism.<br />
To introduce the applicability of this kind of polymorphism an example from the topological<br />
area is given. Classes <strong>for</strong> different kinds of points are used, which should be comparable in their<br />
own set. Traversing through containers or data structures is a quite common task in generic<br />
programming. The next code snippet presents the base class <strong>for</strong> all kind of vertices.<br />
#include<br />
class topology { };<br />
class vertex<br />
{<br />
public:<br />
virtual bool equal(const vertex∗ ve) const = 0;<br />
};<br />
If these vertex types have to be extended, only the new class with the according equal method<br />
should be implemented. The next code snippet presents two possible implementations <strong>for</strong> a<br />
vertex, which can be used in different topologies.<br />
class structured vertex : public vertex<br />
{<br />
public:<br />
structured vertex(int id, topology∗ topo) : id(id), topo(topo) {}<br />
virtual bool equal(const vertex∗ ve) const<br />
{<br />
const structured vertex∗ sv = dynamic cast(ve);<br />
return ((id == sv→ id) && (topo == sv→ topo));<br />
}<br />
protected:<br />
int id;<br />
topology∗ topo;<br />
};<br />
3 Also called inclusion polymorphism.
7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 215<br />
class unstructured vertex : public vertex<br />
{<br />
public:<br />
unstructured vertex(int handle, topology∗ topo, int segment) : handle(handle), segment(segment), topo(topo) {}<br />
virtual bool equal(const vertex∗ ve) const<br />
{<br />
const unstructured vertex∗ sv = dynamic cast(ve);<br />
return handle == (( sv→ handle ) && ( topo == sv→ topo) && (segment == sv→ segment));<br />
}<br />
protected:<br />
int handle;<br />
int segment;<br />
topology∗ topo;<br />
};<br />
With this virtual class hierarchy, an algorithm which operates on all different classes derived<br />
from vertex can be written. This is called explicit interface.<br />
void print equal(const vertex∗ ve1, const vertex∗ ve2)<br />
{<br />
std::cout ≪ std::boolalpha ≪ ve1→ equal(ve2) ≪ std::endl;<br />
}<br />
The next code lines present the generic behavior of the algorithm, which operators on both<br />
types derived from vertex.<br />
int main()<br />
{<br />
topology the topo;<br />
vertex∗ the vertex1;<br />
vertex∗ the vertex2;<br />
}<br />
// ∗∗∗ structured<br />
the vertex1 = new structured vertex(12, &the topo);<br />
the vertex2 = new structured vertex(12, &the topo);<br />
print equal(the vertex1, the vertex2);<br />
// ∗∗∗ unstructured<br />
the vertex1 = new unstructured vertex(12, &the topo, 1);<br />
the vertex2 = new unstructured vertex(12, &the topo, 2);<br />
print equal(the vertex1, the vertex2);<br />
return 0;<br />
As can be seen, polymorphic behavior can be achieved, but with major drawbacks. First,
216 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
pointers or references to the objects have to be used, which eliminates the possibility <strong>for</strong> a<br />
compiler to optimize some parts of the code, i.e. inlining. Second, a dynamic cast has to be<br />
used which can cause an exception at run time. This kind of problem is called binary-methodproblem,<br />
which is explained in Section 7.5.3<br />
Nevertheless, dynamic polymorphism in <strong>C++</strong> is best at:<br />
• Uni<strong>for</strong>m manipulation based on base/derived class relationships: Different classes that<br />
hold a base/derived relationship can be treated uni<strong>for</strong>mly.<br />
• Static type checking: All types are checked statically in <strong>C++</strong>.<br />
• Dynamic binding and separate compilation: Code that uses classes in a hierarchy can<br />
be compiled apart from the code of the entire hierarchy. This is possible because of the<br />
indirection that pointers provide (both to objects and to functions).<br />
• Binary interfacing: Modules can be linked either statically or dynamically, as long as the<br />
linked modules lay out the virtual tables the same way.<br />
Behind the Dynamic Polymorphism in <strong>C++</strong><br />
How virtual functions work:<br />
• Normally when the compiler sees a member function call it simply inserts<br />
instructions calling the appropriate subroutine (as determined by the<br />
type of the pointer or reference)<br />
• However, if the function is virtual a member function call such as<br />
vc→ foo() is replaced with following: (∗((vc→ vtab)[0]))()<br />
• The expression vc→ vtab locates a special ”secret” data member of the<br />
object pointed to by vc. This data member is automatically present in<br />
all objects with at least one virtual function. It points to a class-specific<br />
table of function pointers (known as the class’s vtable)<br />
• The expression (vc→ vtab)[0] locates the first element of the class’s<br />
vtable of the object (the one corresponding to the first virtual function<br />
foo() ). That element is a function pointer to the appropriate foo()<br />
member function.<br />
• Finally, the expression (∗((vc→ vtab)[0]))() dereferences the function<br />
pointer and calls the function<br />
• Special care must be taken with destructors in virtual class hierarchies.<br />
The base class does not know anything about the derived classes and<br />
so the derived class destructor has to be marked with virtual, too.
7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 217<br />
Parametric Polymorphism<br />
Parametric polymorphism was the first type of polymorphism developed, and first identified by<br />
Christopher Strachey in 1967. It was also the first type of polymorphism to appear in an actual<br />
programming language, ML in 1976. It exists in <strong>C++</strong>, Standard ML, Haskell, and others. The<br />
term static polymorphism is often found.<br />
In <strong>C++</strong>, this type of polymorphism can be used via templates and also lets a value have more<br />
than one type. Inside<br />
template double function(T param) {..}<br />
param can have any type that can be substituted inside function to render compilable code.<br />
This is called implicit interface in contrast to a base class’s explicit interface. It achieves the<br />
same goal of polymorphism - writing code that operates on multiple types but in a very different<br />
way.<br />
To tie up to the dynamic polymorphic by example, the same example as in the static polymorphic<br />
world is used through function templates:<br />
#include<br />
class topology<br />
{<br />
// ... temp class<br />
};<br />
class structured vertex<br />
{<br />
public:<br />
structured vertex(int id, topology∗ topo) : id(id), topo(topo) {}<br />
bool equal(const structured vertex& ve) const<br />
{<br />
return id == ve.id && topo == ve.topo;<br />
}<br />
protected:<br />
int id;<br />
topology∗ topo;<br />
};<br />
class unstructured vertex<br />
{<br />
public:<br />
unstructured vertex(int handle, topology∗ topo, int segment) : handle(handle), segment(segment), topo(topo) {}<br />
bool equal(const unstructured vertex& ve) const<br />
{<br />
return handle == ve.handle && topo == ve.topo && segment == ve.segment;<br />
}
218 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
protected:<br />
int handle;<br />
int segment;<br />
topology∗ topo;<br />
};<br />
Here, no class hierarchy is required. It has only be guaranteed, that each data type provides an<br />
implementation of the required method. Below print equal() is written as a function template:<br />
template<br />
void print equal(const VertexType& ve1, const VertexType& ve2)<br />
{<br />
std::cout ≪ std::boolalpha ≪ ve1.equal(ve2) ≪ std::endl;<br />
}<br />
In the code snippet below, the same polymorphic behavior can be seen as in the dynamic<br />
polymorphism example, but without the necessity of inheriting from a common base class.<br />
int main()<br />
{<br />
topology the topo;<br />
}<br />
// ∗∗∗ structured<br />
structured vertex sv1(12, &the topo);<br />
structured vertex sv2(12, &the topo);<br />
print equal(sv1, sv2);<br />
// ∗∗∗ unstructured<br />
unstructured vertex usv1(12, &the topo,1);<br />
unstructured vertex usv2(12, &the topo,2);<br />
print equal(usv1, usv2);<br />
return 0;<br />
Without a pointer mechanisms the compiler can easily optimize these lines, i.e. inline the code.<br />
Additionally exceptions cannot occur at run time.<br />
Due to its characteristics, static polymorphism in <strong>C++</strong> is best at:<br />
• Uni<strong>for</strong>m manipulation based on syntactic and semantic interface: Types that obey a<br />
syntactic and semantic interface can be treated uni<strong>for</strong>mly.
7.5. FROM MONOMORPHIC TO POLYMORPHIC BEHAVIOR 219<br />
• Static type checking: All types are checked statically.<br />
• Static binding (prevents separate compilation):All types are bound statically.<br />
• Efficiency: Compile-time evaluation and static binding allow optimization and efficiencies<br />
not available with dynamic binding.<br />
7.5.3 Comparison of Static and Dynamic Polymorphism<br />
Here the main features from static and dynamic polymorphism are summarized:<br />
• Virtual function calls are slower during run time than function templates: A virtual<br />
function call includes an extra pointer dereference to find the appropriate method in the<br />
virtual table. By itself, this overhead may not be significant. Significant slowdowns can<br />
result in compiled code because the indirection may prevent an optimizing compiler from<br />
inlining the function and from applying subsequent optimizations to the surrounding code<br />
after inlining.<br />
• Run time dispatch versus compile-time dispatch: The run time dispatch of virtual functions<br />
and inheritance is certainly one of the best features of object-oriented programming.<br />
For certain kinds of components, run time dispatching is an absolute requirement, decisions<br />
need to be made based on in<strong>for</strong>mation that is only available at run time. When this<br />
is the case, virtual functions and inheritance are needed.<br />
Templates do not offer run time dispatching, but they do offer significant flexibility at<br />
compile time. In fact, if the dispatching can be per<strong>for</strong>med at compile time, templates offer<br />
more flexibility than inheritance because they do not require the template arguments types<br />
to inherit from some base class.<br />
• Code size: virtual functions are small, templates are big:: A common concern in templatebased<br />
programs is code bloat, which typically results from naive use of templates. Carefully<br />
designed template components need not result in significantly larger executable size<br />
than their inheritance-based counterparts.<br />
• The binary method problem: There is a serious problem that shows up when using inheritance<br />
and virtual functions to express operations that work on two or more objects.
220 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY<br />
Note<br />
The binary method problem is encountered when methods in which the<br />
receiver type and argument type should vary together, such as equality<br />
comparisons, must instead use a fixed <strong>for</strong>mal parameter type to<br />
maintain type safety. The problem arises in mainstream object-oriented<br />
languages because only the receiver of a method call is used <strong>for</strong> run time<br />
method selection, and so the argument must be assumed to have the<br />
most general possible type. Existing techniques to solve this problem<br />
require intricate coding patterns that are tedious and error-prone. The<br />
binary method problem is a prototypical example of a larger class of<br />
problems where overriding methods require type in<strong>for</strong>mation <strong>for</strong> their<br />
<strong>for</strong>mal parameters. Another common example of this problem class is<br />
the implementation of event handling (e.g., <strong>for</strong> graphical user interfaces),<br />
where ”callback methods” must respond to a variety of event<br />
types.
7.6. BEST OF BOTH WORLDS 221<br />
7.6 Best of Both Worlds<br />
The object-oriented programming paradigm offers mechanisms to write libraries that are open<br />
<strong>for</strong> extension, but it tends to impose intrusive interface requirements on the types that will be<br />
supplied to the library. The generic programming paradigm has seen much success in <strong>C++</strong>,<br />
partly due to the fact that libraries remain open to extension without imposing the need to<br />
intrusively inherit from particular abstract base classes. However, the static polymorphism that<br />
is a staple of programming with templates and overloads in <strong>C++</strong>, limits generic programming<br />
applicability in application domains where more dynamic polymorphism is required.<br />
In combining elements of object-oriented programming with those of generic programming, we<br />
take generic programming as the starting point, retaining its central ideas. In particular, generic<br />
programming is built upon the notion of value types that are assignable, copy constructible,<br />
The behavior expected from value types reflects that of <strong>C++</strong> built-in types, like int, double,<br />
and so <strong>for</strong>th. This generally assumes that types encapsulate their memory and resource management<br />
into their constructors, copy-constructors, assignment operators, and destructors, so<br />
that objects can be copied, and passed as parameters by copy, etc., without worrying about<br />
references to their resources becoming aliased or becoming dangling. Value types simplify local<br />
reasoning about programs. Explicitly managing objects on the heap and using pass-by-reference<br />
as the parameter passing mode makes <strong>for</strong> complex object ownership management (and object<br />
lifetime management in languages that are not garbage collected). Instead, explicitly visible<br />
mechanisms thin wrapper types like reference wrapper in the (draft) <strong>C++</strong> standard library ae<br />
used when sharing is desired.<br />
.. more to come..<br />
7.6.1 Compile Time Container<br />
7.6.2 Meta-Functions<br />
7.6.3 Run-Time concepts
222 CHAPTER 7. EFFECTIVE PROGRAMMING: THE POLYMORPHIC WAY
Part II<br />
Using C ++<br />
223
Finite World of Computers<br />
Chapter 8<br />
8.1 Mathematical Objects inside the Computer<br />
First natural numbers N are introduced and the data types available in a programming language<br />
used to represent them. The difference between a single digit, and their connection to the used<br />
base is an important concept in computer science.<br />
A number is represented by several single digits, with each digit being a factor <strong>for</strong> a corresponding<br />
power of the base. The number is only complete when both the base and all of the digits are<br />
known. To use an example, the digit sequence 123 is calculated with the corresponding base,<br />
e.g. base = 10:<br />
12310 = 1 · 10 2 + 2 · 10 1 + 3 · 10 0<br />
If the base is switched, e.g. base = 4, then the following numbers are derived:<br />
1234 = 1 · 4 2 + 2 · 4 1 + 3 · 4 0 = 2710<br />
One of the drawbacks of the representation of numbers within the computer is the fact that the<br />
built-in types such as int and long can only use a finite number of bits and are hence limited<br />
in their range (the int can be omitted), e.g.:<br />
short int: -32768 +32767<br />
long int: -2147483648 +2147483647<br />
unsigned long int: 0 +4 294 967 295<br />
As can be seen the maximum number of countable items is restricted. If, as an example, a<br />
program has to count the living humans on earth we have to switch to another number concept,<br />
either floating point or a decimal data type. A plain and simple arbitrary digit number container<br />
can be implemented by:<br />
class big number{<br />
long base;<br />
std::vector digits;<br />
public:<br />
225
226 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />
// .........<br />
};<br />
8.2 More Numbers and Basic Structure<br />
Polynomials are an important and efficient tool <strong>for</strong> numerous fields of science. Due to the simple<br />
rules regarding differentiation and integration polynomials have found wide spread application.<br />
Polynomials can be defined as a weighted sum of exponential terms in at least one variable or<br />
expression, with the exponents being restricted to non-negative whole numbers. Their simple<br />
definition as well as the fact that their algebraic structure is not only closed under addition,<br />
subtraction, and multiplication, but also under differentiation and integration, result in their<br />
widespread application. The demand of additional properties such as, e.g., orthogonality with<br />
respect to an inner product results in special classes of polynomials, orthogonal polynomials,<br />
which further increases their appeal in fields such as finite elements. A polynomial consists of<br />
coefficients (ai) and a variable expression (x i ):<br />
a0 x 0 + a1 x 1 + a2 x 2 + . . . + an x n<br />
Thus a container representation to store the coefficients <strong>for</strong> polynomials was chosen so that a<br />
generic <strong>C++</strong> variable contains the expression:<br />
gsse::polynomial<br />
When storing the coefficients in a container great care has been taken to implement the library<br />
to be generic with respect to the type of the underlying data structure. In this way it is possible<br />
to use compile time containers if the size or even the concrete coefficients are already known at<br />
compile time. This allows the compiler to inline and execute operations at compile time.<br />
The most suitable container to use <strong>for</strong> the coefficients usually depends on the input and not<br />
the algorithms. It is there<strong>for</strong>e important to provide a basic set of programming utilities which<br />
are generic with regard to the used container type. Compile time and run time containers have<br />
a few incompatible requirements which make it hard to define a common set of utilities.<br />
8.2.1 Accessing Coefficients<br />
Accessing a polynomial’s coefficients is an important operation. There exist two basic ways of<br />
accessing the coefficient. Compile time accessors are used when the index of the coefficient to<br />
be accessed is known at compile time, while run time accessors have to be used otherwise. The<br />
compile time version takes the index as a template-parameter, while the run time entity as a<br />
function argument.<br />
namespace compiletime {<br />
template<br />
typename result of::coeff::type<br />
coeff(Polynomial const &p);<br />
}<br />
namespace runtime {
8.2. MORE NUMBERS AND BASIC STRUCTURE 227<br />
template<br />
typename result of::coeff::type<br />
coeff(index type n, Polynomial const &p);<br />
}<br />
Access to the coefficient is then available by:<br />
polynomial p;<br />
compiletime::coeff(p);<br />
runtime::coeff(n, p);<br />
Thus it is possible <strong>for</strong> the compiler to simplify the code and determine more in<strong>for</strong>mation about<br />
the coefficient. There<strong>for</strong>e the compile time version is more flexible than the run time version.<br />
Using inhomogeneous compile time containers in conjunction with the run time accessor is not<br />
possible since it is not possible to determine the return type in advance. This reduces the<br />
flexibility of the code using the run time accessors. A workaround to this problem can be<br />
achieved by using the visitor pattern.<br />
template<br />
void coeff visitor(<br />
index type n,<br />
Polynomial const &p,<br />
Visitor v<br />
);<br />
However, this approach has the disadvantage of being more complicated to use than the coeff<br />
function.<br />
The coefficient accessors are not simple wrappers around the accessors of the underlying container.<br />
They check the access and return a zero value if the container does not contain the<br />
coefficient. The zero value is determined by the coeff trait template class:<br />
template<br />
struct coeff trait<br />
{<br />
typedef CoeffType zero type;<br />
static zero type const<br />
zero value = zero type();<br />
};<br />
By using partial template-specialization it is possible to define the corresponding zero value <strong>for</strong><br />
the correct type. For inhomogeneous polynomials default coeff is passed as CoeffType and<br />
the default behavior is to return an int.<br />
8.2.2 Setting Coefficients<br />
Coefficients may set using the set coeff function. It does not change the given polynomial but<br />
creates a new view instead. This provides the polynomial library with a functional programming<br />
style. Setting the coefficients and changing the polynomial can only be achieved by directly<br />
manipulating the coefficient container.
228 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />
namespace compiletime<br />
{<br />
template<br />
typename result of::set coeff::type<br />
set coeff(Polynomial const &p,<br />
Coeff const &c);<br />
}<br />
namespace runtime<br />
{<br />
template<br />
typename result of::set coeff::type<br />
set coeff(index type n,<br />
Polynomial const &p,<br />
Coeff const &c);<br />
}<br />
Write access is then available by:<br />
polynomial p;<br />
compiletime::set coeff(p, 1);<br />
runtime::set coeff(n, p, 1);<br />
The degree of the polynomial is defined as the maximum degree of all of its terms, where the<br />
degree of a term is given as the sum of the degree of all variables in this term. The polynomial<br />
library defines the degree as the index of the highest non-zero coefficient. To obtain the correct<br />
degree requires to use a polynomial <strong>for</strong> each variable and finally combine them:<br />
struct X;<br />
typedef polynomial<<br />
X,<br />
fusion::map< pair< mpl::int , double> ><br />
> inner poly;<br />
degree � 3 x 4 y 2� = 4 + 2 = 6<br />
typedef polynomial<<br />
Y,<br />
fusion::map< pair< mpl::int , inner poly> ><br />
> the polynomial;<br />
By instantiating the polynomial the calculation of its degree is possible:<br />
the polynomial p;<br />
assert( degree(p) == 6 );
8.2. MORE NUMBERS AND BASIC STRUCTURE 229<br />
8.2.3 Compile Time Programming<br />
The application of meta-programming is presented which utilizes the compiler to execute code<br />
at compile time and then reduce the result of the expressions. As an example the derivative of<br />
a second-degree polynomial is calculated and a second polynomial is added:<br />
d(3 + 4.5 x + 10 x 2 )<br />
dx<br />
+ (1 + 2x) = 5.5 + 22 x<br />
The type list represents the type of each coefficient starting from the zero to the second degree<br />
coefficient.<br />
struct X { } x;<br />
typedef fusion::vector coeffs;<br />
typedef polynomial poly;<br />
poly p(x, coeffs(3.0, 4.5, 10));<br />
typedef result of::diff::type diffed;<br />
diffed d = diff(p, x);<br />
poly q(x, coeffs(1.0, 2.0, 0));<br />
std::cout ≪ coeff(q + d);<br />
By compiling and evaluating the assembler code it is revealed that the calculations were per<strong>for</strong>med<br />
at compile time and the binary only contains the final result of 22.<br />
8.2.4 Arbitrary-Precision Arithmetic<br />
The application of the polynomial library to per<strong>for</strong>m arbitrary-precision arithmetic (or “bignum<br />
arithmetic”) is also presented here. It uses the fact that a number is in essence a polynomial<br />
with a fixed base.<br />
1372 = 1 · 10 3 + 3 · 10 2 + 7 · 10 + 2<br />
This can easily be translated into <strong>C++</strong> code by using the polynomial library. Note, that the<br />
first element in the array is the zero coefficient:<br />
typedef unsigned char byte t;<br />
typedef array coeffs t;<br />
coeffs t coeffs = {{2, 7, 3, 1}};<br />
gsse::polynomial p(coeffs);<br />
Since computer systems usually operate on binary numbers base-2 is the optimal choice. The<br />
difference between polynomial arithmetic and arbitrary-precision arithmetic is that the coefficients<br />
need to be realigned to the base after each operation.
230 CHAPTER 8. FINITE WORLD OF COMPUTERS<br />
8.2.5 Finite Element Integration<br />
In the theory of finite elements [?, ?], a continuous function space is projected onto a finite<br />
function space P k , where the space P k is the space of polynomials up to the total order of k.<br />
For many special cases, finite element integrals can be computed manually and added into<br />
the source code of an application. This results in excellent run time per<strong>for</strong>mance but lacks<br />
flexibility. For more general cases, e.g., general coefficients, they must be computed by numerical<br />
integration at run-time. To prevent an ill-conditioned system matrix, orthogonal polynomials<br />
have to be chosen as numerical integration weights. One possible type of polynomial is a<br />
normalized Legendre polynomial [?]. Coefficients <strong>for</strong> such a polynomial Pk of order k can be<br />
efficiently evaluated by using the recursion procedure:<br />
P0(x) = 1 (8.1)<br />
P1(x) = x<br />
2j − 1<br />
Pk(x) = x Pk−1(x) −<br />
j<br />
j − 1<br />
j Pk−2(x) k ≥ 2<br />
To use arbitrary p-finite elements (polynomial order [?, ?]) the numerical coefficients have to<br />
be calculated either manually and inserted into the source code or determined numerically at<br />
run time.<br />
The polynomial library presented here is then used to store manually pre-calculated integration<br />
tables at compile time (order 1-5). If the user requires higher order finite elements, numerical<br />
coefficients are calculated at run time to any order.<br />
8.3 A Loop and More<br />
One of the important parts in computer science is repetation. A computer was made to do<br />
exactly like this, programmable operations and repetations. To give a simple example, a <strong>for</strong><br />
loop is expressed by:<br />
<strong>for</strong> (long i = 0; i < max counter; ++i)<br />
{}<br />
To give a real application of this concept, integration is used.<br />
� b<br />
a<br />
f(x)dx<br />
Several approximation schemes are also available:<br />
� b<br />
a<br />
� b<br />
a<br />
f(x)dx ≈<br />
f(x)dx ≈<br />
b − a<br />
6<br />
f(a) + f(b)<br />
2<br />
�<br />
f(a) + 4f<br />
· (b − a)<br />
� a + b<br />
2<br />
� �<br />
+ f(b)
8.4. THE OTHER WAY AROUND 231<br />
As can be seen, this is a very coarse approximation, but the main issue persists. The known<br />
continuos integration is not possible inside the computer, but the concept of numerical integration<br />
is possible. This means the constraint of a finite dx is replaced by a ∆x and the � is<br />
replaced by a finite sum �<br />
i=0;i
232 CHAPTER 8. FINITE WORLD OF COMPUTERS
How to Handle Physics on the<br />
Computer<br />
Chapter 9<br />
9.1 Finite Elements<br />
Discretization schemes lead in general to a linear system of equations:<br />
These matrices are typically:<br />
• sparse (there are only few non-zero elements per row)<br />
• large dimension N (10 4 − 10 9 unknowns)<br />
—xx<br />
A x = f (9.1)<br />
The non-zero elements of the matrix Ai,j represent a finite element with both degrees of freedom<br />
i and j connected.<br />
To demonstrate the transfer of a continuous <strong>for</strong>mulated equation such as the Laplace or Poisson<br />
equation to the finite regime of a computer, a simple Dirichlet problem is used. If an implicit<br />
(uni<strong>for</strong>m) 1D-grid with n elements is used, the contribution of each element to the system<br />
matrix A is constant, so called stencil sub-matrix.<br />
⎛<br />
⎜<br />
A = ⎜<br />
⎝<br />
2 −1<br />
−1 2 −1<br />
2D implicit grid of dimension N = (n − 1) 2 is:<br />
−1 2 −1<br />
−1 2<br />
233<br />
⎞<br />
⎟<br />
⎠<br />
(n−1)x(n−1)
234 CHAPTER 9. HOW TO HANDLE PHYSICS ON THE COMPUTER<br />
⎛<br />
⎜<br />
A = ⎜<br />
⎝<br />
⎜<br />
D = ⎜<br />
⎝<br />
2 −1<br />
−1 2 −1<br />
⎛<br />
⎜<br />
A = ⎜<br />
⎝<br />
⎛<br />
−1 2 −1<br />
−1 2<br />
D −I<br />
−I D −I<br />
4 −1<br />
−1 4 −1<br />
and the (n − 1)x(n − 1) identity matrix I.<br />
9.2 Again, Integrators<br />
⎞<br />
⎟<br />
⎠<br />
−I D −I<br />
−I D<br />
−1 4 −1<br />
−1 4<br />
⎞<br />
⎟<br />
⎠<br />
(n−1)x(n−1)<br />
⎞<br />
⎟<br />
⎠<br />
(n−1)x(n−1)
Programming tools<br />
Chapter 10<br />
In this chapter we introduce programming tools that can be used to solve the exercises.<br />
10.1 GCC<br />
GCC stands <strong>for</strong> the Gnu Compiler Collection. It is a collection of compilers (C, <strong>C++</strong>, FOR-<br />
TRAN, Fortran 90, java) free of charge [?]. The <strong>C++</strong> compilers are very good and produce<br />
reasonably efficient code. In this section, we explain how to compile a <strong>C++</strong> program.<br />
The following command:<br />
g++ -o hello hello.cpp<br />
compiles the <strong>C++</strong> source file hello.cpp into the executable hello.<br />
The compiler command is gcc or g++ with the following options.<br />
• -Idirectory: Include files directory<br />
• -O: Optimization<br />
• -g: Debugging<br />
• -p: Profiling<br />
• -o filename: output file name<br />
• -c: Compile, no link<br />
• -Ldirectory: Library directory<br />
• -lfile: Link with library libfile.a<br />
Here is another example:<br />
g++ -o foo foo.cpp -I/opt/include -L/opt/lib -lblas<br />
compiles and links the file foo.cpp using include files from /opt/include/ (option -I) and<br />
linked with a library that is situated in the directory /opt/lib: For optimizing code, we have<br />
to use the compilation options :<br />
-O3 -DNDEBUG<br />
235
236 CHAPTER 10. PROGRAMMING TOOLS<br />
The -DNDEBUG option sets the C-preprocessor variable NDEBUG which tells the assert command<br />
that debug tests should not be done. This allows us to save time at execution.<br />
10.2 Debugging<br />
10.2.1 Debugging with text tools<br />
“Et la tu t’dis que c’est fini<br />
car pire que ça ce serait la mort.<br />
Qu’en tu crois enfin que tu t’en sors<br />
quand y en a plus et ben y en a encore!”<br />
— Stromae.<br />
There are several debugging tools. In general, graphical ones are more user friendly, but they<br />
are not always available. In this section, we describe the gdb debugger, which is very useful to<br />
trace the cause of a run time error if the code was compiled with the option -g.<br />
The following contains a printout of a gdb session of the program hello.cpp:<br />
#include <br />
#include <br />
int main() {<br />
glas::dense vector< int > x( 2 ) ;<br />
x(0) = 1 ; x(1) = 2 ;<br />
<strong>for</strong> (int i=0; i
10.2. DEBUGGING 237<br />
T& glas::continuous_dense_vector::operator()(ptrdiff_t) [with T = int]:<br />
Assertion ‘i
238 CHAPTER 10. PROGRAMMING TOOLS<br />
10.2.2 Debugging with graphical interface: DDD<br />
More convenient than debugging on a text level is using a graphical interface like DDD (Data<br />
Display Debugger). It has more or less the same functionality as gdb and in fact it runs gdb<br />
internally. One can use it also with another text debugger.<br />
As case study, we use a modified example from Section 5.4.5. In fact, the buggy program arose<br />
by teaching § 5.4.5, i.e. one of the authors tried to reconstruct vector unroll example2.cpp<br />
on the fly.<br />
TODO: Find a better example. The above finally was okay, the tuning just did not change the<br />
run-time behaviour.<br />
In addition to the window above you will see a smaller one like in Figure 10.1, typically on the<br />
right of the large window if there is enough space on your screen.<br />
This control panel let you geer through the debug session in way that is<br />
easier <strong>for</strong> beginner and even <strong>for</strong> some advanced users more convenient.<br />
You have the following command:<br />
Run Start or restart your program.<br />
Interrupt If your program does not terminate or does not reach the next<br />
break point you can stop it manually.<br />
Step Go one step <strong>for</strong>ward. If your position is a function call, jump into<br />
the function.<br />
Next Go to the next line in your source code. If you are located on a<br />
function call do not jump into it unless there is a break point set<br />
inside.<br />
Figure 10.1: DDD<br />
control panel
10.3. VALGRIND 239<br />
Stepi and Nexti This are the equivalents on instruction level. This is<br />
only needed <strong>for</strong> debugging assembler code and not subject in this<br />
book.<br />
Until Position your cursor in your source and run the program until you<br />
reach this line. If your program flow do not pass this line the execution<br />
will continued till the end, the next break point or bug.<br />
Finish Execute the remainder of the current function and stop in the first<br />
line outside this function, i.e. the line after the function call.<br />
Cont Continue your execution till the next event (break point, bug, or<br />
end).<br />
Kill the program.<br />
Up Show the line of the current function’s call, i.e. go up one level in the<br />
call stack.<br />
Down Go back to the called function, i.e. go down one level in the call<br />
stack.<br />
Undo Revert last action (works rarely or never).<br />
Redo Repeat the last command.<br />
Edit Call an editor with the source file currently shown.<br />
Make Call ‘make’ (which must know what to compile).<br />
10.3 Valgrind<br />
The valgrind distribution offers several tools that you can use to analyze your software. We will<br />
only use one of these tools called memcheck. For more in<strong>for</strong>mation on the others we refer you<br />
to http://valgrind.org Memcheck detects memory-management problems like memory leaks.<br />
Memcheck also reports if your program accesses memory it should not or if it uses uninitialized<br />
values. All these errors are reported as soon as they occur along with the corresponding source<br />
line number at which they occurred and also a stack trace of the functions called to reach<br />
that line. You should also take into account that Memcheck runs programs about 10 to 30<br />
times slower than normal. Use the following command to check the memory management of a<br />
program:<br />
valgrind −−tool=memcheck program name<br />
10.4 Gnuplot<br />
A useful tool <strong>for</strong> making plots is Gnuplot. It is a public domain program.<br />
Invoke gnuplot to start the program. Suppose we have the file results with the following<br />
content:
240 CHAPTER 10. PROGRAMMING TOOLS<br />
0 1<br />
0.25 0.968713<br />
0.75 0.740851<br />
1.25 0.401059<br />
1.75 0.0953422<br />
2.25 -0.110732<br />
2.75 -0.215106<br />
3.25 -0.237847<br />
3.75 -0.205626<br />
4.25 -0.145718<br />
4.75 -0.0807886<br />
5.25 -0.0256738<br />
5.75 0.0127226<br />
6.25 0.0335624<br />
6.75 0.0397399<br />
7.25 0.0358296<br />
7.75 0.0265507<br />
8.25 0.0158041<br />
8.75 0.00623965<br />
9.25 -0.000763948<br />
9.75 -0.00486465<br />
plot "results" w l plot "results"<br />
The first column represents the x coordinate and the second colum contains the corresponding<br />
y coordinate values. We can plot this using the command:<br />
plot "results" w l<br />
The command<br />
plot "results"<br />
only plots stars, no line. The command help is also useful. For 3D plots, i.e. a table with three<br />
columns, we use the command splot.<br />
10.5 Unix and Linux<br />
Unix (and Linux) are not used as often as Windows plat<strong>for</strong>ms, although <strong>for</strong> scientific programming<br />
they are popular development plat<strong>for</strong>ms. The Unix operating system is a command line<br />
system with several graphical interfaces. Especially in Linux, the graphical interfaces are well<br />
developed so that you get a windows like look and feel. Although you can easily browse through<br />
the directories, create new directories and move data around with a few mouse clicks, it may<br />
be interesting to know at least a few Unix commands:<br />
• ps: list of my processes,<br />
• kill -9 id : kill the process with id id,
10.5. UNIX AND LINUX 241<br />
• top: list all processes and resource use,<br />
• mkdir: make a new directory,<br />
• rmdir: remove an (empty) directory,<br />
• pwd: name of the current directory,<br />
• cd dir: change directory to dir,<br />
• ls: list the files in the current directory<br />
• cp from to: copy the file from to the file or directory to. if the file to exists, it is<br />
overwritten, unless you use cp -i from to,<br />
• mv from to: move the file from to the file or directory to. If the file to exists, it is<br />
overwritten, unless you use mv -i from to,<br />
• rm files: remove all the files in the list files. rm * removes everything (be careful) chmod<br />
mode files : change the user mode <strong>for</strong> files.<br />
See http://www.physics.wm.edu/unix_intro/outline.html <strong>for</strong> on-line help.
242 CHAPTER 10. PROGRAMMING TOOLS
C ++ Libraries <strong>for</strong> Scientific Computing<br />
Chapter 11<br />
TODO: Introducing words.<br />
11.1 GLAS: Generic Linear Algebra Software<br />
11.1.1 Introduction<br />
Software kernels <strong>for</strong> dense and sparse linear algebra have been developed over many decades.<br />
The development of the BLAS [?] [?] [?] [?] [?] in FORTRAN and later the similar work in<br />
<strong>C++</strong>, see MTL [?], Blitz++, to name a few.<br />
Currently, more and more scientific software is written in <strong>C++</strong>, but the language does not<br />
provide us with dense and sparse vector and matrix concepts and algorithms, as this is the<br />
case <strong>for</strong> Matlab. This makes exchanging <strong>C++</strong> software harder than, <strong>for</strong> example, Fortran 90<br />
software, which has dense vector and matrix concepts defined in the language. Note that<br />
Fortran 90 does not have sparse and structured matrix types such as symmetric or upper<br />
triangular, or banded matrices.<br />
11.1.2 Goal<br />
The goal of the GLAS project is to open the discussion on standardization <strong>for</strong> <strong>C++</strong> programming.<br />
The goal is not to present a standard as such, but may be a first step to achieve this<br />
goal.<br />
We realize that this is very ambitious. We think, the GLAS proposal meets the goals, but the<br />
internals are still rather complicated, which makes extensions less straight<strong>for</strong>ward. GLAS is a<br />
generic software package using advanced meta programming tools such as the Boost MPL, but<br />
this is invisible to the user who does not want to add extensions to GLAS. A minor knowledge<br />
about template programming and expression templates is required <strong>for</strong> making proper use of the<br />
software.<br />
This version does not use Concept <strong>C++</strong>, since we have encountered instability problems with<br />
the Concept-GCC compiler and found it hard to work with expression-templates.<br />
243
244 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />
We now briefly explain how the goals are met be<strong>for</strong>e entering a more detailed discussion of the<br />
software design.<br />
The GLAS should be considered as an interface to other software <strong>for</strong> linear algebra, e.g. the<br />
BLAS, MTL, or other linear algebra software. Such interface is provided by the Back-ends,<br />
whereas the syntax <strong>for</strong> using such backends does not change. For example, if we want to add a<br />
scaled vector to another vector (an axpy), then we write<br />
y + = a ∗ x ;<br />
but the implementation can use the BLAS (e.g. daxpy), or MTL, or another package. We have<br />
provided a reference <strong>C++</strong> implementation, that is an illustration of how the expressions are<br />
dispatched to the actual implementation.<br />
The concepts mainly contain free functions and meta functions, so that external objects can be<br />
used in GLAS provided these functions are specialized. As an exercise, we show how this can<br />
be done <strong>for</strong> an std::vector.<br />
For more in<strong>for</strong>mation, see [?].<br />
11.1.3 Status<br />
GLAS is still under development. Currently, there are features <strong>for</strong> working with dense vectors<br />
and matrices, and sparse matrices. There is support to the Boost.Sandbox.Bindings and Toolboxes<br />
<strong>for</strong> working with LAPACK, Structured Matrices (mase toolbox), and iterative methods<br />
(iterative toolbox).<br />
11.2 Boost<br />
Boost is a bit out of line in this chapter. Firstly, it is not a library itself but a whole collection<br />
of freely available C ++ libraries. Secondly, not all of the contained libraries deal directly with<br />
scientific computing. However, many of the “non-scientific” libraries provide useful functionality<br />
<strong>for</strong> scientific libraries and applications.<br />
Boost provides free portable <strong>C++</strong> libraries.<br />
Currently, the following Boost libraries are available that are useful <strong>for</strong> numerical software:<br />
• Data structures<br />
– tuple: pairs, triples, etc, e.g. tuple<br />
– smart ptr: smart pointers<br />
• Correctness and testing<br />
– static assert: compile time assertions<br />
• Template programming<br />
– enable if, mpl, type traits<br />
– static assert: compile time assertions
11.3. BOOST.BINDINGS 245<br />
• Math and numerics<br />
– numeric::conversions: conversions of types<br />
– thread: multi-threading<br />
– bindings: generic bindings to external software<br />
– graph: graph programs<br />
– integer: integer types<br />
– interval: interval arithmetic<br />
– random: random number generator<br />
– rational: rational numbers<br />
– math: various mathematical things, e.g. greatest common divisor<br />
– typeof: type deduction<br />
– numeric::ublas: vector and matrix library<br />
– math::quaternion, math::octonian<br />
– math::special functions<br />
• Miscellaneous<br />
– filesystem: advanced operations on files, directories<br />
– program options: working with command line options in your<br />
– timer: timing class<br />
For more in<strong>for</strong>mation on these and other boost libraries see http://www.boost.org.<br />
11.3 Boost.Bindings<br />
Scientific programmers using <strong>C++</strong> also want to use the features offered by mature FORTRAN<br />
and C codes such as LAPACK [?], MUMPS [?] [?], SuperLU [?] and UMFPACK [?]. The<br />
programming ef<strong>for</strong>t <strong>for</strong> rewriting these codes in <strong>C++</strong> is very high. It there<strong>for</strong>e makes more<br />
sense to link the codes into <strong>C++</strong> code. Another argument <strong>for</strong> linking with external software is<br />
per<strong>for</strong>mance : the vendor tuned BLAS functions are perhaps the most obvious example.<br />
In the traditional approach, an interface is developed <strong>for</strong> each basic <strong>C++</strong> linear algebra package<br />
and <strong>for</strong> each external linear algebra package. This is illustrated by Figure 11.1. The Boost<br />
bindings adopt the approach of orthogonality between algorithms and data. This orthogonality<br />
is created by traits classes that provide the necessary data to the external software. The vector<br />
traits, <strong>for</strong> example, provide a pointer (or address), size and stride, which can then be used<br />
by e.g. the BLAS function ddot. Each traits class is specialized <strong>for</strong> user defined vector and<br />
matrix packages. This implies that, <strong>for</strong> a new vector or matrix type, the development ef<strong>for</strong>t is<br />
limited to the specialization of the traits classes. Once the traits classes are specialized, BLAS<br />
and LAPACK can be used straightaway. For a new external software package, it is sufficient
246 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />
BLAS LAPACK ATLAS MUMPS . . .<br />
uBLAS<br />
✪✪✪✪✪✪ ✧ ✧✧✧✧✧✧✧✧<br />
✦✦✦✦✦✦✦✦✦✦✦✦✦ MTL GLAS . . .<br />
Figure 11.1: Traditional interfaces between software<br />
BLAS LAPACK<br />
❅<br />
❅❘ ❄<br />
uBLAS<br />
��✒<br />
✻<br />
MTL<br />
ATLAS<br />
❄<br />
Bindings<br />
✻<br />
GLAS<br />
MUMPS<br />
Figure 11.2: Concept of bindings as a generic layer between linear algebra algorithms and vector<br />
and matrix software<br />
to provide a layer that uses the bindings. Figure 11.2 illustrates this philosophy. Note the<br />
difference with Figure 11.1.<br />
11.3.1 Software bindings<br />
We now illustrate how the bindings can be used to interface external software by means of<br />
examples.<br />
BLAS bindings<br />
The BLAS are the Basic Linear Algebra Subroutines [?] [?] [?] [?] [?], whose reference implementation<br />
is available through Netlib 1 . The BLAS are subdivided in three levels : level one contains<br />
vector operations, level two matrix vector operations and level three, matrix operations.<br />
The BLAS bindings in Boost Sandbox contain interfaces to some BLAS functions. Functions<br />
are added on request. The interfaces check the input arguments using the assert command,<br />
which is only compiled when the NDEBUG compile flag is not set. The interfaces are contained<br />
in three files : blas1.hpp, blas2.hpp, and blas3.hpp in the directory boost/numeric/bindings/blas. The<br />
BLAS bindings reside in the namespace boost::numeric::bindings::blas.<br />
The BLAS provide functions <strong>for</strong> vectors and matrices with value type float, double, std::complex,<br />
and std::complex. All matrix containers have ordering type column major t,since the (FOR-<br />
TRAN) BLAS assume column major matrices.<br />
The bindings are illustrated in Figure 11.3 <strong>for</strong> the BLAS subprograms DCOPY, DSCAL, and<br />
DAXPY <strong>for</strong> objects of type std::vector. Note the include files <strong>for</strong> the bindings of the<br />
BLAS-1 subprograms and the include file that contains the specialization of vector traits <strong>for</strong><br />
std::vector.<br />
1 http://www.netlib.org<br />
. . .<br />
❄<br />
✻<br />
. . .<br />
�<br />
�✠
11.3. BOOST.BINDINGS 247<br />
#include <br />
#include <br />
int main() {<br />
std::vector< double > x( 10 ), y( 10 ) ;<br />
// Fill the vector x<br />
...<br />
bindings::blas::copy( x, y ) ;<br />
bindings::blas::scal( 2.0, y ) ;<br />
bindings::blas::axpy( −3.0, x, y ) ;<br />
return 0 ;<br />
}<br />
LAPACK bindings<br />
Figure 11.3: Example <strong>for</strong> BLAS-1 bindings and std::vector bindings traits<br />
Software <strong>for</strong> dense and banded matrices is collected in LAPACK [?]. It is a collection of<br />
FORTRAN routines mainly <strong>for</strong> solving linear systems, and eigenvalue problems, including the<br />
singular value decomposition. As <strong>for</strong> the BLAS, the Boost Sandbox does not contain a full set<br />
of interfaces to LAPACK routines, but only very commonly used subprograms. On request,<br />
more functions are added to the library. The LAPACK bindings reside in the namespace<br />
boost::numeric::bindings::lapack.<br />
Many LAPACK subroutines require auxiliary arrays, which a non-expert user does not wish to<br />
allocate <strong>for</strong> reasons of com<strong>for</strong>t. The interface allows the user to allocate auxiliary vectors using<br />
the templated Boost.Bindings class array.<br />
The LAPACK bindings verify the matrix structure to see whether the routine is the right choice.<br />
It is also checked whether the matrix arguments are column major. Every function’s return<br />
type is int. The return value is the return value of the INFO argument of the corresponding<br />
LAPACK subprogram.<br />
Figure 11.4 shows an example using GLAS.<br />
MUMPS bindings<br />
MUMPS stands <strong>for</strong> Multifrontal Massively Parallel Solver. The first version was a result from<br />
the EU project PARASOL [?, ?, ?]. The software is developed in Fortran 90 and contains a C interface.<br />
The input matrices should be given in coordinate <strong>for</strong>mat, i.e. storage <strong>for</strong>mat=coordinate t<br />
and the index numbering should start from one, i.e. sparse matrix traits::index base==1. We<br />
refer to the MUMPS Users Guide, distributed with the software [?].<br />
The <strong>C++</strong> interface is a generic interface to the respective C structs <strong>for</strong> the different value<br />
types that are available from the MUMPS distribution: float, double, std::complex, and<br />
std::complex. The <strong>C++</strong> bindings also contain functions to set the pointers and sizes<br />
of the parameters in the C struct using the bindings traits classes. An example is given in<br />
Figure 11.5. The sparse matrix is the uBLAS coordinate matrix, which is a sparse matrix in
248 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
#include <br />
...<br />
int main () {<br />
int n=100;<br />
// Define a real n x n matrix<br />
glas::dense matrix< double > matrix( n, n ) ;<br />
// Define a complex n vector<br />
glas::dense vector< std::complex > eigval( n ) ;<br />
// Fill the matrix<br />
...<br />
// Call LAPACK routine DGEES <strong>for</strong> computing the eigenvalue Schur <strong>for</strong>m.<br />
// We create workspace <strong>for</strong> best per<strong>for</strong>mance.<br />
bindings::lapack::gees( matrix, eigval, bindings::lapack::optimal workspace() ) ;<br />
...<br />
}<br />
Figure 11.4: Example <strong>for</strong> LAPACK bindings and matrix bindings traits
11.4. MATRIX TEMPLATE LIBRARY 249<br />
coordinate <strong>for</strong>mat. The matrix is stored column wise. The template argument 1 indicates that<br />
row and column numbers start from one, which is required <strong>for</strong> the Fortran 90 code MUMPS.<br />
Finally, the last argument indicates that the row and column indices are stored in type int,<br />
which is also a requirement <strong>for</strong> the Fortran 90 interface. The solve consists of three phases :<br />
(1) the analysis phase, which only needs the matrix’ integer data, (2) the factorization phase,<br />
where also the numerical values are required and (3) the solution phase (or backtrans<strong>for</strong>mation),<br />
where the right-hand side vector is passed on. The included files contain the specializations of<br />
the dense matrix and sparse matrix traits <strong>for</strong> uBLAS and the MUMPS bindings.<br />
11.4 Matrix Template Library<br />
11.5 Blitz++<br />
TODO: We can ask Todd to write something himself — Peter<br />
11.6 Graph Libraries<br />
TODO: Few introducing words from Peter<br />
11.6.1 Boost Graph Library<br />
TODO: I can write something about it — Peter<br />
11.6.2 LEDA<br />
LEDA implements advanced container types and combinatorial algorithms, especially graph<br />
algorithms. Containers are parameterized by element type and implementation strategies. Algorithms<br />
in general work only with the data structure of the library itself.<br />
11.7 Geometric Libraries<br />
TODO: Few introducing words from René and Philipp<br />
11.7.1 CGAL<br />
TODO: Ask Sylvain to write something? Or can René and Philipp write it?<br />
CGAL implements generic classes and procedures <strong>for</strong> geometric computing. The data structure<br />
complexity is at a very high level.
250 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING<br />
#include <br />
#include <br />
#include <br />
int main() {<br />
namespace ublas = boost::numeric::ublas ;<br />
namespace mumps = boost::numeric::bindings::mumps ;<br />
...<br />
typedef ublas::coordinate matrix< double, ublas::column major<br />
, 1, ublas::unbounded array<br />
> sparse matrix type ;<br />
sparse matrix type matrix( n, n, nnz ) ;<br />
// Fill the sparse matrix<br />
...<br />
mumps::mumps< sparse matrix type > mumps solver ;<br />
// Analysis (Set the pointer and sizes of the integer data of the matrix)<br />
matrix integer data( mumps solver, matrix ) ;<br />
mumps solver.job = 1 ;<br />
driver( mumps solver ) ;<br />
// Factorization (Set the pointer <strong>for</strong> the values of the matrix)<br />
matrix value data( mumps solver, matrix ) ;<br />
mumps solver.job = 2 ;<br />
driver( mumps solver ) ;<br />
// Set the right−hand side<br />
ublas::vector v( 10 ) ;<br />
...<br />
// Solve (set pointer and size <strong>for</strong> the right−hand side vector)<br />
rhs sol value data( mumps solver, v ) ;<br />
mumps solver.job = 3 ;<br />
mumps::driver( mumps solver ) ;<br />
return 0 ;<br />
}<br />
Figure 11.5: Example of the use of the MUMPS bindings
11.7. GEOMETRIC LIBRARIES 251<br />
11.7.2 GrAL<br />
TODO: René and Philipp write more?<br />
GrAL implements some concepts like GSSE, but without the generalization of function objects,<br />
the three-layer-concept (segment,domain,structure), generalized quantity storage, and<br />
n-dimensional structured grid.
252 CHAPTER 11. <strong>C++</strong> LIBRARIES FOR SCIENTIFIC COMPUTING
Real-World Programming<br />
Chapter 12<br />
12.1 Transcending Legacy Applications<br />
Legacy application has been written in plain ANSI C or are available as Fortran libraries. It is<br />
there<strong>for</strong>e highly desirable to rejuvenate the implementation which is already available so that<br />
it utilizes advanced technologies and techniques while at the same time keeping as much of<br />
the already obtained experience and trust related to the original code base. One approach<br />
<strong>for</strong> the transition is possible by an evolutionary fashion initially including as much of the old<br />
implementation as possible and gradually replacing it to bring it up to date.<br />
The following examples are based on a particle simulator, where to important concepts can be<br />
separated: scattering mechanisms (physical behaviour of particles at boundaries (TODO: PS))<br />
and physical model descriptions (how particles interact (TODO: PS) ). All available scattering<br />
mechanisms are implemented as individual functions, which are called subsequently. The scattering<br />
models require a variable set of parameters, which leads to non-homogeneous interfaces<br />
in the functions representing them. To alleviate this to some extent global variables have been<br />
employed completely eliminating any aspirations of data encapsulation and posing a serious<br />
problem <strong>for</strong> attempts <strong>for</strong> parallelization to take advantage of modern multi-core CPUs. The<br />
code has the very simple and repetitive structure:<br />
double sum = 0;<br />
double current rate = generate random number();<br />
if (A key == on)<br />
{<br />
sum = A rate(state, parameters);<br />
if (current rate < sum)<br />
{<br />
counter→ A[state→ valley]++;<br />
state after A (st, p);<br />
return;<br />
}<br />
}<br />
sum += B rate (state, state 2, parameters);<br />
253
254 CHAPTER 12. REAL-WORLD PROGRAMMING<br />
if (current rate < sum)<br />
{<br />
counter→ B[state→ valley]++;<br />
state after B (state, state 2);<br />
return;<br />
}<br />
...<br />
Extensions to this code are usually accomplished by copy and paste, which is prone to simple<br />
mistakes by oversight, such as failing to change the counter which has to be incremented or<br />
calling the incorrect function to update the electron’s state.<br />
Furthermore, at times the need arises to calculate the sum of all the scattering models (λtotal),<br />
which is accomplished in a different part of the implementation, thus further opening the possibility<br />
<strong>for</strong> inconsistencies between the two code paths.<br />
The decision which models to evaluate is done strictly at run time and it would require significant,<br />
if simple modification of the code to change this at compile time, thus making highly<br />
optimized specializations very cumbersome.<br />
The functions calculating the rates and state transitions, however, have been well tested and<br />
verified, so that abandoning them would be wasteful.<br />
12.1.1 Best of Both Worlds<br />
Scientific computing requires not only high per<strong>for</strong>mance components evaluated and optimized<br />
at compile-time, but also runtime exchangeable (physical) models and the ability to cope with<br />
various boundary conditions. The two most commonly used programming paradigms, object<br />
oriented and generic programming, differ in how the required functionality is implemented.<br />
Object oriented programming directly offers runtime polymorphism by means of virtual inheritance.<br />
Un<strong>for</strong>tunately current implementations of inheritance use an intrusive approach <strong>for</strong> new<br />
software components and tightly couples a type and the corresponding operations to the super<br />
type. In contrast to object-oriented programming, generic programming is limited to algorithms<br />
using statically and homogeneously typed containers but offers highly flexible, reusable,<br />
and optimizeable software components.<br />
As can be seen, both programming types offer different points of evaluation. runtime-polymorphism<br />
based on concepts [?] (runtime concepts) tries to combine the virtual inheritance runtime modification<br />
mechanism and the compile-time flexibility and optimization.<br />
Inheritance in the context of runtime polymorphism is used to provide an interface template<br />
to model the required concept where the derived class must provide the implementation of the<br />
given interface. The following code snippet<br />
template struct scatter facade<br />
{<br />
typedef StateT state type;<br />
struct scattering concept<br />
{<br />
virtual ∼scattering concept() {} ;<br />
virtual numeric type rate(const state type& input) const = 0;
12.1. TRANSCENDING LEGACY APPLICATIONS 255<br />
virtual void transition(state type& input) = 0;<br />
};<br />
boost::shared ptr scattering object;<br />
template struct scattering model:scattering concept<br />
{<br />
T scattering instance;<br />
scattering model(const T& x):scattering instance(x) {}<br />
numeric type rate(const state type& input) const ;<br />
void transition(state type& input) ;<br />
};<br />
numeric type rate(const state type& input) const;<br />
void transition(state type& input) ;<br />
template <br />
scatter facade(const T& x):scattering object(new scattering model(x)){}<br />
∼scatter facade() {}<br />
};<br />
there<strong>for</strong>e introduces a scattering facade which wraps a scattering concept part. The<br />
virtual inheritance is used to configure the necessary interface parts, in this case rate() and<br />
transition(), which have to be implemented by any scattering model. In the given example<br />
the state type is still available <strong>for</strong> explicit parametrization.<br />
In contrast to other applications of runtime concepts, e.g. in computer graphics, it is not<br />
necessary to provide mechanisms <strong>for</strong> deep copies, as the actual physical models remain unaltered<br />
once they have been created and would only serve unnecessarily increase the memory footprint.<br />
There<strong>for</strong>e a boost::shared ptr is used <strong>for</strong> memory management.<br />
The legacy application has been writte in plain ANSI C, which makes it easily compatible with<br />
the new C ++ implementation. Several design decisions, such as the use of global and static<br />
variables, make it difficult to extend and update appropriately <strong>for</strong> modern multi-core CPUs.<br />
To interface this novel approach a core structure is implemented which wraps the implementations<br />
of the scattering models by using runtime concepts.<br />
template<br />
struct scattering rate A<br />
{<br />
...<br />
const ParameterType& parameters;<br />
scattering rate A(const ParameterType& parameters):parameters(parameters){}<br />
template <br />
numeric type<br />
operator() (const StateType& state) const<br />
{<br />
return A rate(state, parameters);<br />
}<br />
};
256 CHAPTER 12. REAL-WORLD PROGRAMMING<br />
By supplying the required parameters at construction time it is possible to homogenize the<br />
interface of the operator(). This methodology also allows the continued use of the old data<br />
structures in the initial phases of transition, while not being so constrictive as to hamper future<br />
developments.<br />
The functions <strong>for</strong> the state transitions are treated similarly to those <strong>for</strong> the rate calculation.<br />
Both are then fused in a scattering pack to <strong>for</strong>m the complete scattering model and to ensure<br />
consistency of the rate and state transition calculations and which also models the runtime<br />
concept as can be seen in the following part of code:<br />
template<br />
struct scattering pack<br />
{<br />
// ...<br />
scattering rate type rate calculation;<br />
transition type state transition;<br />
scattering pack (const parameter type& parameters) :<br />
rate calculation(parameters),<br />
state transition(parameters)<br />
{}<br />
template<br />
numeric type rate(const StateType& state) const<br />
{<br />
return rate calculation(state);<br />
}<br />
template<br />
void transition(StateType& state)<br />
{<br />
state transition(state);<br />
}<br />
}<br />
The blend of runtime and compile time mechanisms allows the storage of all scattering models<br />
within a single container, e.g. std::vector, which can be iterated over in order to evaluate<br />
them.<br />
typedef std::vector scatter container type ;<br />
scatter container type scatter container ;<br />
scatter container.push back(scattering model) ;<br />
For the development of new collision models easy extendability, even without recompilations,<br />
is also a highly important issue. This approach allows the addition of scattering models at<br />
runtime and to expose an interface to an interpreted language such as, e.g., Python [?].<br />
In case a highly optimized version is desired, the runtime container (here the std::vector) may<br />
be exchanged by a compile time container, which is also readily available from the GSSE and<br />
provides the compiler with further opportunities <strong>for</strong> optimizations at the expense of runtime<br />
adaptability.
12.1. TRANSCENDING LEGACY APPLICATIONS 257<br />
12.1.2 Reuse Something Appropriate<br />
While the described approach initially slightly increases the burden of implementation, due<br />
to the fact, that wrappers need to be provided, it gives a transition path to integrate legacy<br />
codes into an up to date frame while at the same time not to abandoning the experience<br />
associated with it. The invested ef<strong>for</strong>t allows to raise the level of abstraction, which in turn<br />
allows to increase the benefits obtained from the advances in compiler technologies. This in<br />
turn inherently allows an optimization <strong>for</strong> several plat<strong>for</strong>ms without the need <strong>for</strong> massive human<br />
ef<strong>for</strong>t, which was needed in previous approaches.<br />
In this particular case, encapsulating the reliance on global variables of the functions implementing<br />
the scattering models to the wrapping structures, parallelization ef<strong>for</strong>ts are greatly<br />
facilitated, which are increasingly important with the continued increase of computing cores<br />
per CPU.<br />
Furthermore the results can easily be verified as code parts a gradually moved to newer implementations,<br />
the only stringent requirement being link compatibility with C ++. This test and<br />
verification can be taken a step further in case the original implementation is written in ANSI<br />
C, due to the high compatibility of it to C ++. It is possible to weave parts of the new implementation<br />
into the older code. Providing the opportunity to get very a fine grained comparison<br />
not only of final results, but of all the intermediates as well.<br />
Such swift verification of implementations allows to also speed up the steps necessary to verify<br />
calculated results with subsequent or contemporary experiments, which should not be neglected,<br />
in order to keep physical models and their numerical representations strongly rooted in reality.
258 CHAPTER 12. REAL-WORLD PROGRAMMING
Parallelism<br />
Chapter 13<br />
13.1 Multi-Threading<br />
To do!<br />
13.2 Message Passing<br />
13.2.1 Traditional Message Passing<br />
Parallel hello world<br />
#include <br />
#include <br />
int main (int argc, char∗ argv[])<br />
{<br />
MPI Init(&argc, &argv);<br />
std::cout ≪ ”Hello, World!\n”;<br />
MPI Finalize();<br />
}<br />
return 0 ;<br />
#include <br />
#include <br />
int main (int argc, char∗ argv[])<br />
{<br />
MPI Init(&argc, &argv);<br />
int myrank, nprocs;<br />
MPI Comm rank(MPI COMM WORLD, &myrank);<br />
MPI Comm size(MPI COMM WORLD, &nprocs);<br />
std::cout ≪ ”Hello world, I am process number ” ≪ myrank ≪ ” out of ” ≪ nprocs ≪ ”.\n”;<br />
259
260 CHAPTER 13. PARALLELISM<br />
}<br />
MPI Finalize();<br />
return 0 ;<br />
13.2.2 Generic Message Passing<br />
Everybody sends to process number 0.<br />
#include <br />
#include <br />
#include <br />
int main (int argc, char∗ argv[])<br />
{<br />
MPI Init(&argc, &argv);<br />
}<br />
int myrank, nprocs;<br />
MPI Comm rank(MPI COMM WORLD, &myrank);<br />
MPI Comm size(MPI COMM WORLD, &nprocs);<br />
float vec[2];<br />
vec[0]= 2∗myrank; vec[1]= vec[0]+1;<br />
// Local accumulation<br />
float local= std::abs(vec[0]) + std::abs(vec[1]);<br />
// Global accumulation<br />
float global= 0.0f;<br />
MPI Status st;<br />
// Receive from predecessor<br />
if (myrank > 0)<br />
MPI Recv(&global, 1, MPI FLOAT, myrank−1, 387, MPI COMM WORLD, &st);<br />
// Increment<br />
global+= local;<br />
// Send to successor<br />
if (myrank+1 < nprocs)<br />
MPI Send(&global, 1, MPI FLOAT, myrank+1, 387, MPI COMM WORLD);<br />
else<br />
std::cout ≪ ”Hello, I am the last process and I know that |v| 1 is ” ≪ global ≪ ”.\n”;<br />
MPI Finalize();<br />
return 0 ;<br />
low abstraction level<br />
The library per<strong>for</strong>ms the reduction.<br />
#include <br />
#include <br />
#include
13.2. MESSAGE PASSING 261<br />
int main (int argc, char∗ argv[])<br />
{<br />
MPI Init(&argc, &argv);<br />
}<br />
Because:<br />
int myrank, nprocs;<br />
MPI Comm rank(MPI COMM WORLD, &myrank);<br />
MPI Comm size(MPI COMM WORLD, &nprocs);<br />
float vec[2];<br />
vec[0]= 2∗myrank; vec[1]= vec[0]+1;<br />
// Local accumulation<br />
float local= std::abs(vec[0]) + std::abs(vec[1]);<br />
// Global accumulation<br />
float global;<br />
MPI Allreduce (&local, &global, 1, MPI FLOAT, MPI SUM, MPI COMM WORLD);<br />
std::cout ≪ ”Hello, I am process ” ≪ myrank ≪ ” and I know too that |v| 1 is ” ≪ global ≪ ”.\n”;<br />
MPI Finalize();<br />
return 0 ;<br />
• Higher abstraction:<br />
• MPI implementation usually adapted the underlying hardware: typically logarithmic ef<strong>for</strong>t;<br />
can be tuned in assember <strong>for</strong> network card
262 CHAPTER 13. PARALLELISM
Numerical exercises<br />
Chapter 14<br />
In this chapter, we list a number of exercises where the different aspects discussed in the course<br />
will be used. The goal is to implement a small application program in <strong>C++</strong>, run it and interpret<br />
the results.<br />
You can use any software that may help you with your task. A list of packages is provided<br />
at the end of this chapter. We have only installed Boost, Boost.Sandbox, GLAS, BLAS, and,<br />
LAPACK. Other smaller packages could be downloaded if necessary.<br />
In each exercise, a generic function or class will be developed, and its documentation. These<br />
functions and classes should be part of the namespace athens. The functions arguments will<br />
have to be described. Each template argument will have to satisfy concepts. You may have<br />
to define new concepts. If you are using STL or GLAS concepts, you can just refer to them,<br />
without definition.<br />
Write a small paper on the decisions you made <strong>for</strong> the development of the software. Use the<br />
software <strong>for</strong> some examples and report the results. You may write the report on paper or send<br />
it in electronic <strong>for</strong>m (PDF by preference).<br />
14.1 Computing an eigenfunction of the Poisson equation<br />
This is an example of a more complicated problem. It illustrates what is expected from the<br />
exercises. The actual exercises are less demanding.<br />
In this section, we derive software <strong>for</strong> the solution of the Poisson equation. We start with the<br />
1D problem and then move to the 2D problem.<br />
14.1.1 The 1D Poisson equation<br />
The 1D Poisson equation is<br />
− d2u = f (14.1)<br />
dx2 where u(x) is the solution and f the excitation and x ∈ [0, 1]. We impose the boundary<br />
conditions<br />
u(0) = u(1) = 0 .<br />
263
264 CHAPTER 14. NUMERICAL EXERCISES<br />
This is called a boundary value problem.<br />
The goal is to compute the solution u <strong>for</strong> all x ∈ [0, 1]. Since this is not possible numerically, we<br />
only compute u <strong>for</strong> a discrete number of x’s, which we call discretization points. We discretize x<br />
as xj = jh <strong>for</strong> j = 0, . . . , n+1 and h = 1/(n+1). This is called an equidistant distribution. The<br />
smaller h, the closer we are to the continuous problem, i.e. we have more points in [0, 1], but, as<br />
we shall see, the problem becomes more expensive to solve. One method <strong>for</strong> solving boundary<br />
value problems is to replace the derivative by finite differences. We use finite differences <strong>for</strong> the<br />
second order derivatives:<br />
Filling this in (14.1), we obtain<br />
d 2 u<br />
dx 2 (xj) ≈ 1<br />
h 2 (−2u(xj) + u(xj−1) + u(xj+1)) .<br />
1<br />
h 2 (−u(xi−1) − u (xi+1) + 2u(xi)) = f(xi) <strong>for</strong> j = 1, . . . , n . (14.2)<br />
Note that u(x0) = u(xn+1) = 0. Now define the vectors<br />
u = [u(x1), . . . , u(xn)] T<br />
and f = [f(x1), . . . , f(xn)] T .<br />
Putting together (14.2) <strong>for</strong> j = 1, . . . , n leads to the algebraic system of equations Au = f with<br />
n rows and columns where ⎡<br />
2<br />
⎢ −1<br />
A = ⎢<br />
⎣<br />
−1<br />
2<br />
. ..<br />
−1<br />
. .. . ..<br />
⎤<br />
⎥<br />
⎦<br />
−1 2<br />
.<br />
Note that A is a symmetric tridiagonal matrix. We can show that it is positive definite.<br />
In the algorithms, we need operations on this matrix. We will use two different types of<br />
operations. The first one is the matrix-vector product y = Ax. We write a function <strong>for</strong> this<br />
with a template argument <strong>for</strong> the vectors since we do not know be<strong>for</strong>ehand what the type of<br />
the vectors will be.<br />
#ifndef athens poisson 1d hpp<br />
#define athens poisson 1d hpp<br />
#include <br />
#include <br />
namespace athens {<br />
template <br />
void poisson 1d( X const& x, Y& y ) {<br />
assert( glas::size(x)==glas::size(y) ) ;<br />
assert( glas::size(x) > 1 ) ;<br />
y(0) = 2.0∗x(0) − x(1) ;<br />
<strong>for</strong> ( int i=1; i
14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 265<br />
} // namespace athens<br />
#endif<br />
where we assume that the types X and Y are models of the concept glas::DenseVectorCollection.<br />
14.1.2 Richardson iteration<br />
Richardson iteration is an iterative method <strong>for</strong> the solution of the linear system<br />
Bu = g<br />
that starts from an initial guess u0 and computes ui = ui−1 + ri−1 at iteration i, where ri−1 is<br />
the residual g − Bui−1. It works as follows: The method converges when the eigenvalues of B<br />
1<br />
2<br />
3<br />
4<br />
5<br />
1. For i = 1, . . . , max it:<br />
1.1. Compute residual ri−1 = g − Bui−1<br />
1.2. If �ri−1�2 ≤ τ: return<br />
1.3. Compute new solution ui = ui−1 + ri−1<br />
lie between 0 and 2.<br />
The eigenvalues of the Poisson matrix A are λj = 2(1 − cos(πj/(n + 1))) <strong>for</strong> j = 1, . . . , n. The<br />
eigenvalues are thus bounded by 0 < λj < 4. We there<strong>for</strong>e first multiply Au = f by 0.5 into<br />
(0.5A)u = (0.5f)<br />
Note that the solution u does not change. Define B = 0.5A and g = 0.5f, then Bu = g and the<br />
eigenvalues of B lie in (0, 2). For such matrix, we can use the Richardson iteration method.<br />
We develop the following function<br />
template <br />
double richardson( Op const& op, G const& g, U& u, double const& tol, int max it ) ;<br />
where op is a BinaryFunction op(x,y) that computes y = Bx <strong>for</strong> a given input argument x, and<br />
where u is an initial estimate of the solution on input and the computed solution on output.<br />
The vector g is the right-hand side of the system. The return value of richardson is the residual<br />
norm. This allows us to check how accurate the solution is without having to compute the<br />
residual explicitly. The parameter tol corresponds to the tolerance τ.<br />
First, we set conceptual conditions on all arguments.<br />
• U is a model of concept glas::DenseVectorCollection, i.e. we assume that a dense vector<br />
from GLAS is used.<br />
• Op is a model of BinaryFunction, i.e. the following are valid expressions <strong>for</strong> op of type Op:<br />
– op(x,y) where x and y are instances of type X where X is a model of the concept<br />
glas::DenseVectorCollection.
266 CHAPTER 14. NUMERICAL EXERCISES<br />
• G is a model of concept glas::VectorExpression.<br />
Next, we write the code <strong>for</strong> the Richardson iteration. We store the variables ui in u and ri in r.<br />
#ifndef athens richardson hpp<br />
#define athens richardson hpp<br />
#include <br />
namespace athens {<br />
template <br />
double richardson( Op const& op, F const& f, U& u, double const& tol, int max it ) {<br />
double resid norm ;<br />
// Create residual vector<br />
glas::dense vector< typename glas::value type::type > r( glas::size(u) ) ;<br />
<strong>for</strong> ( int iter =0; iter
14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 267<br />
glas::random( f, seed ) ;<br />
v type x( 10 ) ;<br />
x = 0.0 ;<br />
// Richardson iteration<br />
double res nrm = athens::richardson( poisson scaled(), 0.5∗f, x, 1.e−4, 1000 ) ;<br />
{<br />
glas::dense vector r( size(x) ) ;<br />
athens::poisson 1d( x, r ) ;<br />
std::cout ≪ ”res nrm = ” ≪ norm 2( f −r ) ≪ std::endl ;<br />
std::cout ≪ ”f = ” ≪ f ≪ std::endl ;<br />
std::cout ≪ ”x = ” ≪ x ≪ std::endl ;<br />
}<br />
return 0 ;<br />
}<br />
We multiply right-hand side and matrix vector product by 0.5 to make sure the Richardson<br />
method converges.<br />
The output looks like<br />
res_nrm = 0.000195164<br />
f = (10)[0.0484811,0.822283,0.102721,0.436631,0.46112,0.0475317,0.864644,0.0772845,0.920099,0.105434]<br />
x = (10)[1.85463,3.66081,4.64473,5.52601,5.97071,5.9544,5.8906,4.96226,3.95668,2.03105]<br />
Note that the Richardson method converges very slowly. For the Poisson equation, there exist<br />
much faster methods.<br />
14.1.3 LAPACK tridiagonal solver<br />
The LAPACK [?] software package contains routines <strong>for</strong> solving linear systems with a symmetric<br />
positive definite tridiagonal matrix. This package is written in FORTRAN 77. The<br />
corresponding functions are<br />
• Cholesky factorization: A = LL T by<br />
SUBROUTINE DPTTRF( N, D, E, INFO )<br />
• Linear solve: Ax = b using LL T x = b by<br />
SUBROUTINE DPTTRS( N, NRHS, D, E, B, LDB, INFO )<br />
In order to solve Au = f, first A is factorized by the Cholesky factorization into A = LDL T<br />
where L is a matrix consisting of a main diagonal of ones and a diagonal below the main diagonal<br />
and D is a diagonal matrix. Once the factorization is per<strong>for</strong>med, the solution is computed as<br />
u = L −T D(L −1 f). Note the inversions of L and L T are not computed explicitly. For example<br />
L −1 f is computed as a linear solve with L. Linear solves <strong>for</strong> triangular matrices are easy to<br />
program. This is what DPTTRS does <strong>for</strong> us.<br />
A <strong>C++</strong> interface to DPTTRF and DPTTRS is available from the BoostSandbox.Bindings. For<br />
our application, we can solve a linear system as follows.
268 CHAPTER 14. NUMERICAL EXERCISES<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
1. Given an approximate eigenvector x0<br />
2. Normalize: x0 = x0/�x0�2.<br />
3. For i = 1, . . . , m:<br />
3.1. Solve Ayi = xi.<br />
3.2. Compute the eigenvalue estimate: λi = � xi/ � yi.<br />
3.3. xi = yi/�yi�2.<br />
#include // Lapack binding<br />
#include // glas binding<br />
#include // glas vectors<br />
#include // <strong>for</strong> std::fill<br />
#include // <strong>for</strong> assert<br />
#include / <strong>for</strong> cout and endl<br />
int main() {<br />
int const n = 10 ;<br />
glas::dense vector< double > d(n) ; // Main diagonal<br />
glas::dense vector< double > e(n−1) ; // Lower/upper diagonal<br />
std::fill( begin(d), end(d), 2.0 ) ;<br />
std::fill( begin(e), end(e), −1.0 ) ;<br />
glas::dense vector< double > rhs( n ) ;<br />
std::fill( begin(rhs), end(rhs), 3.0 ) ;<br />
int info = boost::numeric::bindings::lapack::pttrf( d, e ) ;<br />
assert( !info ) ;<br />
std::cout≪ rhs ≪ std::endl ;<br />
info = boost::numeric::bindings::lapack::pttrs( ’L’, d, e, rhs ) ;<br />
std::cout≪ rhs ≪ std::endl ;<br />
// Solution is in rhs<br />
}<br />
14.1.4 The inverse iteration method<br />
The inverse iteration method computes an eigenvalue of a matrix A. The method converges to<br />
the eigenvector associated with the eigenvalue nearest zero. The method works as follows: In<br />
this algorithm � xi means the sum of the elements of xi. For the solution of the linear system,<br />
we can use Richardson iteration.<br />
Write a function with the following header:<br />
template <br />
void inverse iteration( Op const& op, DenseVectorCollection& x, int m, Float& lambda ) ;<br />
where Op is a model of BinaryFunction that solves y from x, x is the eigenvector estimate on<br />
input and output, and m the number of iterations. The return value is the estimated eigenvalue.
14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 269<br />
First, we set conceptual conditions on all arguments.<br />
• Op is a model of BinaryFunction, i.e. the following are valid expressions <strong>for</strong> op of type Op:<br />
– op(x,y) where x and y are instances of type X where X is a model of the concept<br />
glas::DenseVectorCollection.<br />
• DenseVectorCollection is a model of glas::DenseVectorCollection.<br />
• Float is a concept of real numbers, i.e. it is float, double, or long double.<br />
The implementation <strong>for</strong> inverse iteration could be as follows:<br />
#ifndef athens inverse iteration hpp<br />
#define athens inverse iteration hpp<br />
#include <br />
#include <br />
namespace athens {<br />
template <br />
void inverse iteration( Op const& op, DenseVectorCollection& x, int m, Float& lambda ) {<br />
glas::dense vector< typename glas::value type::type > y( glas::size(x) ) ;<br />
x = x / norm 2( x ) ; // 2.<br />
<strong>for</strong> ( int i=0; i
270 CHAPTER 14. NUMERICAL EXERCISES<br />
}<br />
} ;<br />
athens::richardson( poisson scaled(), 0.5∗x, y, 1.e−8, 1000 ) ;<br />
int main() {<br />
typedef glas::dense vector v type ;<br />
v type x( 10 ) ;<br />
glas::random seed seed ;<br />
glas::random( x, seed ) ;<br />
double lambda ;<br />
athens::inverse iteration( solve(), x, 100, lambda ) ;<br />
std::cout ≪ ”lambda = ” ≪ lambda ≪ std::endl ;<br />
std::ofstream xf( ”x.out” ) ;<br />
<strong>for</strong> ( int i=0; i
14.1. COMPUTING AN EIGENFUNCTION OF THE POISSON EQUATION 271<br />
0.45<br />
0.4<br />
0.35<br />
0.3<br />
0.25<br />
0.2<br />
0.15<br />
"x.out"<br />
0.1<br />
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1<br />
Figure 14.1: First eigenvector of the 1D Poisson operator<br />
int main() {<br />
typedef glas::dense vector v type ;<br />
int n = 10 ;<br />
v type x( n ) ;<br />
glas::random seed seed ;<br />
glas::random( x, seed ) ;<br />
v type d( n ) ; std::fill( begin(d), end(d), 2.0 ) ;<br />
v type e( n−1 ) ; std::fill( begin(e), end(e), −1.0 ) ;<br />
solve< v type, v type > solver( d, e ) ;<br />
athens::inverse iteration( solver, x, 100, lambda ) ;<br />
std::cout ≪ ”lambda = ” ≪ lambda ≪ std::endl ;<br />
std::ofstream xf( ”x.out” ) ;<br />
<strong>for</strong> ( int i=0; i plot "x.out" w l<br />
gnuplot>
272 CHAPTER 14. NUMERICAL EXERCISES<br />
14.2 The 2D Poisson equation<br />
The 2D Poisson equation is<br />
− ∂2u ∂x2 − ∂2u = f<br />
∂y2 where u(x, y) is the solution and f the excitation and (x, y) ∈ [0, 1] × [0, 1]. We impose the<br />
boundary conditions<br />
u(0, y) = u(1, y) = y(x, 0) = y(x, 1) = 0 .<br />
We discretize the x as xj = jh <strong>for</strong> j = 1, . . . , n and h = 1/n. Similarly, yj = jh. We use finite<br />
differences <strong>for</strong> the second order derivatives. This produces the equation<br />
1<br />
h 2 (−u(xi−1, yj)−u(xi, yj−1)−u (xi+1, yj)−u(xi, yj+1)+4u(xi, yj)) = f(xi, yj) <strong>for</strong> i, j = 1, . . . , n .<br />
This leads to the algebraic system of equations Au = f with n 2 rows and columns.<br />
Recall the example exercise of §14.1. We do exactly the same exercise. Since the matrix is not<br />
tridiagonal, we cannot use the LAPACK routine pttrf any longer. We use the LAPACK routine<br />
sytrf <strong>for</strong> a full matrix instead. See the documentation on<br />
boost-sandbox/libs/numeric/bindings/lapack/doc/index.html.<br />
For a 2D problem the solution vector u can be represented as a matrix. The row index corresponds<br />
to the variable x and the column index to a variable y.<br />
In particular, you develop the functions inverse iteration, poisson 2d <strong>for</strong> the matrix vector product,<br />
scaled poisson <strong>for</strong> the scaled matrix vector product, and richardson. Give <strong>for</strong> each templated<br />
argument the conceptual conditions. make a plot of the eigenvector using Gnuplot’s splot (<strong>for</strong><br />
plotting surfaces).<br />
14.3 The solution of a system of differential equations<br />
In this exercise, we write a function <strong>for</strong> the computation of a time step of a system of differential<br />
equations using Runge-Kutta methods.<br />
14.3.1 Explicit time integration<br />
Methods <strong>for</strong> the solution of the differential equation<br />
˙u = f(u) u(0) = u0<br />
operate time step by time step, i.e. the time is discretized and given the solution at time step<br />
tj, we compute the solution at time step tj+1 = tj + h where h is small.
14.3. THE SOLUTION OF A SYSTEM OF DIFFERENTIAL EQUATIONS 273<br />
The method that we use here is the Runge-Kutta 4 method, which is described here: the<br />
solution at time step tj+1 is computed as<br />
14.3.2 Software<br />
Write a generic function<br />
uj+1 = uj + h<br />
6 (k1 + 2k2 + 2k3 + k4)<br />
k1<br />
k2<br />
=<br />
=<br />
f(uj)<br />
�<br />
f uj + h<br />
2 k1<br />
k3 =<br />
�<br />
�<br />
f uj + h<br />
2 k2<br />
�<br />
k4 = f(uj + hk3)<br />
template <br />
void rk4( U& u, F& f, T const& h ) ;<br />
that computes one time step with the Runge-Kutta 4 method. The argument u is on input the<br />
solution at time t and on output at timestep t+h. The argument f is the functor that evaluates<br />
the function f(u). The argument u is a vector.<br />
When the implementation is finished, write the concepts <strong>for</strong> U and F in comment lines in the<br />
code.<br />
14.3.3 The van der Pol oscillator<br />
Differential equations appear in the study of physical phenomena. The Van der Pol oscillator<br />
is described by the following equation:<br />
d2x dt2 − µ(1 − x2 ) dx<br />
+ x = 0 (14.3)<br />
dt<br />
with initial solution x(0) and x ′ (0). This is a non-linear second order differential equation, with<br />
a parameter µ. When µ = 0, we have a purely harmonic solution (cos and sin). When µ ≥ 0,<br />
the solution evolves to a harmonic limit cycle.<br />
Second order differential equations are usually solved by writing them as a system of first order<br />
differential equations:<br />
� � � � � �<br />
d dx<br />
dt −µ(1 − x2 dx<br />
) 1<br />
+<br />
dt = 0 .<br />
dt x<br />
−1 0 x<br />
In matrix <strong>for</strong>m, the equation can be written as<br />
where<br />
A(u) =<br />
du<br />
dt<br />
+ A(u)u = 0<br />
� −µ(1 − u 2 2 ) 1<br />
−1 0<br />
�<br />
,
274 CHAPTER 14. NUMERICAL EXERCISES<br />
1 3<br />
2<br />
Figure 14.2: An example of a web with only four pages. An arrow from page A to page B<br />
indicates a link from page A to page B.<br />
or<br />
with<br />
14.3.4 Exercise<br />
du<br />
dt<br />
= f(u)<br />
f(u) = −A(u)u .<br />
Use the Runge-Kutta 4 method <strong>for</strong> evaluating the Van der Pol equation <strong>for</strong> µ = 0, µ = 0.1 and<br />
µ = 1 in the time interval [0, 10] with time steps h = 0.001. Also try smaller and larger time<br />
steps.<br />
Plot the results using gnuplot.<br />
14.4 Google’s Page rank<br />
We all use Google <strong>for</strong> web searching. In this exercise, we try and understand a particular tool<br />
used by Google to rank pages, called PageRank.<br />
The basic idea behind the Google Page Ranking Algorithm, is that the importance of a webpage<br />
is determined by the number of references made to it. We would like to compute a score xk<br />
reflecting the importance of page k. A simple minded approach would be just to count the<br />
number of links to each page. This approach does not reflect the fact that some pages might<br />
be more significant than others there<strong>for</strong>e rendering their votes more important. It also leaves<br />
open the possibility of artificially inflating the rank of a particular page by generating other<br />
trivial or advertising pages whose only function is to promote the importance of a particular<br />
page. Significant refinements are:<br />
• Weight each in-link by the importance of the page which links to it.<br />
• Give each page a total vote of 1. If page j contains nj links, one of which links to page k,<br />
then page k’s score is boosted by xj<br />
nj .<br />
4
14.4. GOOGLE’S PAGE RANK 275<br />
Taking the new refinements into account we can compute the importance score xk of a page k<br />
as follows:<br />
xk = � xj<br />
(14.4)<br />
nj<br />
j∈Lk<br />
Where Lk denotes the set of pages with a link to page k. Consider the simple example of Figure<br />
14.2. Using the <strong>for</strong>mula (14.4) we get the following equations <strong>for</strong> the importance scores of the<br />
pages in this example:<br />
x1 = x3 + x4<br />
2<br />
x2 = x1<br />
3<br />
x3 = x1<br />
3<br />
x4 = x1<br />
3<br />
+ x2<br />
2<br />
+ x2<br />
2<br />
+ x4<br />
2<br />
These linear equations can be written as Ax = x where x = [x1 x2 x3 x4] T and<br />
⎡<br />
⎤<br />
A =<br />
⎢<br />
⎣<br />
0 0 1 1<br />
2<br />
1<br />
3 0 0 0<br />
1<br />
3<br />
1<br />
3<br />
1 1<br />
2 0 2<br />
1<br />
2 0 0<br />
This trans<strong>for</strong>ms the web ranking problem into the standard problem of finding an eigenvector<br />
x with eigenvalue 1 <strong>for</strong> the square matrix A. This eigenvector can be found iteratively using<br />
the power method with a threshold τ:<br />
The power method converges to the eigenvector corresponding to the dominant eigenvalue λ1.<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
1. v (0) = some vector with � v (0) �= 1<br />
2. Repeat <strong>for</strong> k=1,2, . . . :<br />
2.1. Apply A: w = Av (k−1) .<br />
2.2. Normalize: v (k) = w/ � w �.<br />
3. Until �v (k−1) − v (k) � < τ<br />
The matrix A is called a column stochastic matrix, since it is a square matrix with positive<br />
entries and the entries in each column sum to one. In the case of a column stochastic matrix,<br />
this dominant eigenvalue is 1.<br />
14.4.1 Software<br />
Write a generic function:<br />
template <br />
void power iteration( V& v, Function & f, double tau ) ;<br />
that computes the power iteration algorithm 5 <strong>for</strong> a matrix A with starting vector v. The<br />
resulting eigenvector should be stored in v. The argument f is a functor that returns the result<br />
of the matrix vector product. Also write documentation and specify the conceptual constraints<br />
<strong>for</strong> the arguments.<br />
⎥<br />
⎦ .
276 CHAPTER 14. NUMERICAL EXERCISES<br />
14.4.2 Dictionary application<br />
The page ranking algorithm which was described above can also be used to rank different words<br />
in a dictionary. Consider the following small dictionary:<br />
backwoods = bush, jungle<br />
bush = backwoods, jungle, shrub, plant, hedge<br />
flower = plant<br />
hedge = bush<br />
jungle = bush, backwoods<br />
plant = bush, shrub, flower, weed<br />
shrub = bush, plant, tree<br />
tree = shrub<br />
weed = plant<br />
Construct a graph linking every word with the words in its explanation. The first line of the<br />
dictionary, <strong>for</strong> example, would link bush and jungle to backwoods. The graph can be constructed<br />
on paper. Use equation (14.4) to construct the sparse column stochastic matrix A and use your<br />
power method to rank the words.<br />
o of a function in an interval<br />
In this exercise, we make a programming exercise on a root finding method, called the bisection<br />
method.<br />
14.5.1 Functions in one variable<br />
Suppose we are given a function f in one variable and we want to compute the unique zero<br />
in the interval [a, b]. A method that could be used is the bisection method. It only requires<br />
function evaluations and is thus widely applicable.<br />
The method computes a small interval that contains the minimum. This small interval is<br />
obtained by splitting the interval [a, b] in two parts [a, m] and [m, b], where<br />
The method works as follows :<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
m =<br />
a + b<br />
2<br />
1. Given the interval [a, b] <strong>for</strong> which f(a)f(b) < 0.<br />
2. Repeat until b − a < τ:<br />
2.1. Compute m from (14.5).<br />
2.2. If f(m)f(a) < 0: b = m.<br />
2.2. Else: a = m.<br />
(14.5)
14.5. THE BISECTION METHOD FOR FINDING THE ZERO OF A FUNCTION IN AN INTERVAL277<br />
14.5.2 Software<br />
The task is to first develop the function<br />
template <br />
void bisection( T& a, T& b, Function& f, double tau ) ;<br />
that computes the bisection Algorithm 6. The object f is a functor that returns the function<br />
value <strong>for</strong> a single argument x. The type T is a float type, i.e. float or double.<br />
Write documentation <strong>for</strong> the function and describe the conceptual conditions on Function.<br />
14.5.3 The growth and downfall of a caterpillar population<br />
Everyone knows caterpillars grow up to be beautiful butterflies. But be<strong>for</strong>e they reach that<br />
stage of their life, they need lots of food to grow. A large population will not grow at the same<br />
rate as a smaller one, because of a shortage of food. Furthermore most birds enjoy a juicy<br />
caterpillar as snack, so they are responsible <strong>for</strong> the premature death of several members of the<br />
caterpillar population. These relationships can be modelled mathematically by the following<br />
equation:<br />
dN<br />
dt<br />
N αN 2<br />
= rN(1 − ) −<br />
K β + N 2<br />
In this equation rN(1 − N<br />
) models the growth of the population, where N equals the num-<br />
K<br />
ber of caterpillars, r is the growing rate of the population and K is the maximum amount of<br />
caterpillars that can inhabit the area. The second term of the equation models the death of the<br />
caterpillars. Here α is the maximum rate at which a bird can eat caterpillars when N is large<br />
and β is a parameter that indicates the intensity of the bird attacks. We want to know when<br />
there exists an equilibrium between the growth and death rate in the caterpillar population,<br />
i.e. when dN<br />
equals zero.<br />
dt<br />
Use the function bisection to compute the number of caterpillars N <strong>for</strong> which the following<br />
populations are at an equilibrium in the intervals [0.1; 10], [10, 20] and [20, 100]:<br />
• Population 1: r = 1.3, K = 100, α = 20 and β = 50<br />
• Population 2: r = 2.0, K = 80, α = 25 and β = 10<br />
Show the resulting roots in a table.<br />
14.5.4 Computing eigenvalues using the bisection method<br />
In this exercise, we use the function bisection to compute the eigenvalues of a real symmetric<br />
dense matrix A with real eigenvalues. The problem is to compute λ so that<br />
det(A − λI) = 0 .<br />
The determinant is computed using the QR factorization (which is available in LAPACK). The<br />
QR factorization computes an orthogonal matrix Q (Q T Q = I) and an upper triangular matrix<br />
R so that<br />
A − λI = QR .
278 CHAPTER 14. NUMERICAL EXERCISES<br />
We use the property that<br />
det(A − λI) = det(R)<br />
Since R is upper triangular, the determinant is the product of the diagonal elements of R.<br />
The matrices A are constructed as follows. Start with a simple case : the diagonal matrix with<br />
elements 1, 2, . . . , n on the main diagonal. Then do the tests <strong>for</strong> the same matrix multiplied<br />
on left and right by a random orthogonal matrix X, as in A = XDX T where D is a diagonal<br />
matrix.<br />
g the minimum of a convex function<br />
This exercise is a programming exercise on Newton’s method. First, we explain the method <strong>for</strong><br />
a function with a single variable, then we discuss the case of multivariate functions, and finally,<br />
we show a small application.<br />
14.6.1 Functions in one variable<br />
For a differentiable function f, the minimum ˜x is attained <strong>for</strong> f ′ (˜x) = 0. So, we must find the<br />
zero of the first order derivative. When f is a second order polynomial, we have<br />
then an extreme value of f is attained <strong>for</strong><br />
which is a minimum when f ′′ ≡ 2γ > 0.<br />
f = p := α + βx + γx 2<br />
f ′ = p ′ := β + 2γx<br />
˜x = − β<br />
2γ<br />
(14.6)<br />
For an arbitrary function, we do not have such simple explicit <strong>for</strong>mulae. We can use an iterative<br />
method, which is called Newton’s method. It is an iterative approach, i.e. we start from an<br />
initial guess ˜x and improve this value until it has converged to the minimum of the function. On<br />
each iteration we approximate the function by a degree two polynomial, <strong>for</strong> which the simple<br />
<strong>for</strong>mula (14.6) can be used. One way to compute such a degree two polynomial is to start from<br />
the Taylor expansion of f around ˜x :<br />
f(x) = f(˜x) + f ′ (˜x)(x − ˜x) + 1<br />
2 f ′′ (˜x)(x − ˜x) 2 + · · · .<br />
If we approximate f by the first 3 terms (i.e. a degree two polynomial), then we have<br />
f(x) ≈ p(x) := f(˜x) + f ′ (˜x)(x − ˜x) + 1<br />
2 f ′′ (˜x)(x − ˜x) 2 .<br />
If x is close to ˜x, |f(x) − p(x)| is small. The first order derivative is<br />
f ′ (x) ≈ p ′ (x) = f ′ (˜x) + f ′′ (˜x)(x − ˜x) .
14.6. THE NEWTON-RAPHSON METHOD FOR FINDING THE MINIMUM OF A CONVEX FUNCTION2<br />
Then p ′ (x) = 0 <strong>for</strong><br />
x = ˜x − f ′ (˜x)<br />
f ′′ (˜x)<br />
The Newton method goes as follows : In this algorithm τ is a tolerance <strong>for</strong> the stopping criterion.<br />
1<br />
2<br />
1. Given initial ˜x = x (0) .<br />
2. Repeat <strong>for</strong> j = 1, 2, . . .<br />
3<br />
2.1. Compute x (j) = x (j−1) − f ′ (x (j−1) )/f ′′ (x (j−1) 4<br />
)<br />
3. Until |f ′ (x (j−1) )/f ′′ (xj−1) 5<br />
)| < τ<br />
The iteration stops when the derivative is much smaller than the second order derivative. What<br />
happens when f ′′ (xj−1) = 0 ?<br />
14.6.2 Multivariate functions<br />
For multivariate functions, the principle is the same, but it is more complicated. A multivariate<br />
function f has an argument x ∈ R n , ie. a vector of size n. For example, f = sin(x1)+x2 cos(x1)<br />
is a multivariate function in the variables x1 and x2.<br />
We use the same idea as <strong>for</strong> one variable. That is, we use the Taylor expansion to approximate<br />
the function :<br />
f(x) � f(˜x) + ∇f(˜x) T (x − ˜x) + 1<br />
2 (x − ˜x)T H(f(˜x))(x − ˜x)<br />
⎛ ⎞<br />
∂f/∂x1<br />
⎜ ⎟<br />
∇f(˜x) = ⎝ . ⎠<br />
H(f) =<br />
⎡<br />
⎢<br />
⎣<br />
∂f/∂xn<br />
∂ 2 f/∂x1∂x1 · · · ∂ 2 f/∂x1∂xn<br />
.<br />
∂ 2 f/∂xn∂x1 · · · ∂ 2 f/∂xn∂xn<br />
where ∇f(˜x) is called the gradient vector and H(f) the Hessian matrix.<br />
The derivative becomes<br />
so, the derivative is zero when<br />
f ′ (x) = ∇f(˜x) + H(f(˜x))(x − ˜x)<br />
x = ˜x − {H(f(˜x))} −1 ∇f(˜x) .<br />
This requires the solution of an n × n linear system on each iteration. The Newton algorithm<br />
is very similar to the univariate case:<br />
14.6.3 Software <strong>for</strong> uni-variate functions<br />
The task is to first develop the function<br />
.<br />
⎤<br />
⎥<br />
⎦
280 CHAPTER 14. NUMERICAL EXERCISES<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
1. Given initial ˜x = x (0) ∈ R n .<br />
2. Repeat <strong>for</strong> j = 1, 2, . . .<br />
2.1. Compute d = {H(f(x (j−1) )} −1∇f(x (j−1 ))<br />
2.2. Compute x (j) = x (j−1) − d.<br />
3. Until �d�2 < τ<br />
template <br />
void newton raphson( X& x, Function& f, Derivative& d, SecondDerivative& s, double tau ) ;<br />
that computes the Newton-Raphson Algorithm 7. f, d, and s are functors that return the<br />
function value and the derivatives <strong>for</strong> the single argument x.<br />
Write documentation <strong>for</strong> the function and describe the conceptual conditions on Function,<br />
Derivative, and SecondDerivative.<br />
Next, you use this function to compute the minima <strong>for</strong> the following functions :<br />
• f = x 2 − 2x + 4<br />
• f = x 10<br />
• f = x + 5<br />
• f = −x 2 − 2x + 4<br />
14.6.4 Software <strong>for</strong> multi-variate functions<br />
The task is to first develop the function<br />
template <br />
void newton raphson( X& x, Function& f, Gradient& d, Hessian& h, double tau ) ;<br />
that computes the Newton-Raphson Algorithm 8. Note that in this case, g and h should return<br />
the resulting gradient vector and Hessian matrix respectively. Also write documentation and<br />
specify the conceptual constraints <strong>for</strong> the arguments.<br />
14.6.5 Application<br />
The following is an application <strong>for</strong> the multivariate case. Given a Hermitian matrix L ∈ R n×n ,<br />
then we want to solve the following optimization problem :<br />
min 1<br />
2 xT Lx<br />
s.t. x T x = 1<br />
We first introduce a Lagrange multiplier λ and rewrite this problem in the following <strong>for</strong>m. Find<br />
x and λ so that<br />
min f(x, λ) = 1<br />
2 xT Lx − 1<br />
2 λ(x∗x − 1)
14.7. SEQUENTIAL NOISE REDUCTION OF REAL-TIME MEASUREMENTS BY LEAST SQUARES281<br />
The gradient and Hessian are :<br />
∇f =<br />
H(f) =<br />
�<br />
Lx − λx<br />
xT �<br />
x − 1<br />
�<br />
L − λI −x<br />
−xT �<br />
0<br />
One can prove that the solution of this optimization problem is the smallest eigenvalue λ<br />
and associated normalized eigenvector x. This is a method <strong>for</strong> computing eigenvalues of large<br />
matrices.<br />
For solving a linear system with the Hessian, you can use the direct solver MUMPS or the<br />
iterative solver toolbox from GLAS.<br />
easurements by least squares<br />
Suppose, we want to measure a function f(t) <strong>for</strong> given time snapshots t1, . . . , tm. We know that<br />
the function is a polynomial of a given degree n − 1, but due to measurement errors, the data<br />
are noisy. If f is a polynomial,<br />
n�<br />
f(t) = ξjt j−1<br />
We could have a more general series, e.g.<br />
f(t) =<br />
where φj is the jth base function. With<br />
⎛ ⎞<br />
f(t1)<br />
⎜ ⎟<br />
b = ⎝ . ⎠<br />
f(tm)<br />
⎛ ⎞<br />
we have<br />
x =<br />
A =<br />
⎜<br />
⎝<br />
⎡<br />
⎢<br />
⎣<br />
ξ1<br />
.<br />
ξn<br />
⎟<br />
⎠<br />
j=1<br />
n�<br />
ξjφj(t)<br />
j=1<br />
φ1(t1) φ2(t1) · · · φn(t1)<br />
.<br />
.<br />
φ1(tm) φ2(tm) · · · φn(tm)<br />
⎤<br />
⎥<br />
⎦<br />
Ax = b (14.7)<br />
Note that (14.7) is an m×n linear system, where usually m ≫ n. This system is overdetermined,<br />
and so, due to errors in the data, it cannot be solved. However, we can solve the system in a<br />
least squares sense, i.e. find x so that<br />
min<br />
x �Ax − b�2 . (14.8)
282 CHAPTER 14. NUMERICAL EXERCISES<br />
When measurements come in sequentially, i.e. at time steps t1, t2, . . ., we receive at time step<br />
tj the jth row of A and the jth element of b. In the algorithms we now discuss, we have the<br />
following<br />
14.7.1 The least squares QR algorithm<br />
A numerically stable method <strong>for</strong> solving (14.8) is based on the QR factorization. The QR<br />
factorization of the m × n matrix A is<br />
A = QR<br />
with Q ∈ R m×n and unitary (Q T Q = I) and R ∈ R n×n upper triangular. If A has full rank,<br />
the diagonal elements of R are non-zero. Suppose we have computed the solution <strong>for</strong><br />
where<br />
�Akx − bk�2 min<br />
�Akx − bk�2 = �QkRkx − bk�2<br />
= �Rkx − Q T k bk�2<br />
We have to solve an upper triangular linear system. We can develop a ‘sequential’ method <strong>for</strong><br />
this QR decomposition without storing Q, but we will not discuss this any further.<br />
14.7.2 The least squares method via the normal equations<br />
One method to achieve this are the normal equations. That is, multiply (14.7) on the left by<br />
A T , then we obtain<br />
A T Ax = A T b (14.9)<br />
If the A has full column rank, the solution of x is unique and satisfies (14.8).<br />
14.7.3 Least squares Kalman filtering<br />
The Kalman filter is a method to solve the normal equations (14.9) in a step by step way, i.e.<br />
the measurements come in time step at time step. The Kalman filter adapts the least squares<br />
solution to the newly arrived data.<br />
Suppose we have computed the least squares solution of<br />
Akxk = bk<br />
where Ak are the first k rows of A and bk the first k elements of b with k ≥ n. Then we want<br />
to compute the least squares solution of<br />
Since<br />
Ak+1 =<br />
� Ak<br />
a T k+1<br />
Ak+1xk+1 = bk+1 .<br />
�<br />
�<br />
and bk+1 =<br />
bk<br />
f(tk+1)<br />
�
14.7. SEQUENTIAL NOISE REDUCTION OF REAL-TIME MEASUREMENTS BY LEAST SQUARES283<br />
we have, with gk = A T k bk, that<br />
A T k+1 Ak+1xk+1 = gk+1<br />
(A T k Ak + ak+1a T k+1 )xk+1 = gk + ak+1f(tk+1)<br />
With Mk = (A T k Ak) −1 ∈ R n×n , we derive from the Sherman-Morrison <strong>for</strong>mula, that<br />
and we also have that<br />
Mk+1 := (A T k Ak + ak+1a T k+1 )−1 = Mk − Mkak+1a T k+1 Mk<br />
1 + a T k+1 Mkak+1<br />
xk+1 = Mk+1(gk + ak+1f(tk+1)) = Mkgk + Mkak+1f(tk+1) − Mkak+1aT k+1<br />
1 + aT k+1Mkak+1 (Mkgk + Mkak+1f(tk+1))<br />
= xk +<br />
Mkak+1<br />
1 + aT k+1Mkak+1 (f(tk+1) − a T k+1xk) The Kalman method works as follows. We can use the LAPACK subroutine DGESV <strong>for</strong> com-<br />
1<br />
2<br />
3<br />
4<br />
5<br />
6<br />
7<br />
1. Solve Anxn = bn by taking the first n rows of A and b<br />
2. Let Mn = A−1 n A−T n<br />
3. For k = n + 1, . . . , m do:<br />
3.1. Compute the Kalman gain vector kk+1 = Mkak+1/(1 + aT k+1Mkak+1). 3.2. Update step: xk+1 = xk + kk+1(f(tk+1) − aT k+1xk). 3.3. Mk+1 = Mk − kkaT k+1Mk. puting A −1<br />
n .<br />
14.7.4 Software<br />
The goal is to write a function that computes the Kalman filter least squares. Because of the<br />
sequential character, we suggest to make a class with the following specifications:<br />
template <br />
class kalman {<br />
public:<br />
// Creation of the Kalman filter<br />
kalman( int n ) ;<br />
// Compute the first n observations and initialize the Kalman<br />
// filter (Steps 1 and 2 in the algorithm)<br />
// BaseFun is a binary functor.<br />
template <br />
void initialize( VIt t begin, VIt const& t end, BaseFun& base fun, F& f ) {<br />
...<br />
}<br />
template
284 CHAPTER 14. NUMERICAL EXERCISES<br />
void step( T const& t, Base& base, F const& f ) {<br />
...<br />
}<br />
public:<br />
// Return the solution<br />
typedef ... x type ;<br />
x type const& x() const { ... }<br />
private:<br />
...<br />
14.7.5 Test problems<br />
We now solve the following test problems. First, consider the following expansion:<br />
f(t) = ξ1 + ξ2 cos t + ξ3 sin t + ξ4cos2t + ξ5 sin 2t<br />
We compute the coefficients following the least squares criterion <strong>for</strong> the function<br />
f = (2 − 5 cos t)<br />
Print the solution x <strong>for</strong> each step of the Kalman filter and see how it changes. It should be very<br />
close to the function.<br />
Then apply random noise with relative size 0.0001:<br />
f = (2 − 5 cos t)(1 + 0.0001ɛ)<br />
where ɛ is a random number in [−1, 1]. Print the solution x <strong>for</strong> each step of the Kalman filter<br />
and see how it changes. It should be close to the solution of the function with ɛ ≡ 0.<br />
Plot the results using gnuplot.
Programmierprojekte<br />
Kapitel 15<br />
Die folgenden Hinweise betreffen alle Projekte.<br />
• Die Projekte werden vorzugsweise in Teams von 2 Studenten realisiert.<br />
• Jedes Team bekommt ein Repository in einem MTL4-Zweig zur Verfügung.<br />
• Das heißt auch, dass jeder Kursteilnehmer die Versionskontrollesoftware “subversion” lernen<br />
muss, siehe http://subversion.tigris.org/. Die Vorlesung von Greg Wilson<br />
gibt eine ausreichende Einführung in subversion, siehe http://software-carpentry.<br />
org/. Ich werde in der 2. Übung (19.4.) selbst eine kurze Einführung geben.<br />
• Die Projekte sollen mit einem Kommando gebildet (kompiliert, gelinkt) werden. Verwenden<br />
Sie möglichst “cmake”. 1 cmake ist bei jedem vernünftigen Linux mit dabei und<br />
müsste auch auf dem Pool vorhanden sein. Gibt es sogar für Windows: dort kann es die<br />
Projektdateien für Visual Studio erzeugen.<br />
• Schreiben Sie zuerst Tests für neue Features, bevor Sie sie implementieren.<br />
• Versuchen Sie, Ihre Rückfragen auf die Übungszeiten zu begrenzen.<br />
• Schreiben Sie eine doxygen-Dokumentation für Ihre Klassen und Funktionen (auf englisch).<br />
Schreiben Sie möglichst viele Beispiele. (Diese können gern von Ihren Tests abgeleitet<br />
sein.)<br />
– Formeln möglichst mit den Kommandos für L ATEX-Einfügungen (\f[ u.ä) erstellen.<br />
Bei dieser Gelegenheit lernt man häufig auch seine Linux-Installation besser kennen,<br />
da doxygen L ATEX nicht immer findet. Es ist hier keine Schande, Hilfe von befreundeten<br />
Hackern in Anspruch zu nehmen.<br />
15.1 Potenzieren von Matrizen A x<br />
Implementieren Sie Algorithmen für A x für verschiedene Matrixtypen und für x ∈ Q als x ∈ R.<br />
1 Notfalls reines “make” (siehe z.B. http://software-carpentry.org/build.html).<br />
285
286 KAPITEL 15. PROGRAMMIERPROJEKTE<br />
15.2 Exponisation von Matrizen e A<br />
Implementieren Sie Algorithmen für e A für verschiedene Matrixtypen, insbesondere schwach<br />
besetzte Matrizen. Nutzen Sie die in der MTL4 vorhandenen Algorithmen zum Lösen von<br />
Gleichungssystemen. Artikel von Cleve Moller, “19 dubios ways. . . ”<br />
15.3 LU-Zerlegung für m × n-Matrizen<br />
m, n L U<br />
m = n unteres Dreieck oberes Dreieck<br />
m > n Trapez oberes Dreieck<br />
m < n unteres Dreieck Trapez<br />
A = P · L · U (15.1)<br />
Bei L wird die Diagonale=1, daher nicht mit gespeichert. Berechung der Lösung eines Systems<br />
von Gleichungen und anschließende Fehlerberechung.<br />
Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_getrf.htm.<br />
15.4 Bunch-Kaufman Zerlegung<br />
A mit A = A T<br />
Implementiere die Zerlegung:<br />
• Überrschreibend,<br />
A = P · U · D · U T · P T<br />
(15.2)<br />
• und entwickle Funktionen zum Extrahieren von P , U und D aus dem resultierenden A.<br />
• Kopiere A, berechne die Zerlegung und gib P , U und D als Tuple zurück.<br />
Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_sytrf.htm.<br />
15.5 Konditionszahl (reziprok)<br />
• Im allgemeinen Fall LU nutzen.<br />
– Cholesky, wenn symmetrisch.<br />
∗ Gegebenenfalls Bunch-Kaufmann . . .
15.6. MATRIX-SKALIERUNG 287<br />
Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_gecon.htm.<br />
15.6 Matrix-Skalierung<br />
Für dicht und schach besetzte Matrizen Zeilen- und Spalten-Skalierungsfaktoren. Damit größter<br />
Matrixeintrag in jeder Zeile und Spalte 1 ist.<br />
Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lle/functn_geequ.htm.<br />
15.7 QR mit Überschreiben<br />
Implementieren Sie verschiedene eine Zerlegung<br />
mit Q orthogal/unitär für reelle/komplexe A. Realisieren Sie:<br />
• Eine überschreibende Faktorisierung wie in LAPACK,<br />
• Funktionen zum Extrahieren von Q und R,<br />
• Eine Version, die A kopiert und Q und R als Paar zurückgibt.<br />
• Schreiben Sie Tests oder Anwendungen.<br />
A = QR (15.3)<br />
Siehe http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_orgqr.htm,<br />
http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_ungqr.htm,<br />
http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_ormqr.htm,<br />
http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/lse/functn_unmqr.htm.<br />
15.8 Direkter Löser für schwach besetzte Matrizen<br />
Implementieren Sie einen direkten Löser rekursiv.<br />
• Die Matrix sollte hierarchisch als Quad-Baum dargestellt werden.<br />
• Die Operationen sollen auch rekursiv auf Blöcken durchgeführt werden:<br />
– Matrixaddition und -subtrakion,<br />
– Matrixmultiplikation,<br />
– Inverse von Teilbäumen<br />
– Pivotisierung auf<br />
∗ Spalte,
288 KAPITEL 15. PROGRAMMIERPROJEKTE<br />
∗ Zeile oder<br />
∗ Diagonale.<br />
Je nachdem, was am besten geeignet erscheint.<br />
– Die Pivotisierung muss natürlich durch eine Permutation repräsentiert werden.<br />
• Die Anwendung der Lösung auf einen Vektor möglichst auch rekursiv anwenden.<br />
– Das bedeutet, den Dreieckslöser auch rekursiv zu implementieren.<br />
Abbildung 15.1: Hierarchischer Ansatz.<br />
Dieses Projekt ist die größte Heraus<strong>for</strong>derung von allen und auch signifikante Teilergebnisse<br />
werden als Erfolg gewertet.<br />
15.9 Anwendung MTL4 auf Typen der Intervallarithmetik<br />
Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen der Intervallarithmetik,<br />
z.B. boost::interval.
15.10. ANWENDUNG MTL4 AUF TYPEN MIT HÖHERER GENAUIGKEIT 289<br />
15.10 Anwendung MTL4 auf Typen mit höherer Genauigkeit<br />
Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen mit höherer<br />
Genauigkeit, z.B. Gnu Multiprecision (GMP).<br />
15.11 Anwendung MTL4 auf AD-Typen<br />
Schreiben Sie Anwendungen von Matrizen und Vektoren für geeignete Typen des automatischen<br />
Differenzierens mit operatorüberladener Ableitung.
290 KAPITEL 15. PROGRAMMIERPROJEKTE
Acknowledgement<br />
Chapter 16<br />
Special thanks to Josef Weinbub, Carlos Giani, and Franz Stimpfl. These people are<br />
instrumental in the design and development of GSSE and this book. Thanks also goes to<br />
Michael Spevak <strong>for</strong> the development of some basic concepts and text parts <strong>for</strong> an early<br />
version of GSSE.<br />
Andrey Chesnokov, Yvette Vanberghen, Kris Demarsin and Yao Yue.<br />
students of the class “C ++ für Wissenschaftler” at <strong>Technische</strong> <strong>Universität</strong> <strong>Dresden</strong> <strong>for</strong> many<br />
fruitful discussion<br />
291
292 CHAPTER 16. ACKNOWLEDGEMENT
Bibliography<br />
[AG04] David Abrahams and Aleksey Gurtovoy. <strong>C++</strong> Template Metaprogramming: Concepts,<br />
Tools, and Techniques from Boost and Beyond. Addison-Wesley, 2004.<br />
[CE00] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative programming: methods,<br />
tools, and applications. ACM Press/Addison-Wesley Publishing Co., New York, NY,<br />
USA, 2000.<br />
[DHP03] Ionut Danaila, Frédéric Hecht, and Olivier Pironneau. Simulation Numérique en<br />
<strong>C++</strong>. Dunod, Paris, 2003.<br />
[ES90] Margaret A. Ellis and Bjarne Stroustrup. The Annotated <strong>C++</strong> Reference Manual.<br />
Addison-Wesley, 1990.<br />
[GA04] Aleksey Gurtovoy and David Abrahams. Boost Meta-Programming Library (MPL).<br />
Boost, 2004. www.boost.org/doc/libs/1_42_0/libs/mpl/doc/index.html.<br />
[Got11] Peter Gottschling. Mixed Complex Arithmetic. SimuNova, 2011.<br />
=https://simunova.zih.tu-dresden.de/mtl4/docs/mixed complex.html, Part of<br />
Matrix Template Library 4.<br />
[Kar05] Björn Karlsson. Beyond the <strong>C++</strong> standard library. Addison-Wesley, 2005.<br />
[SA05] Herb Sutter and Andrei Alexandrescu. <strong>C++</strong> coding standards. The <strong>C++</strong> in-depth<br />
series. Addison-Wesley, 2005.<br />
[Sch] Douglas C. Schmidt. <strong>C++</strong> programming language tutorials. http://www.cs.wustl.<br />
edu/~schmidt/<strong>C++</strong>.<br />
[Str97] Bjarne Stroustrup. The <strong>C++</strong> Programming Language. Addison-Wesley, 3rd edition,<br />
1997.<br />
293