21.01.2015 Views

Algorithmic Differentiation in Python with Application Examples

Algorithmic Differentiation in Python with Application Examples

Algorithmic Differentiation in Python with Application Examples

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Algorithmic</strong> <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong><br />

<strong>Application</strong> <strong>Examples</strong><br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong><br />

Wednesday, 10.07.2010<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 1 / 27


Part I: Intro to <strong>Algorithmic</strong> Diff.<br />

Comparison to Symbolic/Numerical <strong>Differentiation</strong><br />

The Forward Mode by Taylor Arithmetic<br />

The Reverse Mode<br />

Part II: Advanced <strong>Application</strong> <strong>Examples</strong><br />

<strong>Differentiation</strong> of Differential Equations<br />

<strong>Differentiation</strong> of Numerical L<strong>in</strong>ear Algebra Functions<br />

Optimum Experimental Design<br />

Standard Reference:<br />

Griewank, Evaluat<strong>in</strong>g Derivatives<br />

x(t; p)<br />

4<br />

3<br />

2<br />

1<br />

0<br />

−1<br />

−2<br />

−3<br />

−4<br />

0 1 2 3 4 5<br />

t<br />

10<br />

control func. u(t)<br />

8 state x(t)<br />

dx/dp1(t)<br />

6<br />

dx/dp2(t)<br />

x1(t; p = 1.0)<br />

x2(t; p = 1.0)<br />

dx1/dp(t; p = 1.0)<br />

dx2/dp(t; p = 1.0)<br />

h(t, x, p)<br />

dh/dp(t, x, p)<br />

d 2 x1/dp 2 (t; p = 1.0)<br />

d 2 x2/dp 2 (t; p = 1.0)<br />

state x(t)<br />

4<br />

2<br />

Research Community Website:<br />

www.autodiff.org<br />

0<br />

−2<br />

−4<br />

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0<br />

time t [sec]<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 2 / 27


What is <strong>Algorithmic</strong> <strong>Differentiation</strong> (AD)<br />

Name confusion: <strong>Algorithmic</strong> <strong>Differentiation</strong> aka Automatic<br />

<strong>Differentiation</strong> aka Computational <strong>Differentiation</strong> aka AD<br />

Considered one of the most important algorithmic techniques “<strong>in</strong>vented”<br />

<strong>in</strong> the 20’th century 1<br />

Can be used to differentiate large-scale problems, e.g. <strong>in</strong> PDE<br />

constra<strong>in</strong>ed optimization.<br />

Generally much more efficient than symbolic/numerical differentiation<br />

and also accurate close to mach<strong>in</strong>e precision<br />

1 Nick Trefethen, http://www.comlab.ox.ac.uk/nick.trefethen/<strong>in</strong>ventorstalk.pdf<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 3 / 27


Software Used <strong>in</strong> this Talk:<br />

Name Description Status LOC<br />

algopy forward/reverse UTPM <strong>in</strong> <strong>Python</strong> alpha 10388<br />

www.github.com/b45ch1/algopy<br />

pysolv<strong>in</strong>d <strong>Python</strong> B<strong>in</strong>d<strong>in</strong>gs to SolvIND/DAESOL-II alpha 9743<br />

pyadolc <strong>Python</strong> B<strong>in</strong>d<strong>in</strong>gs to ADOL-C (C++) stable 6895<br />

www.github.com/b45ch1/pyadolc<br />

pycppad <strong>Python</strong> B<strong>in</strong>d<strong>in</strong>gs to CppAD (C++ ) stable 1334<br />

www.github.com/b45ch1/pycppad<br />

taylorpoly ANSI-C <strong>with</strong> <strong>Python</strong> b<strong>in</strong>d<strong>in</strong>gs alpha 9276<br />

www.github.com/b45ch1/taylorpoly<br />

easyodoe Opt. Exp. prototype alpha 8345<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 4 / 27


Why not Symbolic <strong>Differentiation</strong><br />

A Raytrac<strong>in</strong>g Example<br />

1.0<br />

0.5<br />

0.0 8<br />

6<br />

−0.5<br />

1<br />

3<br />

5<br />

7<br />

9<br />

Cyl<strong>in</strong>drical mirror described by 0 = g(x) = x 2 1 + x2 2 − 1,<br />

laser beam enters at x = (0, −1) and direction v.<br />

Recursive algorithm for next reflection po<strong>in</strong>t x + and<br />

direction v + :<br />

„ x +<br />

v + «<br />

0<br />

B<br />

= F(x, v) = @ x +<br />

r !<br />

“ ” x T 2<br />

v ‖x‖<br />

‖v‖ − 2 −1<br />

2 ‖v‖ 2 − xT v<br />

‖v‖ 2<br />

P(x + )v<br />

4<br />

2<br />

0<br />

−1.0<br />

−1.0 −0.5 0.0 0.5 1.0<br />

where P(x + ) = I − 2 wwT<br />

‖w‖ 2 , w = w(x + ) = ∇ xg(x + )<br />

Goal: compute sensitivity of the 10’th reflection po<strong>in</strong>t x (10) w.r.t. <strong>in</strong>itial<br />

direction v (0) , i.e. dx(10)<br />

dv (0) .<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 5 / 27


Why not Symbolic <strong>Differentiation</strong> (cont)<br />

Compute recursively x (k) , v (k) = F(x (k−1) , v (k−1) ) as symbolic<br />

expression<br />

Use sum,product and cha<strong>in</strong>rule to compute the wanted derivative<br />

Problem: Expression swell (show live example)<br />

import p y l a b ; import numpy ; from numpy import s q r t , dot , cos , s i n , pi , l i n<br />

import sympy ; from sympy import s q r t<br />

def F ( x , v ) :<br />

””” computes next r e f l e c t i o n p o i n t x and d i r e c t i o n v ”””<br />

c = d o t ( v , v )<br />

x2 = [ x [ 0 ] + v [ 0 ] ∗ ( s q r t ( ( d o t ( x , v ) / c )∗∗2 −( d o t ( x , x ) − 1 . ) / c)− d o t ( x , v ) / c ) ,<br />

x [ 1 ] + v [ 1 ] ∗ ( s q r t ( ( d o t ( x , v ) / c )∗∗2 −( d o t ( x , x ) − 1 . ) / c)− d o t ( x , v ) / c ) ]<br />

w = x2<br />

v2 = [ ( v [ 0 ] − 2∗ w[ 0 ] ∗ d o t (w, v ) / d o t (w,w) ) ,<br />

( v [ 1 ] − 2∗ w[ 1 ] ∗ d o t (w, v ) / d o t (w,w ) ) ]<br />

return x2 , v2<br />

x1 , x2 , v1 , v2 = sympy . symbols ( ’ x1 ’ , ’ x2 ’ , ’ v1 ’ , ’ v2 ’ )<br />

x = [ x1 , x2 ] ; v = [ v1 , v2 ]<br />

x , v = F ( x , v )<br />

p r i n t ’x , v=\n ’ , x , v<br />

#x , v = F ( x , v )<br />

Sebastian # p rF. i Walter, n t ’x Humboldt-Universität , v=\n ’ , x , v zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 6 / 27


Why not F<strong>in</strong>ite Differences<br />

Problem: F<strong>in</strong>ite Precision Arithmetic<br />

f (x) true value, ˜f (x) numerically computed value (assume x = ˜x)<br />

d˜f (x; v) = ˜f (x + tv) − ˜f (x)<br />

= f (x + tv) + δ 1 − f (x) + δ 2<br />

t<br />

t<br />

=<br />

f (x + tv) − f (x)<br />

+ δ 1 + δ 2<br />

t<br />

t<br />

=<br />

r(x; tv)<br />

df (x; v) − + δ 1 + δ 2<br />

,<br />

t t<br />

where δ 1 and δ 2 random errors due to f<strong>in</strong>ite precision arithmetic<br />

Difference numerical and true derivative:<br />

d˜f (x; v) − df (x; v) =<br />

r(x; tv)<br />

− δ 1 + δ 2<br />

,<br />

} {{<br />

t<br />

} } {{<br />

t<br />

}<br />

Question: What is the best t ∈ R<br />

t→0<br />

→ 0<br />

t→0<br />

→ ∞<br />

if f ∈ C 2 (R), then r(x; tv)/t = f ′′ (ξ)t and therefore t =<br />

√<br />

δ1 +δ 2<br />

f ′′ (ξ)<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 7 / 27


Why not F<strong>in</strong>ite Differences (cont.)<br />

absolute FD error<br />

10 33<br />

10 30<br />

10 27<br />

10 24<br />

10 21<br />

10 18<br />

10 15<br />

10 12<br />

10 9<br />

10 6<br />

10 3<br />

test function: f(x) = 1 + s<strong>in</strong>(x), x = 1<br />

FD 1st order<br />

FD 2nd order<br />

FD 3rd order<br />

10 0<br />

10 −3<br />

10 −6<br />

10 −9<br />

10 −16 10 −14 10 −12 10 −10 10 −8 10 −6 10 −4 10 −2 10 0<br />

step width t<br />

mach<strong>in</strong>e EPS ≈ 10 −16 for 64bit IEEE-754 floats<br />

higher-order derivatives by FD quickly get to large<br />

best t is not known a priory and often has to be guessed by careful tests<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 8 / 27


Part I:<br />

Intro <strong>Algorithmic</strong> <strong>Differentiation</strong><br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong>Wednesday, 10.07.2010 9 / 27


Computational Model and the Evaluation Trace<br />

All computer programs are a sequence of elementary functions<br />

φ l ∈ {+, −, ∗, /, s<strong>in</strong>, exp, . . . }<br />

Symbolic dependency is resolved at each elementary function:<br />

pushforward of numerical values v j≺l<br />

Example: Evaluate Function f (3, 7):<br />

f : R 2 → R<br />

x ↦→ y = f (x) = s<strong>in</strong>(x 1 +cos(x 2 )∗x 1 )<br />

Computational Graph:<br />

Computational Trace:<br />

1 Id<br />

<strong>in</strong>dependent v −1 = x 1 = 3<br />

<strong>in</strong>dependent v 0 = x 2 = 7<br />

v 1 = φ 1 (v 0 ) = cos(v 0 )<br />

2 cos<br />

v 2 = φ 2 (v 1 , v −1 ) = v 1 v −1<br />

v 3 = φ 3 (v −1 , v 2 ) = v −1 + v 2<br />

3 __mul__<br />

v 4 = φ 4 (v 3 ) = s<strong>in</strong>(v 3 )<br />

4 __add__<br />

5 s<strong>in</strong><br />

0 Id<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 10 / 27


Code Trac<strong>in</strong>g <strong>with</strong> PYADOLC and ALGOPY<br />

import a d o l c ; import a l gopy ; import numpy ; from numpy import s i n , cos ;<br />

def f ( x ) :<br />

return s i n ( x [ 0 ] + cos ( x [ 1 ] ) ∗ x [ 0 ] )<br />

a d o l c . t r a c e o n ( 1 )<br />

x = a d o l c . a d o u b l e ( [ 3 , 7 ] ) ; a d o l c . i n d e p e n d e n t ( x )<br />

y = f ( x )<br />

a d o l c . d e p e n d e n t ( y ) ; a d o l c . t r a c e o f f ( )<br />

a d o l c . t a p e t o l a t e x ( 1 , [ 3 , 7 ] , [ 0 . ] )<br />

cg = a l g o p y . CGraph ( )<br />

x = [ a l g o p y . F u n c t i o n ( 3 . ) , a l g o p y . F u n c t i o n ( 7 . ) ]<br />

y = f ( x )<br />

cg . t r a c e o f f ( )<br />

cg . i n d e p e n d e n t F u n c t i o n L i s t = [ x [ 0 ] , x [ 1 ] ] ; cg . d e p e n d e n t F u n c t i o n L i s t = [ y ]<br />

cg . p l o t ( ’ c g r a p h s i m p l e f u n c t i o n . svg ’ )<br />

code op loc loc loc loc double double value value val<br />

33 start of tape<br />

39 take stock op 2 0 3.000000e + 00 n<br />

1 assign <strong>in</strong>d 0 3.000000e + 00<br />

1 assign <strong>in</strong>d 1 7.000000e + 00<br />

20 cos op 1 3 2 7.000000e + 00 6.569866e −<br />

15 mult a a 2 0 3 7.539023e − 01 3.000000e +<br />

11 plus a a 0 3 4 3.000000e + 00 2.261707e +<br />

21 s<strong>in</strong> op 4 6 5 5.261707e + 00 5.221055e −<br />

2 assign dep 5<br />

0 death not 0 6 −8.528809e −<br />

Sebastian32 F. Walter, endHumboldt-Universität of tape<br />

zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 11 / 27


PART I.1:<br />

The Forward Mode of AD by<br />

Univariate Taylor Polynomial (UTP) Arithmetic<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 12 / 27


Univariate Taylor Polynomial Arithmetic (UTP)<br />

Basic Observation 1: Let f : R N → R, then<br />

d<br />

dt f (x + e it)<br />

∣ = (∇ x f (x)) T · e i = ∂f<br />

t=0<br />

∂x i<br />

Basic Observation 2: Hessian<br />

d 2<br />

f (x + e i t 1 + e j t 2 ) ∣<br />

dt 1 dt 2<br />

∣<br />

t1 =t 2 =0<br />

= e T i ∇ 2 xf (x)e j = ∂2 f<br />

∂x i ∂x j<br />

e i = (0, . . . , 1, . . . , 0) is the i’th cartesian basis vector.<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 13 / 27


Univariate Taylor Polynomial Arithmetic (UTP) (cont.)<br />

Problem can be formulated as arithmetic on univariate Taylor<br />

polynomials (UTP)<br />

D−1<br />

∑<br />

[x] D = [x 0 , . . . , x D−1 ] = x d T d ∈ R(T)/(T D ) ,<br />

d=0<br />

T is an <strong>in</strong>determ<strong>in</strong>ate, i.e. a formal parameter<br />

x d ∈ R is called Taylor coefficient<br />

Def<strong>in</strong>e extension of Functions f : R → R, y = f (x):<br />

E D (f ) : R[T]/(T D ) → R[T]/(T D )<br />

[x] D ↦→ [y] D := ∑ 1 d d D−1<br />

d! dt d f ( ∑<br />

x d t d )<br />

T d ,<br />

∣<br />

d=0<br />

k=0<br />

} {{ t=0<br />

}<br />

≡y d<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 14 / 27


Univariate Taylor Polynomial Arithmetic (UTP) (cont.)<br />

Let f (x) = (h ◦ g)(x) = h(g(x)) be a composite function, then<br />

E D (f ) = E D (h) ◦ E D (g) .<br />

I.e. E D is a homomorphism that preserves the function composition.<br />

Therefore: Need algorithms to compute<br />

[y 0 , . . . , y D−1 ] = E D (φ)([x 0 , . . . , x D−1 ])<br />

only for the elementary functions φ ∈ {+, −, ∗, /, . . . } !<br />

Suggests implementation by function and operator overload<strong>in</strong>g, i.e.<br />

univariate Taylor polynomial (UTP) arithmetic.<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 15 / 27


Algorithms for Univariate Taylor Polynomials over Scalars (UTPS)<br />

b<strong>in</strong>ary operations<br />

unary operations<br />

z = φ(x, y) d = 0, . . . , D OPS MOVES<br />

x + cy z d = x d + cy d 2D 3D<br />

x × y z d = P d<br />

k=0 h<br />

x ky d−k D 2 3D<br />

x/y z d = 1 x y d − P i<br />

d−1<br />

0 k=0 z ky d−k D 2 3D<br />

y = φ(x) d = 0, . . . , D OPS MOVES<br />

h<br />

ln(x) ỹ d = 1 ˜x x d − P i<br />

d−1<br />

0 k=1 x d−kỹ k D 2 2D<br />

exp(x) ỹ d = P d<br />

k=1 y d−k˜x k D 2 2D<br />

√ h<br />

x yd = 1 x 2y d − P i<br />

d−1<br />

0 k=1 y 1<br />

ky d−k 2 D2 3D<br />

h<br />

x r ỹ d = 1 r P d<br />

x 0 k=1 y d−k˜x k − P i<br />

d−1<br />

k=1 x d−kỹ k 2D 2 2D<br />

s<strong>in</strong>(v) ˜s d = P d<br />

j=1 ṽjc d−j 2D 2 3D<br />

cos(v) ˜c d = P d<br />

j=1 −ṽ js d−j<br />

tan(v) ˜φd = P d<br />

j=1 w d−jṽ j<br />

˜w d = 2 P d<br />

j=1 φ d−j “<br />

˜φ j<br />

arcs<strong>in</strong>(v) ˜φd = w −1<br />

0 ṽ d − P d−1<br />

j=1 w d−j ˜φ<br />

”<br />

j<br />

˜w d = − P d<br />

j=1 v d−j “<br />

˜φ j<br />

arctan(v) ˜φd = w −1<br />

0 ṽ d − P d−1<br />

j=1 w d−j ˜φ<br />

”<br />

j<br />

˜w d = 2 P d<br />

j=1 v d−jṽ j<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 16 / 27


Live Example: Directional Derivatives us<strong>in</strong>g TAYLORPOLY<br />

Interpretation: extract derivatives from Taylor coefficients<br />

if [x] D = [x 0 , 1, 0, 0, . . . , 0], then<br />

y d = 1 d d D−1<br />

d! dt d f ( ∑<br />

x d t d )<br />

= dd f<br />

∣ dx d (x 0)1 ,<br />

t=0<br />

Example: f : R 2 → R<br />

k=0<br />

x ↦→ y = f (x) = s<strong>in</strong>(x 1 + cos(x 2 )x 1 )<br />

(( ) ( )∣<br />

Compute df<br />

dx 1<br />

(3, 7) = d 3 1 ∣∣∣t=0<br />

dt f + t<br />

7 0)<br />

import numpy ; from numpy import s i n , cos ; from t a y l o r p o l y import UTPS<br />

def f ( x ) :<br />

return s i n ( x [ 0 ] + cos ( x [ 1 ] ) ∗ x [ 0 ] ) + x [ 1 ] ∗ x [ 0 ]<br />

x = [ UTPS ( [ 3 , 1 ] ) , UTPS ( [ 7 , 0 ] ) ]<br />

y = f ( x )<br />

p r i n t ’ normal f u n c t i o n e v a l u a t i o n y 0 = f ( x 0 ) = ’ , y . d a t a [ 0 ]<br />

p r i n t ’ g r a d i e n t e v a l u a t i o n df / dx 1 = ’ , y . d a t a [ 1 ]<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 17 / 27


PART I.2:<br />

The Reverse Mode of AD<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 18 / 27


The Reverse Mode by Hand:<br />

Recall: y = f (x) = s<strong>in</strong>(x 1 + cos(x 2 )x 1 )<br />

<strong>in</strong>dependent v −1 = x 1 = 3<br />

<strong>in</strong>dependent v 0 = x 2 = 7<br />

v 1 = φ 1 (v 0 ) = cos(v 0 )<br />

v 2 = φ 2 (v 1 , v −1 ) = v 1 v −1<br />

v 3 = φ 3 (v −1 , v 2 ) = v −1 + v 2<br />

v 4 = φ 4 (v 3 ) = s<strong>in</strong>(v 3 )<br />

dependent y = v 4<br />

Reverse Mode by Hand: Successive Pullbacks<br />

dy = dφ 4 (v 3 ) = ∂φ 4(z)<br />

∂z<br />

˛ dv 3 = cos(v 3 )<br />

˛z=v3<br />

= ¯v 3 dφ 3 (v −1 , v 2 ) = ¯v 3 dv −1 + ¯v 3<br />

|{z} |{z}<br />

=¯v −1<br />

= (¯v −1 + ¯v 2 v 1 ) dv −1 + ¯v 2 v −1 dv 1<br />

| {z } | {z }<br />

=¯v −1<br />

=¯v 1<br />

= ¯v −1 dv −1 + (−¯v 1 s<strong>in</strong>(v 0 )) dv 0<br />

| {z }<br />

=¯v 0<br />

Interpretation: ¯v −1 ≡ df and ¯v dx 0 ≡ df<br />

1 dx 2<br />

Need to store v 0 , v 1 , v 3 , v 4 for the reverse mode!<br />

dv 3<br />

| {z }<br />

=¯v 3<br />

=¯v 2<br />

dv 2<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 19 / 27


Semi-Automatic Forward/Reverse Mode by Manual Trac<strong>in</strong>g<br />

import numpy ; from numpy import s i n , cos ; from t a y l o r p o l y import UTPS<br />

x1 = UTPS ( [ 3 , 1 , 0 ] , P = 2 ) ; x2 = UTPS ( [ 7 , 0 , 1 ] , P=2)<br />

# forward mode<br />

vm1 = x1 ; v0 = x2<br />

v1 = cos ( v0 )<br />

v2 = v1 ∗ vm1<br />

v3 = vm1 + v2<br />

y = v4 = s i n ( v3 )<br />

# r e v e r s e mode<br />

v4bar = UTPS ( [ 0 , 0 , 0 ] , P = 2 ) ; v3bar = UTPS ( [ 0 , 0 , 0 ] , P=2)<br />

v2bar = UTPS ( [ 0 , 0 , 0 ] , P = 2 ) ; v1bar = UTPS ( [ 0 , 0 , 0 ] , P=2)<br />

v0bar = UTPS ( [ 0 , 0 , 0 ] , P = 2 ) ; vm1bar = UTPS ( [ 0 , 0 , 0 ] , P=2)<br />

v4bar . d a t a [ 0 ] = 1 .<br />

v3bar += v4bar ∗ cos ( v3 )<br />

vm1bar += v3bar ; v2bar += v3bar<br />

v1bar += v2bar ∗ vm1 ; vm1bar += v2bar ∗ v1<br />

v0bar −= v1bar ∗ s i n ( v0 )<br />

g1 = y . d a t a [ 1 : ] ; g2 = numpy . a r r a y ( [ vm1bar . d a t a [ 0 ] , v0bar . d a t a [ 0 ] ] )<br />

p r i n t ’ f o r w a r d g r a d i e n t g ( x 0 )=\ n ’ , g1 , ’\ n r e v e r s e g r a d i e n t g ( x 0 )=\ n ’ , g2<br />

p r i n t ’ H e s s i a n H( x 0 )=\ n ’ , numpy . v s t a c k ( [ vm1bar . d a t a [ 1 : ] , v0bar . d a t a [ 1 : ] ] )<br />

can automatize this us<strong>in</strong>g a code tracer or source code transformation,<br />

e.g. <strong>with</strong> PYADOLC or ALGOPY<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 20 / 27


Forward Mode vs Reverse Mode<br />

Task: compute Jacobian J = dF<br />

dx for F : RN → R M<br />

Forward Mode:<br />

J = dF<br />

dx · S ,<br />

where S = I ∈ R N×N .<br />

Reverse Mode:<br />

J = ¯S T · dF<br />

dx ,<br />

where ¯S ∈ R M×M .<br />

Gradient: The number of arithmetic operations (OPS) for the gradient<br />

evaluation ∇f (x) ∈ R N is only a small constant multiple of the OPS for<br />

the function f itself.<br />

Example: If f : R 2500 → R and runtime(f )=30 sec then SD/FD would<br />

require about 2500 ∗ 30 sec ≈ 21 hours but only a couple of m<strong>in</strong>utes<br />

us<strong>in</strong>g AD<br />

Mode Operations Memory<br />

Forward ∝ N OPS(F) MEM(J) N MEM(F)<br />

Reverse ∝ M OPS(F) MEM(J) ∝ OPS(F)<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 21 / 27


M<strong>in</strong>imal Surface problem <strong>with</strong> PYADOLC<br />

Example where the Reverse Mode Excels<br />

(discretized) objective function:<br />

u : [0, 1] 2 → R , u ∈ C 1<br />

s<br />

Z 1 Z 1<br />

u ↦→<br />

1 +<br />

0 0<br />

≈<br />

m−1 X m−1 X<br />

O ij (u)<br />

„ « ∂u 2<br />

+<br />

∂x<br />

„ « ∂u 2<br />

∂y<br />

i=0 j=0<br />

"<br />

Õ ij (u) := h 2 1 + (u #<br />

i+1,j+1 − u i,j ) 2 + (u i,j+1 − u i+1,j ) 2<br />

4<br />

Nonl<strong>in</strong>ear Program <strong>with</strong> box constra<strong>in</strong>ts:<br />

u ∗ ∈ R m×m = argm<strong>in</strong> u Õ(u)<br />

therefore ∇ u Õ(u) ∈ R m×m , e.g. m = 50<br />

yields a gradient <strong>with</strong> 2500 elements ⇒ use<br />

reverse mode<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 22 / 27


A M<strong>in</strong>imal Surface Problem<br />

part of unit test of pyadolc: /pyadolc/tests/complicated tests.py<br />

import a d o l c ; import numpy ;<br />

def O t i l d e ( u ) :<br />

””” o b j e c t i v e f u n c t i o n of the m<strong>in</strong>imal s u r f a c e problem ”””<br />

M = numpy . shape ( u ) [ 0 ]<br />

h = 1 . / (M−1)<br />

return M∗∗2∗h∗∗2+numpy . sum ( 0 . 2 5 ∗ ( ( u [ 1 : , 1 : ] − u [0: −1 ,0: −1])∗∗2+( u [ 1 : , 0 : − 1<br />

M = 5 0 ; h = 1 . /M; u = numpy . z e r o s ( (M,M) , d t y p e = f l o a t )<br />

u [ 0 , : ] = [ numpy . s i n ( numpy . p i ∗ j ∗h / 2 . ) f o r j <strong>in</strong> r a n g e (M) ]<br />

u [ −1 ,:] = [ numpy . exp ( numpy . p i / 2 ) ∗ numpy . s i n ( numpy . p i ∗ j ∗ h / 2 . ) f o r j<br />

u [ : , 0 ] = 0<br />

u [: , −1]= [ numpy . exp ( i ∗h∗numpy . p i / 2 . ) f o r i <strong>in</strong> r a n g e (M) ]<br />

# t r a c e the o b j e c t i v e f u n c t i o n<br />

a d o l c . t r a c e o n ( 1 )<br />

au = a d o l c . a d o u b l e ( u )<br />

a d o l c . i n d e p e n d e n t ( au )<br />

ay = O t i l d e ( au )<br />

a d o l c . d e p e n d e n t ( ay )<br />

a d o l c . t r a c e o f f ( )<br />

# compute g r a d i e n t<br />

g AD = a d o l c . g r a d i e n t ( 1 , numpy . r a v e l ( u ) ) . r e s h a p e ( numpy . shape ( u ) )<br />

g AD [ : , 0 ] = 0 ; g AD [ 0 , : ] = 0 ; g AD [: , −1] = 0 ; g AD [ −1 ,:] = 0 # on the ed<br />

# compute dot ( Hessian , v ) , v random v e c t o r<br />

Hv AD = a d o l c . h e s s v e c ( 1 , numpy . r a v e l ( u ) , numpy . random . r and ( u . s i z e ) )<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 23 / 27


PART II:<br />

Advanced <strong>Application</strong>s <strong>Examples</strong><br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 24 / 27


Optimum Experimental Design <strong>in</strong> Chemical Eng<strong>in</strong>eer<strong>in</strong>g<br />

Tetramethyl<br />

Cyclohexadien<br />

+<br />

k2 + Cat<br />

Pi−Complex +<br />

Cat<br />

λ<br />

Male<strong>in</strong>acid<br />

Anhydrid<br />

Male<strong>in</strong>acid<br />

Anhydrid<br />

Deactiv. Cat<br />

k1<br />

k3<br />

Diels−Alder<br />

Product<br />

− Cat<br />

non-catalyzed and catalyzed reaction path<br />

deactivation of the catalyst<br />

batch process<br />

measurements: product mass concentration<br />

control of educt molar numbers, catalyst<br />

concentration, temperature profile<br />

five unknown model parameters<br />

ṅ 1 = −k · n1 · n 2<br />

m tot<br />

, n 1 (0) = n a1<br />

ṅ 2 = −k · n1 · n 2<br />

m tot<br />

, n 2 (0) = n a2<br />

ṅ 3 = k · n1 · n 2<br />

m tot<br />

, n 3 (0) = 0<br />

k = k 1 · exp − E 1<br />

R ·<br />

1<br />

T − 1<br />

!!<br />

T ref<br />

+ k kat · c kat · exp (−λ · t) · exp − E kat<br />

R ·<br />

n 4 = n a4 T = ϑ + 273<br />

m tot = n 1 · M 1 + n 2 · M 2 + n 3 · M 3 + n 4 · M 4<br />

1<br />

T − 1<br />

!!<br />

T ref<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 25 / 27


Objective Function of Opt. Exp. Design<br />

Part I: Computation of J 1 and J 2<br />

J 1 [n mts, :] =<br />

√ wmts<br />

d<br />

(h(tnmts, x(tnmts; s, u(tnmts; q), p)))<br />

σ nmts (x(t nmts ; s, u(t nmts ; q), q) d(p, s)<br />

J 2 =<br />

d<br />

r(q, p, s)<br />

d(p, s)<br />

Part II: Numerical L<strong>in</strong>ear Algebra<br />

„<br />

J T<br />

C(J 1 , J 2 ) = (I, 0) 1 J 1 J2<br />

T « −1 „ « I<br />

J 2 0 0<br />

”<br />

=<br />

“Q T 2 (Q 2J1 T J 1Q T 2 )−1 Q 2<br />

Φ = λ 1 (C) , max. eigenvalue<br />

where J2 T = (QT 1 , QT 2 )(L, 0)T<br />

Computational Graph<br />

[p]<br />

[h], [r] [J 1 ], [J 2 ] [C] [Φ]<br />

[q]<br />

[s] [x 0 ] [x 1 ] [x 2 ] [x 3 ] [x 4 ] . . . [x N mts−1] [x N mts ]<br />

statex(t) atmeasurementtimes (mts)<br />

<strong>in</strong>dependent/dependent variables<br />

N mts Number measurement times, w measurement weight, σ std of a measurement, q controls, p nature<br />

Sebastian givenF. Walter, parameter, Humboldt-Universität s pseudo-Parameter zu Berl<strong>in</strong> <strong>Algorithmic</strong> ()(e.g. <strong>in</strong>itial <strong>Differentiation</strong> values), <strong>in</strong> u<strong>Python</strong> control <strong>with</strong> functions <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 26 / 27


Algorithm: Forward UTPM of the Rectangular QR Decomposition<br />

<strong>in</strong>put : [A] D = [A 0 , . . . , A D− 1], where A d ∈ R M×N , d = 0, . . . , D − 1, M ≥ N.<br />

output: [Q] D = [Q 0 , . . . , Q D− 1] matrix <strong>with</strong> orthonormal column vectors, where Q d ∈ R M×N ,<br />

d = 0, . . . , D − 1<br />

output: [R] D = [R 0 , . . . , R D− 1] upper triangular, where R d ∈ R N×N , d = 0, . . . , D − 1<br />

Q 0 , R 0 = qr (A 0 )<br />

for d = 1 to D − 1 do<br />

∆F = A d − P d−1<br />

k=1 Q d−kR k<br />

S = − 1 P d−1<br />

2 k=1 QT d−k Q k<br />

P L ◦ X = P L ◦ (Q T 0 ∆FR−1 0 − S)<br />

X = P L ◦ X − (P L ◦ X) T<br />

R d = Q T 0 ∆F − (S + X)R 0<br />

Q d = (∆F − Q 0 R d )R −1<br />

0<br />

end<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 27 / 27


Example for <strong>Differentiation</strong> of Numerical L<strong>in</strong>ear Algebra<br />

Compute directional derivatives of the largest eigenvalue,<br />

„<br />

J T<br />

∇ qλ max (I, 0) 1 J 1 J2<br />

T « −1 „ « ! I<br />

.<br />

J 2 0 0<br />

import numpy<br />

from a l g o p y import CGraph , F unction , UTPM, dot , <strong>in</strong>v , z e r o s , e i g h<br />

def P h i f c n (C ) :<br />

””” return max e i g e n v a l u e ”””<br />

return e i g h (C) [ 0 ] [ − 1 ]<br />

def Cfcn ( J1 , J2 ) :<br />

””” compute c o v a r i a n c e matrix ”””<br />

Np = J1 . shape [ 1 ] ; Nr = J2 . shape [ 0 ]<br />

tmp = z e r o s ( ( Np+Nr , Np+Nr ) , d t y p e =J1 )<br />

tmp [ : Np , : Np ] = d o t ( J1 . T , J1 )<br />

tmp [ Np : , : Np ] = J2<br />

tmp [ : Np , Np : ] = J2 . T<br />

return i n v ( tmp ) [ : Np , : Np ]<br />

D, P ,Nm, Np , Nr = 2 , 1 , 2 0 0 0 , 6 , 3<br />

cg = CGraph ( )<br />

J1 = F u n c t i o n (UTPM( numpy . random . r and (D, P ,Nm, Np ) ) )<br />

J2 = F u n c t i o n (UTPM( numpy . random . r and (D, P , Nr , Np ) ) )<br />

Phi = P h i f c n ( Cfcn ( J1 , J2 ) )<br />

cg . i n d e p e n d e n t F u n c t i o n L i s t = [ J1 , J2 ] ; cg . d e p e n d e n t F u n c t i o n L i s t = [ Phi ]<br />

p r i n t ’ o b j e c t i v e f u n c t i o n Phi =\n ’ , Phi<br />

Sebastian # cg F. . Walter, p l o t Humboldt-Universität ( ’ odoe cgraph zu Berl<strong>in</strong> . svg <strong>Algorithmic</strong> () ’ ) <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 28 / 27


Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 29 / 27


Summary: Software Used <strong>in</strong> this Talk:<br />

Name Description Status LOC<br />

algopy forward/reverse UTPM <strong>in</strong> <strong>Python</strong> alpha 10388<br />

www.github.com/b45ch1/algopy<br />

pysolv<strong>in</strong>d <strong>Python</strong> B<strong>in</strong>d<strong>in</strong>gs to SolvIND/DAESOL-II alpha 9743<br />

pyadolc <strong>Python</strong> B<strong>in</strong>d<strong>in</strong>gs to ADOL-C (C++) stable 6895<br />

www.github.com/b45ch1/pyadolc<br />

pycppad <strong>Python</strong> B<strong>in</strong>d<strong>in</strong>gs to CppAD (C++ ) stable 1334<br />

www.github.com/b45ch1/pycppad<br />

taylorpoly ANSI-C <strong>with</strong> <strong>Python</strong> b<strong>in</strong>d<strong>in</strong>gs alpha 9276<br />

www.github.com/b45ch1/taylorpoly<br />

easyodoe Opt. Exp. prototype alpha 8345<br />

API is fairly well documented, about 30% of the LOCs<br />

quite complete unit test and many examples (<strong>in</strong>cluded <strong>in</strong> LOCs)<br />

ready to be used!<br />

Sebastian F. Walter, Humboldt-Universität zu Berl<strong>in</strong> <strong>Algorithmic</strong> () <strong>Differentiation</strong> <strong>in</strong> <strong>Python</strong> <strong>with</strong> <strong>Application</strong> <strong>Examples</strong> Wednesday, 10.07.2010 30 / 27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!