Numerical Methods for Optimal Control Problems. Part I ... - Cnr

Numerical Methods for Optimal Control Problems. 

Part I: Hamilton-Jacobi-Bellman Equations and 

Pontryagin Minimum Principle 

Ph.D. course in OPTIMAL CONTROL 

Emiliano Cristiani (IAC–CNR) 

e.cristiani@iac.cnr.it 

March 2013 

E. Cristiani (IAC–CNR) Numerical Methods for Optimal Control Pbs March 2013 1 / 18

1 References 

2 Optimal control problems 

3 Finite horizon problem 

HJB approach 

Find optimal trajectories from HJB 

PMP approach 

HJB↔PMP connection 

Numerical approximation 

Direct methods 

HJB 

PMP 

Exploiting HJB↔PMP connection 

4 Minimum time problem with target 

HJB approach 

PMP approach 

Numerics for HJB 

Numerics for PMP 


References 

Main references 

1 M. Bardi, I. Capuzzo Dolcetta, Optimal control and viscosity solutions 

of Hamilton-Jacobi-Bellman equations, Birkhäuser, Boston, 1997. 

2 J. T. Betts, Practical methods for optimal control and estimation 

using nonlinear programming, SIAM, 2010. 

3 E. Cristiani, P. Martinon, Initialization of the shooting method via the 

Hamilton-Jacobi-Bellman approach, J. Optim. Theory Appl., 146 

(2010), 321–346. 

4 L. C. Evans, An introduction to mathematical optimal control theory, 

http://math.berkeley.edu/∼evans/ 

5 E. Trélat, Contrôle optimal: théorie et applications, 

http://www.ljll.math.upmc.fr/∼trelat/ 


Optimal control problems 

Introduction 

Controlled nonlinear dynamical system 

{ ẏ(s) = f (y(s), α(s)), s > t 

y(t) = x ∈ R n 

Solution: 

y x,α (s) 

Admissible controls: α ∈ A := {α : [t, +∞) → A}, 

A ⊂ R m 

Regularity assumptions 

Are they meaningful from the numerical point of view? Discussion. 


Payoff 

Optimal control problems 

Infinite horizon problem 

J x,t [α] = 

Finite horizon problem 

Target problem 

J x,t [α] = 

J x,t [α] = 

∫ τ 

t 

∫ ∞ 

t 

∫ T 

t 

max J x,t[α] 

α∈A 

( 

) 

r y x,α (s), α(s) e −µs ds, µ > 0 

( 

) 

r y x,α (s), α(s) ds + g(y x,α (T )) 

( 

) 

r y x,α (s), α(s) ds, τ := min{s : y x,α (s) ∈ T } 



HJB approach 

HJB equation 

Value function 

Theorem (HJB equation) 

v(x, t) := max 

α∈A J x,t[α], x ∈ R n , t ∈ [0, T ] 

Assume that v ∈ C 1 . Then v solves 

v t (x, t) + max 

a∈A {f (x, a) · ∇ xv(x, t) + r(x, a)} = 0, x ∈ R n , t ∈ [0, T ) 

with the terminal condition 

v(x, T ) = g(x), 

x ∈ R n 

sup{J}=-inf{-J} 

What about cost functionals to be minimized? 


Find α ∗ by means of v 


Find optimal trajectories from HJB 

Given v(x, t) for any x ∈ R n and t ∈ [0, T ], we define 

αfeedback ∗ (x, t) := arg max {f (x, a) · ∇ xv(x, t) + r(x, a)} 

a∈A 

or, coming back to the original variables, 

αfeedback ∗ (y, s) := arg max {f (y, a) · ∇ xv(y, s) + r(y, a)}. 

a∈A 

Then, the optimal control is 

α ∗ (s) = α ∗ feedback (y ∗ (s), s) 

where y ∗ (s) is the solution of 

{ ẏ ∗ (s) = f (y ∗ (s), α ∗ feedback (y ∗ (s), s)), s > t 

y(t) = x 


PMP 


PMP approach 

Theorem (Pontryagin Minimum Principle) 

Assume α ∗ is optimal and y ∗ is the corresponding trajectory. Then there 

exists a function p ∗ : [t, T ] → R n (costate) such that 

⎧ 

ẏ ∗ (s) = f (y ∗ (s), α ∗ (s)) 

⎪⎨ 

ṗ ∗ (s) = −∇ x f (y ∗ (s), α ∗ (s)) · p ∗ (s) − ∇ x r(y ∗ (s), α ∗ (s)) 

{ 

} 

⎪⎩ α ∗ (s) = arg max f (y ∗ (s), a) · p ∗ (s) + r(y ∗ (s), a) 

a∈A 

with initial condition y ∗ (t) = x and terminal condition p ∗ (T ) = ∇g(y ∗ (T )). 

PMP can fail! 

Along the optimal trajectory the Hamiltonian H(y ∗ , p ∗ , a ∗ ) = f ∗ · p ∗ + r ∗ 

may not be an explicit function of the control inputs. 





Theorem 

If v ∈ C 2 , then 

p ∗ (s) = ∇ x v(y ∗ (s), s), s ∈ [t, T ]. 

The gradient of the value function gives the optimal value of the costate 

all along the optimal trajectory, in particular for s = t! 




Direct methods 

The control problem is entirely discretized and it is written in the form 

Discrete problem 

Find ξ ∗ such that J(ξ ∗ ) = max ξ∈R n J(ξ), with J : R n → R. 

Fix a grid (s 1 , . . . , s n , . . . , s N ) in [t, T ] with s n − s n−1 = ∆s. A discrete 

control function α is characterized by the vector (α 1 , . . . , α N ) with 

α n = α(s n ). 

Given (α 1 , . . . , α N ), the ODE is discretized, for example, by 

and so it is the payoff, for example 

y n+1 = y n + ∆s f (y n , α n ). 

J(α) = ∑ n 

r(y n , α n )∆s + g(y N ) + penalization for ctrl and state constr. 

Then, a gradient method is used to maximize J. 




Semi-Lagrangian discretization of HJB 

{ 

v t (x, t) + max f (x, a) · ∇x v(x, t) + r(x, a) } = 0, x ∈ R n , t ∈ [0, T ) 

a∈A 

Fix a grid in Ω × [0, T ], with Ω ⊂ R n bounded. Steps: ∆x, ∆t. Nodes: 

{x 1 , . . . , x M }, {t 1 , . . . , t N }. Discrete solution: wi n ≈ v(x i , s n ). 

{ 

wi 

n − w n−1 

i 

w 

n ( x i + ∆t f (x i , a) ) } 

− wi 

n 

+ max 

+ r(x i , a) = 0 

∆t a∈A 

∆t 

w n−1 

i 

= max 

a∈A 

⎧ 

⎫ 

⎪⎨ 

⎪⎩ w n( x i + ∆t f (x i , a) ) ⎪⎬ 

+r(x i , a) 

} {{ } ⎪⎭ = 0 

to be interpolated 

CFL condition (not needed but useful) 

∆t max |f (x, a)| ≤ ∆x 

x,a 




Shooting method for PMP 

Find the solution of S(p 0 ) = 0, where 

S(p 0 ) := p(T ) − ∇g(y(T )) 

and p(T ) and y(T ) are computed solving the ODEs 

⎧ 

⎪⎨ 

⎪⎩ 

α n = arg max 

a∈A {f (y n , a) · p n + r(y n , a)} 

y n+1 = y n + ∆s f (y n , α n ) 

p n+1 = p n + ∆s (−∇ x f (y n , α n ) · p n − ∇ x r(y n , α n )) 

with only initial conditions 

y 1 = x and p 1 = p 0 

The solution of S(p 0 ) = 0 can be found by means of an iterative method 

like bisection, Newton, etc. 




HJB↔PMP for numerics 

Idea [CM10] 

1 Solve HJB on a coarse grid 

2 Compute ∇ x v(x, 0) = p 0 

3 Use it as initial guess for the shooting method. 

Advantages 

1 Fast in dimension ≤ 4, feasible in dimension 5-6. 

2 Highly accurate 

3 Reasonable guarantee to converge to the global maximum of J. 


HJB equation 

Minimum time problem with target 

HJB approach 

Value function 

v(x) := min 

α∈A J x[α], x ∈ R n (t = 0) 

with cost functional 

∫ τ ( 

) 

J x [α] = r y x,α (s), α(s) ds + g(y x,α (τ)), τ := min{s : y x,α (s) ∈ T } 

0 

Theorem (HJB equation) 

Assume that v ∈ C 1 . Then v solves the stationary equation 

max {−f (x, a) · ∇ xv(x) − r(x, a)} = 0, x ∈ R n \T 

a∈A 

with boundary conditions 

v(x) = g(x), 

x ∈ ∂T 



HJB approach 

HJB equation 

Eikonal equation 

If f (x, a) = c(x)a, A = B(0, 1), r ≡ 1, and g ≡ 0, we get 

or, equivalently, 

with boundary conditions 

max {c(x)a · ∇v(x)} = 1, x ∈ 

a∈B(0,1) Rn \T 

c(x)|∇v(x)| = 1, x ∈ R n \T 

v(x) = 0, 

x ∈ ∂T 

Optimal trajectories ≡ characteristic lines ≡ gradient lines 


PMP 


PMP approach 

Theorem (Pontryagin Minimum Principle) 

Assume α ∗ is optimal and y ∗ is the corresponding trajectory. Then there 

exists a function p ∗ : [0, τ] → R n (costate) such that 

⎧ 

ẏ ∗ (s) = f (y ∗ (s), α ∗ (s)) 

⎪⎨ 

ṗ ∗ (s) = −∇ x f (y ∗ (s), α ∗ (s)) · p ∗ (s) − ∇ x r(y ∗ (s), α ∗ (s)) 

{ 

} 

⎪⎩ α ∗ (s) = arg max f (y ∗ (s), a) · p ∗ (s) + r(y ∗ (s), a) 

a∈A 

and 

f (y ∗ (τ), α ∗ (s)) · p ∗ (s) + r(y ∗ (s), α ∗ (s)) = 0, s ∈ [0, τ] (HJB) 

with initial condition y ∗ (0) = x. 



Numerics for HJB 

Semi-Lagrangian discretization of HJB 

{ ( 

w i = min w xi + ∆t f (x i , a) ) + ∆t r(x i , a) } 

a∈A 

Iterative solution 

The fixed-point problem can be solved iterating the scheme until 

convergence, starting from any initial guess 

w (k+1) 

i 

w (0) 

i 

= 

{ 

= min w 

(k) ( x i + ∆t f (x i , a) ) + ∆t r(x i , a) } 

a∈A 

{ +∞ xi ∈ R n \T 

g(x i ) 

x i ∈ ∂T 

CFL condition (not needed but useful) 

∆t max |f (x, a)| ≤ ∆x 

x,a 



Shooting method for PMP 

Numerics for PMP 

Find the solution of S(p 0 , τ) = 0 where 

( 

) 

S(p 0 , τ) := y(τ) − T , f (y(τ), α(τ)) · p(τ) + r(y(τ), α(τ)) 

and y(τ), p(τ) and α(τ) are computed solving the ODEs 

⎧ 

⎪⎨ 

⎪⎩ 

α n = arg max 

a∈A {f (y n , a) · p n + r(y n , a)} 

y n+1 = y n + ∆t f (y n , α n ) 

p n+1 = p n + ∆t (−∇ x f (y n , α n ) · p n − ∇ x r(y n , α n )) 

with only initial conditions 

y 1 = x and p 1 = p 0

Numerical Methods for Optimal Control Problems. Part I ... - Cnr

Create successful ePaper yourself

Delete template?

Save as template?