Numerical Methods for Optimal Control Problems. Part I ... - Cnr
Numerical Methods for Optimal Control Problems. Part I ... - Cnr
Numerical Methods for Optimal Control Problems. Part I ... - Cnr
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> <strong>Problems</strong>.<br />
<strong>Part</strong> I: Hamilton-Jacobi-Bellman Equations and<br />
Pontryagin Minimum Principle<br />
Ph.D. course in OPTIMAL CONTROL<br />
Emiliano Cristiani (IAC–CNR)<br />
e.cristiani@iac.cnr.it<br />
March 2013<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 1 / 18
1 References<br />
2 <strong>Optimal</strong> control problems<br />
3 Finite horizon problem<br />
HJB approach<br />
Find optimal trajectories from HJB<br />
PMP approach<br />
HJB↔PMP connection<br />
<strong>Numerical</strong> approximation<br />
Direct methods<br />
HJB<br />
PMP<br />
Exploiting HJB↔PMP connection<br />
4 Minimum time problem with target<br />
HJB approach<br />
PMP approach<br />
Numerics <strong>for</strong> HJB<br />
Numerics <strong>for</strong> PMP<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 2 / 18
References<br />
Main references<br />
1 M. Bardi, I. Capuzzo Dolcetta, <strong>Optimal</strong> control and viscosity solutions<br />
of Hamilton-Jacobi-Bellman equations, Birkhäuser, Boston, 1997.<br />
2 J. T. Betts, Practical methods <strong>for</strong> optimal control and estimation<br />
using nonlinear programming, SIAM, 2010.<br />
3 E. Cristiani, P. Martinon, Initialization of the shooting method via the<br />
Hamilton-Jacobi-Bellman approach, J. Optim. Theory Appl., 146<br />
(2010), 321–346.<br />
4 L. C. Evans, An introduction to mathematical optimal control theory,<br />
http://math.berkeley.edu/∼evans/<br />
5 E. Trélat, Contrôle optimal: théorie et applications,<br />
http://www.ljll.math.upmc.fr/∼trelat/<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 3 / 18
<strong>Optimal</strong> control problems<br />
Introduction<br />
<strong>Control</strong>led nonlinear dynamical system<br />
{ ẏ(s) = f (y(s), α(s)), s > t<br />
y(t) = x ∈ R n<br />
Solution:<br />
y x,α (s)<br />
Admissible controls: α ∈ A := {α : [t, +∞) → A},<br />
A ⊂ R m<br />
Regularity assumptions<br />
Are they meaningful from the numerical point of view? Discussion.<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 4 / 18
Payoff<br />
<strong>Optimal</strong> control problems<br />
Infinite horizon problem<br />
J x,t [α] =<br />
Finite horizon problem<br />
Target problem<br />
J x,t [α] =<br />
J x,t [α] =<br />
∫ τ<br />
t<br />
∫ ∞<br />
t<br />
∫ T<br />
t<br />
max J x,t[α]<br />
α∈A<br />
(<br />
)<br />
r y x,α (s), α(s) e −µs ds, µ > 0<br />
(<br />
)<br />
r y x,α (s), α(s) ds + g(y x,α (T ))<br />
(<br />
)<br />
r y x,α (s), α(s) ds, τ := min{s : y x,α (s) ∈ T }<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 5 / 18
Finite horizon problem<br />
HJB approach<br />
HJB equation<br />
Value function<br />
Theorem (HJB equation)<br />
v(x, t) := max<br />
α∈A J x,t[α], x ∈ R n , t ∈ [0, T ]<br />
Assume that v ∈ C 1 . Then v solves<br />
v t (x, t) + max<br />
a∈A {f (x, a) · ∇ xv(x, t) + r(x, a)} = 0, x ∈ R n , t ∈ [0, T )<br />
with the terminal condition<br />
v(x, T ) = g(x),<br />
x ∈ R n<br />
sup{J}=-inf{-J}<br />
What about cost functionals to be minimized?<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 6 / 18
Find α ∗ by means of v<br />
Finite horizon problem<br />
Find optimal trajectories from HJB<br />
Given v(x, t) <strong>for</strong> any x ∈ R n and t ∈ [0, T ], we define<br />
αfeedback ∗ (x, t) := arg max {f (x, a) · ∇ xv(x, t) + r(x, a)}<br />
a∈A<br />
or, coming back to the original variables,<br />
αfeedback ∗ (y, s) := arg max {f (y, a) · ∇ xv(y, s) + r(y, a)}.<br />
a∈A<br />
Then, the optimal control is<br />
α ∗ (s) = α ∗ feedback (y ∗ (s), s)<br />
where y ∗ (s) is the solution of<br />
{ ẏ ∗ (s) = f (y ∗ (s), α ∗ feedback (y ∗ (s), s)), s > t<br />
y(t) = x<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 7 / 18
PMP<br />
Finite horizon problem<br />
PMP approach<br />
Theorem (Pontryagin Minimum Principle)<br />
Assume α ∗ is optimal and y ∗ is the corresponding trajectory. Then there<br />
exists a function p ∗ : [t, T ] → R n (costate) such that<br />
⎧<br />
ẏ ∗ (s) = f (y ∗ (s), α ∗ (s))<br />
⎪⎨<br />
ṗ ∗ (s) = −∇ x f (y ∗ (s), α ∗ (s)) · p ∗ (s) − ∇ x r(y ∗ (s), α ∗ (s))<br />
{<br />
}<br />
⎪⎩ α ∗ (s) = arg max f (y ∗ (s), a) · p ∗ (s) + r(y ∗ (s), a)<br />
a∈A<br />
with initial condition y ∗ (t) = x and terminal condition p ∗ (T ) = ∇g(y ∗ (T )).<br />
PMP can fail!<br />
Along the optimal trajectory the Hamiltonian H(y ∗ , p ∗ , a ∗ ) = f ∗ · p ∗ + r ∗<br />
may not be an explicit function of the control inputs.<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 8 / 18
Finite horizon problem<br />
HJB↔PMP connection<br />
HJB↔PMP connection<br />
Theorem<br />
If v ∈ C 2 , then<br />
p ∗ (s) = ∇ x v(y ∗ (s), s), s ∈ [t, T ].<br />
The gradient of the value function gives the optimal value of the costate<br />
all along the optimal trajectory, in particular <strong>for</strong> s = t!<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 9 / 18
Finite horizon problem<br />
<strong>Numerical</strong> approximation<br />
Direct methods<br />
The control problem is entirely discretized and it is written in the <strong>for</strong>m<br />
Discrete problem<br />
Find ξ ∗ such that J(ξ ∗ ) = max ξ∈R n J(ξ), with J : R n → R.<br />
Fix a grid (s 1 , . . . , s n , . . . , s N ) in [t, T ] with s n − s n−1 = ∆s. A discrete<br />
control function α is characterized by the vector (α 1 , . . . , α N ) with<br />
α n = α(s n ).<br />
Given (α 1 , . . . , α N ), the ODE is discretized, <strong>for</strong> example, by<br />
and so it is the payoff, <strong>for</strong> example<br />
y n+1 = y n + ∆s f (y n , α n ).<br />
J(α) = ∑ n<br />
r(y n , α n )∆s + g(y N ) + penalization <strong>for</strong> ctrl and state constr.<br />
Then, a gradient method is used to maximize J.<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 10 / 18
Finite horizon problem<br />
<strong>Numerical</strong> approximation<br />
Semi-Lagrangian discretization of HJB<br />
{<br />
v t (x, t) + max f (x, a) · ∇x v(x, t) + r(x, a) } = 0, x ∈ R n , t ∈ [0, T )<br />
a∈A<br />
Fix a grid in Ω × [0, T ], with Ω ⊂ R n bounded. Steps: ∆x, ∆t. Nodes:<br />
{x 1 , . . . , x M }, {t 1 , . . . , t N }. Discrete solution: wi n ≈ v(x i , s n ).<br />
{<br />
wi<br />
n − w n−1<br />
i<br />
w<br />
n ( x i + ∆t f (x i , a) ) }<br />
− wi<br />
n<br />
+ max<br />
+ r(x i , a) = 0<br />
∆t a∈A<br />
∆t<br />
w n−1<br />
i<br />
= max<br />
a∈A<br />
⎧<br />
⎫<br />
⎪⎨<br />
⎪⎩ w n( x i + ∆t f (x i , a) ) ⎪⎬<br />
+r(x i , a)<br />
} {{ } ⎪⎭ = 0<br />
to be interpolated<br />
CFL condition (not needed but useful)<br />
∆t max |f (x, a)| ≤ ∆x<br />
x,a<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 11 / 18
Finite horizon problem<br />
<strong>Numerical</strong> approximation<br />
Shooting method <strong>for</strong> PMP<br />
Find the solution of S(p 0 ) = 0, where<br />
S(p 0 ) := p(T ) − ∇g(y(T ))<br />
and p(T ) and y(T ) are computed solving the ODEs<br />
⎧<br />
⎪⎨<br />
⎪⎩<br />
α n = arg max<br />
a∈A {f (y n , a) · p n + r(y n , a)}<br />
y n+1 = y n + ∆s f (y n , α n )<br />
p n+1 = p n + ∆s (−∇ x f (y n , α n ) · p n − ∇ x r(y n , α n ))<br />
with only initial conditions<br />
y 1 = x and p 1 = p 0<br />
The solution of S(p 0 ) = 0 can be found by means of an iterative method<br />
like bisection, Newton, etc.<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 12 / 18
Finite horizon problem<br />
<strong>Numerical</strong> approximation<br />
HJB↔PMP <strong>for</strong> numerics<br />
Idea [CM10]<br />
1 Solve HJB on a coarse grid<br />
2 Compute ∇ x v(x, 0) = p 0<br />
3 Use it as initial guess <strong>for</strong> the shooting method.<br />
Advantages<br />
1 Fast in dimension ≤ 4, feasible in dimension 5-6.<br />
2 Highly accurate<br />
3 Reasonable guarantee to converge to the global maximum of J.<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 13 / 18
HJB equation<br />
Minimum time problem with target<br />
HJB approach<br />
Value function<br />
v(x) := min<br />
α∈A J x[α], x ∈ R n (t = 0)<br />
with cost functional<br />
∫ τ (<br />
)<br />
J x [α] = r y x,α (s), α(s) ds + g(y x,α (τ)), τ := min{s : y x,α (s) ∈ T }<br />
0<br />
Theorem (HJB equation)<br />
Assume that v ∈ C 1 . Then v solves the stationary equation<br />
max {−f (x, a) · ∇ xv(x) − r(x, a)} = 0, x ∈ R n \T<br />
a∈A<br />
with boundary conditions<br />
v(x) = g(x),<br />
x ∈ ∂T<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 14 / 18
Minimum time problem with target<br />
HJB approach<br />
HJB equation<br />
Eikonal equation<br />
If f (x, a) = c(x)a, A = B(0, 1), r ≡ 1, and g ≡ 0, we get<br />
or, equivalently,<br />
with boundary conditions<br />
max {c(x)a · ∇v(x)} = 1, x ∈<br />
a∈B(0,1) Rn \T<br />
c(x)|∇v(x)| = 1, x ∈ R n \T<br />
v(x) = 0,<br />
x ∈ ∂T<br />
<strong>Optimal</strong> trajectories ≡ characteristic lines ≡ gradient lines<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 15 / 18
PMP<br />
Minimum time problem with target<br />
PMP approach<br />
Theorem (Pontryagin Minimum Principle)<br />
Assume α ∗ is optimal and y ∗ is the corresponding trajectory. Then there<br />
exists a function p ∗ : [0, τ] → R n (costate) such that<br />
⎧<br />
ẏ ∗ (s) = f (y ∗ (s), α ∗ (s))<br />
⎪⎨<br />
ṗ ∗ (s) = −∇ x f (y ∗ (s), α ∗ (s)) · p ∗ (s) − ∇ x r(y ∗ (s), α ∗ (s))<br />
{<br />
}<br />
⎪⎩ α ∗ (s) = arg max f (y ∗ (s), a) · p ∗ (s) + r(y ∗ (s), a)<br />
a∈A<br />
and<br />
f (y ∗ (τ), α ∗ (s)) · p ∗ (s) + r(y ∗ (s), α ∗ (s)) = 0, s ∈ [0, τ] (HJB)<br />
with initial condition y ∗ (0) = x.<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 16 / 18
Minimum time problem with target<br />
Numerics <strong>for</strong> HJB<br />
Semi-Lagrangian discretization of HJB<br />
{ (<br />
w i = min w xi + ∆t f (x i , a) ) + ∆t r(x i , a) }<br />
a∈A<br />
Iterative solution<br />
The fixed-point problem can be solved iterating the scheme until<br />
convergence, starting from any initial guess<br />
w (k+1)<br />
i<br />
w (0)<br />
i<br />
=<br />
{<br />
= min w<br />
(k) ( x i + ∆t f (x i , a) ) + ∆t r(x i , a) }<br />
a∈A<br />
{ +∞ xi ∈ R n \T<br />
g(x i )<br />
x i ∈ ∂T<br />
CFL condition (not needed but useful)<br />
∆t max |f (x, a)| ≤ ∆x<br />
x,a<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 17 / 18
Minimum time problem with target<br />
Shooting method <strong>for</strong> PMP<br />
Numerics <strong>for</strong> PMP<br />
Find the solution of S(p 0 , τ) = 0 where<br />
(<br />
)<br />
S(p 0 , τ) := y(τ) − T , f (y(τ), α(τ)) · p(τ) + r(y(τ), α(τ))<br />
and y(τ), p(τ) and α(τ) are computed solving the ODEs<br />
⎧<br />
⎪⎨<br />
⎪⎩<br />
α n = arg max<br />
a∈A {f (y n , a) · p n + r(y n , a)}<br />
y n+1 = y n + ∆t f (y n , α n )<br />
p n+1 = p n + ∆t (−∇ x f (y n , α n ) · p n − ∇ x r(y n , α n ))<br />
with only initial conditions<br />
y 1 = x and p 1 = p 0<br />
E. Cristiani (IAC–CNR) <strong>Numerical</strong> <strong>Methods</strong> <strong>for</strong> <strong>Optimal</strong> <strong>Control</strong> Pbs March 2013 18 / 18