17.08.2013 Views

# Truncated Gauss-Newton Method with Trust-region: an efficient ... - IFP

Truncated Gauss-Newton Method with Trust-region: an efficient ... - IFP

### You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Truncated</strong> <strong>Gauss</strong>-<strong>Newton</strong> <strong>Method</strong> <strong>with</strong><br />

<strong>Trust</strong>-<strong>region</strong>:<br />

<strong>an</strong> <strong>efficient</strong> solver for the non-linear<br />

tomographic inverse problem<br />

Frédéric Delbos<br />

Je<strong>an</strong>-Charles Gilbert<br />

Delphine Sinoquet

Objective<br />

Where is my model solution ?<br />

Develop a fast, accurate, robust <strong>an</strong>d automatic<br />

solver for the tomographic inverse problem<br />

jerry

Outline<br />

Preconditioned Conjugate Gradient Revisited<br />

<strong>Trust</strong>-Region method - <strong>an</strong> alternative to line search<br />

Singular model <strong>an</strong>alysis

Terminology<br />

Line search ambiguity:<br />

The optimization method<br />

The <strong>Gauss</strong>-<strong>Newton</strong> step failure<br />

<strong>Gauss</strong>-<strong>Newton</strong>: GN ; index k<br />

<strong>Trust</strong>-Region:T-R<br />

Line Search:LS<br />

Preconditioned Conjugate Gradient: PCG ; index i

Optimization method - GN<br />

Non-linear cost function f<br />

Linearization of T around<br />

<br />

mk<br />

Quadratic model F m of<br />

k<br />

f m <br />

m<br />

k

Optimization method - GN<br />

Simplified expression of the quadratic function<br />

f k<br />

Value of in<br />

Gradient of in<br />

Approximated Hessi<strong>an</strong><br />

(neglect second-order derivatives of traveltimes)<br />

m<br />

m<br />

f k

Optimization method - GN

Preconditioning CG - Motivations<br />

Solve the ill-conditioned linear problem:<br />

Speed-up convergence of the CG algorithm

Preconditioning CG - <strong>Method</strong><br />

Tr<strong>an</strong>sform linear system into:<br />

1 1<br />

k k k k<br />

P H m P g<br />

<strong>with</strong> P symmetric positive definite<br />

<strong>an</strong>d P as close as possible to H

Preconditioning CG - <strong>Method</strong><br />

Find a matrix such that<br />

Minimize the quadratic function in new variables<br />

<strong>with</strong><br />

Q<br />

k<br />

P Q Q<br />

k<br />

T<br />

k k<br />

T T 1<br />

mQm, g Q<br />

g , <strong>an</strong>d H = Q H Q<br />

k k k k k k k k

Preconditioning CG - Choice of P<br />

Hessi<strong>an</strong> decomposition into velocity <strong>an</strong>d interface Blocks<br />

<strong>with</strong> the block of H corresponding to velocity<br />

<strong>an</strong>d the block of H corresponding to interface<br />

v<br />

z<br />

Hvi, vi<br />

i<br />

Hz , z<br />

i<br />

i i

Preconditioning CG - Choice of P<br />

Hessi<strong>an</strong> decomposition into a sum of block matrices<br />

With D <strong>an</strong>d L defined as:

Preconditioning CG - Choice of P<br />

D is symmetric positive definite<br />

We use a Cholesky factorization on matrix D:<br />

The chosen preconditioners: (L. Chauvier et al. 2000)<br />

Jacobi<br />

D <br />

L L<br />

k D<br />

T<br />

D<br />

Symmetric <strong>Gauss</strong>-Seidel<br />

k k

Preconditioning CG - example<br />

Number of CG iterations over GN iterations

Preconditioning CG - example<br />

Relative residuals over CG iterations

Preconditioning CG - example<br />

Non linear cost function over GN iterations

Preconditioning CG<br />

Solve more accurately ill-conditioned problems<br />

Speed-up convergence of the CG algorithm

Outline<br />

Preconditioned Conjugate Gradient Revisited<br />

Preconditioners<br />

CG termination criteria<br />

<strong>Trust</strong>-Region method - <strong>an</strong> alternative to line search<br />

Singular model <strong>an</strong>alysis

CG termination criteria - Choice<br />

Objective: minimize the number of CG iterates <strong>an</strong>d<br />

obtained a sufficient reduction of the non linear cost<br />

function<br />

CG termination criteria:<br />

Maximum number of CG iterations<br />

Relative residual criterion<br />

Final quadratic cost reduction criterion (QCR<br />

experimental)

CG termination criteria - QCR<br />

ni F <br />

F F<br />

, 1<br />

k , k ,<br />

ki i i

CG termination criteria - example<br />

Number of PCG iterations for different eps of<br />

the relative residual criterion

CG termination criteria - example<br />

Non linear cost for different eps of<br />

the relative residual criterion

CG termination criteria - Choice<br />

Objective: minimize the number of CG iterates <strong>an</strong>d<br />

obtained a sufficient reduction of the non linear cost<br />

function<br />

CG termination criteria:<br />

Maximum number of CG iterations<br />

Relative residual criterion<br />

Final quadratic cost reduction criterion (QCR<br />

experimental)

Outline<br />

Preconditioned Conjugate Gradient Revisited<br />

<strong>Trust</strong>-Region method - <strong>an</strong> alternative to line search<br />

Line search<br />

<strong>Trust</strong>-<strong>region</strong><br />

Levenberg-Marquard<br />

Singular model <strong>an</strong>alysis

Minimize F <br />

m :<br />

k<br />

LS method - Theory<br />

<br />

Line Search subproblem to be solved at each GN<br />

iteration <strong>with</strong> a CG algorithm<br />

Accept<strong>an</strong>ce of the step: Armijo criterion

LS method - movie

LS method - example<br />

Non linear cost function over GN iterations<br />

for a relative residual criterion at 1E-15

T-R method - Theory<br />

Minimize F m in a <strong>region</strong> around the current iterate<br />

<br />

k<br />

<br />

K<br />

Adv<strong>an</strong>tage: control the norm of the perturbation<br />

<strong>Trust</strong>-<strong>region</strong> subproblem to be solved at each<br />

GN iteration

T-R method - Theory<br />

Solve the trust-<strong>region</strong> subproblem <strong>with</strong> Steihaug´s<br />

method<br />

* Solve the first order optimal condition <strong>with</strong> a preconditioned<br />

conjugate gradient method (H s.p.d.)<br />

* Ensuring that the computed step stays inside the trust-<strong>region</strong><br />

radius at each CG iteration<br />

** Steihaug´s theorem: the P-norm of the perturbation<br />

increases <strong>with</strong> CG iterates<br />

** Global convergence<br />

Generalized Cauchy point<br />

at least better th<strong>an</strong> the steepest descent method

T-R method - Theory<br />

<br />

Concord<strong>an</strong>ce ratio K between the quadratic model<br />

<strong>an</strong>d the non linear cost function<br />

Numerator actual reduction (non linear cost)<br />

Denominator predicted reduction (quadratic cost)<br />

Acceptation of the step following the value of K<br />

If K<br />

is negative or close to 0 Rejected step :<br />

decrease K<br />

<strong>an</strong>d restart the step<br />

If <br />

is close to 1 Accepted step:<br />

K<br />

m m m<br />

K1K

T-R method - Movie

T-R method - example<br />

Comparison T-R/LS on the non linear<br />

cost function over GN iterations

Levenberg-Marquard - Theory<br />

The Levenberg-Marquard step is defined by the<br />

C<br />

resolution of the following subproblem: <br />

H I mg<br />

Equivalent to solve the trust-<strong>region</strong> minimization<br />

subproblem: (Dennis & Schnabel 1983)<br />

<br />

<br />

<br />

k<br />

min F m<br />

m<br />

<br />

<br />

<br />

n<br />

m<br />

<br />

C<br />

1<br />

C C<br />

<br />

<strong>with</strong> Hk <br />

I gk<br />

; is interpreted as a<br />

Lagr<strong>an</strong>ge multiplier<br />

TR allows <strong>an</strong> eleg<strong>an</strong>t <strong>an</strong>d <strong>efficient</strong> choice of<br />

<br />

C<br />

k k<br />

<br />

C

Outline<br />

Preconditioned Conjugate Gradient Revisited<br />

<strong>Trust</strong>-Region method - <strong>an</strong> alternative to line search<br />

Line search<br />

<strong>Trust</strong>-<strong>region</strong><br />

Levenberg-Marquard<br />

Singular model <strong>an</strong>alysis

Singular model <strong>an</strong>alysis - discussion<br />

H may be positive semi-definite<br />

The reasons:<br />

small regularization weights<br />

ill-posed problem lack of a priori information<br />

The main consequences:<br />

slow or even no convergence of GN method<br />

bad preconditioner: P also semi-positive definite<br />

explosion of the final PCG perturbation in L2 norm<br />

bad direction of the PCG perturbation g m<br />

, <br />

0

Singular model <strong>an</strong>alysis - detection<br />

Detection criteria of singular models<br />

L2 norm of PCG perturbation <br />

PCG perturbation <strong>an</strong>gle <br />

m K 2<br />

g, m<br />

<br />

Preconditioned norm of the generalized Cauchy point:<br />

T 1<br />

C C 1<br />

C g P g<br />

<strong>with</strong> S P g <strong>an</strong>d <br />

T 1 1<br />

g P HP g<br />

1/2 C<br />

PS <br />

2<br />

L-curve find good regularization weight (S. Gomez, 2001)<br />

Plot the parametrized curve:<br />

<br />

* 2 *<br />

<br />

<br />

J m mT <br />

<br />

( mm) R( mm) <br />

<br />

2<br />

<br />

T<br />

*

L-curve - example

Singular model <strong>an</strong>alysis - remedy<br />

Solve a new problem <strong>with</strong> more a priori information<br />

constraint, stronger regularization, <strong>an</strong>d/or new<br />

parameterization<br />

Find a low frequency solution (largest eigenvalues) of<br />

the singular linear problem

Singular model <strong>an</strong>alysis - low frequency<br />

Choice of <strong>Trust</strong>-Region method: adds a priori<br />

information by constraint on the perturbation norm<br />

Work on PCG regularization properties: the best<br />

determined components are computed in the first CG<br />

iterates<br />

Choosing weaker CG preconditioners (tune the CG<br />

velocity convergence)<br />

Modified Cholesky factorization<br />

Convex combination<br />

No preconditioning<br />

T<br />

P P E LL<br />

P 1 I P<br />

P <br />

I<br />

<br />

Choosing stronger CG stopping criteria (stop earlier<br />

in the CG loop)<br />

Criterion on the final reduction of the quadratic cost

Singular model <strong>an</strong>alysis - Weaker<br />

preconditioner<br />

Non linear cost function: comparison of inversion between<br />

the new convex preconditioner <strong>an</strong>d <strong>with</strong>out preconditioning

Singular model <strong>an</strong>alysis - Weaker<br />

preconditioner<br />

Non linear cost function: comparison of inversion between<br />

the new convex preconditioner <strong>an</strong>d <strong>with</strong>out preconditioning

Singular models <strong>an</strong>alysis - QCR<br />

Number of CG iterations: comparison between the final<br />

quadratic cost reduction criterion <strong>an</strong>d the relative residual criterion

Singular models <strong>an</strong>alysis - QCR<br />

Non linear cost function: comparison between the final<br />

quadratic cost reduction criterion <strong>an</strong>d the relative residual criterion

Conclusions<br />

<strong>Trust</strong>-<strong>region</strong> method is:<br />

More robust (solve singular models)<br />

Automatic (choice of trust-<strong>region</strong> parameters)<br />

PCG method:<br />

Relev<strong>an</strong>t PCG termination criterion<br />

More flexibility of the preconditioner<br />

Detect <strong>an</strong>d solve singular model problems

Perspectives<br />

Finalization of the work on unconstrained problems for<br />

singular models<br />

Optimization <strong>with</strong> linear constraints<br />

Augmented Lagr<strong>an</strong>gi<strong>an</strong>: integration of the new<br />

solver<br />

Feasibility study of interior point method<br />

(comparison <strong>with</strong> A.L.)