Uncertainty Quantification for Large Scale Inverse ... - MUCM

mucm.ac.uk

Uncertainty Quantification for Large Scale Inverse ... - MUCM

Uncertainty Quantification for Large Scale Inverse

Problems, with Applications to Seismic Inverse Problems

Omar Ghattas

joint work with:

Tan Bui-Thanh, Carsten Burstedde, James Martin

Georg Stadler, Lucas Wilcox

Institute for Computational Engineering & Sciences

Departments of Geological Sciences and Mechanical Engineering

The University of Texas at Austin

Uncertainty in Computer Models 2012

Sheffield, UK, 3 July 2012

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 1 / 63


Outline

1 Connection between Bayesian and deterministic inverse problems

2 Langevin methods and stochastic Newton

3 Low rank Hessian approximation and scalability

4 Specification and manipulation of prior

5 Forward elastic-acoustic wave propagation solver

6 Example: Full waveform global seismic (linearized) inversion

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 2 / 63


Connection between Bayesian and deterministic

formulations of inverse problem

Given parameters x ∈ R n , observations y obs ∈ R m , and the

parameter-to-observable map f(x) ∈ R m a Gaussian additive noise model states:

y obs = f(x) + ε, ε ∼ N(0, Γnoise)

If the prior is taken as Gaussian with mean x pr and covariance Γ pr, then the

posterior can be written


πpost(x) ∝ exp − 1

2 f(x) − yobs 2

Γ −1

noise

Note that maximum a posteriori point (MAP) is given by

x MAP

def

= arg max

x

= arg min

x

π post(x)

− 1

2 x − xpr 2

Γ −1


pr

1

2 f(x) − yobs 2

Γ −1 +

noise

1

2 x − xpr 2

Γ −1

pr

This is an (appropriately weighted) deterministic inverse problem.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 3 / 63


Linear inverse problem

Suppose further the parameter-to-observable map is linear, i.e.

Then the posterior can be written as

y = F x


πpost(x) ∝ exp − 1

2 F x − yobs 2

Γ −1

The posterior is then Gaussian with

noise

x ∼ N(xMAP, Γpost)

− 1

2 x − xpr 2

Γ −1


pr

The covariance is the inverse Hessian of the negative log posterior:

Γ −1

post = F T Γ −1

noiseF + Γ −1

pr

= ∇ 2 xx(− log πpost)

def

= H

i.e., the covariance is given by the inverse Hessian of the regularized misfit

function that deterministic methods seek to minimize

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 4 / 63


A Newton method for finding the MAP point

x MAP = arg min

x

Newton step: Hkp k = −g k

1

2 f(x) − yobs 2

Γ −1 +

noise

1

2 x − xpr 2

Γ−1 pr

xk+1 = xk + αkp k

Solve inexactly (Eisenstat-Walker) by CG; globalize by Armijo line search

where:

and:

Hk := F T (xk)Γ −1

HiΓ

i

−1

noise(xk)(fi(xk) − (yobs)i) + Γ −1

pr

gk := F T (xk)Γ −1

noise (f(xk) − yobs) + Γ −1

pr (xk − xpr) noiseF (xk) +

F (x) := ∇xf(x) (Jacobian of parameter-to-observable map)

Hi(x) := ∇ 2 xxfi(x) (Hessian of parameter-to-ith observable map)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 5 / 63


Outline

1 Connection between Bayesian and deterministic inverse problems

2 Langevin methods and stochastic Newton

3 Low rank Hessian approximation and scalability

4 Specification and manipulation of prior

5 Forward elastic-acoustic wave propagation solver

6 Example: Full waveform global seismic (linearized) inversion

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 6 / 63


Langevin MCMC (Grenander & Miller, 1994)

Given the target density π(x), the associated Langevin SDE is given by:

dXt = −A∇x(− log π)dt + √ 2A 1/2 dW t

Discretize with a timestep ∆t to derive Langevin proposal:

Notes:

x prop

k+1 = xk − A∇x(− log π)∆t + √ 2∆tA 1/2 N(0, I)

Preconditioner A must be symmetric positive definite

Process is ergodic (convergence of time averages)

W t is i.i.d. vector of standard Brownian motions

W t has independent increments given by

W (t+∆t) − W t ∼ N 0, ∆t I

See work by A. Stuart, Y. Efendiev, ...

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 7 / 63


Stochastic Newton’s method

Langevin MCMC proposal given by:

x prop

k+1 = xk − A∇x(− log π)∆t + √ 2∆tA 1/2 N(0, I)

Take A to be the inverse of the (local) Hessian and set ∆t = 1:

A ≡ H(x) −1 = ∇ 2 xx(− log π(x)) −1

= F T Γ −1

noiseF + Γ −1

pr

−1 (e.g. for Gaussian noise and prior)

Then we have the stochastic equivalent of Newton’s method:

x prop

k+1 = xk − H −1 ∇x(− log π) + N(0, H −1 )

Details in: J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, Uncertainty quantification in

inverse problems with stochastic Newton MCMC, SIAM Journal on Scientific Computing,

34(3):A1460-A1487, 2012.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 8 / 63


Stochastic Newton: Optimal sampling of Gaussians

When the target density π(x) is Gaussian, N(µ, Γ):

Apply Stochastic Newton:

− log π(x) ≡ 1

2 x − µ Γ −1

x prop

k+1 = xk − H −1 ∇x(− log π) + H −1/2 N(0, I)

= xk − ΓΓ −1 (xk − µ) + Γ 1/2 N(0, I)

= µ + Γ 1/2 N(0, I)

= N(µ, Γ)

Samples xk act like independent draws from the true pdf

Related work: Geweke & Tanizaki (1999), Qi & Minka (2002), Girolami &

Calderhead (2011)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 9 / 63


Rosenbrock illustration: Gaussian random walk

y

1

0.5

0

−0.5

−0.5 0 0.5 1

x

x prop

k+1 = xk + N(0, I)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 10 / 63


Rosenbrock illustration: Unpreconditioned Langevin

y

1

0.5

0

−0.5

−0.5 0 0.5 1

x

x prop

k+1 = xk − ∆t∇x(− log π) + √ 2∆t N(0, I)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 11 / 63


Rosenbrock illustration: Hessian-based Gaussian proposal

y

1

0.5

0

−0.5

−0.5 0 0.5 1

x

x prop

k+1 = xk + N(0, H −1 )

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 12 / 63


Rosenbrock illustration: Stochastic Newton

y

1

0.5

0

−0.5

−0.5 0 0.5 1

x

x prop

k+1 = xk − H −1 ∇x(− log π) + N(0, H −1 )

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 13 / 63


Inversion for 1D flat-layered earth model

Forward problem: Given elastic modulus a(x), solve 1D wave equation for

Ricker wavelet source g(t) to obtain observed waveform y(0, t) at top

surface:

ρ ∂2 y

∂t 2 − ∂

∂x

∂y

a(x) ∂x = δ(x − 0)g(t)

a ∂y



∂x = 0

x=0



∂y

ρa ∂t = −a

x=1 ∂y



∂x

x=1

y| t=0 = 0

˙y| t=0 = 0

Inverse problem: Given observed noisy waveform and layered

parametrization of elastic medium modulus a(x) with associated prior,

recover layer moduli and associated uncertainty

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 14 / 63


Sample noisy observations

Surface Displacement

3

2

1

0

−1

−2

−4 True Observation

x 10

10 20 30 40 50 60 70 80 90 100 110 120

Sample time

Surface Displacement

3

2

1

0

−1

−2

x 10 −4 Noisy Observation (y obs)

10 20 30 40 50 60 70 80 90 100 110 120

Sample time

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 15 / 63


Inversion for 1D flat-layered earth model: 65-layer prior

density plot of marginal pdfs of prior of elastic moduli of 65 layers

blue curve is “truth” modulus used to synthesize observations

other colors are draws from prior

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 16 / 63


Inversion for 1D flat-layered earth model: 65-layer posterior

density plot of marginal pdfs of posterior of elastic moduli of 65 layers

blue curve is “truth” modulus used to synthesize observations

other colors are draws from posterior

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 17 / 63


Convergence comparison for 65-parameter problem

Multivariate potential scale reduction factor (MPSRF) convergence

statistic for 65-parameter problem

unpreconditioned Langevin vs. stochastic Newton vs. Adaptive Metropolis

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 18 / 63


MPSRF convergence statistic for 1025-parameter problem

MPSRF

60

50

40

30

20

10

Comparison of 65D and 1025D MPSRF

1025D MPSRF Values

65D MPSRF values

0

0 20 40 60 80 100 120 140

Number of Samples

MPSRF statistic for 1025-parameter problem compared with 65-parameter

(1025-parameter results based on fast low-rank implementation)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 19 / 63


Outline

1 Connection between Bayesian and deterministic inverse problems

2 Langevin methods and stochastic Newton

3 Low rank Hessian approximation and scalability

4 Specification and manipulation of prior

5 Forward elastic-acoustic wave propagation solver

6 Example: Full waveform global seismic (linearized) inversion

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 20 / 63


Large-scale local covariance estimates

Covariance estimate for (local) Gaussian approximation given by inverse of

the Hessian. Key idea: never form H; instead:

recognize that H is sum of data misfit term, which is often

equivalent to a compact operator, and (the inverse of) a prior, which

is often equivalent to a differential operator:

H = F T Γ −1

noiseF + Γ −1

pr

use randomized SVD to construct low rank approximation of data

misfit operator; often requires constant number of forward/adjoint

solves, independent of problem size

combine with Sherman-Morrison-Woodbury to invert/factor

Details in: H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, and

O. Ghattas, Fast algorithms for Bayesian uncertainty quantification in large-scale linear inverse

problems based on low-rank partial Hessian approximations, SIAM Journal on Scientific

Computing, 33(1):407–432, 2011.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 21 / 63


Low rank approximation of data misfit Hessian

Γpost = H −1


= F T Γ −1

noise

= Γ 1/2

pr

≈ Γ 1/2

pr

= Γ 1/2

pr

−1 F + Γ−1 pr


Γ 1/2

pr F T Γ −1

noiseF Γ1/2 pr + I


V rΛrV T −1 r + I Γ 1/2

pr


I − V rDrV T

n

r + O

i=r+1

−1

Γ 1/2

pr

λi

λi + 1


Γ 1/2

pr

where V r, Λr are truncated eigenvector/values of prior-preconditioned data misfit

Hessian, and Dr = diag(λi/(λi + 1))

Other low rank ideas: Liberty, et al. (2007); Biros & Chaillat (2011); Demanet, et

al. (2011); Halko, Martinsson, Tropp (2011); ...

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 22 / 63


Low-rank-based posterior covariance

Posterior covariance is given by prior covariance less information gained

from data:

Γpost ≈ Γpr − Γ 1/2

pr V rDrV T r Γ 1/2

pr

The data term is the only place where the forward PDE is involved

λi( ˜Hmisfit)

100

80

60

40

20

0

0 500 1000 1500 2000 2500 3000 3500

λi(Γ approx

post )

1.4

1.2

1

0.8

0.6

0.4

0.2

0

0 500 1000 1500 2000 2500 3000 3500

λi(Γ 1/2

pr F T Γ −1

noiseF Γ1/2 pr ) λi(Γpost)

Spectra of prior-preconditioned data misfit spectrum and the posterior covariance

approximation for a large-scale inverse transport problem with n=275,000

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 23 / 63


Computations required for stochastic Newton

Never need to form dense Hessian:

H = Γ −1/2

T

pr V rΛrV r + I Γ −1/2

pr

H −1 g = Γ 1/2

−1 T

pr V r (Λr + Ir) − Ir V r + I Γ 1/2

pr g (Newton step)

H −1/2 x = Γ 1/2

−1/2 T

pr V r (Λr + Ir) − Ir V r + I x (drawing a sample)

det(H 1/2 ) = (det Γpr) −1/2

r

(λi + 1) 1/2

(accept/reject criterion of M-H)

i=1

Complexity of these operations is scalable (i.e. requires a number of forward PDE solves

that is independent of the parameter dimension) when:

prior-preconditioned data misfit Hessian is compact with mesh-independent

dominant spectrum (ill-posed inverse problem)

dominant spectrum is captured in a number of matvecs that is a constant multiple

of number of dominant eigenvalues (e.g., using Lanczos or randomized SVD)

Hessian-vector products carried out matrix-free using adjoint methods

We also need scalable forward (PDE) solver, inner products (V T r z), daxpys (V rz), and

fast elliptic solvers (Γ 1/2

pr z, where Γ 1/2

pr is the inverse of a Laplacian-like operator).

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 24 / 63


Analytical example of Hessian spectrum

1D advection-diffusion, inversion for initial condition with final time observations

min

u0

L

0

(u(T ) − u obs ) 2 dx + β

2

L

u

0

2 0 dx

where: ut − kuxx + vux = 0 in (0, L) × (0, T )

kux(0, t) = kux(L, t) for t ∈ (0, T )

u(0, t) = u(L, t) for t ∈ (0, T )

u = u0 in (0, L) × {t = 0}

Hessian: jth eigenfunction: e 2πijx/L , jth eigenvalue: e −8j2 π 2 kT/L 2

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 25 / 63


Compactness of the Hessian for inverse medium scattering

Theorem

Let (1 − n) ∈ C m,α

0 , where n is the refractive index, m ∈ N ∪ {0} , α ∈ (0, 1).

The Hessian is a compact operator everywhere.

Time harmonic (Helmholtz equation), noise-free

The proof uses Newton potential theory, Riesz-Fredholm theory and

compact embeddings in Hölder spaces.

The theorem holds for both continuous and pointwise observations.

The decay rate of the eigenvalues of the Hessian as a function of the

smoothness of the refractive index n are also obtained.

If the medium refractive index is analytic, the decay is exponential

Details in: T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering

problems. Part I: Inverse shape scattering of acoustic waves. Part II: Inverse medium scattering

of acoustic waves. Inverse Problems, 28(5):055001-055002, 2012.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 26 / 63


An analytical example for inverse shape scattering

Exponential decay of eigenvalues of Gauss-Newton Hessian

λ 1,m

10 0

10 −5

10 −10

ω b = 2

ω b = 10

ω b = 1000

ω b = 5000

asymptotic ω b = 5000

10

0 5 10 15 20 25

−15

m

circular scatterer with radius a, observations on circle with radius b, incident

plane wave with frequency ω, eigenvalue index is m, ωa = 1

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 27 / 63


Outline

1 Connection between Bayesian and deterministic inverse problems

2 Langevin methods and stochastic Newton

3 Low rank Hessian approximation and scalability

4 Specification and manipulation of prior

5 Forward elastic-acoustic wave propagation solver

6 Example: Full waveform global seismic (linearized) inversion

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 28 / 63


Definition of the prior covariance as the inverse of a

differential operator

Gaussian random field prior, µ0 := N(m0, C0) on L 2 (Ω)

when discretized in high dimensions, construction of a (dense) explicit

covariance matrix between pairs of points in space is prohibitive

employ infinite dimensional framework of Andrew Stuart (Acta Numerica,

2010): covariance operator C0 taken as inverse of an elliptic differential

operator A 2 that is sufficiently smoothing (order >3 in R 3 ) ⇒ the resulting

inverse problem is well posed

we choose C0 = (A 2 ) −1 , where A is the generalized anisotropic Poisson

operator, A(m) := −∇ · (Θ∇m) + αm, with anisotropy tensor Θ and

scalar parameter α, both of which are heterogeneous (to reflect spatiallyand

directionally-varying correlation lengths)

to sample proposal density and compute accept/reject Metropolis-hastings

criterion, we need to form products of discretized square-root covariance

matrix (C0) 1

2 with a vector

squared form of prior covariance operator is convenient, since this amounts

to solution of solving elliptic PDE with operator A ⇒ need fast elliptic solver

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 29 / 63


A hybrid geometric-algebraic multigrid fast elliptic solver

based on forest-of-octrees adaptivity

AMG: ideal for unstructured meshes,

but difficulty scaling to extreme core

counts

GMG: demonstrated good scaling to

O(10 5 ) cores, but challenges for

unstructured meshes

hybrid AMG-GMG:

hexahedral coarse mesh to

resolve geometry

GMG using forest of octrees

adaptivity to define finer meshes

and prolongation & restriction

operators

AMG coarse mesh solver

time(sec)→

50

40

30

20

10

10 3

10 4

cores→

AMG strong

GMG strong

10 5

Figure : strong scaling on Jaguar

XK6 for variable-coefficient Poisson

on adapted spherical mesh

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 30 / 63


Weak scalability of hybrid geometric-algebraic multigrid

based on forest-of-octrees adaptivity

64 512 4096 32768 262144

Setup 2.97 2.64 3.1 3.76 8.6

Smoother 289.7 301.5 336.3 391.3 409.1

Transfer 7.45 8.47 11.5 11.35 15.88

Coarse Setup 1.85 2.13 0.82 1.27 1.63

Coarse Solve 24.3 30.8 18.47 30.1 26.01

Total Time 326.3 345.5 370.2 437.8 461.2

weak scaling of Poisson solve with 400K elements per core

Antarctica mesh with 45K octrees forest (base mesh)

4 pre- and post-smoothing steps

ML AMG solver (from Trilinos) used as coarse grid solver

71% parallel efficiency for 4000× increase in problem size/core count

Details in: H. Sundar, G. Stadler, O. Ghattas and G. Biros, Parallel Geometric Multigrid

Methods on Unstructured Forests of Octrees, SC12, submitted.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 31 / 63


Examples of meshes for hybrid GMG-AMG

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 32 / 63


Outline

1 Connection between Bayesian and deterministic inverse problems

2 Langevin methods and stochastic Newton

3 Low rank Hessian approximation and scalability

4 Specification and manipulation of prior

5 Forward elastic-acoustic wave propagation solver

6 Example: Full waveform global seismic (linearized) inversion

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 33 / 63


The seismic inverse problem in global seismology

Joint work with Steve Grand (UT-Austin) and Don Helmberger & Mike Gurnis (Caltech)

50˚

40˚

30˚

−120˚ −110˚ −100˚ −90˚

48˚

47˚

46˚

45˚

44˚

43˚

42˚

41˚

40˚

39˚

30 sec

Black Hills

38˚

-108˚ -107˚ -106˚ -105˚ -104˚ -103˚ -102˚ -101˚ -100˚ -99˚ -98˚ -97˚

Figure : Left. USArray network of 400 broadband seismic stations with ∼70 km spacing

over 1000 km aperture. Past/present stations in green, future stations in blue. Right. S

waves from deep South American earthquake plotted on top of map of arrival time.

Early arrivals in blue, late arrivals in red. (Courtesy D. Helmberger, Caltech)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 34 / 63


Elastic–acoustic wave equation

Governing equations in velocity-strain form

∂E

∂t

ρ ∂v

∂t

E — strain tensor

S — stress tensor

ρ — mass density

1 T

= ∇v + ∇v

2


in B

= ∇ · (λ tr(E)I + 2µE) + ρf in B

Sn = t bc (t) on ∂B

v = v0(x) at t = 0

E = E0(x) at t = 0

v — displacement velocity

f — body force per unit mass

λ and µ — Lamé parameters

I — identity tensor

t bc — traction bc

v0, E0 — initial conditions

t — time

x — point in the body

B — solution body

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 35 / 63


Forward elastic–acoustic wave equation solver

A discontinuous Galerkin method with the following features:

nonconforming hexahedral elements with Kopriva’s mortar approach for

hyperbolic equations

theory in place for general hp-non-conforming dG

implementation for h-non-conforming dG with 2:1 balance

tensor product Lagrange basis on Legendre-Gauss-Lobatto (LGL) nodes

LGL quadrature (gives diagonal mass matrix)

time integration by classical 4-stage/4-order Runge-Kutta

integrated parallel mesh generation/adaptivity

parallel scalability on petascale supercomputers

L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkin

method for wave propagation through coupled elastic-acoustic media, Journal of Computational

Physics, 229(24):9373–9396, 2010.

T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkin spectral

element method for wave propagations, SIAM Journal on Numerical Analysis, 2012, to appear.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 36 / 63


Jaguar supercomputer at Oak Ridge National Laboratory

Cray XT-5, 224,256 cores (AMD Istanbul 6 cores/proc, 2 proc/node), 2.3 petaflops

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 37 / 63


Global seismic wave propagation

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 38 / 63


Global seismic wave propagation

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 39 / 63


Global seismic wave propagation

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 40 / 63


Global seismic wave propagation

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 41 / 63


Point source approximation of M9 Tohoku earthquake

Animation by Greg Abram, TACC

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 42 / 63


Extreme granularity limits for strong scaling of forward DG

wave propagation solver on (upgraded) Jaguar

#cores cpu per step (ms) elem/core efficiency (%)

256 1630.80 4712 100.0

512 832.46 2356 98.0

1024 411.54 1178 99.1

8192 61.69 148 82.6

65536 11.79 19 54.0

131072 7.09 10 44.9

262144 4.07 5 39.2

table shows wall clock time per time step in ms, elements per core,

and parallel efficiency for 3 orders of magnitude increase in core count

just 1.21 million 3rd order DG elements (694 million unknowns)

parallel efficiency remains at 39% with just 4 or 5 elements/core

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 43 / 63


Adaptive mesh refinement for wave and medium fields

AMR in some cases useful for dynamic tracking of wavefronts

AMR essential for adapting mesh to wavespeed, both statically (for forward

problem) and dynamically (as medium evolves in inverse problem)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 44 / 63


p4est: Parallel forest-of-octrees AMR library

p0 p1

p1 p2

Details in: C. Burstedde, L.C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for parallel

adaptive mesh refinement on forests of octrees, SIAM Journal on Scientific Computing,

33(3):1103–1133, 2011.

Open-source release, interface w/deal.II: W. Bangerth, C. Burstedde, T. Heister, and

M. Kronbichler, Algorithms and data structures for massively parallel generic adaptive finite

element codes, ACM Transactions on Mathematical Software, 30, 2011.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 45 / 63

y0

x0

y1

x1


Weak scalability of p4est-only operations on full Jaguar

Excellent scalability of pure AMR operations over 18,360X range of core count

Percentage of runtime

100

90

80

70

60

50

40

30

20

10

0

Partition Balance Ghost Nodes

12 60 432 3444 27540 220320

Number of CPU cores

Seconds per (million elements / core)

10

8

6

4

2

0

Balance Nodes

12 60 432 3444 27540 220320

Number of CPU cores

Left: Runtime dominated by Balance and Nodes while Partition and Ghost take less than

10% (New and Refine are negligible and not shown).

Right: Weak scaling for 2.3 million elements/core; ideal scaling would result in bars of constant

height. Largest mesh created contains over 513 billion elements and is balanced in 21 s.

Details in: C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton,

L.C. Wilcox, Extreme-Scale AMR, Proceedings of ACM/IEEE SC10 (Gordon Bell Prize Finalist)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 46 / 63


Outline

1 Connection between Bayesian and deterministic inverse problems

2 Langevin methods and stochastic Newton

3 Low rank Hessian approximation and scalability

4 Specification and manipulation of prior

5 Forward elastic-acoustic wave propagation solver

6 Example: Full waveform global seismic (linearized) inversion

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 47 / 63


Gradient and Hessian for full waveform seismic inversion

Would like to compute gradients and Hessian actions w.r.t. C of

J(C) := 1

T

B(x)|v(C) − v

2

obs | 2 dx dt + R(C)

0


where the dependence of v on C is given by solving the forward wave

propagation equations:

−C ∂E

∂t

ρ ∂v

∂t

− ∇ · (CE) = f in Ω × (0, T )

+ 1

2 C(∇v + ∇vT ) = 0 in Ω × (0, T )

ρv = CE = 0 in Ω × {t = 0}

CEn = 0 on Γ × (0, T )

v, E are velocity and strain state variables

C is the fourth rank tensor of uncertain constitutive parameters

ρ and f are known density and seismic source

v obs are observations at receivers, B(x) is an observation operator

R is the prior term

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 48 / 63


The gradient computation

Gradient expression (for general tensor C) given by

T

1

G(C) :=

2 (∇w + ∇wT

) ⊗ E dt + R ′ (C)

0

where v, E satisfy the forward wave propagation equations

−C ∂E

∂t

ρ ∂v

− ∇ · (CE) = f

∂t

in Ω × (0, T )

1

+

2 C(∇v + ∇vT ) = 0 in Ω × (0, T )

ρv = CE = 0 in Ω × {t = 0}

CEn = 0 on Γ × (0, T )

w, D (adjoint velocity, strain) satisfy the adjoint wave propagation equations

C ∂D

∂t

−ρ ∂w

∂t − ∇ · (CD) = −B(v − vobs ) in Ω × (0, T )

+ 1

2 C(∇w + ∇wT ) = 0 in Ω × (0, T )

ρw = CD = 0 in Ω × {t = T }

CDn = 0 on Γ × (0, T )

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 49 / 63


Computation of action of Hessian in given direction

Action of the Hessian operator in direction ˜ C at a point C given by

T

1

H(C)˜C :=

2 (∇ ˜w + ∇ ˜wT ) ⊗ E + 1

2 (∇w + ∇wT ) ⊗ ˜

E dt + R ′′ (C)˜C

0

where ˜v, ˜ E satisfy the incremental forward wave propagation equations

ρ ∂˜v

∂t − ∇ · (C ˜ E) = ∇ · (˜CE) in Ω × (0, T )

−C ∂ ˜ E

∂t

+ 1

2 C(∇˜v + ∇˜vT ) = 0 in Ω × (0, T )

ρ˜v = C ˜ E = 0 in Ω × {t = 0}

C ˜ En = − ˜ CEn on Γ × (0, T )

and ˜w, ˜ D satisfy the incremental adjoint wave propagation equations

∂ ˜w

−ρ

∂t − ∇ · (C ˜ D) = ∇ · (˜CD) − B˜v in Ω × (0, T )

C ∂ ˜ D

∂t

+ 1

2 C(∇ ˜w + ∇ ˜wT ) = 0 in Ω × (0, T )

ρ ˜w = C ˜ D = 0 in Ω × {t = T }

C ˜ Dn = − ˜ CDn on Γ × (0, T )

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 50 / 63


Application to synthetic global seismic inversion

solve for anomaly from radially-varying PREM model (left)

data: from laterally-varying S20RTS model (right)

Details in: T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, L.C. Wilcox,

Extreme-scale UQ for Bayesian inverse problems governed by PDEs, SC12, submitted.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 51 / 63


The prior

±2σ fields prior samples

ground truth

Prior is defined by square of generalized anisotropic Poisson operator

A := −α∇ · Θ∇ + α, with

Θ = β I3 − θ(r)rr T ⎧

⎨1

− θ

with θ(r) := r


2

2

2r − r

if r = 0

0 if r = 0,

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 52 / 63


Synthetic example: Many receivers

inversion field: cp in acoustic wave equation

prior mean: PREM (radially symmetric model)

“truth” model: S20RTS (Ritsema et al.), (laterally heterogeneous)

parameter field representation: piecewise-trilinear on same mesh as

forward/adjoint high-order dG fields

source: smoothed point source in space, Gaussian source-time function

3 simult. sources, 130 receivers, 10km

depth

1200 seconds simulation time

3rd order DG elements for wavefield

72841 elements, 38.6M dofs

67770 inversion parameters

600 Lanczos iterations for low rank Hessian

0.05 Hz frequency

runtime: ∼20 hours on 2400 cores

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 53 / 63


Synthetic example: Many receivers

Comparison of “ground truth” (S20RTS Earth model, left) with MAP solution (right)

black dots = 3 earthquake sources; white dots = 130 receivers

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 54 / 63


Synthetic example: Many receivers

Comparison of “ground truth” (S20RTS Earth model, left) with MAP solution (right)

black dots = 3 earthquake sources; white dots = 130 receivers

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 54 / 63


Synthetic example: Many receivers

Comparison of “ground truth” (S20RTS Earth model, left) with MAP solution (right)

black dots = 3 earthquake sources; white dots = 130 receivers

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 54 / 63


Synthetic example: Many receivers

Comparison of “ground truth” (S20RTS Earth model, left) with MAP solution (right)

black dots = 3 earthquake sources; white dots = 130 receivers

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 54 / 63


Synthetic example: Many receivers

Comparison of “ground truth” (S20RTS Earth model, left) with MAP solution (right)

black dots = 3 earthquake sources; white dots = 130 receivers

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 54 / 63


Synthetic example: Many receivers

Comparison of “ground truth” (S20RTS Earth model, left) with MAP solution (right)

black dots = 3 earthquake sources; white dots = 130 receivers

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 54 / 63


Synthetic example: Many receivers

Comparison of “ground truth” (S20RTS Earth model, left) with MAP solution (right)

black dots = 3 earthquake sources; white dots = 130 receivers

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 54 / 63


Samples from prior and posterior distributions

Bayesian solution of an inverse problem in 3D global seismic wave propagation

1.07 million uncertain earth model parameters

630 million wave propagation unknowns

up to 262K cores on Jaguar at ORNL

2000× reduction in problem dimension (∼500 dominant eigenvectors)

Top row: Samples from prior

Bottom row: Samples from the posterior

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 55 / 63


Synthetic example: Single receiver

inversion field: cp in acoustic wave equation

prior mean: PREM (radially symmetric model)

“truth” model: S20RTS (Ritsema et al.), (laterally heterogeneous)

parameter field representation: piecewise-trilinear on same mesh as

forward/adjoint high-order dG fields

source: smoothed point source in space, Gaussian source-time function

single source and receiver at 10km depth

2000 seconds simulation time

3rd order elements for wavefield

72,841 elements, 42M wavefield dofs

67,770 cp inversion parameters

200 Lanczos iter. for low rank Hessian

339× speedup

0.04 Hz frequency

runtime: ∼15 hours on 960 cores

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 56 / 63


Single receiver: Data-induced reduction of variance

Variance reduced mainly near surface; also at core-mantle boundary

Γpost ≈ Γpr − Γ 1/2

pr V rDrV T r Γ 1/2

pr

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 57 / 63


Eigenvectors of prior-preconditioned misfit Hessian

Eigenvectors corresponding to 1st, 3rd, 5th, 8th, and 13th largest eigenvalues

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 58 / 63


Spectral decay for refined parameter meshes

Spectrum of prior-preconditioned misfit Hessian for global seismic inversion problem

eigenvalue

10 7

10 6

10 5

10 4

10 3

10 2

10 1

10 0

40,842 parameters

67,770 parameters

431,749 parameters

10

0 100 200 300 400 500 600 700

−1

number

largest 700 eigenvalues of prior preconditioned data misfit Hessian for

different discretizations

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 59 / 63


Summary and conclusions

Scalability of UQ for inverse wave propagation in framework of Bayesian inference

requires:

MCMC sampling algorithm that scales independent of problem dimension

(stochastic Newton—but does it?)

Compactness of local data-misfit Hessian operator (shown for frequency

domain wave propagation)

Extraction of low rank approximation of data-misfit Hessian (randomized

SVD)

Hessian matvecs implemented through consistent first and second order

adjoint wave solver

Scalability of forward/adjoint wave propagation solver (explicit discontinuous

Galerkin scaling to 262K cores)

Scalability of AMR algorithms (forest of octrees + space filling curves

scaling to 224 cores)

Scalability of elliptic solve for action of prior operator (hybrid GMG-AMG on

forest of octrees scaling to 262K cores)

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 60 / 63


Summary and conclusions, continued

Demonstrated on linearized inverse problems in 3D global seismology

with 1M earth model parameters on up to 100K cores, 630M forward

unknowns, leading to 3 orders of magnitude dimension reduction

No time to mention discretization issues, but these are critical

Challenges/issues

what can be proven about dimension dependence of number of

samples?

despite low rank approximation, Hessian is still expensive (the constant

can be large); what sort of Hessian reuse is possible?

when third derivative of negative log posterior changes rapidly, local

Gaussian approximation may not be helpful; what can be done in this

case? (analogy with globalization strategies in deterministic

optimization)

can third derivative information be incorporated in a tractable way?

how to incorporate multilevel information?

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 61 / 63


References

T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part II: Inverse medium

scattering of acoustic waves. Inverse Problems, 28(5):055002, 2012.

T. Bui-Thanh and O. Ghattas, Analysis of the Hessian for inverse scattering problems. Part I: Inverse shape scattering

of acoustic waves. Inverse Problems, 28(5):055001, 2012.

T. Bui-Thanh and O. Ghattas, Analysis of an hp-non-conforming discontinuous Galerkin spectral element method for

wave propagation, SIAM Journal on Numerical Analysis, 2012, to appear.

C. Burstedde, L.C. Wilcox, and O. Ghattas, p4est: Scalable algorithms for parallel adaptive mesh refinement on forests

of octrees, SIAM Journal on Scientific Computing, 33(3):1103–1133, 2011.

H.P. Flath, L.C. Wilcox, V. Akcelik, J. Hill, B. van Bloemen Waanders, and O. Ghattas, Fast algorithms for Bayesian

uncertainty quantification in large-scale linear inverse problems based on low-rank partial Hessian approximations, SIAM

Journal on Scientific Computing, 33(1):407–432, 2011.

L.C. Wilcox, G. Stadler, C. Burstedde, and O. Ghattas, A high-order discontinuous Galerkin method for wave

propagation through coupled elastic-acoustic media, Journal of Computational Physics, 229(24):9373–9396, 2010.

C. Burstedde, O. Ghattas, M. Gurnis, T. Isaac, G. Stadler, T. Warburton, L.C. Wilcox, Extreme-Scale AMR,

Proceedings of ACM/IEEE SC10, 2010.

J. Martin, L.C. Wilcox, C. Burstedde, and O. Ghattas, Uncertainty quantification in inverse problems with stochastic

Newton MCMC, SIAM Journal on Scientific Computing, 2012 to appear.

T. Bui-Thanh, O. Ghattas, and D. Higdon, Adaptive Hessian-based non-stationary Gaussian process response surface

method for probability density approximation with application to Bayesian solution of large-scale inverse problems, SIAM

Journal on Scientific Computing, 2011, submitted.

T. Isaac, C. Burstedde, and O. Ghattas, Low-Cost Parallel Algorithms for 2:1 Octree Balance, Proceedings of IPDPS 12,

2012.

T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, L.C. Wilcox, Extreme-scale UQ for Bayesian inverse

problems governed by PDEs, SC12, submitted.

T. Bui-Thanh, C. Burstedde, O. Ghattas, J. Martin, G. Stadler, and L.C. Wilcox, Scalable algorithms for uncertainty

quantification in high dimensional inverse problems, manuscript, 2012.

T. Bui-Thanh, G. Stadler, L.C. Wilcox, C. Burstedde, and O. Ghattas, Analysis of the adjoint for a high-order

discontinuous Galerkin method for wave propagation through coupled acoustic-elastic media, manuscript.

T. Bui-Thanh, L. Demkowicz, and O. Ghattas, A Unified discontinuous Petrov-Galerkin method and its analysis for

Friedrichs’ systems, SIAM Journal on Numerical Analysis, 2011, submitted.

H. Sundar, G. Stadler, O. Ghattas and G. Biros, Parallel Geometric Multigrid Methods on Unstructured Forests of

Octrees, SC12, submitted.

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 62 / 63


Acknowledgements

Research supported by:

AFOSR grant FA9550-09-1-0608 (Computational Math)

NSF grants OCI-0749334 (PetaApps), DMS-0724746 (Collaborations

in Mathematical Geosciences), CMMI-1028889 (CDI), OPP-0941678

(CDI)

DOE grants DE-FG02-08ER25860 (ASCR), DE-SC0006656

(SciDAC), DE-SC0002710 (SciDAC), DE-FC52-08NA28615 (NNSA)

Resources on ORNL Jaguar Cray XT-5/XK-6 supercomputer provided

through ALCC award at ORNL Leadership Computing Facility

Resources on TACC Lonestar and Longhorn systems provided through

award from TACC

Omar Ghattas (ICES, UT-Austin) UQ for Large Scale Inverse Problems UCM 2012 63 / 63

More magazines by this user
Similar magazines