Introduction to Krylov subspace methods - IMAGe

Introduction to Krylovsubspace methodsbyAmik St-CyrInstitute for Mathematics Applied to Geosciences1

Overview• Basic iterative methods:• Matrix splitting• Minimization• Gradient and conjugate gradient• Projection methods 2

Why iterative methods?• Most discretizations of PDE’s lead to largelinear systems• LU or factorizations require too muchmemory: fill in is a problem...• Why:• Preserve the structure of the matrix• Based only on matrix-vector products 3

(2) Ax = b(2) Ax = brve the structure of the matrix (not change it)bMatrix splittingsIterative methodIterative methodSuppose A = MIterative Suppose − N (⇒ A = method M −1 −A N = (⇒I −M −1 −1 A N, = IM− −1Mtive methodSolve: Au = f Suppose:A = M − N (⇒ M −1 A = I − M(M − N)u = fose A = M − N (⇒ M −1 A = I − M −1 N, (M M −1 −N N)u = I= −fM −1 A)n = f − Au n ,(M − N)u = fDefine: Let r n = f − Au n ,Mu n+1 = Nu n + fMu n+1 = (M NuMu n −+ n+1 N)u f = Nu fn + fu n+1 = M −1 Nu n + M −1 fu n+1 = M −1 Mu uNu n+1 n+1 n = + M Nu −1 −1n Nu f+nu n+1 = (I −uM n+1 n+1 −1 = A)u (IM n − −1 + M N−u n+1 = u n u+ M n+1 n+1 −1 =(f u(I − n +Au −M n −)u n+1 = (I − M −1 A)u n + M −1 fu n+1 = u n + M −1 (f − Au n )Let r n = f − Au n ,u n+1 = u n + MStationary iterative Let method: r n = f − Au n ,u n+1 = u n u n+1+ M −1 =r n u n + M −1 r n : statio: stationary iteratiu n+1 = u n + M −1 r n : stationary iterative method4

et r n = f − Au n ,Let r n = f − Au n ,n+1 n −1 n stationary iterative methodu n+1 = u n + M −1 r n : stationary iterative methodn en u 0 0Given:u n n → u u ?0nvergencevergence :: eu n n = → u n u −?uGiven u 0Matrix splittingsConvergence : e n = u n − uu n+1 = u n + M −1 r n : stationary iterativeu n → u ?Convergence Error: : e n = u n − uu n+1 n+1 = u n + M −1 −1 (f (f − Au Au n n ))e n+1 n+1 = e n + M −1 −1 (f (f − Au Au n ))u n+1 = u n + Me n+1 = e n + Mu n+1 = u n + M −1 (f − Au n )= e n + M −1 −1 A(u − u n ))e n+1 = e n + M −1 (f − Au n )= e n − M −1 Ae n= e n + M −1 A(u − u n )= e n + M= e n − M∴ e n+1 = (I − M −1 = A)e e n − n M −1 Ae n∴∴ e n+1 = (I − M −1 A)e n5e n+1 = (I − M

Theorem:Matrix splittingsKRYLOV5 KRYLOVSUBSPACESUBSPACEMETHODSMETHODSheorem Theorem 5.1. For 5.1. A For ∈ R n×m A ∈ Rwith n×m rank(A) with rank(A) = n and = nɛ and > 0, ɛ ∃ > 0, ‖ · ∃ ‖ s.t. ‖ · ‖ s.t.ρ(A) ≤ ρ(A) ‖A‖ ≤ ‖A‖ ρ(A) ≤+ ρ(A) ɛ. + ɛ.ince eSince n = (Ie n −= M(I −1 −A) M n −1 e 0 , A) n e 0 ,‖e n ‖ ≤ ‖I − M −1 A‖ n ‖e 0 ‖ ≤ ( ρ(I − M −1 A) + ɛ ) n‖e n ‖ ≤ ‖I − M −1 A‖ n ‖e 0 ‖ ≤ ( ρ(I − M −1 A) + ‖e ɛ 0 ) n‖ ‖e 0 ‖= ρ n (1 + ɛ ρ )n ‖e 0 ‖ ≤ ρ n e ɛn ρ= ρ n (1 + ɛ ‖e 0 ‖.ρ )n ‖e 0 ‖ ≤ ρ n e ɛn ρ ‖e 0 ‖.ɛρince ρe Since ɛ/ρ ≤ρe1 ɛ/ρ ⇔≤ 1 ≤⇔− ln≤ ρ −⇔ ln ρ < ⇔1, ρ to< converge 1, to convergeɛ ρρ(I − ρ(I M −1 −A) M< −1 1, A) < 1,here where A = MA −= NM and − Nρ(A) and= ρ(A) max{|λ = max{|λ i | s.t. i λ| i s.t. = eigenvalue λ i = eigenvalue of A}. of A}.6

Gauss–Seidel A = L + D + U ; M = L + D, −N = Uand ρ(A) = max{|λ− N and Special ρ(A) splittings i | s.t. λ i = eigenvalue of A}.otice that we have the following= u n+1 4-stepsmax{|λ:= u n + i | Ms.t. −1 (fλ− i Au = n eigenvalue ) o(1) Mz n = f − Au n = r ningssplittingsMz Mz n = n r= n r n(1) Mz n = f − Au n = r n(2) ˜z n ˜z n = = Az nAz nMatrix splittingsu n+1 = u n + M −1 (f − Au n )[1 ] Jacobi A = D + L + U ; M = D, −N = L• Special splittings:(3) u n+1 = u n + ˜z n[2 ] Gauss–Seidel A = L + D + U ; M = L + D(3) u n+1 = u n + ˜z n+ L + U ; M = D, −N = L + UA = D + L + U ; M = D, −N = L + U(4) r n+1 = r n − ˜z n• Jacobi: Notice that we have the following 4-steps :• Gauss-Seidel: 5 KRYLOV SUBSPACE METHODSM = ( 1 1 − wD Def. + L), Condition −N number = (U of a matrix − A,w w D)Richardson’s iteration:(4) r n+1 = r n − ˜z n( ∵ Au n+1 = Au n + AM −1 r n ; f − Au n+1 = f − Au n − AM −1 r n ;r n+1 = r n − ˜z n )l A = L + D + U ; M = L + D, −N = U–Seidel Richardson’sr n+1 = A Iterationr n = − L ˜z(for n + )preconditioned D + U version ; M by= M) L + D, −N = Ue have the following 4-steps :u n+1 = u n + τM −1 (f − Au n ), τ ∈ R, ρ(I − M −1 A) < 1Richardson’s hat we have Iteration the (for following preconditioned 4-steps version : by M)( ∵ Au n+1 = Au n + AM −1 r n ; f − Au n+1 = f − Au n − A• SOR:•(1) Mz n = f − Au n = r nhere τ is a relaxation parameter, 0 ≤ τ ≤ 2−1 A.u n+1 = u n + M −1 (fu n+1 Mz= n = r u n+1 u n + n M −1= u n (f −+ M −1 Au n )For the case of Richardson,(f − Au n )u n+1 = u n + τM −1 (f − Au n ), τ ∈ R, ρ(I − M −1 A) < 1here τ is− Au n τ a(2)relaxation= r n ˜z n =parameter,Az n 0 ≤ τ ≤ 2optimal =, ρλ min + λ max optimal = λ max − λ minλ max + λ minn n n−1 A.2λ max,λ max : eigenvaluesκ(A)of= λ max.λ minλ max : eigenλ max, ρ = κ(M −1 A) − 1κ(M −1 A) + 1 . 7

on-stationary approaches :son,hes :u n+1 =ρ = κ(M u n + −1 τ n A) M −1 −(f 1Iterative methodsκ(M −1 A) + 1 .− Au n )κ(M −1 A) + 1 .(Multi-step)k∑Non-stationary approaches u n+1 : = u n +u n+1 = u n i=0k∑k∑ + τ n M −1 (f − Au n )Let’s callu n+1 Ã =uM = n+1 −1 Au n and+ = u ˜f = n M+−1 f.n∑(Multi-step)i=0 u i=0k∑n+1 = u n + (k∑˜f − Ãun )i=0τ n+1 M −1 (f − Au n )u n+1 = u n u+n+1 = u n + ˜r ni=0i=0Let’s call Ã = M −1 Auand n+1 ˜f = M˜ru n n+1 −1 + f.= ˜r( ˜f n − Ã˜rnÃun )˜r n+1 = (I −u n+1 u n+1 = u= n u+ n + ˜r nÃ)˜rn( ˜f Ãun )u ˜r n+1 = (Iu n − Ã)n+1˜r + n0u n+1 ∴= u˜r n = + (I ˜r n − Ã)n˜r 0˜r n+1 = ˜r n − Ã˜rn ˜r n+1 ˜r n = P˜r n (Ã)˜r0 − Ã˜rn , P n (0) = I˜r n+1 = (I − Ã)˜rnρ = κ(M −1 A) − 1(Multi-step)u n+1 = u n + τ n M −1 (f − Au n )ulti-step)Non-stationary approach:τMulti-steps: u n+1 = u n n+1 M −1 (f − Au n )+ τ ni M −1 (f − Au n )n+1 = u n + τ n M −1 (f − Au n )1 = u n +et’s call Ã = M −1 A and ˜f = M −1 f.u n+1 = u n + τ n M −1 (f − ALet’s call Ã = M −1 A and ˜f = M −1 f.nd ˜f = M −1 f.u n+1 = u n + ( ˜f − Ãun )u n+1 = u n + ˜r nτ n+1 M −1 (f − Au n )˜r n+1 = ˜r n − A˜r nτ n+1 M −1 (f − Au n )τ n+1 M −1 (f −u n+1 = u n + ( ˜f − Ãun )˜r n+1 = ˜r n − Ã˜rnwhere P n : monic polynomial of degree n. 8

For a SPD matrix A, considerg(x) := 1 2 〈Ax, x〉 − 〈b, x〉 where 〈u, v〉 := m∑Minimization i=1g(x) := 1 m2 〈Ax, x〉 − 〈b, x〉 where 〈u, v〉 := ∑u i v i . i=1u i v i .ThenThen(∇g(x)) k = dgdmdx k= 1 d ∑ma ij x i x j −ddx kdxdx i,j=1kk 2 dx k= 1 i,j=1m∑a ij δ ik x j + 1 m∑a ij δ jk x i −2m∑2m∑(∇g(x)) k = dg= 1 2= 1 2i,jai,jij δ ik x j + 1 ∑ 2= 1 ∑i,j a kj x j + 1 22∑j∴ j (∇g) k =ia ik xi,ji − b km∑a kj ix j − b kj=1∇g = Ax − b = 0 ⇐⇒ Ax ∗ = b,Condition at the minimum:m∑i=1b i x ii=1m∑b i δ ikai=1 ij δ jk x i −x ∗ : solution of Ax = bIt means that the solution of Ax = b is also the minimum of g.j=1Now, what’s the best and simplest method to find a minimum?Starting with a ”guess” x 0 ∈ R m and a non-zero vector p ∈ R m with step length αdetermined in a way to minimizex 1 = x 0 + αpa kj x j + 1 2∴ (∇g) k =∇g = Ax − b = 0 ⇐⇒ Ax ∗ = b,a ij x i x j −∑a ik x i − b km∑a kj x j − b kddx km ∑b i x im∑b i δ iki=1x ∗ : solution of Ax = beans that the solution of Ax = b is also the minimum of g.11

jx j − b kMinimizationjm∑•j=1Simplest way to find j=1 a minimum?∇g = Ax − b = 0 ⇐⇒•∇g = Ax − b = 0 ⇐⇒ AxStart with a guess ∗ Ax ∗ = b,= b,solution• and search directionx 1 = x 0 + αp• controlled g(x 1 ) = g(x 0 by + αp) a step sizej=1∴ (∇g) k ∑ j − b k= 1 ∑= ∴ 1 (∇g) a kja kj x j + 1 x∑j + 1 2 k = aa2kj x ja ik −xb ik− b kj2 j=1 ∇g 2 = Axik xi− bj=1bk= 0 ⇐⇒ Ax ∗ = b,Ax − b = 0 ⇐⇒ It means Ax ∗ = that ∴ b, (∇g) the xm∑∗ k = : solution solution a kj x j −of bof k Ax Ax = = b is b also thx ∗ : solution of Ax = bolutionNow, what’sof Axthe=best Now, b isand also what’s simplestthemethod minimum best to and of g. simplest methodx ∗ xfind ∗ : solutiona minimum?of Ax = beminimum Starting with a ”guess”best and simplest of Starting g.x 0 ∈ Rmethod with m and a non-zero: solutionvectora to ”guess” find axminimum?0 ofp ∈Ax∈ R R m =withbstep lengthand a non-zeItdetermined It meansmeans thatin thatthe a solution waythetosolutionminimizeof Ax = b is also the of g.of Ax = b is also the minimum of g.to ess” find Now,Now,x 0 what’s ∈a what’sRminimum?the m determined besttheand andbesta simplestand simplestnon-zeroinmethod a way methodvector findto minimizeafindpminimum?a minimum?∈ R m with step leny to minimizeodetermined vector determined pa way in ∈a Rto way minimizem towith minimizestep length αWex 1 get:= x 0 + αpx 1 ) = g(x 0 + αp)2 i,j2∑ m∑ i,j ∑ i=1∇g = Ax − b = 0 ⇐⇒ Ax ∗ = b,It means that the solution∴ of (∇g) Ax = k b=is also a kj the x j −minimum b k of g.StartingStartingwithwitha ”guess”a ”guess”x 0 ∈ Rx 0 m ∈andR m aandnon-zeroa non-zerovectorvectorp ∈ R m pwith∈ R m stepwithlengthstepαlengx 1 = x 0 + αpg(x 1 ) = g(x 0 + αp)= 1 2 〈A(x0 + αp), (x 0 += 1 2 〈A(x0 dg(x 1 dg(x ) + 1 αp), (x 0 dg(x αp)〉 1 )− 〈b, x 0 + αp〉dα= α〈Ap, )But:p〉 + 〈Ax0 − b, p〉 =dα = 0 and α〈Ap, then we p〉 find + ”α”dα= α〈Ap, p〉 + 〈Ax0 〈Ap, − b, p〉 p〉 = 0 and then we find ”α”〈Ax0 − b, p〉Leads to optimal step size: ∴ α = 〈r0 , p〉ix 1 = x= 1 1 0 + = αp x 0 + αp2 〈A(x0 + αp), (x 0 + αp)〉 − 〈b, x 0 + αp〉g(x 1 ) g(x = g(x 1 ) = 0 + g(x αp)0 + αp)dg(x 1 )dα= = α〈Ap, 1 2 〈A(x0 = p〉 1 + αp), (x 0 + αp)〉 − 〈b, x 0 + αp〉2 〈A(x0 〈Ax0 − b, p〉+ αp), (x 0 = 0 and then we+ αp)〉 − 〈b, x 0 find ”α”+ αp〉α = −〈Ax0 − b, p〉p)〉 − 〈b, x 0 α = −〈Ax0 − b, p〉α = −〈Ax0 − b, p〉+ αp〉 0〈Ap, p〉〈Ap, p〉〈Ap, p〉x ∗ : solution of Ax = b012

SUBSPACE METHODSKRYLOV SUBSPACE METHODS 2Minimizationdon’t decide the p yet. Problem if r 0 ⊥ p ?cide the p yet. Problem if r 0 ⊥ p ?ice thatWhere division is ensured because of SPDnessWhat is p ?∣d 2 g ∣∣∣xdα 2 0 +αpGradient Methodo known as(AKA) ”Steepest Descent(SD)”= 〈Ap, p〉 = p T Ap > 0x 1 = x 0 + 〈r0 , p〉〈Ap, p〉 p (5.4• What is a good search direction?m (5.4), taking p = r 0 ,∣d 2 g ∣∣∣xdα 2 0 +αp= 〈Ap, p〉 = p T Ap > 0THODS 23blem if If r 0 ⊥ p ?p ?dient Methodx 1 = x 0 + 〈r0 , p〉〈Ap, p〉 pthe method “stalls”p = −∇g = −(Ax − b) = b − Ax = r= 〈Ap, p〉 = p T Ap > 013

adient Method Methodat is p ?x k+1 = x k + 〈rk , r k 〉What is p ?〈Ar k , r k 〉 rkSteepest descent (SD)Gradient MethodAx k+1 = Ax k + 〈rk , r k 〉〈Ar k , r k 〉 Arkown as(AKA) ”Steepest Descent(SD)”b − Axp = p −∇g = −∇g = −(Ax = −(Ax k+1 = b − Ax− b) − b) = Ax b k − 〈rk , r k 〉2 Gradient Method= 〈Ar Ax rk , = r k 〉 rArkp = −∇g = −(Ax − b) = b − Ax = r.4), Take: taking p = r 0 p r,0 = , r 0 r k+1 = r k − 〈rk , r k 〉〈Ar,k , r k 〉 Arkg(xx k+1 x k+1 = x k + 〈rk , r k 〈r〉k+1 x, 0 r k 〉 = 〈r k , r k 〉 − 〈rk , r k 〉 1 , x 2 ) = Notice:= x + 〈rk , r k 〉x k+1 〈Ar k , r= k 〉x k rk+ 〈rk , r k 〉 x 0 〈Ar k , r k 〉〈Ark , r k 〉∴〈Ar k 〈r k+1 〈Ar , r k , k r k 〉, rk 〉 =r k 0〉 rk : orthogonalAx k+1 = Ax k + 〈rk , r k 〉〈Ar k , r k 〉 Ark∴ 〈r k+1 , r k ∴ 〉 = 〈r 0 k+1 : orthogona , k 〉 = 0own as(AKA) as(AKA) ”Steepest ”Steepest Descent(SD)” Descent(SD)” r k+1 = r k − 〈rAx k+1 = Ax k + 〈rk , r k 〉Ax k+1 = Ax〈Ar k + 〈rk , r k 〉〈Ar k , r k 〉 Ark k 〈Ar , r k k 〉, Arkr k x〉 Ark 0b − Ax k+1 = b − Ax k − 〈rk , r k 〉r k+1 = r k − 〈rk , r k 〉b − Ax k+1 = b − Ax k − 〈rk , r k 〉b − Ax k+1 〈Ar k , r= k 〉 Arkb − Ax〈Ar k − 〈rk g(x, 1 ,r k x〉2 ) = kk 〈Ar , r k k 〉, Arkr k 〉 Ark〈r k+1 , r k 〉 = 〈r k , r k 〉 − 〈rk , r k 〉〈Ar k , r k 〉〈Ark , r k 〉r k+1 = r k − 〈rk , r k 〉r k+1 = r〈Ar k − 〈rk , r k 〉k , r k k〉 Arkk Ark∴ 〈r k+1 , r k 〉 = 0 : orthogonal2D example (2x2 matrix!)so known as(AKA) ”Steepest Descent(SD)om (5.4), taking p = r 0 ,b − Ax= b − Ax〈Ar k , r〈A〈r k+1 , r k 〉 = 〈r k , r k 〉∴ 〈r k+1 , r k 〉 = 0 :p = −∇g = −(Ax − bg(x 1x 0 14x k+1 = x k + 〈rk , rg

Steepest descent (SD)• Homework:• Show:5 KRYLOV SUBSPACE METHODSShow that SD converges as :‖e n ‖ A ≤( κ(A) − 1κ(A) + 1) n‖e 0 ‖ A , where ‖v‖ 2 A = v T Av with A : SPDe.g. Orthogonal basis of R 2 : d 0 , d 1 (d 1 ⊥ d 0 )e 0 = α 0 d 0 + α 1 d 1 ↑α 1 d 115

thod〈r 〈Ar , r k , 〉 r k = 〉〈r , r 〉 − 〈Ark , rWe would like...r k+1 = r k − 〈rk , r k 〉〈Ar k , r k 〉 Ark∴ 〈r k+1 , r k 〉 = 0 : orthogonteepest0Descent(SD)”g(x 1 , x 2 ) = k, r k 〉 = 〈r k , r k 〉 − 〈rk , r k 〉r k+1 , r k 〉 = 0〈Ar k , r k 〉〈Ark , r k 〉x 0g(x 1 , x 2 )two iterations...x 0 g(x 1 , x 2 ) = k: orthogonal= −∇g = −(Ax − b) = b −0 ,g(x 1 , x 2 ) = k16

〈Ap, p〉r k 〉 = 〈r k , r k 〉 − 〈rk , r k 〉hat is p ?x k + 〈rk , r k 〉Orthogonal basis...+1 , r k 〈Ar∴ 〉 = k , r〈r 0 k 〉 rkk+1 : orthogonal , r k 〉 = 0 : orthogonalAx k + 〈rk , r k 〉〈Ar k , r k 〉 Ark〈r k+1 , r k 〉 = 〈r k , r kGradient • Methodb − AxWith an orthogonal basis we could make thek − 〈rk , r k 〉d5〈Ar algorithm k KRYLOV , r k 〉 Arkconverge SUBSPACE in at most METHODS ∴ m 〈rsteps!k+1 , r k 〉 = 0r k − 〈rk , r• k 〉nown 〈Ar k , However as(AKA) r k 〉 Ark this implies ”Steepest knowing the Descent(SD)errorwhich means knowing the solution!pest 〈r k , r k 〉 Descent(SD)”−Show 〈rk g(x, r k 〉 1that , x 2 ) = x 0 〈Ar k , r k 〉〈Ark , r k converges〉as :g(x ( 1 , x 2 ) = kk 〉 = 0 : orthogonal p κ(A)Basis:= −−∇g 1,= −(Ax − b〈r k+1 , r k 〉 = 〈r k , r k 〉 − 〈rk , r k 〉〈Ar k , r k 〉〈Ark , r k 〉〈Ar k , r k 〉〈Ark , r k 〉 r k+1 = r k − 〈〈A∇g = −(Ax − b) = b − Ax = r(5.4), taking p = r 0 ,g(x 1 , x 2 ) = k) n d 0 d 1‖e n ‖ A ≤‖e 0 ‖ A ,κ(A) + 1 e 0 where ‖v‖ 2 A =x d 1 e 1e.g. Orthogonal basis of R 2 : d 0 , d 1 (d 1 ⊥ d 0 )x 1d 0e 0 = α 0 d 0 + α 1 d 117kg(x

king p = d :ing g p p = = d d : :From graphic...causeuseα dOrthogonalx 1 = x 0 + αbasis...0 d 0x 1 x= 1 = x 0 x+ e 0 1 + α⊥ 0 α d 0 d 0 0e〈d 1 e i ⊥ 1 , ⊥e d i+1 0 d 0〉 = 0 ?〈d 〈d i ,〈d i e, i i+1 ,ee i+1 i 〉 + = 〉 =α0 i ? d0 i ?〉 = 0〈d 〈d i , e i , i e+ i + α i αd i 〉 d i = 〉 = 0 0⇒ α i = − 〈di , e i 〉⇒ α i = − 〈di , e i 〉⇒ α i = − 〈di , e〈d i 〉i , d i (need x !)〉〈d i , d i (need x !)〈d i , d i 〉〉(need x !)∴ α i = − 〈di , e i 〉 A∴ α i = − 〈di , e i 〉 A〈d i , d i = − 〈di , Ae i 〉〉 A 〈d i , Ad i 〉 = 〈di , r i 〉∴ α i = − 〈di , 〈d e i 〉 i A , d〈d i , d i = i = − 〈di , Ae i 〉〉− 〈di , Ae 〉〉〈Ad i , d i A 〈d i , Ad i 〉 = 〈di , r i A 〈d i , Ad i 〉 = 〈di , r i 〉〈Ad 〉 i , d i 〉〈Ad i , d i 〉〉Replace with inner product based on matrix:Not error dependent〈x, y〉 A = x T Ay = 〈x, Ay〉 = (Ax) T y = 〈Ax, y〉 for SPD A〈x, 〈x, y〉 y〉 A = x T Ay = 〈x, Ay〉 = (Ax) T A = x T Ay = 〈x, Ay〉 = (Ax) T y y = 〈Ax, 〈Ax, y〉 y〉 for for SPD SPDAAAe i = A(x i − x) = Ax i − Ax = Ax i − b = −r iAe Ae i = i = A(x A(x i − i −x) x) = = Ax Ax i − i −Ax Ax = = Ax Ax i − i −b = b = −r −r i i(Details)18

∴α i = − 〈di , e i 〉 A〈d i , d i 〉 A= − 〈di , Ae i 〉〈d i , Ad i 〉 = 〈di , r i 〉〈Ad i , d i 〉becauseA-conjugate〈x, y〉 A = x T Ay = 〈x, Ay〉 = (Ax) T y = 〈Ax, y〉 for SPD AAe i = A(x i − x) = Ax i − Ax = Ax i − b = −r iDefinition: linearly independent vectors are said to formDef. Linearly independent vectors (p 1 , p 2 , · · · , p n ) with propertyan A-orthogonal basis wrt A ifRYLOV SUBSPACE METHODSKRYLOV SUBSPACE 〈p i , Ap j 〉 METHODS = 〈p i , p j 〉 A = 〈Ap i , p j 〉 = 0 ∀i ≠ j2LOV SUBSPACE METHODS 25are said to form an A-orthogonal basis w.r.t. A.or A Goal: = A convergence in at most m T , we have D = p T ]]steps ...??For A = A Ap T , we have D = p T Ap = diag[λ 1 λ 2 · · · ·] λ m with withD D : :: diagonA atrix, = p := Ap T := , we A-orthogonal have D = column p T Ap vectors = diagand[λ 1 λ 2 · · · λ m with D : diagonalp := A-orthogonal column vectors andg(x) = = x x T T Ax − b T x,x = py⇒ ⇒g(x) g(y) g(y) = x= y TT yAx T p T p T − Apy b T x,− b T x py = pyy T Dy − ˜by˜by⇒ g(y) = y T p T Apy − b T py = T Dy − ˜by∴ λ i yi 2 = ˜b i y i ⇒ y i = ˜b ∴∴λλi yi 2 ˜b i i i yi 2 = ˜b i y i ⇒ y i = ˜b i,iλ i ihere λ,re e λ i λ i i are i are eigenvalues of A.eigenvalues of A.19

g(x) = x Ax − b x, x = py ]e D = p T Ap = diag[λ 1 λ 2 · · · λ ml column vectors andg(x) = x T Ax − b T x,ConjugateWe now,have :gradientλ ix = py(y) = y T p T Apy − b T py = y T Dy − ˜by• Start with one step of SD•,λ i∴ λ i y 2 i = ˜b i y i ⇒ y i = ˜b iof A.an A-conjugate basis cheaply?x k = x (k−1) + α (k−1) pr k = r (k−1) − α (k−1) Apα (k−1) : = 〈r(k−1) , p〉〈Ap, p〉with D : diagonalg(y) = y T p T Apy − b T py = y T Dy −How to ˜by find/generate an A-conjugate basis cheaply?es of A.∴ λ i y 2 i = ˜b i y i ⇒ y i = ˜b iate an A-conjugate basis cheaply?α (k−1) : = 〈r(k−1) , p〉〈Ap, p〉xFind a recurrence to compute A-conjugatek = x (k−1) + α (k−1) p1) Pick first vector of the basis as br basis...0 ≡ p 0 ≡ r 0 (first α is nonk = r (k−1) − α (k−1) Ap•2) Choose and recurrence to obtain the next A-conjugate ”p”α (k−1) : = How 〈r(k−1) , p〉short will the recurrence be?〈Ap, p〉(How short will the recurrence be?)of the basis as b 0 ≡ p 0 ≡ r 0 (first For all α isp,nonzero.) (→ SD)rrence to obtain the next A-conjugate ”p”the recurrence be?)〈r k , p〉 = 〈r k−1 , p〉 − α k−1 〈Ap, Thep〉new residual is orthogonal to the search direction.〈r k , p〉 = 0where λ i are eigenvalues of A.x k = x (k−1) + α (k−1) pr k = r (k−1) − α (k−1) ApNotice:〈r k , p〉 = 〈r k−1 , p〉 − α k−1 〈Ap, p〉〈r k , p〉 = 0the basis as b 0 ≡kp 0 ≡ rConjugate 0 (first α isGram-Schmidt nonzero.) (→ SD) Procedure 20∴r k ⊥ p

〈r , p〉 = 〈r , p〉 − α 〈Ap, p〉〈r k , p〉 = 0Conjugate gradient• Conjugate Gram-Schmidt:∴r k ⊥ pThe new residual is orthogonal to the search direction.Conjugate Gram-Schmidt ProcedureSuppose ”m” linearly independent vectors u 0 , u 1 , u 2 , · · · , u m−1 .To construct d i , take u i and subtract out any component that are not A-orthogonalto the previous ”d” vectors.5 KRYLOV SUBSPACE METHODS(1) d 0 = u 0(2) for i > 0,endd i = u i +∑i−1k=0β ik d kUsing A-orthogonality〈d i , Ad j 〉 = 〈u i +∑i−1k=0= u T i Ad j +β ik d k , Ad j 〉∑i−1k=0β ik (d k ) T Ad j= u T i Ad j + β ij (d j ) T Ad j = 0What is u? We need to keep all d’s∴β ij = − uT i Adj(d j ) T Ad jThis is not cheap, since we need to keep the vectors ”d” to21

j 〉 = 〈u i +∴= u i Ad +Conjugate gradient∑i−1β ik d k , Ad j 〉k=0= u T i Ad j +• Use the β ik (dresiduals k ) T Ad j as the u’s•Suppose k=0 u i = r i (cheap!)The A-conj basis is the p’s (p=d)∴k=0β ik (d ) Ad= u T i Ad j + β ij (d j ) T Ad j = 0β ij = − uT i Adj(d j ) T Ad jThis is not cheap, since we need to keep the vectors ”d” to solve the∑i−1= u T i Ad j + β ij (d j ) T Ad j = 0β ij = − uT i Adj(d j ) T Ad jp i = d i(This is a lower triangular matrix)(A-conj. GS)β ij = −〈ri , p j 〉 A〈p j , p j 〉 A= −〈ri , Ap j 〉〈p j , Ap j 〉 .e need to keep the vectors ”d” to solve the problem.From x i+1 = x i + α i p i : r i+1 = r i − α i Ap id i〈r j+1 , r i 〉 = 〈r j , r i 〉 − α j 〈Ap j , r i 〉Simplifying it leads to a short recurrence(A-conj. GS)−〈r i , p j 〉 A〈p j , p j 〉 A= −〈ri , Ap j 〉〈p j , Ap j 〉 .α j 〈Ap j , r i 〉 = 〈r j , r i 〉 − 〈r j+1 , r i 〉α j β ij 〈Ap j , p j 〉 = 〈r j+1 , r i 〉 − 〈r j , r i 〉∴ β ij = 〈ri , r j+1 〉α j 〈Ap j , p j 〉 − 〈rj , r i 〉α j 〈Ap j , p j 〉22

iFrom:Suppose u i = r i (cheap!)p i = d iConjugatep i d i (A-conj.gradientGS)β ij = −〈ri , p j 〉 A〈p j , p j = −〈ri , Ap j 〉〉 A 〈p j , Ap j 〉 .β ij = −〈ri , p j 〉 A〈p j , p j = −〈ri , Ap j 〉〉 A 〈p j , Ap j 〉 .From x i+1 = x i + α i p i : r i+1 = r i − α i Ap iFrom x i+1 = x i + α i p i : r i+1 = r i − α i Ap i〈r j+1 , r i 〉 = 〈r j , r i 〉 − α j 〈Ap j , r i 〉〈r j+1 , r i = 〈r j , r i 〉 − α j 〈Ap j r i 〉α j 〈Ap j , r i 〉 = 〈r j , r i 〉 − 〈r j+1 , r i 〉α j 〈Ap j , r i 〉 = 〈r j , r i 〉 − 〈r j+1 , r i 〉α j βα j ij 〈Ap jβ ij 〈Ap j , p j, j 〉 = 〈r j+1 〈r j+1 , r i, r i 〉 − 〈r j〉 − 〈r j ,, r i r i 〉〉∴∴ β ij ij= 〈ri , r j+1 〈ri j+1 〉〉j 〈Ap j j 〉〈rj , r i 〉α j 〈Ap j , p j 〉 − 〈rj , r i 〉αα j 〈Ap j 〈Ap j , j p, j p〉j 〉We We know knowthat thathe thep i p i are linearly independent.From From x i = x i x= i−1 x i−1 + + α i−1 α i−1 p i−1 p ,,Since p lin indep write error asLet e 0 = ∑ m−1j=0 δ jp j ,Let e 0 = ∑ m−1j=0 δ jp j ,e i = e i−1 + α i−1 i−1 pp i−1 i−1i−1∑i−1j p j .e i = e 0 + α j p j .e i =(A-conj. GS)m−1m−1From x i+1 = x i + α i p i : r i+1 = r i −j=0j=0〈r j+1 , r i 〉α j 〈Ap j , r i 〉α j β ij 〈Ap j , p j 〉∴ β ij = 〈riα j 〈AWe know that the p i are linearlyFrom x i = x i−1 + α i−1 p i−1 ,Let e 0 = ∑ m−1j=0 δ jp j ,e i =e i =∑i−1∑ δj p j +∑ ∑i−1α j p j . i(5.5) m−∑23

Let e 0 = ∑ m−1j=0 δ jp j ,Using A-conj:e i = e i−1 + α i−1 p i−1∑i−1Conjugatee i = e 0 + αgradientj p j .e i =∑m−1j=0m−1∑j=0∑i−1δ j p j + α j p j . (5.5)5 KRYLOV SUBSPACE METHODS 27〈Ap k , e 0 〉 = δ j 〈Ap k , p j 〉 = δ k 〈Ap k , p k 〉5 KRYLOV SUBSPACE METHODS j=0 αj p j 〉27Then (5.5) becomesThen (5.5) becomesδ k = 〈Apk , e 0 〉j=0j=0〈Ap k , p k 〉 = 〈Apk , e 0 + ∑ k−1〈Ap k , p k 〉= 〈Apk , e k 〉〈Ap k , p k 〉 = 〈pk , Ae k 〉δ k = 〈Apk , e 0 〉〈Ap k , p k 〉 = 〈Apk , e 0 + ∑ k−1j=0 αj p j 〉〈Ap k , p k 〉〈Ap k , p k 〉= −〈pk , r k 〉e i == 〈Apk , e k 〉, k 〉〈pk , Ae k 〉〈Ap k , p k 〉 = −α k〈Ap k , p k 〉= −〈pk , r k 〉〈Ap k , p k 〉 = −α km−1∑∴∴δ k = −α kδ k = −α kδ j p j (m size of A ∈ R m×m )we get deltaA-ortho!m−1∑j=ie∑ i = δj p j (m size of A ∈ R m×m )24

e A-conjugate GS,ThenConjugatee i =gradientm−1∑e i m−1 = ∑ δ j p j (m size of A ∈ R m×m )We e i = know j=i δ〈Ap j p j k , e i (m 〉 = size ∑ m−1of A ∈ R m×m j=iδ j 〈Ap k , p)j 〉 for k < i,u i = r ij=iWe know 〈Ap k , e i 〉 = ∑ m−1j=iδ j 〈Ap k , p j 〉 for k < i,We know 〈Ap k , e i 〉 = ∑ m−1 i−1〈Apj=iδ j ∑k , e i 〉 = 0〈Ap k , p j 〉 for k < i,p i = r i + β ik 〈Ap p k k , ∴ e i 〉 = 〈p k 0, Ae i 〉 = −〈p k , r i 〉 = 0 for k < ik=0〈Ap k , e i 〉 = 0From ∴ the 〈p k , Ae i 〉 = −〈p k , r i 〉 = 0 for k < i∴ 〈p k A-conjugate, Ae i 〉 = −〈p k GS,, r i 〉 = 0 for k < i〈p i , r j 〉 = 〈r i , r j 〉 +From the A-conjugate GS,From the A-conjugate GS,From the Gram-Schmidt〈Ap , e 〉 = 0∑i−1β ik 〈p u k , j i = r i 〉 for i < j (5.6k=0u i = r iu i = r ip i = r i + β ik p k∑i−1p i = r i i−1+ β ik p kk=0∑⎧ p i = r i + β ik p k∑i−1k=0⎨k=0〈r i , r j 〈p〉 = 0 i , r j 〉 = 〈r∑i−1for i i , r< j 〉 + βj ik 〈p k , r j 〉 for i < j〈p i , r j 〉 = 〈r i , r j ∑i−1〉 + β ik 〈p k , r j k=0⎩〈p i , r j 〉 = 〈r i , r j 〉 + β〉 for i < j (5.6)ik 〈p k , r j 〉 for i < j (5.6)Since 〈p i , rthe i 〉 second = 〈r i , term r i 〉k=0 of RHS for in i = (5.6) j. is zero and 〈p i , r j 〉 = 0 for i

i p i : r i+1 = r i − α i Ap i〈r j+1 , r i 〉 = 〈r j , r i 〉 − α j 〈Ap j , r i 〉α j 〈Ap j , r i 〉 = 〈r j , r i 〉 − 〈r j+1 , r i 〉α j β ij 〈Ap j , p j 〉 = 〈r j+1 , r i 〉 − 〈r j , r i 〉∴ β ij = 〈ri , r j+1 〉α j 〈Ap j , p j 〉 − ⎧ 〈rj , r i 〉βα j 〈Ap j , p j 〉 ij =⎪⎨ 1 〈r i , r j+1 〉⎪⎩β ij = α j 〈Ap j , p j 〉 , i = j + 1⎪⎩ 0, i > j + 1e p i are linearly independent.i−1 p i−1 ,,ie i = e i−1 + α i−1 p i−1e i = e 0 +⎪⎩i−1∑α j p j .j=0Shortest recurrence possible!m−1∑ ∑i−1j j j〈p i β, r j 〉 = 〈r i , r j ik p k 〉 +k=0β ik 〈p k , r j 〉 for i < jSince the second term of RHS in (5.6) is zero and 〈p i , r j 〉 = 0 for i < j,Conjugate〈p i , r j 〉 = 〈r i , r j 〉 + β ik 〈p k ,gradientr j 〉 for i < j (5.6)k=0⎧⎨〈r i , r j 〉 = 0 for i < j⎧⎩〈p i , r i 〉 = 〈r i , r i 〉 for i = j.Thereforep i = r i +k=0∑i−1Since the second term of RHS in (5.6) is zero and 〈p i , r j 〉 = 0 for i < j,Thereforeor ...β =⎨〈r , r j 〉 = 0 for i < j⎩〈p , r i 〉 = 〈r i , r i 〉 for i = j.⎧⎪⎨0β =⎧⎪⎨β 10 0 ∅∅β 21 0. .. . ..⎪⎩0⎧⎪⎨ 1 〈r i , r j+1 〉α j 〈Ap j , p j 〉 , i = j + 10, i > j + 1β 10 0 ∅∅β m−1,m−2 0⎫⎪⎬β 21 0. .. . ..⎪⎭β m−1,m−2 0⎫⎪⎬⎪⎭26

α i−1 〈Ap i−1 , p i−1 〉〈r i−1 , p i−1 〉〈Ap i−1 , p i−1 〉∴ β i = 〈ri , r i 〉〈r i−1 , p i−1 〉Conjugate gradient(Hestenes and Stiefel, [3], Lanczos, [4])Given x 0 , r 0 = b − Ax 0 , p 0 = r 0 ,k = 0do ”until convergence”end doα k = 〈rk , p k 〉〈Ap k , p k 〉x k+1 = x k + α k p kr k+1 = r k − α k Ap kβ k+1 = 〈rk+1 , r k+1 〉〈r k , r k 〉p k+1 = r k+1 + β k+1 p kk = k + 1Stiefel (1952) and Lanczos (1950)1) Krylov Subspace Methods27

So x − x is the orthogonal projection of x − x onto Span{p , p , . . . , p } wis reached respect by the to (·, polynomial ·) A .k+1∑⇒ ‖x k+1 − x‖ A = min ‖(x − x 0 ) − θ i A i (x − x 0 )‖ AYLOV SUBSPACE METHODS i=1̂T k (t) = T k(1 + 2(t−β)β−α )30ω∈KConvergence k+1 (A;r 0 )of CGT= min ‖(I −∑ k (1 + 2γ−ββ−αθ i A ) .i )(x − x 0 )‖ A0 = r 0 .= minHomework:uppose true for k, that is, assume thathenω∈K k+1 (A;r 0 )•(√ )Converges‖xin m steps (exactk+1κ(A) − 1 arithmetic)...k+1 − x‖ A ≤ 2 √ ‖x − x 0 ‖ A .κ(A) + 1p∈P ∗ k+1k+1i=1‖p(A)(x − x 0 )‖ Awhere Pk+1 ∗ := {p ∈ P k+1| p(0) = 1}.TheFromsolutionp 0 Span{p 0obtained= r 0 , p 1 ,andbyp k+1 . . . , p k }CG is=ther k+1 = K k+1 (A; r 0best+approximationβ k+1 p k ),to the solution of Ax = bλ min(A : Let SP D) η = for the ‖ · ‖ A norm , thenλ max − λ minp 1 = r 1 + β 1 r 0r k+1 = r k − α k Ap ‖x k+1 k − x‖ A min ‖x − v‖ A .v∈K k+1 (A;r 0 )T k+1 (1 + 2η) ≥ 1 (√ p 2 = r 2 + βλmax √ 2 p 1 = ) r 2 k+1 + βλ 2√ min2For v ∈ K k+1 (A; r 0 λmax − √ = 1 (r(√ 1 (From + β κ − 1 r 0 k+1 1 CG) )√k∑k∑ λ), .min 2 κ + 1= r 0 θ⇒m − α k A A m r 0 θ m ∈ K k+2 (A; r 0 )m=0(√ )0 = (v, e k+1 m=0) A = (v, Ae κ − k+1 k+1 1 )Rate:‖x k+1 ⇒ Span{p 0 , p 1 , . . . , p k } = Span{r 0− x‖ A ≤ 2 √ ‖x − x 0 , r 1 , . . . , r k }‖ A .here r k ∈ K = (v, A(x k+1 − x)) κ = + −(v, 1We k+1 (A; rwant 0 ), pto show k ∈ Kthis k+1 (A; r:0 )r k+1 ) A5.3 Projection methodsProof:ve shown that (p j , xSpan{p k−1 −r k+1 x 0 0 ),⊥p A 1 K= , . k+1 (p. . ,(A; j ,p k x}r− 0 =) xSpan{r 0 ) ← A Galerkin so that 0 , Ar 0 condition, A 2 r 0 , . . . , A k r 0 } =: K k+1 (A; r 0 )12(p.86 Convergence in [2]) We now understand that CG/SDesc./Richardson are projection methodsonAlgorithm a k+1 (p1) TheK (A; j ,rKrylov 0 e(CG) ) k+1 )is called A = 0, 0 ≤ j ≤ k.space. converges KrylovIn general, in m-steps subspace of V .let V = [v 1 , · · · , v m ] be an n × m matrix where2) {vA i } m practical error is useful for application :i=1 are By 3 a induction result from basis of space : (p.48 K in approx and [8]) theory...W = [w 1, · · · , w m ] a (A-orthogonal) basis ofnsider the A-norm of the errorx 028x

(√w understand that CG/SDesc./Richardson are projection meth-2 λ 2 λ − 2 κ + 1T k+1 (1 + 2η) ≥ 1 λ max + λ√ min2‖x k+1 λmax − √ = 1 max − (p.86 λ min in [2]) We2nowκ +understand1thatκ − 1√ CG/SDesc./Richarλ min 2 κ + − 1 x‖ A ≤ 2 √(√ ods on ) (√ )Projectionk+1a Krylov space. In general, (p.86 in let[2]) V = We [vκ −methods1 , now · · · , vundm ] bκ − 1k+1‖x k+1 − x‖ A ≤ 21‖x k+1 √ {v ‖x− x‖ A ≤ 2 (√ −√ x 0 i } m i=1 are a basis ‖ A . of ) space ‖x K − and onx 0 k+1 5.3a‖ W KrylovA .= [w Proje1, space. · · · , w m ] Iκ + 1‖x k+1 space − x‖ L. + 5.3 A ≤Then κ − 12 √if the approximateProjection‖x {v− i } x m0 solutionmethods‖ A . is written asi=1 are a basis of spκ + 1n methodsspace L. Thenrojection methods(p.86 x = x 0 if the app+ in V y[2]) WProjection • We have methods shown (p.86that in [2]) all iterative We now methods understand that C[2]) We noware understand projection ods that CG/SDesc./Richardson Krylov a Krylov subspaces space.odsInare general,on a Kryloace. In general, let V = [v projection let Vmin [2]) We now understand 1 , · · · , vthat m ] be an n ×CG/SDesc./Richardson ⇒m matrixAx =whereb, are A projection meKrylov•space. Generalization:In general, {v ilet } [v ] be an n × m matrix wn a Krylov space. In general, let i=1 V = are [v 1 , a· · basis · , v m ] be of {van space i } m ∈ R n×n , b, x ∈of space K and W = [w 1 , · · · , w m ] a (A-orthogonal) basis ofA(x 0 n + of × i=1 Vmy) Kmatrix = are and ba Wwharee approximatea basis of space K and [w 1 , ] a (A-orthogonal) ba=1 are a basis solution of space is written K space and asW L. = Then [w 1, · · · if , wthe m ] a basis approximate space (A-orthogonal) AVofy = L. r 0 Then solut basis ⇒Then if the approximate x = x 0 + V y solution is written asW T AV y = W T r 0x =x = x 0 + V y⇒ Ax = b, A ∈IfR(W n×n T ,AVb, x) −1 ∈ Rexists, nA(x 0 ∴ ˜x = x 0 + V (W T AV ) −1 W T r 0+ V y) = b⇒⇒ Ax Ax = b, b, A ∈ R n×n , b, x ∈ R nAV y = r 0n ⇒ Ax = b, AConditions forW T AV y = W T rA(x 0 A(x 0 0 existence of (W IfT (W AV T ) −1 AV : ) −1 exists,+ V y) = bA(x∴0 +(i) A is positive AV y = definite rAV 0 and L = Ke L. Then if the approximate solution is written asts,AV 29

⇒ Ax = b, A ∈ R n×n , b, x ∈ R nA(x 0 + V y) = bProjection methodsAV y = r 0W T AV y = W T r 0If (W T AV ) −1 exists,∴ ˜x = x 0 + V (W T AV ) −1 W T r 0Conditions for existence of (W T AV ) −1 :(i) A is positive definite and L = K(ii) A is non-singular and L = AK30

Projection methods5 KRYLOV SUBSPACE METHODS 33Generate Krylov subspace from scratch:Proof. (Homework)Arnoldi’s method : (1951) A → H unitaryArnoldi’s procedure builds on orthonormal basis for the Krylov subspace K m (A; v).(Exact arithmetic supposed)Pick v 1 s.t. ‖v 1 ‖ 2 = 1for j = 1, 2, · · · , m doendh ij = (Av j , v i ) for i = 1, 2, · · · , jw j = Av j − ∑ ji=1 h ijv ih j+1,j = ‖w j ‖ 2 (If h j+1,j = 0, then Stop!! ∗)v j+1 = w j /h j+1,jDef. The grade of a vector v w.r.t. A is the lowest degree monic polynomialp(≠ 0) s.t. p(A)v = 0. (Trsf ForA Ainto ∈ Rupper m×m , Heissenberg v ∈ R m , weform have using thatunitary the grade transformations) of v isalways equal or smaller than m.31

v w.r.t. A is the endConsequence lowest degree monic polynomial ⎡e grade always equal or smaller than m.R m×m of a vector v, v ∈ m w.r.t. A is the lowest degree hmonic polynomial)v = 0. For ADef. ∈ RThe m×m , we grade , vhave ∈of Ra m that vector , we have the v w.r.t. grade thatA the 11 hisof the grade v 12 hlowest is 21 · · · hh 22of degree v is monic polysmaller p(≠ Consequence1) than dim(K m.Projection0) s.t. m ]1j· ·(A; p(A)v v)) = 0. = For min(m, A ∈ Rmethodsm×mgrade(v)), h 21 ∈ R m h, 22 we have that the gradeor smaller than m.alwaysAv equal j =1) dim(K m or smaller[v 1 than v 2 m. · · · v j+1 0· · · hh 32j · ·]nce Av Consequence j =[v 1 (A; vv)) 2 = · · min(m, · v j+1 grade(v))0 h 32 · · · h 0 2j ·· ; v))2)= min(m,(∗)grade(v))0 ⎢ · · ·rade(v)) 2) 1) (∗) dim(K m (A; v)) = min(m, grade(v))⎢ ⎣⎣h j+1,j ·rade of a vector v w.r.t. A is the lowest degree monic polynomialp(A)v = 0. For A ∈ R m×m , v ∈ R m , we have that the grade of v ism (A; v)) = min(m, grade(v))• From line 4 and 6 we have:Notice that v j+1 h j+1,j = Av j − ∑ ji=1 h ijv i . Then Av j =2) (∗)Notice that v j+1 h j+1,j = Av j − ∑ ji=1 h ijv i . Then Av j = ∑ j+1i=1vh ∑Notice that v j+1 h j+1,jj ⎡i=1 h ijv then i . Then Av j = ∑ = Av j − ∑ jj+1i=1 h i=1 h ijv i . Then Av j = ∑ j+1j+1 h j+1,j = Av j − ∑ ji=1 h ijv i . Then Av j j+1j+1,j = Av j − ∑ ji=1 h ijv i . Then Av j = ∑ 0LOV SUBSPACE METHODSj+1i=1 h i=1 ijv i :⎡ ⎡i=1 h ijv i⎡:⎡ ⎡ijv i : ⎡ ⎤h 11h ∗ ∗ 11 h 12h · · · 12 · · · ⎤· · ∗·hh 1j11 h 11 h 12 h 12 · · · · h 1j h 11· · ·h 12 · · · h 1j ·⎡in matrix form:5 KRYLOV SUBSPACE⎡⎤h 21 h 22 · h h 21 h h 2jh 11 h 12 · · · 1j · · · ]22 · · · h 2j · · ·]21 h 21 h 22 h 22 · · · h 2j · · ·h METHODS 0 ∗ · ·21 h ⎤⎡ ]⎤]]j =[v 1 v 2 j+1 0 h 32 · h 2jAv h 21 h 22 j · · vh 2 v 2j · j+1 0 h 32 · · · h 2j · · ·Av j =1 v 2 · · v j+1 0 h 32 · 2j =[v 1 v 2 · · · v j+1 0 h 32 · · h 2j · · ·]]∗ ∗ · 22 · · · · 0·∗1 ]· · · 0A[v 1 v 2 · · · v m =.[v 1 v 2 · · · v m+1 Av j =0 · ·[v 0 ·· · ·]0 v 2 · · · · · · j+1 0 ] ..0 ∗ h· 32· · · · · ∗A[v 1 v 2 · · · v1 m =[v 10v 2 · · ⎢· ⎢ v m+1 V.m = Vm ⎥T V m+1 H m =. ⎢v ⎣h j+1j+1 0 h 32 ·⎢· · h.2j · · h·⎣h j+1,j · · ⎦j+1,j j+1,j · · ·⎥⎦⎣ ⎡ 0 .. · · ∗·⎣ .. ⎥0 .. ∗0⎦1 ⎢· · ·⎢ . 0 · · ·0⎣.. ⎥⎣ 01 0⎢⎡ ⎥⎡1 ∗ ∗⎦⎤ 0: AV m ⎡⎣= V m+1 H m hVj+1,jm T AV ∗ m ∗ · = ⎦V· T m×(m+1)m V m+1 ⎤ H∗ m = ∗ · · · ∗∗ ∗ · · · ∗ ⎢ 0 ⎡ . ∗0 ∗ · · · ∗ ∗· · ·] ]0 ∗ · ⎣ .. 0]0 ∗ · · · ∗]. : VAV A[v 1 v 0 · · ·[ ⎡2 · v m =[v 1 v . 2 · · · v] [ ⎤m+1 0 ..⎡m T AV 2 · · ·] mv[ m = VH m = m+1 m where H mH [v 1 v 2 · · v] m := {H m deleted the last row}m+1 0 .. . ⎢· · v = v v · · · v ⎢ 0 .. ∗ ∗ 1ark : In practice, a modified Gram-Schmidt ∗(MGS) ⎥ ⎢ is used ] ⎢ on the ⎥ Ho 32⎢.

TakingV T m AV m y = H m r 0 = H m (βv 1 ) = βẽ 1FOM⎢V ⇒ x m = x 0 + V m (Mm −1 )βẽm T AV m = Vm T V m+1 H m = ⎢ . 1Combing MGS/Arnoldi with the above ⎣ leads ..to FOM 0:Generater 0 = b − Ax 0 , β = ‖r 0 ‖ 2 , v 1 = r 0 /βH m = {h ij } m i,j=1 : Set H m = 0w j = Av jfor i = 1, 2, · · · , j doAVh ij = (w j , v i )m y m = r 0w j = w j − h ij v iFULL endV ORTHOGONALIZATION METHOD (FOMm T AVh m y m = Vj+1,j = ‖w j ‖ m T r 0 = V2 , if h j+1,j = m T (βv0 set m 1 ) = βẽ= j exit 1endv j+1 = w j /h j+1,j⎡⎤1 · · · 0⎢1 0⎥⎦y m = H −1m (βẽ 1 ), x m = x 0 + V m y m1 0m×(m+1)⎡∗ ∗0 ∗0⎢⎣Vfor j m T AV= 1, 2, m = H· · · , m m where Hdom := {H m deleted the lastRemark : In practice, a modified Gram-Schmidt (MGS) is uSolution isx m = x 0 + V m y mholder version of Arnoldi’s algo.y m = Hm −1 (βẽ 1 )L = K = K m (A; r 0 ) ⊕ b − Ax m ⊥ K m (A; r 0 )Galerkin condi33

‖λ 0 ‖ 2We have from Arnoldi’s algo :βFOMV T m AV m y = H m r 0 = H m (βv 1 ) = βẽ 1⇒ x m = x 0 + V m (Mm −1 )βẽ 1Combing MGS/Arnoldi with the above leads to FOM :r 0 = b − Ax 0 , β = ‖r 0 ‖ 2 , v 1 = r 0 /βH m = {h ij } m i,j=1 : Set H m = 0for j = 1, 2, · · · , m doendw j = Av jfor i = 1, 2, · · · , j doendh ij = (w j , v i )w j = w j − h ij v ih j+1,j = ‖w j ‖ 2 , if h j+1,j = 0 set m = j exitv j+1 = w j /h j+1,jy m = H −1m (βẽ 1 ), x m = x 0 + V m y m 34

5 KRYLOV SUBSPACE METHODSRemarks :FOM• When A is symmetric, Arnoldi’s algorithm1) When A is symmetric, Arnoldi’s algorreduces to the Lanczos methods.•••H m becomes a tridiagonal matrixbecomes triadiagonal !Arnoldi becomes Lanczos algorithm...2) Noticing orthogonality of the residualsCG is an efficient implementation of FOMleads to CG as we desired it! (i.e. effiCG=FOM+Lanczos35

with R(x) = ‖b − Ax‖ 2This is Petrov-Galerkin aasubcase of of a ageneral Galerkin methodit is (b −K finite-element =: L, x 1 ∈ x 0 Axframework. + K, 1 v) = 0 ∀v ∈ AK =: L, xv ∈ AK = L.1 ∈ x 0 + K, v ∈ AK = L.b − Ax = b − A(x 0 + V m y) = r 0 − AV m y = βV 1 − V m+1 H m y = VThis is Petrov-Galerkin a subcase of a general Galerkin methodubcase of a general Galerkin method as seen in thefinite-element framework.GMRESFind y minimizingGMREShere r 0 = βV 1 , β = ‖r 0 ‖ 2 .Any vector in xAny vector in 0 + K m can be written as x = xcan be written as x = 0 +x 0 V+ m y where y ∈ R⇒ ?(y) GMRES = ‖V m+1 (βẽ 1 − H m y)‖ 2 = ‖βẽ 1 − H m y‖ V m2y where yAnyDef. ?(y) = ‖b − Ax‖Def. vector ?(y) in x‖b 0 + KAx‖ m can 2 = ‖b − A(x2 be ‖bwritten − A(x 0 + 0 asV+ x m y)‖ThusV= mxy)‖ 0 2 + 2V m y where y ∈ Rwritten as x = x 0 + V m y where y ∈ R mb −Using A(x 0 Def. ?(y) = ‖b − Ax‖+ Arnoldi:2 = ‖b − A(x 0 + V m y)‖ 2V m y)‖ 2b − Ax = b − A(x 0b − Ax = b − A(x 0 + V m y) = r 0+ V m y) = r 0 −x m AV=m yx= 0 +βVv m 1 −y mV m+1 H m y = V m+− AV m y = βV 1 − V m+1 H m y = Vb − Axwhere r 0 = b − A(x 0 + V= βV 1 , β = ‖r 0 m y) y m = = r 0 arg − AV min m y ‖βẽ = βV 11 − HV m+1 m y‖ H 2‖ 2 .y m y = V m+where ⇒ r 0 ?(y) = βV 1 ‖V , β = ‖r 0 } {{ }‖ 2 .Equivalent where r 0 = to βV 1 m+1 , β = (βẽ ‖r 1 0 ‖ − 2 . H m y)‖ 2 = ‖βẽ 1 − H m y‖ 2⇒ Thus?(y) = ‖V m+1 (βẽ 1 Solved y)‖ using ‖βẽ least-squares⇒ ?(y) = ‖V 1 − H m y‖ 2m+1 (βẽ 1 − H m y)‖ 2 = ‖βẽ 1 − H m y‖ 2Thus ThusSolve: H T x m = x 0 + v m ymH mm y = H T mβẽ 1 or (H m H T m)u = βẽ 1y m = arg x m min = x ‖βẽ 0 + vv y m m 1 −yH mm y‖x m = x 0 + v 2m y mythen y = H T mu } y m = arg min{{‖βẽ 1 1− HH m y‖ m} y‖ 2 2arg min ‖βẽ 1 − H m y‖ y2GMRES y (Saad and Schultz, }Solved [7]): using{{ {{ least-squares} }360 − AV m y = βV 1 − V m+1 H m y = V m+1 (βẽ 1 − H m y)y)‖ 2 = ‖βẽ 1 − H m y‖ 2

GMRES5 KRYLOV SUBSPACE METHODS 36r 0 = b − Ax 0 , β = ‖r 0 ‖ 2 and v 1 = r 0 /βfor j = 1, · · · , m doendw j = Av jfor i = 1, · · · , j doendh ij = (w j , v i )w j = w j − h ij v ih j+1,j = ‖w j ‖ 2 , if h j+1,j = 0, m = j exitv j+1 = w j /h j+1.jV = [v 1 , · · · , v m ] , H m = {h ij } 1≤i≤m+1, 1≤j≤mMinimizer y m of ‖βẽ 1 − H m y‖ 2x m = x 0 + V m y m⎫⎪ ⎬⎪ ⎭MGSIf P is an orthogonal projector, then P x and (I − P )x are orthogonal,Saad Schultz (1986)(P x, (I − P )x) = (P x, x) − (P x, P x) = (P x, x) − (x, P H P x)= (P x, x) − (x, P 2 x) = (P x, x) − (x, P x) = 0,37

Parallelism ?• Inner products in parallel (reduction-scatter)•Matrix-vector products: very efficient waysof doing this for FEM and high-orderelements• Avoid default partitioning schemes in matrixlibraries38

Thank you!39

Introduction to Krylov subspace methods - IMAGe

Create successful ePaper yourself

Delete template?

Save as template?