12.07.2015 Views

The Split Bregman Method for L1-Regularized Problems

The Split Bregman Method for L1-Regularized Problems

The Split Bregman Method for L1-Regularized Problems

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>The</strong> <strong>Split</strong> <strong>Bregman</strong> <strong>Method</strong> <strong>for</strong> <strong>L1</strong>-<strong>Regularized</strong><strong>Problems</strong>Tom GoldsteinMay 22, 2008


Some Common <strong>L1</strong> <strong>Regularized</strong> <strong>Problems</strong>TV Denoising: minu‖u‖ BV + µ 2 ‖u − f ‖2 2De-Blurring/Deconvolution: minu‖u‖ BV + µ 2 ‖Ku − f ‖2 2Basis Pursuit/Compressed Sensing MRI: minu‖u‖ BV + µ 2 ‖Fu−f ‖2 2


What Makes these <strong>Problems</strong> Hard??◮ Some “easy” problems...arg min ‖Au − f ‖ 2u2 (Differentiable)arg min |u| 1 + ‖u − f ‖ 2u2 (Solvable by shrinkage)◮ Some “hard” problemsarg minu|Φu| 1 + ‖u − f ‖ 2 2arg minu|u| 1 + ‖Au − f ‖ 2 2◮ What makes these problems hard is the “coupling” between<strong>L1</strong> and L2 terms


A Better Formulation◮ We want to solve the general <strong>L1</strong> regularization problem:arg minu|Φu| + ‖Ku − f ‖ 2◮ We need to “split” the <strong>L1</strong> and L2 components of this energy◮ Introduce a new variablelet d = Φu◮ We wish to solve the constrained problemarg minu,d ‖d‖ 1 + H(u) such that d = Φ(u)


Solving the Constrained Problemarg minu,x ‖d‖ 1 + H(u) such that d = Φ(u)◮ We add an L2 penalty term to get an unconstrained problemarg min ‖d‖ 1 + H(u) + λ ‖d − Φ(u)‖2u,x 2◮ This splitting was independently introduced by Wang and Dr.Yin Zhang (FTVd)◮ We need a way of modifying this problem to get exacten<strong>for</strong>cement of the constraint◮ <strong>The</strong> most obvious way is to use continuation: let λ n → ∞◮ Continuation makes the condition number bad


A Better Solution: Use <strong>Bregman</strong> Iteration◮ We group the first two energy terms together:◮ to get...arg min ‖d‖ 1 + H(u) + λ ‖d − Φ(u)‖2u,d } {{ } 2E(u,d)arg min E(u, d) + λ ‖d − Φ(u)‖2u,d 2◮ We now define the “<strong>Bregman</strong> Distance” of this convexfunctional asD p E (u, d, uk , d k ) = E(u, d) − 〈p k u, u − u k 〉 + 〈p k d , d − d k 〉


A Better Solution: Use <strong>Bregman</strong> Iteration◮ Rather than solve min E(u, d) + λ 2 ‖d − Φ(u)‖2 we recursivelysolve◮ or(u k+1 , d k+1 ) = arg minu,d Dp E (u, d, uk , d k ) + λ 2 ‖d − Φ(u)‖2 2arg minu,d E(u, d) − 〈pk u, u − u k 〉 + 〈p k d , d − d k 〉 + λ 2 ‖d − Φ(u)‖2 2◮ Where p u and p d are in the subgradient of E with respect tothe variables u and d


Why does this work?◮ Because of the convexity of the functionals we are using, itcan be shown that‖d − Φu‖ → 0 as k → ∞◮ Furthermore, is can be shown that the limiting values,u ∗ = lim k→∞ u k and d ∗ = lim k→∞ d k satisfy the originalconstrained optimization problemarg minu,d ‖d‖ 1 + H(u) such that d = Φ(u)◮ It there<strong>for</strong>e follows that u ∗ is a solution to the original <strong>L1</strong>constrained problemu ∗ = arg minu|Φu| + ‖Ku − f ‖ 2


Don’t Worry! This isn’t as complicated as it looks◮ As is done <strong>for</strong> <strong>Bregman</strong> iterative denoising, we can get explicit<strong>for</strong>mulas <strong>for</strong> p u and p d , and use them to simplify the iteration◮ This gives us the simplified iteration(u k+1 , d k+1 ) = arg minu,d ‖d‖ 1 + H(u) + λ 2 ‖d − Φ(u) − bk ‖ 2b k+1 = b k + (Φ(u) − d k )◮ This is the analog of “adding the noise back” when we use<strong>Bregman</strong> <strong>for</strong> denoising


Summary of what we have so far◮ We began with an <strong>L1</strong>-constrained problemu ∗ = arg min |Φu| + ‖Ku − f ‖ 2◮ We <strong>for</strong>m the “<strong>Split</strong> <strong>Bregman</strong>” <strong>for</strong>mulationmin ‖d‖ 1 + H(u) + λ ‖d − Φ(u) − b‖2u,d 2◮ For some optimal value b ∗ = b of the <strong>Bregman</strong> parameter,these two problems are equivalent◮ We solve the optimization problem by iterating(u k+1 , d k+1 ) = arg minu,d ‖d‖ 1 + H(u) + λ 2 ‖d − Φ(u) − bk ‖ 2b k+1 = b k + (Φ(u) − d k )


Why is this better?◮ We can break this algorithm down into three easy stepsStep 1 : u k+1 = arg minuH(u) + λ 2 ‖d − Φ(u) − bk ‖ 2 2Step 2 : d k+1 = arg mind|d| 1 + λ 2 ‖d − Φ(u) − bk ‖ 2 2Step 3 : b k+1 = b k + Φ(u k+1 ) − d k+1◮ Because of the decoupled <strong>for</strong>m, step 1 is now a differentiableoptimization problem - we can directly solve it with tools likeFourier Trans<strong>for</strong>m, Gauss-Seidel, CG, etc...◮ Step 2 can be solved efficiently using shrinkaged k+1 = shrink(Φ(u k+1 ) + b k , 1/λ)◮ Step 3 is explicit, and easy to evaluate


Example: Fast TV Denoising◮ We begin by considering the Anisotropic ROF denoisingproblemarg minu|∇ x u| + |∇ y u| + µ 2 ‖u − f ‖2 2◮ We then write down the <strong>Split</strong> <strong>Bregman</strong> <strong>for</strong>mulationarg minx,y,u|d x | + |d y | + µ 2 ‖u − f ‖2 2+ λ 2 ‖d x − ∇ x u − b x ‖ 2 2+ λ 2 ‖d y − ∇ y u − b y ‖ 2 2


Example: Fast TV Denoising◮ <strong>The</strong> TV algorithm then breaks down into these steps:Step 1 : u k+1 = G(u k )Step 2 : dxk+1 = shrink(∇ x u k+1 + bx k , 1/λ)Step 3 : dyk+1 = shrink(∇ y u k+1 + by k , 1/λ)Step 4 : bx k+1 = bx k + (∇ x u − x)Step 5 : by k+1 = by k + (∇ y u − y)where G(u k ) represents the results of one Gauss Seidel sweep<strong>for</strong> the corresponding L2 optimization problem.◮ This is very cheap – each step is only a few operations perpixel


Isotropic TV◮ This method can do isotropic TV using the followingdecoupled <strong>for</strong>mulation√arg min dx 2 + dy 2 + µ 2 ‖u − f ‖2 2+ λ 2 ‖d x − ∇ x u − b x ‖ 2 2 + λ 2 ‖d y − ∇ y u − b y ‖ 2 2◮ We now have to solve <strong>for</strong> (d x , d y ) using the generalizedshrinkage <strong>for</strong>mula (Yin et. al.)whered k+1x = max(s k − 1/λ, 0) ∇ xu k + b k xs kd k+1y = max(s k − 1/λ, 0) ∇ y u k + b k ys ks k =√(∇ x u k + b k x ) 2 + (∇ y u k + b k y ) 2


Time Trials◮ Time trials were done on a Intel Core 2 Due desktop (3 GHz)◮ Linux Plat<strong>for</strong>m, compiled with g++AnisotropicImage Time/cycle (sec) Time Total (sec)256 × 256 Blocks 0.0013 0.068512 × 512 Lena 0.0054 0.27IsotropicImage Time/cycle (sec) Time Total (sec)256 × 256 Blocks 0.0018 0.0876512 × 512 Lena 0.011 0.55


This can be made even faster...◮ Most of the denoising takes place in first 10 iterations


This can be made even faster...◮ Most of the denoising takes place in first 10 iterations◮ “Staircases” <strong>for</strong>m quickly, but then take some time to flattenout◮ If we are willing to accept a “visual” convergence criteria, wecan denoise in about 10 iterations (0.054 sec) <strong>for</strong> Lena, and20 iterations (0.024 sec) <strong>for</strong> the blocky image.


This can be made even faster...


Compressed Sensing <strong>for</strong> MRI◮ Many authors (Donoho, Yin, etc...) get superiorreconstruction using both TV and Besov regularizers◮ We wish to solvearg minu|∇u| + |Wu| + µ 2 ‖RFu − f k ‖ 2 2where R comprises a subset of rows of the identity, and W isan orthogonal wavelet trans<strong>for</strong>m (Haar).◮ Apply the “<strong>Split</strong> <strong>Bregman</strong>” method: Let w ← Wu,d x ← ∇ x u, and d y ← ∇ y uargminu,d x ,d y ,w√d 2 x + d 2 y + |w| + µ 2 ‖RFu − f ‖2 2+ λ 2 ‖d x − ∇ x u − b x ‖ 2 2 + λ 2 ‖d y − ∇ y u − b y ‖ 2 2+ γ 2 ‖w − Wu − b w ‖ 2 2


Compressed Sensing <strong>for</strong> MRI◮ <strong>The</strong> optimality condition <strong>for</strong> u is circulant:(µF T R T RF − λ∆ + γI )u k+1 = rhs k◮ <strong>The</strong> resulting algorithm isUnconstrained CS Optimization Algorithmu k+1 = F −1 K −1 Frhs k(dx k+1 , dyk+1 ) = shrink(∇ x u + b x , ∇ y u + b y , 1/λ)w k+1 = shrink(Wu + b w , 1/γ)bx k+1 = bx k + (∇ x u − d x )by k+1 = by k + (∇ y u − d y )bw k+1 = bw k + (Wu − w)


Compressed Sensing <strong>for</strong> MRI◮ To solve the constrained problemarg min |∇u| + |Wu| such that ‖RFu − f ‖ 2 < σuwe use “double <strong>Bregman</strong>”◮ First, solve the unconstrained problemarg minu|∇u| + |Wu| + µ 2 ‖RFu − f k ‖ 2 2by per<strong>for</strong>ming “inner” iterations◮ <strong>The</strong>n, updatethis is an “outer iteration”f k+1 = f k + f − RFu k+1


Compressed Sensing◮ 256 x 256 MRI of phantom, 30%


<strong>Bregman</strong> Iteration vs Continuation◮ As λ → ∞, the condition number of each sub-problem goesto ∞◮ This is okay if we have a direct solver <strong>for</strong> each sub-problem(such as FFT)◮ Drawback: Direct solvers are slower than iterative solvers, ormay not be available◮ With <strong>Bregman</strong> iteration, condition number stays constant -we can use efficient iterative solvers


Example: Direct Solvers May be Inefficient◮ TV-<strong>L1</strong>:◮ <strong>Split</strong>-<strong>Bregman</strong> <strong>for</strong>mulationarg minu|∇u| + µ|u − f |arg minu,d |d| + µ|v − f | + λ 2 ‖d − ∇u − b d‖ 2 2 + γ 2 ‖u − v − b v ‖ 2 2◮ We must solve the sub-problem(µI − λ∆)u = RHS◮ If λ ≈ µ, then this is strongly diagonally dominant: useGauss-Seidel (cheap)◮ If λ >> µ, then we must use a direct solver: 2 FFT’s periteration (expensive)


Example: Direct Solvers May Not Exist◮ Total-Variation based Inpainting:∫arg min |∇u| + µ (u − f )u∫Ω2Ω/Darg minu|∇u| + µ‖Ru − f ‖ 2where R consists of rows of the identity matrix.◮ <strong>The</strong> optimization sub-problem is(µR T R − λ∆)u = RHS◮ Not Circulant! - We have to use an iterative solver (e.g.Gauss-Seidel)


Generalizations◮ <strong>Bregman</strong> Iteration can be used to solve a wide range ofnon-<strong>L1</strong> problemsarg min J(u) such that A(u) = 0where J and ‖A(·)‖ 2 are convex.◮ We can use a <strong>Bregman</strong>-like penalty functionu k+1 = arg min J(u) + λ 2 ‖A(u) − bk ‖ 2b k+1 = b k − A(u)◮ <strong>The</strong>orem: Any fixed point of the above algorithm is solutionto the original constrained problem◮ Convergence can be proved <strong>for</strong> a broad class of problems:If J is strictly convex and twice differentiable, then ∃λ 0 > 0such that the algorithm converges <strong>for</strong> anyλ < λ 0


Conclusion◮ <strong>The</strong> <strong>Split</strong> <strong>Bregman</strong> <strong>for</strong>mulation is a fast tool that can solvealmost any <strong>L1</strong> regularization problem◮ Small memory footprint◮ This method is easily parallelized <strong>for</strong> large problems◮ Easy to code


ConclusionAcknowledgmentWe thank Jie Zheng <strong>for</strong> his helpful discussions regarding MR imageprocessing. This publication was made possible by the support ofthe National Science Foundation’s GRFP program, as well as ONRgrant N000140710810 and the Department of Defense.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!