Blind optimization of algorithm parameters for signal ... - IEEE Xplore

More documents

Recommendations

Info

Theorem 1 (cf. [5]) The random variable η(f λ (y)) is an unbiasedestimator of MSE(f λ (y)), that is,E b {MSE(f λ (y))} = E b {η(f λ (y))}, (6)where E b {·} represents the expectation with respect to b.Fig. 1. Schematic of the denoising problem: ˜x is obtainedby the application of the denoising algorithm on the data y.The MSE estimation box then computes an estimate of theMSE of ˜x (i.e., SURE) as a function of λ knowing only y andf λ (y).2. PROBLEM FORMULATION AND SUREWe adopt the standard vector formulation of a denoising problem:we measure noisy data y ∈ R N given by,y = x + b, (1)where x ∈ R N represents the vector containing the samplesof the unknown deterministic noise-free signal and b ∈ R Ndenotes the vector containing the zero-mean white Gaussiannoise of variance σ 2 , respectively. We are given a denoisingalgorithm which is represented by the operator f λ : R N →R N that maps the input data y on to the signal estimate ˜x:˜x = f λ (y), (2)where λ represents the set of parameters characterizing f λ ;these should be adjusted to yield the best estimate of the signal[see Figure 1]. Our primal aim in this work is to optimizeλ knowing only y and ˜x = f λ (y) as illustrated by the “MSEestimation” box in Figure 1. To achieve this, we propose theuse of SURE as a reliable estimate of the true MSE.In the sequel, we will assume that f λ is a bounded andcontinuous operator (i.e., the input-output mapping is continuousand a small perturbation of the input necessarily yields asmall perturbation in the output). In particular, we do requirethat the divergence of f λ with respect to the data y given bydiv y {f λ (y)} =N∑k=1∂f λk (y)∂y k, (3)where f λk (y) and y k are the k th component of the vectorsf λ (y) and y respectively, is well-dened in the weak sense.Then the SURE corresponding to ˜x = f λ (y) is a randomvariable given byη(f λ (y)) = 1 N ‖y − f λ(y)‖ 2 − σ 2 + 2σ2N div y{f λ (y)}, (4)where ‖·‖ 2 represents the Euclidean norm. The followingtheorem, due to Stein, states that η is an unbiased estimate ofthetrueMSEgivenbyMSE(f λ (y)) = 1 N ‖x − f λ(y)‖ 2 . (5)3. MONTE-CARLO SUREAs noted in (4), the divergence term, div y {f λ (y)}, playsapivotal role in the computation of SURE. The divergence canbe calculated analytically and has a closed form expressiononly in some special cases such as when f λ is linear or whenf λ is a pointwise operator in an orthogonal transform domain[6–8]. For a general f λ , the evaluation of the divergence maynot be tractable analytically and worse, it may even be numericallyinfeasible, especially if f λ is implemented in an iterativefashion (as is the case with most variational or PDE-baseddenoising methods). We circumvent this difculty by proposinga novel technique that is based on the following theoremwhich allows us to estimate the required divergence (and thusSURE) for an arbitrary f λ .Theorem 2 Let f λ (z) be the output of f λ corresponding toz = y + b ′ ,whereb ′ is a zero-mean i.i.d random vector (thatis independent of y) with covariance ε 2 I.Then1div y {f λ (y)} = limε→0 ε 2 E b ′{b′ T (f λ (z) − f λ (y))}, (7)provided that f λ admits a well-dened second order Taylorexpansion.The proof of this theorem will be presented elsewhere. This isa powerful result since (7) does not require any knowledge ofthe functional form of f λ , thus making it applicable for a widevariety of algorithms. The important point is that f λ is treatedas a black-box, meaning that we only need the output of theoperator irrespective of how it is implemented. Equation (7)forms the basis of our Monte-Carlo approach for computingSURE for a general f λ . Since, in practice, the limit in equation(7) cannot be implemented due to nite machine precision, wepropose to use the following approximation:div y {f λ (y)} ≈ 1 ε 2 b′ T (f λ (y + b ′ ) − f λ (y)). (8)The idea is to add a small amount of noise (of variance ε 2 )to y and evaluate f λ (y + b ′ ). The difference f λ (y + b ′ ) −f λ (y) is then used according to (8) to obtain an estimate ofthe divergence. The schematics of implementing the rhs of(8) is illustrated in Figure 2.We will demonstrate numerically that the approximationin (8) is quite reasonable and yields excellent numerical results.The validity of the approximation in (8) depends onhow small ε can be made. In practice, we must select ε smallenough to mimic the limit, but still large enough so as to avoid906
difcult (especially for large data sets). For such cases, wecan prove that Algorithm 1 yields an unbiased estimate ofTrace{F λ } irrespective of the value of ε. This provides anotherstrong justication for dropping the limit in (7).Fig. 2. The dotted box depicts the module which estimatesdiv y {f λ (y)} according to (8) using the realization b ′ i . Thedashed box represents the SURE module (depicted as theMSE estimation box in Figure 1) which computes the SUREaccording to equation (4) for a given λ.numerical round-off errors in f λ (y + b ′ ). It turns out that theestimation procedure is remarkably robust with respect to ε(cf. Section 4).We now develop an algorithm based on (8) for estimatingdiv y {f λ (y)} for an arbitrary f λ . The algorithm assumes thata suitable ε has been selected and that a set of K independentrandom vectors {b ′ i }K i=1 where each b′ i (independent of y) iszero-mean i.i.d with variance ε 2 has been generated.Algorithm 1 Estimation of div y {f λ (y)} and computation ofSURE for a given λ = λ 0 and xed ε:Step 1: For λ = λ 0 ,evaluatef λ (y); i =1; div = 0Step 2: Build z = y + b ′ i ;Evaluatef λ(z) for λ = λ 0Step 3: div = div + 1 εb ′T 2 i (f λ (z) − f λ (y)); i = i +1Step 4: If (i ≤ K) go to Step 2; otherwise evaluate samplemean: div = div/K and compute SURE(λ 0 ) using (4).It is clear that our method needs only O(N) storage whilecomputationally it is K times as costly as the denoising algorithmitself. It should also be noted that to estimate div y{f λ (y)} for a given set of parameters λ = λ 0 , f λ (y) needsto be evaluated only once, while f λ (z) may be repeated withmany realizations b ′ i |K i=1 for computing the sample mean instep 4. However, in practice, when N is large (especially images),it is usually sufcient to use a single realization (i.e.,K =1)ofb ′ .Let us now consider the special case of a linear denoisingoperator whose generic form isf λ (y) =F λ y, (9)where F λ is the matrix corresponding to the linear transformation.Then, the desired divergence is simply given bydiv y {f λ (y)} =Trace{F λ }, (10)which may be explicitly evaluated if F λ is known. There aremany scenarios, however, where the matrix is not known explicitlyand where (9) is evaluated through a recursive process,in which case the trace computation can turn out to be4. EXPERIMENTSWe illustrate the applicability of our Monte-Carlo method fortwo popular denoising algorithms: total variation denoising(TVD) [9] and redundant wavelet denoising (RWD) by universalsoft-thresholding with the Haar transform. In both cases,the SURE computation is known to be non-trivial and mostprobably not feasible numerically. For our experiments, weused K =1in Algorithm 1 with Gaussian random vectors.Our observation was that any ε ∈ [10 −12 , 1] yielded agreeableresults for both TVD and RWD. So we chose to be conservativewith respect to round-off errors and selected ε =0.1 in all experiments. The performance of the methods wasquantied by the signal-to-noise ratio ((SNR) of the output‖x‖f λ (y) computed as SNR = 10 log 210 ‖x−f λ (y)‖). In all2cases, the value of σ was set to achieve the desired input SNRcomputed by replacing ‖x − f λ (y)‖ 2 with Nσ 2 . The noisevariance σ 2 was assumed to be known (it can be reliably estimatedfrom y using the median estimator in [6]) for computingSUREin(4).Figures 3 and 4 plot the true MSE and SURE as a functionof λ for both TVD and RWD (for the Boats image, input SNR= 4 dB). In the case of TVD, the parameter λ represents theregularization parameter, while it denotes the soft-thresholdvalue for RWD. It is clearly seen that SURE computed usingour Monte-Carlo method approximates the true MSE curveremarkably well over the entire range of λ in both cases. Alsosignicant is the fact that SURE yields correct values for theoptimal λ in all cases. We observed the same trend for alltest images and input SNRs which conrms the consistencyof our method.Some further denoising results are summarized in Table 1.The rst value in each cell gives the SNR obtained by choosingλ based on the true MSE (oracle SNR), while the secondcorresponds to the result obtained by Monte-Carlo SURE optimization.Again, the two SNR values are in near perfectagreement for all the test images and noise levels. This con-rms the validity of our choice of ε and K; it also demonstratesthe reliability and robustness of our Monte-Carlo SUREoptimization procedure.5. CONCLUSIONSWe have developed a novel technique for computing SUREfor an arbitrary denoising algorithm. A possible interpretationof our Monte-Carlo scheme is that of a random rst orderdifference estimator of the divergence of an operator: itboils down to a randomly weighted summation of the differencebetween the restored signal and a perturbed version of it.In effect, this yields a black-box approach that uses only the907
Page 1: BLIND OPTIMIZATION OF ALGORITHM PAR

Blind optimization of algorithm parameters for signal ... - IEEE Xplore

Create successful ePaper yourself

Delete template?

Save as template?