18.07.2014 Views

Exploiting Redundancy for Aerial Image Fusion using Convex ...

Exploiting Redundancy for Aerial Image Fusion using Convex ...

Exploiting Redundancy for Aerial Image Fusion using Convex ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Exploiting</strong> <strong>Redundancy</strong> <strong>for</strong> <strong>Aerial</strong> <strong>Image</strong> <strong>Fusion</strong><br />

<strong>using</strong> <strong>Convex</strong> Optimization ⋆<br />

Stefan Kluckner, Thomas Pock and Horst Bischof<br />

Institute <strong>for</strong> Computer Graphics and Vision<br />

Graz University of Technology, Austria<br />

{kluckner,pock,bischof}@icg.tugraz.at<br />

Abstract. <strong>Image</strong> fusion in high-resolution aerial imagery poses a challenging<br />

problem due to fine details and complex textures. In particular,<br />

color image fusion by <strong>using</strong> virtual orthographic cameras offers a common<br />

representation of overlapping yet perspective aerial images. This<br />

paper proposes a variational <strong>for</strong>mulation <strong>for</strong> a tight integration of redundant<br />

image data showing urban environments. We introduce an efficient<br />

wavelet regularization which enables a natural-appearing recovery of fine<br />

details in the images by per<strong>for</strong>ming joint inpainting and denoising from<br />

a given set of input observations. Our framework is first evaluated on<br />

a setting with synthetic noise. Then, we apply our proposed approach<br />

to orthographic image generation in aerial imagery. In addition, we discuss<br />

an exemplar-based inpainting technique <strong>for</strong> an integrated removal<br />

of non-stationary objects like cars.<br />

1 Introduction<br />

In general, image fusion integrates in<strong>for</strong>mation of multiple images, taken from<br />

the same scene, in order to obtain an improved result with respect to noise, outliers,<br />

illumination changes etc. <strong>Fusion</strong> from multiple observations is a hot topic<br />

in computer vision and photogrammetry since scene in<strong>for</strong>mation can be taken<br />

from different view points without additional costs. In particular, modern aerial<br />

imaging technology provides multi-spectral images, which map every visible spot<br />

of urban environments from many overlapping camera viewpoints. Typically, a<br />

point on ground is at least visible in ten cameras. The provided, highly redundant<br />

data enables efficient techniques <strong>for</strong> height field generation [1], but also<br />

methods <strong>for</strong> resolution and quality enhancement [2–6]. On one hand, taking into<br />

account redundant observations of corresponding points in a common 3D world,<br />

the localization accuracy can be significantly improved <strong>using</strong> an integration of<br />

range data e.g. <strong>for</strong> 3D reconstruction [7]. On the other hand, accurate height<br />

fields can also be exploited to align data, such as the corresponding color in<strong>for</strong>mation,<br />

within a common coordinate system. In our approach we exploit derived<br />

range data to compute geometric trans<strong>for</strong>mations between the original images<br />

⋆ This work was financed by the Austrian Research Promotion Agency within the<br />

projects vdQA (No. 816003) and APAFA (No. 813397).


2 Stefan Kluckner, Thomas Pock and Horst Bischof<br />

Fig. 1. An observed scene with overlapping camera positions. The scene is taken from<br />

different viewpoints. We exploit computed range images to trans<strong>for</strong>m the point cloud<br />

into a common orthographic aerial view. Our joint inpainting and denoising approach<br />

takes redundant observations as input data and generates an improved fused image.<br />

Note that some observations include many undefined areas (black pixels) caused by<br />

occlusions and non-stationary objects.<br />

and an orthographic view, which is related to novel view synthesis [2–4, 8]. Due<br />

to missing data in the individual height fields (e.g. caused by non-stationary<br />

objects or occlusions) the initial alignment causes undefined areas, artifacts or<br />

outliers in the novel view. Figure 1 depicts an urban scene taken from different<br />

camera positions and a set of redundant images, geometrically trans<strong>for</strong>med to<br />

a common view. Some image tiles show large areas of missing in<strong>for</strong>mation and<br />

erroneous pixel values. Our task can also be interpreted as an image fusion from<br />

multiple input observations of the same scene by joint inpainting and denoising.<br />

While inpainting fills undefined areas, the denoising removes strong outliers and<br />

noise by exploiting the high redundancy in the input data.<br />

This paper has several contributions: First, we present a novel variational<br />

framework <strong>for</strong> gray and color image fusion, which provides a smooth solution<br />

over the image domain by exploiting redundant input images. In order to compute<br />

natural appearing images we further introduce a wavelet trans<strong>for</strong>m [9], providing<br />

an improved texture prior <strong>for</strong> regularization, in our convex optimization<br />

framework (Section 3). In the experimental section we show that our framework<br />

can be successfully applied to image recovery and orthographic image generation<br />

in high-resolution aerial imagery (Section 4). In addition, we present results <strong>for</strong><br />

exemplar-based inpainting, which enables an integrated removal of undesired,<br />

non-stationary objects like cars. Finally, Section 5 concludes our work and gives<br />

an outlook on future work.<br />

2 Related Work<br />

The challenging task of reconstructing an original image from given (noisy) observations<br />

is known to be ill-posed. Although fast mean or median computation


<strong>Exploiting</strong> <strong>Redundancy</strong> <strong>for</strong> <strong>Aerial</strong> <strong>Image</strong> <strong>Fusion</strong> <strong>using</strong> <strong>Convex</strong> Optimization 3<br />

over multiple pixel observations will suppress noisy or undefined areas, each<br />

pixel in the result is treated independently. A variety of proposed algorithms<br />

<strong>for</strong> a fusion of redundant in<strong>for</strong>mation is based on image priors [2, 8], image<br />

trans<strong>for</strong>ms [10], Markov random field optimization procedures [3] and generative<br />

models [4]. Variational <strong>for</strong>mulations are well-suited <strong>for</strong> finding smooth and<br />

consistent solutions of the inverse problem by exploiting different types of regularizations<br />

[7, 11–14]. The quadratic model [11] uses the L 2 norm <strong>for</strong> regularization,<br />

which causes smoothed edges. Introducing a total variation (TV) norm<br />

instead leads to the edge preserving denoising model proposed by Rudin, Osher<br />

and Fatemi (ROF) [12]. The authors in [13] proposed to also use a L 1 norm in<br />

the data term to estimate the deviation between sought solution and input observation.<br />

Thus the resulting TV-L 1 model is more effective in removing impulse<br />

noise containing strong outliers than the ROF model. Zach et al. [7] applied the<br />

TV-L 1 to robust range image integration from multiple views. Although TVbased<br />

methods are well suited <strong>for</strong> tasks like range data integration, in texture<br />

inpainting the regularization produces results that look unnatural near recovered<br />

edges (too much contrast). To overcome the problem of synthetic appearance,<br />

natural image priors based on multi-level trans<strong>for</strong>ms like wavelets [9, 15, 16] or<br />

curvelets [17] can be used within the inpainting and fusion model [14, 18]. These<br />

trans<strong>for</strong>ms provide a compact yet sparse image representation obtained with low<br />

computational costs. Similar to [14], we exploit a wavelet trans<strong>for</strong>m <strong>for</strong> natural<br />

regularization within our proposed variational fusion framework capable to<br />

handle multiple input observations.<br />

3 <strong>Convex</strong> <strong>Fusion</strong> Model<br />

In this section we describe our generic fusion model which takes into account<br />

multiple observations of the same scene. For clarity, we derive our model <strong>for</strong> grayvalued<br />

images, however the <strong>for</strong>mulation can be easily extended to vector-valued<br />

data like color images.<br />

3.1 The Proposed Model<br />

We consider a discrete image domain Ω as a regular grid of size W × H pixels<br />

with Ω = {(i, j) : 1 ≤ i ≤ W, 1 ≤ j ≤ H}, where the tupel (i, j) denotes a pixel<br />

position in the domain Ω.<br />

Our fusion model, which takes into account multiple observations and a waveletbased<br />

regularization, can be seen as an extension of the TV-L 1 denoising model<br />

proposed by Nikolova [13]. In the discrete setting the minimization problem of<br />

the common TV-L 1 model <strong>for</strong> an image domain Ω is <strong>for</strong>mulated as<br />

⎧<br />

⎫<br />

⎨<br />

min<br />

u∈X ⎩ ‖∇u‖ 1 + λ ∑<br />

⎬<br />

|u i,j − f i,j |<br />

⎭ , (1)<br />

i,j∈Ω<br />

where X = R W H is a finite-dimensional vector space provided with a scalar<br />

product 〈u, v〉 X<br />

= ∑ i,j u i,jv i,j , u, v ∈ X. The first term denotes the TV of the


4 Stefan Kluckner, Thomas Pock and Horst Bischof<br />

sought solution u and reflects the regularization in terms of a smooth solution.<br />

The second term accounts <strong>for</strong> the summed errors between u and the (noisy) input<br />

data f. The scalar λ controls the fidelity between data fitting and regularization.<br />

In following we derive our model <strong>for</strong> the task of image fusion from multiple<br />

observations.<br />

As a first modification of the TV-L 1 model defined in (1), we extend the convex<br />

minimization problem to handle a set of K scene observations (f 1 , . . . , f K ).<br />

Introducing multiple input images can be accomplished by summing the deviations<br />

between the sought solution u and available observations f k , k = 1 . . . K<br />

according to<br />

⎧<br />

⎫<br />

⎨<br />

K∑<br />

min<br />

u∈X ⎩ ‖∇u‖ ∑<br />

⎬<br />

1 + λ |u i,j − fi,j|<br />

k ⎭ . (2)<br />

k=1 i,j∈Ω<br />

Since orthographic image generation from gray or color in<strong>for</strong>mation with<br />

sampling distances of approximately 10 cm requires an accurate recovery of fine<br />

details and complex textures, we replace the TV-based regularization with a<br />

dual-tree complex wavelet trans<strong>for</strong>m (DTCWT) [9, 16]. The DTCWT is nearly<br />

invariant to rotation, which is important <strong>for</strong> regularization, but also to translations<br />

and can be efficiently computed by <strong>using</strong> separable filter banks. The<br />

trans<strong>for</strong>m is based on analyzing the signal with two separate wavelet decompositions,<br />

where one provides the real-valued part and the other one yields the<br />

complex part. Due to the redundancy in the proposed decomposition, the directionality<br />

can be improved, compared to standard discrete wavelets [9]. In order<br />

to include the linear wavelet-based regularization into our generic <strong>for</strong>mulation we<br />

replace the gradient operator ∇ by the linear trans<strong>for</strong>m Ψ : X → C. The space<br />

C ⊆ C D denotes the real- and complex-valued trans<strong>for</strong>m coefficients c ∈ C. The<br />

dimensionality of C D directly depends on parameters like the image dimensions,<br />

the number of levels and orientations. The adjoint operator of the trans<strong>for</strong>m Ψ,<br />

required <strong>for</strong> the signal reconstruction, is denoted as Ψ ∗ and is defined through<br />

the identity 〈Ψu, c〉 C<br />

= 〈u, Ψ ∗ c〉 X<br />

.<br />

As the L 1 norm in the data term is known to be sensitive to Gaussian noise<br />

(we expect a small amount), we use the robust Huber norm [19] to estimate<br />

the error between sought solution and observations instead. The Huber norm<br />

is quadratic <strong>for</strong> small values, which is appropriate <strong>for</strong> handling Gaussian noise,<br />

and linear <strong>for</strong> larger errors, which amounts to median like behavior. The Huber<br />

norm is defined as<br />

{ t<br />

2<br />

|t| ɛ = 2ɛ<br />

: 0 ≤ t ≤ ɛ<br />

t − ɛ 2 : ɛ < t . (3)<br />

Because of the height field driven alignment of the appearance in<strong>for</strong>mation,<br />

undefined areas can be simply determined in advance <strong>for</strong> a geometrically trans<strong>for</strong>med<br />

image f k . There<strong>for</strong>e, we support our <strong>for</strong>mulation with a spatially varying<br />

term w k i,j ∈ {0, 1}W H , which encodes the inpainting domain. The choice w k i,j = 0<br />

corresponds to pure inpainting at a pixel location (i, j).<br />

Considering the wavelet-based regularization, the encoded inpainting domain<br />

and the Huber norm, our extended energy minimization problem <strong>for</strong> redundant


<strong>Exploiting</strong> <strong>Redundancy</strong> <strong>for</strong> <strong>Aerial</strong> <strong>Image</strong> <strong>Fusion</strong> <strong>using</strong> <strong>Convex</strong> Optimization 5<br />

observations can now be <strong>for</strong>mulated <strong>for</strong> the image domain Ω as<br />

⎧<br />

⎫<br />

⎨<br />

K∑<br />

min<br />

u∈X ⎩ ‖Ψu‖ ∑<br />

⎬<br />

1 + λ wi,j|u k i,j − fi,j| k ɛ<br />

⎭ . (4)<br />

k=1 i,j∈Ω<br />

In the following we highlight an iterative strategy, based on an optimal first-order<br />

primal-dual algorithm, to minimize the non-smooth problem defined in (4).<br />

3.2 Primal-Dual Formulation<br />

Note that the minimization problem given in (4) poses a large-scale (the dimensionality<br />

directly depends on the number of image pixels e.g. <strong>for</strong> a small<br />

color image tile: 3 × 1600 2 pixels) and non-smooth optimization problem. Following<br />

recent trends in convex optimization [20, 21], we use an optimal first-order<br />

primal-dual scheme [22, 23] to minimize the energy. Thus we first need to convert<br />

the <strong>for</strong>mulation defined in (4) into a classical convex-concave saddle-point<br />

problem. The general minimization problem is written as<br />

min max 〈Kx, y〉 + G(x) − F ∗ (y) , (5)<br />

x∈X y∈Y<br />

where K is a linear operator, G and F ∗ are convex functions and the term F ∗<br />

denotes the convex conjugate of the function F . The finite-dimensional vector<br />

spaces X and Y provide a scalar product 〈·, ·〉 and a norm ‖ · ‖ = 〈·, ·〉 1 2<br />

. By applying<br />

the Legendre-Fenchel trans<strong>for</strong>m to (4), we obtain an energy minimization<br />

problem as follows<br />

{<br />

min max<br />

u c,q<br />

〈Ψu, c〉 − δ C (c) +<br />

K∑ ( } 〈u<br />

− f k , q k〉 − δ Q k(q k ) − ɛ 2 ‖qk ‖ 2) . (6)<br />

k=1<br />

In our case, the convex sets Q and C are defined as follows<br />

Q k = { q k ∈ R W H : |q k i,j| ≤ λw k i,j, (i, j) ∈ Ω } , k = 1 . . . K, (7)<br />

C = { c ∈ C D : ‖c‖ ∞ ≤ 1 } , (8)<br />

where the norm of the coefficient vector space C is defined as<br />

√<br />

‖c‖ ∞ = max |c i,j | , |c i,j | = (c 1 i,j<br />

i,j<br />

)2 + (c 2 i,j )2 . (9)<br />

Considering (6), we can first identify F ∗ = δ C (c) + ∑ K<br />

(<br />

k=1 δQ k(q k ) + ɛ 2 ‖qk ‖ 2) .<br />

The functions δ C and δ Q k are simple indicator functions of the convex sets and<br />

are given with<br />

δ C (c) =<br />

{ 0 if c ∈ C<br />

+∞ if c /∈ C<br />

δ Q k(q k ) =<br />

{ 0 if q k ∈ Q k<br />

+∞ if q k /∈ Q k . (10)


6 Stefan Kluckner, Thomas Pock and Horst Bischof<br />

Since a closed <strong>for</strong>m solution <strong>for</strong> the sum over multiple L 1 norms cannot be<br />

implemented efficiently, we additionally introduce a dualization of the data<br />

term with respect to G, which yields an extended linear term with 〈Kx, y〉 =<br />

〈Ψu, c〉 + ∑ K<br />

〈<br />

k=1 u − f k , q k〉 . According to [22, 23], the primal-dual algorithm<br />

can be summarized as follows: First, we set the primal and dual time steps with<br />

τ > 0, σ > 0. Additionally, we construct the required structures with u 0 ∈ R W H ,<br />

ū 0 = u 0 , c 0 ∈ C and q0 k ∈ Q k . Based on the iterations proposed in [22], the iterative<br />

scheme is then given by<br />

⎧<br />

c n+1 = proj C (c<br />

( n + σΨū n )<br />

⎪⎨ qn+1<br />

k q<br />

k<br />

= proj Q k<br />

(<br />

u n+1 = u n − τ<br />

⎪⎩<br />

ū n+1 = 2u n+1 − u n .<br />

n +σ(ūn−f k )<br />

1+σɛ<br />

)<br />

, k = 1 . . . K<br />

)<br />

Ψ ∗ c n+1 + ∑ K<br />

k=1 qk n+1<br />

(11)<br />

In order to iteratively compute the solution of (6) by <strong>using</strong> the primal-dual<br />

scheme, point-wise Euclidean projections of the dual variables q and c onto the<br />

convex sets C and Q are required. The projection of the wavelet coefficients c is<br />

defined as<br />

˜c i,j<br />

proj C (˜c i,j ) =<br />

max(1, |˜c i,j |) . (12)<br />

The corresponding projections <strong>for</strong> the dual variables q k with k = 1 . . . K are<br />

given by<br />

proj Q k(˜q k i,j) =<br />

˜q k i,j<br />

min(+λwi,j k , max(−λwk i,j , (13)<br />

|˜qk i,j |)).<br />

Note that the iterative minimization scheme mainly consists of simple point-wise<br />

operations, there<strong>for</strong>e it can be considerably accelerated by exploiting multi-core<br />

systems such as graphics processing units. In the next section we use our model<br />

to per<strong>for</strong>m image fusion of synthetic and real image data.<br />

4 Experimental Evaluation<br />

In this section we first demonstrate our convex fusion model on synthetic data,<br />

then we apply it to real world aerial images.<br />

4.1 Synthetic Experiments<br />

To show the per<strong>for</strong>mance with respect to recovered fine details, our first experiment<br />

investigates the fusion and inpainting capability of our proposed model<br />

<strong>using</strong> images with synthetically added noise. We there<strong>for</strong>e take the Barbara<br />

gray-valued image (512 × 512 pixels), which contains fine structures and highly<br />

textured areas. In order to imitate the expected noise model, we add a small<br />

amount of Gaussian noise (µ = 0, σ = 0.01) and replace a specified percentage


<strong>Exploiting</strong> <strong>Redundancy</strong> <strong>for</strong> <strong>Aerial</strong> <strong>Image</strong> <strong>Fusion</strong> <strong>using</strong> <strong>Convex</strong> Optimization 7<br />

35<br />

Peak Signal−To−Noise Ratio Outliers=10%<br />

35<br />

Peak Signal−To−Noise Ratio Outliers=50%<br />

Peak Signal−To−Noise Ratio [dB]<br />

30<br />

25<br />

20<br />

15<br />

Mean<br />

10<br />

Median<br />

TV Regularization<br />

DTCWT Regularization<br />

5<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Number of Input Observations<br />

Peak Signal−To−Noise Ratio [dB]<br />

30<br />

25<br />

20<br />

15<br />

Mean<br />

10<br />

Median<br />

TV Regularization<br />

DTCWT Regularization<br />

5<br />

0 2 4 6 8 10 12 14 16 18 20<br />

Number of Input Observations<br />

Fig. 2. Quantitative results <strong>for</strong> the Barbara image: PSNR depending on synthetically<br />

added noise (averaged noise values: 10%: 14.63 dB and 50%: 8.73 dB) and a varying<br />

number of input observations. Our proposed model <strong>using</strong> the wavelet-based regularization<br />

yields the best noise suppression.<br />

of pixels with undefined areas (we use 10% and 50%), which can be seen as simulation<br />

of occluded regions caused by perspective views. An evaluation in terms<br />

of peak signal-to-noise ratios (PSNR) <strong>for</strong> different amounts of undefined pixels<br />

and quantities of observations is shown in Figure 2. We compare our model to<br />

the TV-L 1 <strong>for</strong>mulation, the mean and the median computation. For the TV-L 1<br />

and our model we present the plots computed <strong>for</strong> the optimal parameters determined<br />

by cross validation. One can see that our joint inpainting and denoising<br />

model, <strong>using</strong> the parameter setting τ = 0.05, σ = 1/8/τ, ɛ = 0.1, λ = 1.2 and 3<br />

levels of wavelet decomposition (we use 13,19-tap and Q-shift 14-tap filter kernels,<br />

respectively), per<strong>for</strong>ms best in both noise settings. Moreover, it is obvious<br />

that an increasing number of input observations improves the result significantly.<br />

Compared to the TV-L 1 model, the wavelet-based regularization improves the<br />

PSNR by an averaged value of 2 dB.<br />

4.2 <strong>Fusion</strong> of Real <strong>Image</strong>s<br />

Our second experiment focuses on orthographic color image fusion in aerial imagery.<br />

The images are taken with the Microsoft UltraCam out of an aircraft in<br />

overlapping strips, where each image has a resolution of 11500×7500 pixels with<br />

a ground sampling distance of approximately 10 cm. Depending on the overlap<br />

in the imagery, the mapped area provides up to ten redundant observations <strong>for</strong><br />

the same scene. To obtain the required range data <strong>for</strong> each input image we use a<br />

dense matching algorithm similar to the method proposed in [1]. By taking into<br />

account the ranges and available camera data, each pixel in the images can be<br />

trans<strong>for</strong>med to common 3D world coordinates <strong>for</strong>ming a large cloud of points,<br />

which are then defined by location and color in<strong>for</strong>mation. Introducing virtual<br />

orthographic cameras, together with a defined pixel resolution (we use the same<br />

sampling distance provided by the original images), enables a projection of the<br />

point cloud of each scence observation to the ground plane (we simply set the


8 Stefan Kluckner, Thomas Pock and Horst Bischof<br />

Fig. 3. Some fusion results. The first column shows results obtained with the TV-L 1<br />

model (three input observations). The second column depicts corresponding images<br />

computed with our fusion model <strong>using</strong> a DTCWT regularization, which yields images<br />

with an improved natural appearance (λ = 1.0). Larger fusion results are given in the<br />

third column, where we exploit a redundancy of ten input images. The color fusion<br />

(1600 × 1600 pixels) can be obtained within two minutes. Best viewed in color.<br />

height coordinate to a fixed value). Computed fusion results <strong>for</strong> different dimensions<br />

are shown in Figure 3. The obtained results show an improved natural<br />

appearance, resulting from the wavelet-based regularization in our fusion model.<br />

4.3 Removal of Non-Stationary Objects<br />

Non-stationary objects such as cars disturb in orthographic image generation.<br />

We there<strong>for</strong>e use our model to remove cars by simultaneous inpainting. Car<br />

detection masks can be efficiently obtained by an approach as described in [24].<br />

In order to fill the detected car areas, our strategy is inspired by the work of<br />

Hays and Efros [25]. We per<strong>for</strong>m scene completion with respect to the detection<br />

mask by <strong>using</strong> a pool of potential exemplars. To do so, we randomly collect<br />

image patches (the dimension is adapted <strong>for</strong> a common car length) and apply<br />

image trans<strong>for</strong>mations like rotation and translation in order to synthetically<br />

increase the pool. To find the best matching candidate <strong>for</strong> each detected car<br />

we compute a sum of weighted color distances between a masked detection and<br />

each exemplar. The weighting additionally prefers pixel locations near the mask<br />

boundary and is derived by <strong>using</strong> a distance trans<strong>for</strong>m. The detection mask with<br />

overlaid exemplars is then used as an additional input observation within the<br />

fusion model. Obtained removal results are shown in Figure 4.


<strong>Exploiting</strong> <strong>Redundancy</strong> <strong>for</strong> <strong>Aerial</strong> <strong>Image</strong> <strong>Fusion</strong> <strong>using</strong> <strong>Convex</strong> Optimization 9<br />

5 Conclusion<br />

We have presented a novel variational method to fuse redundant gray and color<br />

images by <strong>using</strong> wavelet-based priors <strong>for</strong> regularization. To compute the solution<br />

of our large-scale optimization problem we exploit an optimal first-order primaldual<br />

algorithm, which can be accelerated <strong>using</strong> parallel computation techniques.<br />

We have shown that our fusion method is well suited <strong>for</strong> orthographic image generation<br />

in high-resolution aerial imagery, but also <strong>for</strong> an integrated exemplarbased<br />

fill-in to remove e.g. non-stationary objects like cars. Future work will<br />

concentrate on synthetic view generation in ground-level imagery, similar to the<br />

idea of [3], and on computing super-resolution from many redundant observations.<br />

References<br />

1. Hirschmüller, H.: Stereo vision in structured environments by consistent semiglobal<br />

matching. In: Proc. Conf. on Comp. Vision and Pattern Recognition. (2006)<br />

2. Fitzgibbon, A., Wexler, Y., Zisserman, A.: <strong>Image</strong>-based rendering <strong>using</strong> imagebased<br />

priors. In: Proc. Int. Conf. on Comp. Vision. (2003)<br />

3. Agarwala, A., Agrawala, M., Cohen, M., Salesin, D., Szeliski, R.: Photographing<br />

long scenes with multi-viewpoint panoramas. ACM Trans. on Graphics (SIG-<br />

GRAPH) 25(3) (2006)<br />

4. Strecha, C., Gool, L.V., Fua, P.: A generative model <strong>for</strong> true orthorectification.<br />

Int. Archives of Photogrammetry and Remote Sensing 37 (2008) 303–308<br />

5. Goldluecke, B., Cremers, D.: A superresolution framework <strong>for</strong> high-accuracy multiview<br />

reconstruction. In: Proc. Pattern Recognition DAGM. (2009)<br />

6. Unger, M., Pock, T., Werlberger, M., Bischof, H.: A convex approach <strong>for</strong> variational<br />

super-resolution. In: Proc. Pattern Recognition DAGM. (2010)<br />

7. Zach, C., Pock, T., Bischof, H.: A globally optimal algorithm <strong>for</strong> robust TV-L 1<br />

range image integration. In: Proc. Int. Conf. on Comp. Vision. (2007)<br />

8. Wood<strong>for</strong>d, O.J., Reid, I.D., Torr, P.H.S., Fitzgibbon, A.W.: On new view synthesis<br />

<strong>using</strong> multiview stereo. In: Proc. British Machine Vision Conf. (2007)<br />

9. Selesnick, I.W., Baraniuk, R.G., Kingsbury, N.G.: The dual-tree complex wavelet<br />

trans<strong>for</strong>m. Signal Processing Magazine 22(6) (2005) 123–151<br />

10. Pajares, G., de la Cruz, J.M.: A wavelet-based image fusion tutorial. Pattern<br />

Recognition 37(9) (2004) 1855 – 1872<br />

11. Tikhonov, A.N.: On the stability of inverse problems. Doklady Akademii Nauk<br />

SSSR 39 5 (1943) 195–198<br />

12. Rudin, L., Osher, S.J., Fatemi, E.: Nonlinear total variation based noise removal<br />

algorithms. Physica D. 60 (1992) 259–268<br />

13. Nikolova, M.: A variational approach to remove outliers and impulse noise. Journal<br />

of Mathematical Imaging and Vision 20(1-2) (2004) 99–120<br />

14. Carlavan, M., Weiss, P., Blanc-Fraud, L., Zerubia, J.: Complex wavelet regularization<br />

<strong>for</strong> solving inverse problems in remote sensing. In: Proc. Geoscience and<br />

Remote Sensing Society. (2009)<br />

15. Portilla, J., Simoncelli, E.P.: A parametric texture model based on joint statistics<br />

of complex wavelet coefficients. Int. Journal of Comp. Vision 40 (2000) 49–71<br />

16. Fadili, M., Starck, J.L., Murtagh, F.: Inpainting and zooming <strong>using</strong> sparse representations.<br />

Computer Journal 52(1) (2009)


10 Stefan Kluckner, Thomas Pock and Horst Bischof<br />

Fig. 4. Inpainting results <strong>using</strong> a car detection mask. From left to right: The car<br />

detection mask, the fusion result computed without <strong>using</strong> a car detection mask, the<br />

result obtained by pure inpainting and the inpainting with supporting exemplars. The<br />

car areas are successfully removed in both cases, however the exemplar-based fill-in<br />

appears more naturally. Best viewed in color.<br />

17. E.Candés, Laurent, D., Donoho, D., Ying, L.: Fast discrete curvelet trans<strong>for</strong>ms.<br />

Multiscale Modeling and Simulation 5(3) (2006) 861–899<br />

18. l. Starck, J., Elad, M., Donoho, D.: <strong>Image</strong> decomposition via the combination of<br />

sparse representations and a variational approach. Trans. on <strong>Image</strong> Processing 14<br />

(2004) 1570–1582<br />

19. Huber, P.: Robust Statistics. Wiley, New York (1981)<br />

20. Nesterov, Y.: Smooth minimization of nonsmooth functions. Mathematical programming<br />

Series A 103 (2005) 127–152<br />

21. Nemirovski, A.: Prox-method with rate of convergence O(1/t) <strong>for</strong> variational<br />

inequalities with Lipschitz continuous monotone operators and smooth convexconcave<br />

saddle point problems. Journal on Optimization 15(1) (2004) 229–251<br />

22. Chambolle, A., Pock, T.: A first-order primal-dual algorithm <strong>for</strong> convex problems<br />

with applications to imaging. Technical report, TU Graz (2010)<br />

23. Esser, E., Zhang, X., Chan, T.: A general framework <strong>for</strong> a class of first order<br />

primal-dual algorithms <strong>for</strong> tv minimization. Technical Report 67, UCLA (2009)<br />

24. Grabner, H., Nguyen, T.T., Grabner, B., Bischof, H.: On-line boosting-based car<br />

detection from aerial images. J. of Photogr. and R. Sensing 63(3) (2008) 382–396<br />

25. Hays, J., Efros, A.A.: Scene completion <strong>using</strong> millions of photographs. ACM Trans.<br />

on Graphics (SIGGRAPH) 26(3) (2007)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!