23.06.2013 Views

Research statement - UCLA Department of Mathematics

Research statement - UCLA Department of Mathematics

Research statement - UCLA Department of Mathematics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Dr. Dominique Zosso Summary <strong>of</strong> <strong>Research</strong> Interests<br />

Abstract<br />

Images and Imaging are ubiquitous today. Nearly every mobile phone has an integrated camera, and<br />

many are quasi-pr<strong>of</strong>essional “photoshoppers”. There is, however, more to image understanding than<br />

applying a canvas texture filter, red eye removal or motion blur correction—although the last one<br />

is a pretty hard problem, already. Images are more than just the result <strong>of</strong> cheap phone-cameras or<br />

expensive high-end gear. Images are also essential in remote sensing, medicine, biology, computer<br />

vision and what not. Also, the role <strong>of</strong> images is not only to aesthetically picture a scene, but to convey<br />

information. In medical imaging for example, we extract information from images about a physical<br />

reality that is hardly accessible to inspection otherwise, like intra-cranial MRI. Image understanding<br />

describes the collection <strong>of</strong> individual problems such as denoising, segmentation, classification,<br />

decomposition and so on, which scientists have been working on for a few decades, already.<br />

And still, many associated research questions have not been answered to full satisfaction, yet. Mathematically<br />

speaking, most image processing tasks are instances <strong>of</strong> inverse problems, where one looks<br />

for an underlying, hidden layer <strong>of</strong> information given a set <strong>of</strong> derived measurements. New applications,<br />

types <strong>of</strong> images, and questions to be answered through imaging come up everyday. My research interest<br />

is to ride on this highly dynamic wave at the cutting-edge frontier <strong>of</strong> image understanding<br />

problems. I want to contribute new mathematical models and efficient schemes for their computation,<br />

to provide new and better ways <strong>of</strong> getting information out <strong>of</strong> images, and to create new possibilities<br />

<strong>of</strong> imaging in the first place. Taking images is easy. Understanding them is the challenge.<br />

Motivation<br />

Ever since the first cave paintings, images and imaging have played important roles in human civilization.<br />

Early “research” was driven by the urge to perfect the fidelity <strong>of</strong> those images. The first<br />

successful use <strong>of</strong> heliography by Joseph Nicéphore Niépce, 1826, ultimately marks the beginning<br />

<strong>of</strong> “automated painting” as an entirely physical process <strong>of</strong> imaging, which shortcuts the necessity <strong>of</strong><br />

human understanding <strong>of</strong> the depicted object altogether.<br />

The discovery <strong>of</strong> X-rays, commonly attributed to Wilhelm Conrad Röntgen in 1895, sparks the extension<br />

<strong>of</strong> imaging beyond the boundaries <strong>of</strong> normal human visual perception. From then on, it became<br />

possible to “see” information from objects that was otherwise hidden and inaccessible. The spectrum<br />

<strong>of</strong> imaging modalities hasn’t stopped broadening ever since, each modality picturing a different<br />

physical property <strong>of</strong> the underlying object. Modern day imaging is now heavily challenged by the<br />

backward problem that consists <strong>of</strong> gaining insight on the physical reality <strong>of</strong> an object, given one or<br />

several derived images. Think <strong>of</strong> clinical health-care without the assistance <strong>of</strong> non-invasive imaging<br />

techniques s.a. computed tomography or magnetic resonance imaging, or modern live science<br />

research without appropriate imaging instrumentation.<br />

If we continue the journey through “imaging history”, we realize that nowadays, thanks to the exploding<br />

number <strong>of</strong> image acquisition devices and the availability <strong>of</strong> cheap (electronic) computing power,<br />

not only the process <strong>of</strong> imaging itself, but increasingly also the interpretation <strong>of</strong> acquired images is<br />

to a large extent left to non-human automata. The computational problems related to this task are<br />

generally referred to as image processing and computer vision, and their ultimate goal is to achieve<br />

image understanding. Typical tasks include, but are not limited to: Image restoration, segmentation,<br />

registration and classification, decomposition, stereo and multi-view scene reconstruction. The range<br />

<strong>of</strong> applications is enormous: medical imaging, biological imaging, non-destructive testing, remote<br />

sensing, surveillance and monitoring, robotics, and even consumer electronics.<br />

1


Recent Graduate <strong>Research</strong><br />

During my PhD thesis at EPFL, I have been working on a number <strong>of</strong> image processing problems. I<br />

would like to summarize the most important ones.<br />

Geometric image registration<br />

In this project, I first developed a novel geometric framework called geodesic active fields (GAF)<br />

for image registration in general. In image registration, one looks for the underlying deformation<br />

field that best maps one image onto another, see figure 1. This is a classic ill-posed inverse problem,<br />

which is usually solved by adding a regularization term. Here, instead, I proposed a multiplicative<br />

coupling between the registration term and the regularization term, the Beltrami energy <strong>of</strong> the embedded<br />

deformation field, see figure 2. This approach turns out to be equivalent to embedding the<br />

deformation field in a weighted minimal surface problem. Then, the deformation field is driven by a<br />

minimization flow toward a harmonic map corresponding to the solution <strong>of</strong> the registration problem.<br />

This proposed approach for registration shares close similarities with the well-known geodesic active<br />

contours model in image segmentation, where the segmentation term (the edge detector function) is<br />

coupled with the regularization term (the length functional) via multiplication as well. As a matter <strong>of</strong><br />

fact, our proposed geometric model is actually the exact mathematical generalization to vector fields<br />

<strong>of</strong> the weighted length problem for curves and surfaces.<br />

As compared to specialized state-<strong>of</strong>-the-art methods tailored for specific applications, our geometric<br />

framework involves important contributions. Firstly, our general formulation for registration works<br />

on any parametrizable, smooth and differentiable surface, including non-flat and multiscale images.<br />

In the latter case, multiscale images are registered at all scales simultaneously, and the relations<br />

between space and scale are intrinsically being accounted for. Secondly, this method is, to the best <strong>of</strong><br />

our knowledge, the first re-parametrization invariant registration method introduced in the literature.<br />

Thirdly, the multiplicative coupling between the registration term, i.e. local image discrepancy, and<br />

the regularization term naturally results in a data-dependent tuning <strong>of</strong> the regularization strength.<br />

Finally, by choosing the metric on the deformation field one can freely interpolate between classic<br />

Gaussian and more interesting anisotropic, TV-like regularization.<br />

Then, I provide an efficient numerical scheme that uses a splitting approach: data and regularity terms<br />

are optimized over two distinct deformation fields that are constrained to be equal via an augmented<br />

Lagrangian approach. This fast scheme is called FastGAF.<br />

Registration <strong>of</strong> cortical feature maps and automatic parcellation<br />

Much like a tree, the human brain exhibits a highly convoluted and irregular structure, with lots <strong>of</strong><br />

complexity and variability: sulci and gyri vary a lot between subjects. On the other hand, high level<br />

structures <strong>of</strong> the brain – the “big picture” – are highly conserved, such as the two hemispheres, the<br />

lobes and main folds. A hierarchical representation <strong>of</strong> these structures is important for example in<br />

the context <strong>of</strong> intersubject registration: Considering the complexity <strong>of</strong> the cortical surface, directly<br />

involving local small-scale features would mislead the registration to be trapped in local minima. A<br />

robust method needs to rely on large-scale features, describing the main landmarks <strong>of</strong> the cortex, such<br />

as the main gyri or sulci. Subsequently, features are to be iteratively refined to drive the registration<br />

more locally and reach the desired precision.<br />

Here, I proposed a scale-space that lends itself for a meaningful hierarchical representation <strong>of</strong> structures<br />

on cortical feature maps on the sphere. The scale-space is expected to produce “generic” brain<br />

images at coarse scale, adding more and more individual details at finer scales, as shown in figure 3.<br />

Finally, I provided an implementation <strong>of</strong> the FastGAF method on the sphere, so as to register the triangulated<br />

cortical feature maps. We build an automatic parcellation algorithm for the human cerebral<br />

cortex, which combines the delineations available on a set <strong>of</strong> atlas brains in a Bayesian approach, to<br />

automatically delineate the corresponding regions on a subject brain given its feature map (figure 4).<br />

2


Figure 1: Image Registration.<br />

The human skull is registered<br />

to chimpanzee and baboon<br />

by finding the deformation fields<br />

u(x), s.t. human features match<br />

chimpanzee/baboon at x + u(x).<br />

Figure 3: Scale-Space <strong>of</strong> cortical feature maps. (a)–<br />

(b) The equalized spherical map and projection on the partially<br />

inflated surface, at scales k. (c) Median thresholding<br />

illustrates the simplification <strong>of</strong> structures. Main structures,<br />

e.g. the central sulcus (red), were well preserved in coarser<br />

scales. (d)–(e) Magnitude and orientation <strong>of</strong> the gradient <strong>of</strong><br />

the map.<br />

Figure 2: The Swiss view <strong>of</strong> the Beltrami framework. A flat<br />

map corresponds to a higher-dimensional reality. Here: the intensity<br />

information (false-color) <strong>of</strong> the rectangular topographic map <strong>of</strong> the Matterhorn<br />

region translates directly into a three-dimensional terrain model.<br />

The surface <strong>of</strong> the Matterhorn as measured by the Beltrami energy <strong>of</strong> its<br />

map, is a direct measure <strong>of</strong> its “regularity”. (geodata c○swisstopo)<br />

3<br />

Ground Truth Automatic<br />

Error on GT Error on Feature<br />

Figure 4: Automatic parcellation. After<br />

registering a subject to the brains in the atlas,<br />

we perform automatic labelling. First and<br />

second row show manual (“ground truth”)<br />

and automatic parcellation results on the pial<br />

and inflated cerebro-cortical surface. The incorrect<br />

regions are marked in the third row.


Geometric image segmentation with Harmonic Active Contours<br />

In the meantime, I also worked on using the Beltrami energy as single term in a variational framework<br />

for image segmentation. As a more fancy extension <strong>of</strong> the Beltrami framework, we propose a segmentation<br />

method based on the geometric representation <strong>of</strong> images as surfaces embedded in a higher<br />

dimensional space, enabling us to naturally work with multichannel images. The segmentation is<br />

based on an active contour, described as the zero level set <strong>of</strong> the level set function, which is embedded<br />

in the image manifold along with a set <strong>of</strong> image features.<br />

Hence, both the data-fidelity and regularity terms <strong>of</strong> the active contour are jointly optimized by minimizing<br />

a single, unweighted Beltrami energy representing the hyper-surface <strong>of</strong> this embedded manifold.<br />

This approach is purely geometrical and does not require additional weighting <strong>of</strong> the energy<br />

functional to drive the segmentation task to the image contours. Indeed, the joint embedding <strong>of</strong> the<br />

image features and the level set both regularizes the level set and couples it to the image features. We<br />

have implemented both gradient-based and region-based criteria. The potential <strong>of</strong> such a geometric<br />

approach lies in the general definition <strong>of</strong> Riemannian manifolds, enabling the use <strong>of</strong> the proposed<br />

technique in scale-space methods, volumetric data or catadioptric camera images. This constitutes<br />

a more general framework for image segmentation by defining the optimal segmentation function as<br />

the harmonic map minimizing the hyper-surface <strong>of</strong> the manifold. The proposed technique is therefore<br />

called Harmonic Active Contour (HAC).<br />

Efficient algorithm for the level set method<br />

The level set method, as also used in the aforementioned HAC framework, is a popular technique<br />

for tracking moving interfaces in several disciplines including fluid dynamics and computer vision.<br />

However, despite its high flexibility, the level set method is limited by two important numerical issues.<br />

Firstly, the level set algorithm is slow because the time step is limited by the standard CFL condition,<br />

which is also essential to the numerical stability <strong>of</strong> the iterative scheme. Secondly, the level set method<br />

does not implicitly preserve the level set function as a distance function, which is necessary to ensure<br />

a good estimation <strong>of</strong> the contour normal and the curvature during the evolution process. Therefore we<br />

propose a new algorithm which overcomes these two fundamental problems in the level set method.<br />

The algorithm is based on recent advances <strong>of</strong> ℓ1 optimization techniques, some <strong>of</strong> which have already<br />

been employed in the fast algorithm for the optimization <strong>of</strong> the weighted Beltrami energy, above.<br />

The main idea is to split the original hard problem into sub-optimization problems which are wellknown<br />

and easy to solve, and to combine them together using an augmented Lagrangian approach to<br />

guarantee the equivalence with the original problem. This idea is borrowed from recent efficient ℓ1<br />

minimization methods originally applied to compressed sensing.<br />

Current Postdoctoral <strong>Research</strong><br />

Currently, I am researcher at the <strong>Department</strong> <strong>of</strong> <strong>Mathematics</strong> at <strong>UCLA</strong>, on a fellowship <strong>of</strong> the Swiss<br />

National Science Foundation (SNF). This fellowship gives me great independence in my research.<br />

A Unifying Geometric Framework for Image Processing<br />

My main focus, the topic for which my current fellowship was granted, is research on a “unifying geometric<br />

framework” for image processing. We believe that the Beltrami framework has great potential<br />

as generalized regularizing functional, in particular if equipped with a weighting function that allows<br />

multiplicative coupling <strong>of</strong> the data term with the regularization term <strong>of</strong> the specific inverse problem.<br />

Indeed, the resulting energy functional has very interesting fundamental properties, such as geometric<br />

regularization interpolating between Gaussian and TV regularization, re-parametrization invariance,<br />

applicability to any Riemannian manifold and intrinsic automatic data-dependent modulation <strong>of</strong> the<br />

local regularization strength.<br />

4


<strong>Research</strong> in related fields, s.a. compressed sensing and optimization theory, has yielded very efficient<br />

optimization algorithms, which can be applied to the weighted Beltrami framework as well. We<br />

postulate that the weighted Beltrami framework represents an important step towards a unifying variational<br />

framework for geometric image processing, with a high degree <strong>of</strong> generality and a multitude<br />

<strong>of</strong> beneficial properties. Therefore we research, if and how other inverse problems in computer vision,<br />

image processing and other related domains can be generalized by the weighted Beltrami framework,<br />

and to develop robust and fast numerical schemes to optimize them.<br />

Beyond, I have developed two different generalizations <strong>of</strong> the Beltrami energy, that make the functional<br />

applicable to more interesting and more powerful regularization problems.<br />

Beltrami diffusion in the space <strong>of</strong> patches<br />

First, I am working on weighted Beltrami regularization in the space <strong>of</strong> patches. Recently, the use<br />

<strong>of</strong> patches has significantly gained in importance in various image processing applications. Indeed,<br />

the individual intensity or color information contained in a single local pixel is <strong>of</strong>ten not sufficient to<br />

completely characterize this pixel. Neighborhood information is required in order to better differentiate<br />

between textural features and noise, and diffusion on the space <strong>of</strong> patches was proposed mainly by<br />

Tschumperle. Patch-based embeddings <strong>of</strong> the “Beltrami-kind” were proposed as texture-aware edgeindicators<br />

and for denoising. Only very recently, a computationally more interesting minimization<br />

scheme was proposed.<br />

Here, I propose a novel model for image restoration, based on anisotropic diffusion on the space <strong>of</strong><br />

patches, using the Beltrami embedding. We derive a local multiplicative coupling from the standard<br />

additive scheme and show how this automatically introduces an edge-aware pre-conditioner for diffusion.<br />

We propose a splitting scheme that naturally allows dealing with the patch-overlap and different<br />

non-linearities in a very elegant and efficient way.<br />

Graph-based and non-local Beltrami energy<br />

Beyond patch-diffusion, (patch-based) non-local regularization currently produces very promising results.<br />

For example, the current denoising state-<strong>of</strong>-the art is achieved by sparsification in patch-group<br />

transform-domain (BM3D). Currently, I am working on rendering the Beltrami-energy fully nonlocal,<br />

by extending its definition to non-local operators as introduced e.g., by Osher and Gilboa. This<br />

extension makes the benefits <strong>of</strong> Beltrami regularization, such as the intrinsic inter-channel coupling<br />

in vectorial or color images, readily available for data defined on graphs. This applies to non-local<br />

regularization where the graph-edge-weights are defined by patch-distances, but we equally see important<br />

usage in color-image processing, where node-distances are defined by various color-distances<br />

instead. Beyond, the anisotropic, inherently multichannel Beltrami-regularization thereby becomes<br />

available for any graph-based inverse problem, such as clustering or segmentation, with immediate<br />

applications in machine learning.<br />

Non-local Retinex<br />

Retinex is a theory on the human visual perception, introduced in 1971 by Edwin Land. It was an<br />

attempt to explain how the human visual system, as a combination <strong>of</strong> processes both in the retina<br />

and the cortex, is capable <strong>of</strong> adaptively coping with illumination spatially varying both in intensity<br />

and color. In image processing, the retinex theory has been implemented in various different flavors,<br />

each particularly adapted to specific tasks, including color balancing, contrast enhancement, dynamic<br />

range compression and shadow removal in consumer electronics and imaging, bias field correction in<br />

medical imaging or even illumination normalization, e.g. for face detection.<br />

In this project, I develop a unifying framework for retinex that is able to reproduce many <strong>of</strong> the existing<br />

retinex implementations within a single model, including all gradient-fidelity based models,<br />

variational models, and kernel-based models. The fundamental assumption, as shared with many<br />

5


etinex models, is that the observed image is a locally pointwise multiplication between the illumination<br />

and the true underlying reflectance <strong>of</strong> the object. Taking the logarithm splits the observed image<br />

into a sum <strong>of</strong> contributions from reflectance and from illumination. Starting from Morel’s 2010 PDE<br />

model for retinex, where illumination is supposed to vary smoothly and where the reflectance is thus<br />

recovered from a hard-thresholded Laplacian <strong>of</strong> the observed image in a Poisson equation, we define<br />

our retinex model in similar but more general two steps.<br />

The first step looks for a filtered gradient that is the solution <strong>of</strong> an optimization problem consisting<br />

<strong>of</strong> two terms: The first term is a sparsity prior <strong>of</strong> the reflectance, such as the TV or H1 norm. The<br />

second term is a quadratic fidelity prior <strong>of</strong> the reflectance gradient with respect to the observed image<br />

gradients. Since this filtered gradient almost certainly is not a consistent image gradient, we then<br />

look for a reflectance whose actual gradient comes close, while possibly respecting other priors and<br />

constraints on the reflectance or illumination.<br />

Our framework makes extensive use <strong>of</strong> non-local differential operators, <strong>of</strong> which the classical local<br />

finite difference operators are just special cases. Also, the freedom to choose different sparsity and fidelity<br />

norms in the first and second step, respectively, allows reproducing many <strong>of</strong> the existing retinex<br />

models, as illustrated in figure 5. As validation and illustration, we provide extensive comparisons <strong>of</strong><br />

the existing models with their equivalents in the proposed unifying retinex framework.<br />

More importantly though, by using different and more interesting non-local weightings for the sparsity<br />

and fidelity prior, we are able to derive entirely novel retinex formulations. This allows us defining<br />

new retinex instances particularly suited for drop shadow removal, texture-preserving shadow<br />

removal, cartoon-texture decomposition, color image enhancement, and even shadow removal in hyperspectral<br />

images—all within a single framework by just selecting different norms and non-local<br />

weights.<br />

Subvoxel segmentation in diffusion-weighted magnetic resonance images<br />

Precise segmentation <strong>of</strong> diffusion-weighted MR images (DWI) is a difficult task due to the very limited<br />

resolution (currently around 2.2 × 2.2 × 3mm 3 ). In brain tractography and subsequent connectivity<br />

analysis, precise segmentation <strong>of</strong> the CSF-grey-matter and white-matter-grey-matter interfaces<br />

is, however, an important task in order to achieve robust fiber tracking performance and consistent<br />

parcellisation <strong>of</strong> the cortical surface across subjects. The diffusion-weighted acquisitions <strong>of</strong>ten suffer<br />

from severe distortions due to increased local field inhomogeneities, in particular in the anterior<br />

part <strong>of</strong> the brain and along the phase-encoded direction. On the other hand, anatomical MR images<br />

(e.g. T1 and T2 weighted acquisitions) can be produced with substantially better resolution (1mm 3 or<br />

less), relatively low distortions and well understood tissue contrast. Also, many excellent state-<strong>of</strong>-theart<br />

methods are readily available to perform high-quality segmentation <strong>of</strong> the intracranial structures,<br />

including the important CSF-grey-matter and white-matter-grey-matter interfaces.<br />

Here we work on the segmentation problem in poorly defined diffusion space by exploiting the samesubject<br />

anatomy easily extracted from T1-weighted images as strong shape-priors. We reformulate<br />

the segmentation problem as an inverse problem, where we look for an underlying deformation field<br />

(the distortion) mapping from T1 space into diffusion space, such that the structures identified in the<br />

T1 image, with high confidence, will optimally align with structures in diffusion space.<br />

Variational adaptive mode decomposition<br />

Many real world signals are a superposition <strong>of</strong> different underlying components <strong>of</strong> different spectral<br />

“color”. The different colors <strong>of</strong> light, radio waves, electrical currents, etc. are all just examples <strong>of</strong><br />

this. In many cases, one is interested in isolating just a particular component out <strong>of</strong> this blend, like<br />

the modulated carrier signal <strong>of</strong> a desired radio station. In other cases, we wish to recover several<br />

or even all individual components less the noise that form the observed signal, with applications<br />

in many different domains, including image decomposition. In case <strong>of</strong> frequency-separated modes,<br />

6


Input Reflectance Illumination<br />

Figure 5: Unifying non-local Retinex model. Retinex is a model tempting to explain the capacity <strong>of</strong> the<br />

human visual system to normalize inhomogeneous illumination, and many particular, application-specific implementations<br />

exist. Our unifying model decomposes an input image into underlying reflectance and estimated<br />

illumination, and successfully reproduces basic retinex behavior (grey squares and checkerboard). The same<br />

model allows dynamic range compression and local contrast enhancement (radiography), shadow detection and<br />

removal (tiles), texture-cartoon decomposition (checkerboard), and illumination-invariant feature extraction for<br />

face detection. We are working on extensions to color and hyperspectral images.<br />

7


filter banks or wavelet decomposition are widely used techniques. The drawback here, however, is<br />

that one computes coefficients <strong>of</strong> far more components than actually present, and that the shape <strong>of</strong> the<br />

components is <strong>of</strong>ten overly restricted and predefined. Adaptive band decomposition techniques have<br />

gained importance, such as the empirical mode decomposition (EMD) algorithm.<br />

We are currently working on a variational model for adaptive mode decomposition, where from the<br />

input signal we estimate a number <strong>of</strong> modes which are mostly band-separated (but not strictly), and<br />

whose carrier frequency and bandwidth are determined on-line, adaptively. Indeed, we perform<br />

Wiener filtering on the demodulated input signal for each individual mode, and the resulting optimization<br />

problems are a mixture <strong>of</strong> adaptive Gabor-wavelet filtering and spectral Gaussian mixture<br />

models.<br />

Future <strong>Research</strong>: Vision and Objectives<br />

My research is driven by the quest for better models and tools for image understanding. I want to<br />

continue working on a broad spectrum <strong>of</strong> image processing and computer vision problems, that all<br />

have their useful application in science and society. Image segmentation, decomposition, denoising,<br />

restoration, illumination normalization—these tasks are all routinely performed in medical image<br />

scanners as well as consumer electronics, but research has not come to an end yet. As the imaging<br />

modalities become more and more complex, so do the questions to be answered by images. And the<br />

computational complexity <strong>of</strong> the inverse problems involved doesn’t stop growing.<br />

More challenging forms <strong>of</strong> images, such as omni-directional images or maps on more complicated<br />

surfaces such as the human cortex are becoming more important, and most <strong>of</strong> the established image<br />

processing tools do not directly translate to these new domains. Also, generalizing even further, data<br />

points on graphs can be considered images, and many machine learning tasks have a structure very<br />

similar to image processing problems, such as clustering or classification.<br />

Answering upcoming medical imaging problems still requires a lot <strong>of</strong> original research in applied<br />

mathematics, a lot <strong>of</strong> mathematical modeling intuition as well as the synthesis <strong>of</strong> many powerful<br />

existing concepts, like convex optimization tools, compressed sensing, Beltrami regularization or nonlocal<br />

operators, coupled with new insights from optimization, numerical optimization and scientific<br />

computing. I am strongly willing to further enlarge my toolbox in applied mathematics, and dedicate<br />

myself to research for new image understanding solutions.<br />

Dec. 2012, Dominique Zosso<br />

8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!