paper - Institute for Computer Graphics and Vision - Graz University ...

icg.tugraz.at

paper - Institute for Computer Graphics and Vision - Graz University ...

Graz University of Technology

Institute for Computer Graphics and Vision

Master’s Thesis

Globally Optimal

TV-L 1 Shape Prior Segmentation

Manuel Werlberger

Graz, Austria, May 2008

Thesis supervisor

Univ. Prof. DI Dr. Horst Bischof

Instructor

DI Dr. Thomas Pock


Abstract

Interpreting an image is a common and challenging task in computer vision. A human

observer does not only use intensity or color information or other basic features when

looking for region boundaries but also takes prior knowledge into account. This increases

the robustness on the segmentation result for most images. The main intention

of our work is to propose a globally optimal segmentation algorithm that incorporates

prior knowledge in form of a geometric shape. The proposed energy is based on a

weighted Total Variation energy and is optimized with fast numerical approaches like

the projected gradient descent method. The GPU-based implementation is able to

achieve real-time performance for the presented applications. We show the coherence

of the proposed energy model to former variational methods like the well-known edgepreserving

restoration model of Rudin, Osher and Fatemi and methods that incorporate

prior information into classical segmentation models. Different applications are realized

with the proposed energy. First of all a semi-automatic, interactive segmentation tool

is implemented. The user can either define a shape prior on the fly using the weighted

Total Variation as geodesic active contour or load a predefined geometric shape. Next

the energy model can be used to align two shapes on each other or optimize the alignment

of a shape to an underlying edge function. Consequentially a tracking approach

was introduced with the ability to optimize the incorporated shape information according

to consecutive frames. This position update is also used when processing 3D data

sets with a 2D prior which is particularly useful for segmenting tubular structures in

medical data sets with a single constraint on the first slice.

Keywords.

Segmentation, Geodesic Active Contour, Prior Knowledge, Shape Prior,

Total Variation, Variational Methods, globally optimal, GPU

iii


Acknowledgments

First of all, I would like to thank my family for always supporting me and giving me the

opportunity to follow the educational career to my liking. I am grateful to Prof. Horst

Bischof for supervising my master’s thesis. Special thanks go to Dr. Thomas Pock for

the guidance, for all the inspirational discussions and the time spent answering all my

questions and proof-reading my thesis. Without him this thesis would not have been

possible in this form. Many thanks to all the members of the Institute of Computer

Graphics and Vision who had always time for discussions and suggestions to my work.

In particular this people are Markus Unger who shared an office with me for several

months, Martin Urschler, Bernhard Kainz and Werner Trobin.

During my studies I got to know many different people and gained some sincere

friendships. I am thankful for the support of my friends. I am much obliged to Michael

Rabatscher for the successful collaboration in many lectures and the supporting talks

we had. Finally, I would like to thank Julia for her love and all her suggestions for

improvements after reading my thesis.

v


Contents

1 Introduction 1

1.1 Motivation and Problem Statement . . . . . . . . . . . . . . . . . . . . . 1

1.2 Organization of the Master’s Thesis . . . . . . . . . . . . . . . . . . . . 2

1.3 Digital Image and its Continuous Space . . . . . . . . . . . . . . . . . . 3

1.4 Well- or Ill-Posed Problem? . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 What Makes Image Segmentation a Hard Task? . . . . . . . . . . . . . . 5

1.6 Shape Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Related Work 7

2.1 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Edge-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Edge Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Scale Space Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2.1 Canny-Edge Detector . . . . . . . . . . . . . . . . . . . 10

2.2.3 Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.4 Balloons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.5 Geodesic Active Contours . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Region-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.2 Region splitting and merging . . . . . . . . . . . . . . . . . . . . 17

2.3.2.1 Region Merging . . . . . . . . . . . . . . . . . . . . . . 17

2.3.2.2 Region Splitting . . . . . . . . . . . . . . . . . . . . . . 17

vii


viii

CONTENTS

2.3.2.3 Splitting and Merging . . . . . . . . . . . . . . . . . . . 17

2.3.3 Mumford-Shah . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.4 Chan-Vese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Shape Prior Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.1 Introduction to the Level Set Framework . . . . . . . . . . . . . 21

2.4.2 Leventon et al. approach . . . . . . . . . . . . . . . . . . . . . . 23

2.4.3 Diffusion Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.4 Chen et al. approach . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.5 Global-to-Local Shape Registration . . . . . . . . . . . . . . . . . 27

2.4.6 Shape Prior Driven Mumford-Shah Functional . . . . . . . . . . 29

2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Geodesic Active Contour with L 1 Shape Prior 33

3.1 Geodesic Active Contours with Shape Information . . . . . . . . . . . . 33

3.2 Review of Total Variation Models . . . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Rudin Osher and Fatemi - Noise Removal . . . . . . . . . . . . . 35

3.2.2 The L 1 Data Fidelity Term . . . . . . . . . . . . . . . . . . . . . 36

3.2.3 Weighted Total Variation . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Geometric Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.1 Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.2 L 1 Shape Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.3 Rigid Shape Transformation φ ◦ s . . . . . . . . . . . . . . . . . . 41

3.4 TV-L 1 Shape Prior Segmentation . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Solutions to Variational Models . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.1 Calculus of Variation . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.2 Explicit Solution of the ROF Model . . . . . . . . . . . . . . . . 43

3.5.3 Dual Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.4 Solving the TV-L1 Model . . . . . . . . . . . . . . . . . . . . . . 45

3.6 Solving the Shape Prior Segmentation Model . . . . . . . . . . . . . . . 46

3.6.1 Minimize u for fixed v and φ – Projected Gradient Descent . . . 47


CONTENTS

ix

3.6.2 Minimize v for fixed u and φ . . . . . . . . . . . . . . . . . . . . 49

3.6.3 Optimize the Rigid Transformation φ for fixed u and v . . . . . . 50

3.6.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.7 Iterative Scheme to Solve the Segmentation Model . . . . . . . . . . . . 52

4 Implementation 53

4.1 GPU design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3 Is there an alternative to CUDA? . . . . . . . . . . . . . . . . . . . . . . 58

4.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Applications and Results 61

5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Interactive Medical Image Segmentation . . . . . . . . . . . . . . . . . . 62

5.3 Shape Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4 Tracking Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.5 Processing 3D CT/MR Data . . . . . . . . . . . . . . . . . . . . . . . . 79

6 Conclusion and Outlook 81

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Bibliography 84


List of Figures

1.1 Example of an ill-posed problem. . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Importance of prior knowledge. . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 The task of image segmentation depends on the given image and its

pictured objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Two different edge profiles and their first and second order derivative . 9

2.3 Edge detection on different scales. . . . . . . . . . . . . . . . . . . . . . 11

2.4 Histogram of a grayscale image with a clear bound to do a fore- and

background segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Segmentation with image thresholding . . . . . . . . . . . . . . . . . . . 16

2.6 The Ambrosio-Tortorelli approximation of the Mumford-Shah functional. 19

2.7 Chan-Vese segmentation result. . . . . . . . . . . . . . . . . . . . . . . . 20

2.8 Defining level set properties . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.9 Contour topology change during the evolution of the level set function. . 22

2.10 Contour optimization with respect to statistical shape priors . . . . . . 26

2.11 Segmentation of an epicardium . . . . . . . . . . . . . . . . . . . . . . . 27

2.12 Global-to-Local Registration . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.13 Cardiac Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.14 Comparison of Chan-Vese Model to shape prior segmentation proposed

by Bresson et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1 Denoising with TV-L2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 TV-L2 filter applied to a resolution test chart. . . . . . . . . . . . . . . 37

3.3 TV-L1 filter applied to a resolution test chart. . . . . . . . . . . . . . . 38

xi


xii

LIST OF FIGURES

3.4 Shape similarity measure. . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Evaluation of different λ settings. . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Schematic sequence of shader units in a traditional GPU rendering

pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 Principle of unified shader model. . . . . . . . . . . . . . . . . . . . . . . 54

4.3 Block Diagram of GeForce 8800 GTX. . . . . . . . . . . . . . . . . . . . 55

4.4 Provided memory of the GeForce 8 Series. . . . . . . . . . . . . . . . . . 56

4.5 Thread organization in grids and blocks for kernel execution. . . . . . . 57

5.1 Workflow of shape prior segmentation. . . . . . . . . . . . . . . . . . . . 63

5.2 Segmentation of the first phalanx of an index finger and a metacarpal

bone of a ring finger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3 Segmentation of a vertebra. . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4 Segmentation of a hand. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.5 Shape alignment with user interaction. . . . . . . . . . . . . . . . . . . . 68

5.6 Alignment of a given shape to a metacarpal bone of a ring finger. . . . . 69

5.7 Alignment of a given shape to the second phalanx of an index finger. . . 70

5.8 Segmentation of the left ventricle. . . . . . . . . . . . . . . . . . . . . . . 71

5.9 Position optimization for a shape of the right atrium. . . . . . . . . . . . 72

5.10 Alignment of a single vertebra (sagittal). . . . . . . . . . . . . . . . . . . 73

5.11 Alignment of a single vertebra (coronal). . . . . . . . . . . . . . . . . . . 73

5.12 Alignment of multiple vertebrae. . . . . . . . . . . . . . . . . . . . . . . 74

5.13 Hand shape alignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.14 Real-time tracking example. . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.15 Daimler-Chrysler tracking sequence. . . . . . . . . . . . . . . . . . . . . 78

5.16 Aorta segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.17 3D visualization of aorta segmentation. . . . . . . . . . . . . . . . . . . . 80


Chapter 1

Introduction

Contents

1.1 Motivation and Problem Statement . . . . . . . . . . . . . . 1

1.2 Organization of the Master’s Thesis . . . . . . . . . . . . . . 2

1.3 Digital Image and its Continuous Space . . . . . . . . . . . . 3

1.4 Well- or Ill-Posed Problem? . . . . . . . . . . . . . . . . . . . 4

1.5 What Makes Image Segmentation a Hard Task? . . . . . . . 5

1.6 Shape Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1 Motivation and Problem Statement

For the human eye it is a natural procedure to partition the field of view into distinguishable

regions. Therefore different types of features like edges, texture or appearance

are taken into account. The human observer also takes region descriptions, like object

shapes, into account. This master’s thesis incorporates shape description into a geodesic

active contour segmentation model to partition an image into non-overlapping regions.

In modern computer vision many kinds of segmentation algorithms exist that make

use of different local features like gradients, intensity, color, etc. The main drawback

using basic features is that the lack of intensity or image distortion may lead to wrong

segmentation results for more complex images. Model based vision in form of shape

models (e.g. active shape models introduced by Cootes et al. [20]) is used to fit a

pre-learned model onto a nearby object. Therefore it is necessary to train the model

1


2 Chapter 1. Introduction

with hand labeled samples in advance. In addition the initialisation is very important

for the segmentation outcome. Our proposed model can be defined on the fly and used

for real time segmentation.

1.2 Organization of the Master’s Thesis

In Section 1.3 and 1.4 some fundamentals about digital images and the character of

computer vision problems are discussed. This master’s thesis is concerned with image

segmentation and therefore we show the difficulties to obtain a valid segmentation in

Section 1.5. Section 1.6 focuses on the introduction of prior knowledge in form of a

shape prior.

Chapter 2 reviews classical segmentation approaches. First in Section 2.2 edgebased

segmentation methods are presented. Therefore the fundamentals of edge detectors

and their behavior to different scaled objects are reviewed. More sophisticated approaches

like the Snake model [29] and geodesic active contours [8, 30, 31] are presented

in Section 2.2.3-2.2.5. After that region-based approaches are examined in Section 2.3.

Section 2.3.1 and 2.3.2 focus on basic methods like thresholding and region splitting and

merging. However in Section 2.3.3 and 2.3.4 we will discuss the segmentation model

of Mumford and Shah [39] and its variation from Chan and Vese [15, 58]. To improve

the robustness of such segmentation methods prior knowledge was incorporated. Various

methods that make use of shape models for the segmentation task are reviewed in

Section 2.4.

In Section 3 we introduce the idea of a variational shape prior segmentation method.

In Section 3.2 corresponding total variation methods are reviewed. The method how

shape information is incorporated in the segmentation system is presented in Section

3.3.2. In Section 3.4 the energy model for the shape prior segmentation is proposed.

In the following methods on solving variational methods are reviewed in Section 3.5

and Section 3.6 focuses on the solution of the shape prior segmentation model.

Chapter 4 deals with the GPU-based implementation. Details on GPU design and

the used graphics hardware are given in Section 4.1. Section 4.2 introduces the GPGPU

programming framework CUDA from NVidia which is used for the implementation.

Chapter 5 is devoted to different applications and their results. First a program for

interactive image segmentation is presented in Section 5.2. There also an evaluation


1.3. Digital Image and its Continuous Space 3

on hand-labeled reference data is done. Some examples of shape alignments are shown

in Section 5.3 that is also evaluated on the same reference data among others. Next

the adaption of shape alignment to sequences is shown in a tracking application in

Section 5.4. Section 5.5 shows a possibility how to process 3D data with the help of a

2D shape prior.

Finally in Chapter 6 we give a conclusion to the presented algorithm and applications

and an outlook of possible future work and algorithm enhancements.

1.3 Digital Image and its Continuous Space

A digital image is a mapping from a real world scene to a representation that is readable

for a digital device (e.g. digital camera, computer, embedded system, ...). Therefore

the continuous space of the analog image has to be mapped onto a discrete space by

sampling and quantization. Since computers are powerful enough it is possible to process

a large amount of data and apply complex algorithms. Vision algorithms have been

developed mainly by computer scientists, electrical engineers and mathematicians. The

main intention is to obtain information from digital images. Applied functions can be

categorized into continuous and discrete functions which may represent the appliance

very well. There have been different approaches to solve vision based problems. The

computer science community may prefer discrete operations that fit the computer representation

well, mathematicians will specialize on a continuous space for well defined

mathematical models. For electrical engineers representing the image as a 2D signal allows

the appliance of signal-processing techniques. Fundamental algorithms and filters

have been developed by this means.

Nowadays more sophisticated algorithms are proposed and one of the mainstream

approaches emerged from one of the most important field of mathematical analysis,

namely the field of partial differential equations (PDEs). In general a PDE describes

an equation involving functions and their partial derivatives with respect to independent

variables. They actually come from Physics and got more common in other fields

over time. In the first place physics and biology, afterwards finance and now computer

vision benefit from the versatile mathematical model. As PDEs are stated in continuous

settings, the solution has to be proven in continuous space and afterwards the

calculation can be applied to the discrete space to find a numerical solution for the


4 Chapter 1. Introduction

respective problem.

1.4 Well- or Ill-Posed Problem?

The french mathematician Jacques Hadamard [26] defined that physical problems have

to be solved with well-posed mathematical models. To make a problem well-posed its

solution has to be

1. existent,

2. unique,

3. and depends continuously on initial data (is for example robust against noise).

(a) original image x (b) blurred image y = Ax (c) blurred and additive noise

y = Ax + n

Figure 1.1: Example of an ill-posed problem.

Computer vision is often stated as inverse optics. Most of the upcoming problems

are inverse ones which means that model parameters have to be estimated out of given

data. Consider e.g. y = Ax + n with a known operator A and n. To obtain the data

y from the given parameters x is the direct problem and usually well-posed, whereas

the inverse problem calculating x when the data y is given is usually ill-posed. As a

concrete example we have a look at an image x (Figure 1.1a) that was blurred with

a function A (Figure 1.1b). In addition we add some noise n. This will result into

an observed image y (Figure 1.1c). It is obvious that it is a difficult task to restore

the original image x if you only have the observed image y and only few information

on the added distortions A and n. Keeping sharp edges at the right place is a key


1.5. What Makes Image Segmentation a Hard Task? 5

problem of such an ill-posed problem. Tikhonov and Arsenin [54, 55] developed a wellknown

regularization technique to solve such ill-posed problems. The main idea is to

restrict the space of acceptable solutions and define a model that minimizes the defined

function.

1.5 What Makes Image Segmentation a Hard Task?

(a) Dalmatian

(b) Teddy Bear

Figure 1.2: Importance of prior knowledge. The images are taken from a

talk of Pylyshyn about cognitive science [48].

In the Figure 1.2 humans can identify the shown objects without a problem although

edges are broken or not even existent, as in the image with the dalmatian (Figure 1.2a)

or regions are scrambled like in the teddy bear image (Figure 1.2b), it is no problem

for the human visual system to identify the important cues for a reasonable segmentation.

Though when a human has never seen a dalmatian or a teddy bear before it

would be hard for him to see the objects because of lacking prior information for the

segmentation. People that are aware of the objects will be able to identify them fairly

reliable which suggests to incorporate previous knowledge to segmentation systems.

To build (semi-)automated vision systems an introduction of prior information is

a key component. To identify a known object in an unseen image or scene previous

knowledge can be essential. Humans can easily handle this task, even for very difficult

images as stated before. Robustness of automated segmentation system will improve the

more prior knowledge is available. The problem is that the computer has to interpret

this vast amount of data and this will badly effect runtime. Our idea is to offer only


6 Chapter 1. Introduction

shape information to help the segmentation model to find a valid result.

1.6 Shape Prior

A reasonable partitioning of a complex scene with basic algorithms is hardly ever possible.

In our work we use higher-level information in combination with approximate

knowledge of the desired object shape to get the desired segmentation. The outcome is

a very versatile segmentation algorithm that can be used not only for a segmentation

task but also for shape alignment, tracking and automated analysis of 3D data. In general

our intention is not to build a fully automated segmentation system. Especially

with medical image data a specialist has to supervise the application and if needed

interact with the system. Therefore we have three prime concerns:

1. Real-time ability

2. User interaction (semi-automated approach)

3. Finding a globally optimal solution

To reach our demands we use variational methods for the image segmentation and

implement the algorithms with the help of GPGPU programming. This allows to gain

a speedup of these rather complicated algorithms and enables the application to react

on user input in real-time.


Chapter 2

Related Work

Contents

2.1 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Edge-Based Segmentation . . . . . . . . . . . . . . . . . . . . 8

2.3 Region-Based Segmentation . . . . . . . . . . . . . . . . . . . 14

2.4 Shape Prior Segmentation . . . . . . . . . . . . . . . . . . . . 20

2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.1 Image Segmentation

One of the most important tasks in computer vision corresponding to object borders

is to divide the image into multiple regions and detect objects and region borders. A

person can easily separate constituent parts of a scene, but for computers this is highly

ambiguous. The segmentation problem is not unique and depends on the available

image data like in Figure 2.1. For the left image the result would be a segmentation

of the single objects but for the right one there is no such obvious split up. To obtain

the final result different features can be used. An often used element to describe

certain objects are their contours and therefore edges are frequently used features for

segmentation. For object detection one often favors to detect complete regions that

correspond to objects. In medical image analysis it is often necessary to detect a

certain structure (e.g. bone, tumor, vessels, . . . ). Especially in medicine the exact

position is very important when identifying an object. In the following we give an

7


8 Chapter 2. Related Work

outline on basic edge- and region-based segmentation models.

(a) The single objects can be identified easily

in this scene.

(b) Demonstrating that segmentation is not a unique

problem. The result differs on the desired region that is

wanted to detect. (e.g.: tree, clouds, lake, . . .

Figure 2.1: The task of image segmentation depends on the given image

and its pictured objects.

2.2 Edge-Based Segmentation

Edge information is a possibility to extract boundary segments from an image. There

are many kinds of edge detectors that can be used to create an edge image. But

edges alone are not enough to cluster the image into differentiable regions. Therefore

the found edges have to be combined into chains that may describe object borders.

Another popular method to obtain region boundaries are active contour models. The

terminus active is introduced because the segmentation evolves over time and for certain

models it is even possible to change its topology. The goal of this process is to fit a

curve to the boundary of an object.

The main problems on edge based segmentation are edges which do not represent

object borders. They can be caused by noise or object texture and should not be taken

into account when searching for an object or region boundary. Also missing edges due

to occlusion have the same negative effect on the resulting segmentation.


2.2. Edge-Based Segmentation 9

2.2.1 Edge Detectors

Edge detectors make use of local changes in the intensity function. A big change

will result into a stronger edge than a small one. A threshold applied to the edge

strength can suppress weak edges. Examples of edge detectors are the well-known Sobel

filter [52] that searches for maxima in the first derivation and the Marr-Hildreth [38] and

Laplacian of Gaussian (LoG) algorithms [28] that detect zero-crossings in the second

derivative. For this purpose see Figure 2.2 where the strength of edges are modelled

dependent on the intensity change in the original image. We see that higher intensity

change results into stronger edges.

(a) Intensity Profile (b) 1 st Derivative (c) 2 nd Derivative

Figure 2.2: Two different edge profiles and their first and second order

derivative

2.2.2 Scale Space Theory

Real-world objects are normally defined on a range of various sizes. Digital representations

of objects only incorporate a single scale. For further processing it is sometimes

essential that the scale of the representation does fit to the real object. In [36, 37]

Lindeberg explored the field of scale space theory referring to computer vision tasks.

Mostly when talking about scale spaces, a linear (Gaussian) one is meant. A compilation

of gradually smoothed images is the basis of such a representation that should

make vision algorithms scale invariant.

But what is the coherence of different scale representations with edge detection?

Objects have features, textures and sub-parts of different size and therefore in a scaled

representation some features may vanish. An example of different scale spaces of a

gray-value image is presented in the left column of Figure 2.3 whereas the right column


10 Chapter 2. Related Work

shows the corresponding edge image.

edges that arose from object texture.

account after the Gaussian filtering with

Blurring can be used to eliminate disturbing

Only strong enough edges will be taken into

G(x, y) = 1

2πσ 2 e−( x 2 +y 2 )

2σ 2 . (2.1)

2.2.2.1 Canny-Edge Detector

An edge detector which uses the theory of scale spaces was proposed by Canny in the

year 1983 in [7]. The main intention of the edge detector was to fulfill the following

three conditions:

good detection: This requirement expresses that important edges ought to be found.

good localization: The detection should be accurate and the real edge position should

hardly vary from the detected edge position.

definite detection: The third constraint says that a single edge should not produce

multiple responses.

To suppress unimportant edges, the image f is filtered with a Gaussian convolution

G like in the scale space theory. Afterwards the normal direction n of the edges are

estimated for each pixel with the help of

n =

∇(G ∗ f)

|∇| (G ∗ f) . (2.2)

To suppress weak edges a non-maximal suppression is applied and the location of the

edges are evaluated with

∂ 2 G

∗ f = 0. (2.3)

∂n2 Next the edge strength s is evaluated with the magnitude of the gradient of the image

intensity function:

s = |∇(G ∗ f)| (2.4)

To remove spurious edge responses, a thresholding with hysteresis is applied. Therefore

a high edge response is marked as definite edge and a response below the threshold is


2.2. Edge-Based Segmentation 11

(a) Unfiltered image.

(b)

(c) σ = 1

(d)

(e) σ = 5

(f)

Figure 2.3: The left column shows the original image and two Gaussian

filtered versions. The edge image, produced with a Sobel filter, is presented

on the right.


12 Chapter 2. Related Work

removed and classified as noise. To increase the robustness of the algorithm the steps

are repeated for increased smoothing and the results are combined. Canny proposed

this approach as feature synthesis.

2.2.3 Snakes

One of the earliest active contour models are Snakes proposed by Kass, Witkin and

Terzopoulos [29]. A Snake is defined as an energy minimizing spline C(s) which is

guided through the image by internal and external forces. Snakes belong to the family

of parametric models as borders are represented in parametric form. The energy that

is minimized was proposed by Kass et al. in the form

E Snake =

∫ 1

0

E internal (C(s)) + E image (C(s)) + E constraints (C(s)) ds. (2.5)

As the energy term is non-convex there is no global solution to that problem. Snakes

are very sensitive to initialization and can result into different segmentation results.

The energy minimization is influenced by image forces that pull the contour towards

edges. So the contour will snap to nearby line structures.

E image : The image forces are modelled by a combination of three energy terms:

E image = w line E line + w edge E edge + w term E term (2.6)

E line can be a simple intensity value pushing the Snake towards contours with the

specified intensity. The edge term w edge E edge , as its name already says, attracts

the segmentation boundary towards edges. The termination term w term E term

stops the evolution process if no continuously contour is available and the segmentation

will also take line segments into account.

E internal : The internal spline energy is modelling a bending energy. The Snake can

act as a membrane or a thin plate. The needed behavior can be adjusted with

the two parameters α(s) and β(s).

(

E internal = 1 α(s)

∂C

2 ∣ ∂s ∣

2

+ β(s)

∂ 2 ∣ ) C ∣∣∣

2

∣ ∂s 2 (2.7)


2.2. Edge-Based Segmentation 13

E constraints : The third data term of the energy proposed by Kass et al. incorporates

user input or other high level features. E constraints can guide the Snake to a

meaningful segmentation result with the help of additional constraints.

In [29] Kass et al. propose an image energy that pulls the contour towards nearby

edges. In Chapter 2.2.2 the problem of edge scales is mentioned. Because of the

Gaussian convolution the following image energy only takes edges of a certain scale σ

into account.

E image = − |∇G σ ∗ I(x, y)| 2 (2.8)

Using this image energy, ignoring the user-driven term and using constant parameters

α and β leads to a simplified version of the energy in the form of

E Snake = α 2

∫ 1

0

∂C(s)

∣ ∂s ∣

2

ds + β 2

∫ 1

0

∂ 2 ∣

C(s) ∣∣∣

2

∣ ∂s 2 ds − |∇G σ ∗ I(x, y)| 2 . (2.9)

2.2.4 Balloons

In 1993 an extension to Snakes was proposed by Cohen and Cohen in [19] by adding

an additional term to the energy:

E balloons = E Snake + w b

∫Ω in

dx (Ω in . . . region inside the contour) (2.10)

Snakes tend to converge to smaller regions because the contour is driven by optimizing

the length of the boundary. The so-called Balloons favor regions of specific sizes and by

adjusting the parameter w b the region can either shrink or expand to find the solution.

The added term can be interpreted as additional constraint E constraints for the snake

model. The advance of Balloons is the versatility that the boundary can either shrink

or grow to detect the region boundaries. The disadvantage is that the position of

initialized contour and final object position has to be known so that the weight w b can

be set the right way.

2.2.5 Geodesic Active Contours

Based on the Snake model of Kass et al.

(Section 2.2.3), Caselles et al. [8] and

Kichenassamy et al. [30, 31] proposed an energy that is invariant with respect to new


14 Chapter 2. Related Work

parametrization of the curve unlike the Snake model. The Geodesic Active Contour

(GAC) (in 3D the model is called minimal surfaces) is defined as the energy optimization

{ }

min E GAC (C)

C

= min

C

{ ∫ |C|

0

}

g (|∇I (C(s))|) ds . (2.11)

|C| describes the Euclidean length of the curve C and the function g models an edge

detector. The edge strength has to be restricted to an interval g ∈ (0, 1]. Caselle et al.

used the edge function

g(|∇I|) =

for detecting the object boundaries.

1

1 + |∇(G σ ∗ I)| p , with p = 1 or 2 (2.12)

Another possibility for modelling g is using a

measurement optimized for natural images, proposed by Huang et al. in [27]:

g(|∇I|) = e −η|∇I|κ , e.g. with κ = 0.55 (2.13)

The general intention of the Snake model is to locate the curve at points with a

high edge strength and keep a certain smoothness in the curve. Caselles et al. proved

in [8] that these properties are still given when β → 0 in the energy equation (2.9).

The main advance of GACs are the profound mathematical framework that makes the

model very versatile for different applications. The user has to incorporate constraint

because C = 0 is always a minimizer for the GAC energy. This leads over to the major

handicap of the model. Due to its non-convex energy the minimization task will not

find a globally optimal solution but get stuck in local minimia and therefore the result

depends on the contour initialization.

2.3 Region-Based Segmentation

In the previous chapter the intention was to find region borders. The next logical step

is to directly detect the regions themselves. It is obvious that it is no problem to

reconstruct image borders out of existing regions and contrariwise. The main idea is to

find coherent regions that have something in common. The decision if a pixel belongs

to a certain region is based on a homogeneity criteria H(R i ) in respect to gray values,

texture measurements, color, etc which gives a measure of the similarity and/or spatial

proximity among pixels.


2.3. Region-Based Segmentation 15

A complete split up of an image R into disjoint regions R 1 , R 2 , . . . , R S defines the

idea of segmentation and can be formulated with the help of set theory:

S⋃

R = R i , ∀ i ≠ j | R i ∩ R j = 0 (2.14)

i=1

H(R i ) = TRUE, ∀ u = 1, 2, . . . , S (2.15)

H (R i ∪ R j ) = FALSE, ∀ i ≠ j (2.16)

Region-based segmentation methods are normally more robust on noisy data where

edges are difficult to detect. Also highly textured objects can be found with the use of

texture properties for the homogeneity criteria. The drawbacks are over- and undersegmented

results. In addition most region-based algorithms are not able to detect

objects that span over several disconnected regions.

2.3.1 Thresholding

Some segmentation problems can be solved with simple algorithms. An example is a

simple thresholding operation on gray levels to obtain regions with a certain intensity

value which depicts one of the simplest segmentation algorithms. The outcome of this

is a separation of foreground and background. Therefore no model based information is

used and no prior knowledge contributes to the final result. An example of a threshold

operation applied to two different images is shown in Figure 2.5. For Figure 2.5a the

foreground contains all the objects whereas the segmentation of the x-ray image in

Figure 2.5b fails for certain image regions.

The resulting image is normally a binary representation b that depends on the choice

of the threshold T :

b(x, y) = 1

b(x, y) = 0

∀ I(x, y) ≥ T

∀ I(x, y) < T

(2.17)

This method can be extended with multiple thresholds T 1 , . . . , R S to get a result that

is not binary but contains all segmented regions R 1 , . . . , R S . To optimize the threshold

values it is often a good idea to have a look at the image histogram. In the Figure 2.4

an image and its histogram is shown. There a specific threshold can be encountered to

segment the presented object from the background. A segmentation of this example is


16 Chapter 2. Related Work

done in Figure 2.5a.

(a) Image with homogeneous background

and several objects as foreground.

(b) Intensity histogram

Figure 2.4: Histogram of a grayscale image with a clear bound to do a

fore- and background segmentation.

(a) Threshold image of Figure 2.1a and the

corresponding segmentation result.

(b) Detail of a thresholding segmentation

applied to a X-ray image. The various gray

levels do not allow to get a reasonable segmentation

result out of a simple thresholding

operation.

Figure 2.5: Segmentation with image thresholding


2.3. Region-Based Segmentation 17

2.3.2 Region splitting and merging

2.3.2.1 Region Merging

Starting at the finest level where each pixel represents a single region, these regions

are merged as long as equation (2.15) is satisfied. The merging process connects two

adjacent regions that fulfill the same homogeneity criteria.

2.3.2.2 Region Splitting

Region splitting defines the opposite process of region merging. Therefore the image

represents the starting region which is split into single regions until the criteria of

equation (2.14), (2.15) and (2.16) are met.

Although the algorithms of merging and splitting seem to be very similar, they

do not result into the same segmentation result. There are cases where the splitting

process is stopped for homogeneous regions whereas the merging process would not find

this region because of a restriction at an earlier step.

2.3.2.3 Splitting and Merging

To take advantage of both, the splitting and merging process, the two approaches

are combined. The split-and-merge algorithm uses a pyramidal image structure. On

any level of the pyramid regions are split into four sub-regions when the homogeneity

criteria is not satisfied. If any of these four parts are coherent, they are merged together

again. If none of the regions can be split or merged any more, regions that have not the

same parent or are in different pyramid levels are taken into account. If those regions

are homogeneous they are merged together too. Small-sized regions can be evaluated

separately at the end and merged together if appropriate.

2.3.3 Mumford-Shah

Most of the time the presented basic methods, like for example the thresholding scheme,

are not applicable to real-world problems because of a too high complexity of the images.

Mumford and Shah proposed a model [39] to approximate an observed image u 0 with


18 Chapter 2. Related Work

a function u by minimizing the energy

{ } { ∫


1

min E MS (u, C) = min (u 0 − u) 2 dΩ + λ2

|∇u| 2 dΩ + νH d−1} . (2.18)

u,C

u,C 2 Ω

2 Ω\C

The approximation u is a piecewise constant and smooth function and λ defines the

scale where the smoothing is done. Starting with the last part of the equation, ν is a

tuning parameter for the Hausdorff measure H where d stands for the dimensionality.

For the 2D case H represents the length of the discontinuity set C. The first integral

represents the fidelity term using a L 2 -norm which ensures that u is similar to u 0 . More

details on the fidelity term is discussed in course of the variational restoration approach

of Rudin, Osher and Fatemi in Section 3.2.1. The second integral, the regularization

term, ensures the smoothness of the segmentation result but not across discontinuities

which enables the method to process open structures.

The Mumford-Shah segmentation model does not pick up textured objects directly

because they are not composed of one smooth region. Only objects featuring a homogeneous

region inside the boundary can be modelled with the proposed energy and the

segmentation for these will be valid. A benefit of the model is that a test image of two

objects can be segmented (Figure 2.6). Most standard active contour models will fail

with this class of images and return a single contour that enclose both objects. The

main drawback appears with textured objects because u 0 cannot be well approximated

by a piecewise smooth function u due to the occurring discontinuities.

Ambrosio and Tortorelli proposed an approximation of the Mumford-Shah functional

in [1]. They introduced a dual variable v which will represent the discontinuities

of the processed image. In Figure 2.6b the edge set v is shown for a synthetic image.

{ } { ∫ ∫ ( ∫

min E AT (u, C) = min ρ |∇v| 2 + α v 2 |∇u| 2 +

u,v

u,v

Ω

Ω

Ω

)

(v − 1) 2

+ β |u − u 0 | 2} .

4αρ

(2.19)

2.3.4 Chan-Vese

Many variations on the Mumford-Shah energy were proposed over the years. The one

we want to focus on is the approach of Chan and Vese [15, 58]. They took up the

challenge of solving the minimization problem of equation (2.18). The energy has to be

reformulated to be solvable in a mathematically correct way. The method enables to


2.3. Region-Based Segmentation 19

(a) Test image with two objects. (b) Edge set v.

Figure 2.6: The Ambrosio-Tortorelli approximation of the Mumford-Shah

functional is able to find both objects.

find objects where the boundaries are not necessarily defined by gradient. Chan-Vese

propose the following energy with the main idea on separating two different intensity

values c 1 and c 2 which splits the approximation of u 0 in two regions, defining u as

average of u 0 of the sub-regions in- and outside of C:

regularization terms

{

} { { }} {

min E CV (c 1 , c 2 , C) = min µ · H 1 (C) + ν · H 2 (c)

c 1 ,c 2 ,C

c 1 ,c 2 ,C



+λ 1 |u 0 − c 1 | 2 dΩ in + λ 2

Ω in

Ω out

|u 0 − c 2 | 2 dΩ out

} {{ }

fitting term

}

(2.20)

The set Ω in is defined as inside region of the contour C and Ω out therefore as outside

region: {Ω in } ∩ {Ω out } ∩ {C} = {Ω}. Using a variational formulation for the level set

representation as proposed in [15] the energy E CV (c 1 , c 2 , C) can be reformulated to



E (c 1 , c 2 , φ) = µ δ(φ) |∇φ| dΩ + ν H(φ) dΩ

∫Ω

Ω


+λ 1 |u 0 − c 1 | 2 H(φ) dΩ + λ 2 |u 0 − c 2 | 2 (1 − H(φ)) dΩ.

Ω

Ω

(2.21)

For further details on level set representations see section 2.4.1. The solution for u using

the Chan-Vese model as a particular case of the Mumford-Shah approach is proposed in


20 Chapter 2. Related Work

[15] and the constants c 1 and c 2 are interpreted as the average of u 0 over the resulting

sub-region:

u = c 1 H(φ) + c 2 (1 − H(φ)) (2.22)


Ω

c 1 (φ) =

u 0H(φ) dΩ

∫Ω H(φ) dΩ (2.23)


Ω

c 2 (φ) =

u 0 (1 − H(φ)) dΩ

∫Ω (1 − H(φ)) dΩ (2.24)

The segmentation model by Chan and Vese is often referred to as “Active Contours

without Edges” due to the name of their paper [15] and the corresponding, often reproduced

segmentation results on blurred images or objects without well-defined borders.

To show this effect of the Chan-Vese segmentation algorithm a synthetic image without

clear borders and edges is processed and the result is presented in Figure 2.7.

Figure 2.7: Segmentation result of a synthetic test image using the Chan-

Vese model.

2.4 Shape Prior Segmentation

In this section we want to review segmentation methods that identify structures of

known geometric shape. The shape is used to guide the segmentation to the desired

object. The form of the shape representation depends on the field of application. Many

approaches use statistical models to represent a shape model whereas others only use

a single spline or binary image to integrate prior information into the segmentation


2.4. Shape Prior Segmentation 21

model. Some of the reviewed methods use a level set approach to optimize the proposed

energies. Therefore we present a short introduction of level sets in the following.

2.4.1 Introduction to the Level Set Framework

Introduced by Stanley Osher and James A. Sethian in the year 1988 [44] level set methods

have been enhanced over the years and therefore many variations and extensions

to the algorithm exist. The idea is to evolve a closed curve Γ within a plane called zero

level set. Γ is called the interface and bounds an open region Ω. The idea of Osher

and Sethian is to define a higher dimensional embedding function φ that represents the

interface. Therefore the curve is defined at every point where the higher dimensional

level set function φ crosses the zero level set:

C = {φ (x) = 0} (2.25)

The level set properties depending on the interface position are represented in

equation (2.26) and Figure 2.8.

∀x ∈ Ω + : φ > 0

∀x ∈ Ω − : φ < 0

∀x ∈ ∂Ω = Γ : φ = 0

(2.26)

Figure 2.8: Defining level set properties

During the evolution of the contour, connectivity may change and undergo a topological

change (see Figure 2.9). A single level set initialization can handle a split-up


22 Chapter 2. Related Work

into multiple regions like in Figure 2.9b and 2.9c which would not be feasible with parametric

models like Snakes. The general evolution of the implicit function φ is given by

the partial differential equation (PDE)

∂φ

∂t

+ ∇φ · ∂C

∂t

= 0. (2.27)

This equation is often referred to as level set equation and models the motion of φ

where φ (C (t) , t) = 0.

The dot-product takes only the normal component of the

contour velocity into account. Therefore a force F describes this normal component

and leads to the PDE

∂φ

∂t

+ F |∇φ| = 0. (2.28)

(a) Initialization

(b)

(c)

(d) Final Result

Figure 2.9: Contour topology change during the evolution of the level set

function.


2.4. Shape Prior Segmentation 23

2.4.2 Leventon et al. approach

Leventon et al. devised a method in [33, 34] where a segmentation task is done with

respect to a shape model. Therefore a PCA is applied on the signed distance functions

(SDFs) of the contour model. SDFs are preferred over parametric models because SDFs

provide more tolerance against slight misalignment during the aligning step. First a

statistical model is trained over a set of curves representing the shape information.

The segmentation process itself is done by evolving a geodesic active contour using

local image features like image gradients and curvature (see level sets in Section 2.4.1).

In addition to evolving the level set and the image term an additional term that forms a

shape force is added. To estimate the globally optimal shape pose and position a shape

parameter α and a pose parameter p are introduced using a maximum a posteriori

(MAP) approach:

{

}

(α MAP , p MAP ) = argmax P (α, p | φ, ∇I)

α,p

(2.29)

The combination of the shape pose and parameter describes a specific shape C ∗ = (α, p).

Expanded by the Bayes’ rule this leads to:

P (α, p | φ, ∇(I)) =

=

P (φ, ∇(I) | α, p) P (α, p)

P (φ, ∇I)

P (φ | α, p) P (∇I | α, p, φ) P (α) P (p)

P (φ, ∇I)

(2.30)

P (φ | α, p): Probability of a certain evolution interface φ given a shape pose (α, p).

P (∇I | α, p, φ): This gradient term represents the probability of certain image gradients

to the contour. Aligning the contour along the image border P (∇I | α, p, φ)

is maximized. In [34] Leventon et al. show that moving along the normal of the

object border, the probability can be modelled as a Gaussian distribution.

P (α) P (p): These two terms define the maximum a posteriori estimators of shape and

pose. Therefore the priors are essential to estimate the final segmentation result.

Due to runtime issues not all probabilities are evaluated in every step. P (α) and

P (p) are only evaluated near the current level set result at every evolution step.

To maximize this a posteriori probability two independent optimization steps are

needed. The equation for evolving the surface φ with a shape prior C ∗ is presented in


24 Chapter 2. Related Work

[34, 35]:

φ (t + 1) = φ(t) + λ 1 (g (c + κ) |∇φ(t)| + ∇φ(t) · ∇g)

+ λ 2 (C ∗ (t) − φ(t))

(2.31)

λ 1 defines the update step size and λ 2 ∈ [0, 1] is a linear coefficient defining how much

to trust the maximum a posteriori estimate. The λ 1 weighted term represents the

“classical” geodesic active contour energy. To get shape knowledge into account the

λ 2 weighted part drives the shape of the evolved contour in direction of the estimated

prior. The final evolution equation is not a PDE since two separated, independent steps

evolve the final result.

2.4.3 Diffusion Snakes

In [23, 25] Cremers et al. propose a framework that integrates statistical shape knowledge

into the Mumford-Shah segmentation model [39, 40]. This modification of the

Mumford-Shah model allows to create an explicit parametrization of the contour. The

main advantage of representing segmentation and shape energy in one term is the possibility

of calculating the contour evolution and shape optimization in one task. All

previous mentioned approaches could only solve this with a two-step algorithm. In

[24] the combination of contour energy and shape optimization is introduced by Cremers

et al. using the equation

E(u, C ∗ ) = E MS (u, C ∗ ) + αE c (C ∗ ) (2.32)

which gives the possibility to control whether the energy favors contours that are similar

of the learnt shape contours E c or the Mumford-Shah segmentation with the parameter

α. E MS represents the fit of the current segmentation to the gray value information in

form of the Mumford-Shah model reviewed in Section 2.3.3. A PCA representation is

chosen for shape representation and the mean shape Cµ ∗ and the covariance matrix Σ

is determined with an average of all shapes Ci ∗ from the set of shapes {C1 ∗, C∗ 2 , . . . }

Cµ ∗ = mean [Ci ∗ ] (2.33)

(C


Σ = mean[

i − Cµ) ∗ (

C


i − Cµ

∗ ) ] t

(2.34)


2.4. Shape Prior Segmentation 25

The covariance matrix can be used to model a Gaussian probability distribution of

shapes P (C ∗ ) that can be used to define the energy as proposed in [24]:

(

P (C ∗ −

) ∝ e

2(C ∗ −Cµ) ∗ t Σ −1 (C ∗ −Cµ) ) ∗

( )

(2.35)

P (C ∗ −E(C

) ∝ e

)

(2.36)

E c (C ∗ ) = − log ( P (C ∗ ) ) + const. (2.37)

E c (C ∗ ) = 1 (

C ∗ − C ∗ ) t

µ Σ

−1 ( C ∗ − C ∗

2

µ)

(2.38)

In respect to the contour energy (2.38) the Mumford-Shah energy (2.18) can be adapted

to

E(u, C(C ∗ )) = E MS

(

u, C(C ∗ ) ) + α 1 2

(

C ∗ − Cµ

∗ ) t Σ

−1 ( C ∗ − Cµ)


(2.39)

For the stated equation it is assumed that the covariance matrix is of full rank so

that the inverse Σ −1 exists. If this is not the case the inverse can be substituted with

the pseudo inverse Σ ∗ defined in [24].

reproduced in Figure 2.10.

A segmentation example using this model is

In [22] Cremers et al. proposed a nonlinear alternative for the linear shape model

in Equation (2.38).

The linear shape prior does not perform accurate for training

data which is not appropriately aligned. The introduced nonlinear kernel PCA maps

the Gaussian density estimation into a kernel space.

Therefore the original data is

non-linearly transformed into feature space where the distribution is estimated by a

Gaussian density. Though the corresponding density estimate in the original space is

non-Gaussian. This method offers the possibility to model versatile forms of distributions.

The only drawback that arises is that the resulting energy is no longer convex.

Minimization with a gradient descent method will therefore end up in a local minima.

Cremers et al. proposed to minimize the energy with a two-part algorithm where the

image energy is optimized first and afterwards the best fitting shape is searched by

optimizing the shape energy.

2.4.4 Chen et al. approach

The main idea of Chen et al. [16–18] is to propagate an active contour by a velocity

that depends on image gradients and shape prior information. Leventon et al. use

mean and variance to describe the shape prior statistics whereas Chen et al. only use


26 Chapter 2. Related Work

(a) init (b) (c)

(d) (e) result (f) training

Figure 2.10: Results of contour optimization with respect to statistical

shape priors proposed by Cremers et al. in [23–25]. Figures 2.10b–2.10d

show the evolution of the contour. All figures are reprinted from [25]

the first geometric moment (mean). The shape prior is built out of the different shapes’

contours and provided as a mean shape. Unlike the approach of Leventon et al. the

proposed model of Chen et al. proofs the mathematical existence of a solution to the

energy minimization problem [17, 18]. The proposed energy lets the contour stick to

high image gradients and tries to form a shape related to the given prior:

E(C, µ, R, T ) =

∫ 1

0

( ( )

g |∇I| (C(p)) + λ ) )

2 d2( ∣

µRC(p) + T C ′ (p) ∣ dp (2.40)

In equation (2.40) a curve C = µRC + T with the rigid transformation parameters

scale = µ, rotation = R and translation = T is searched. The result of C should be be

closely related to the shape Prior C ∗ . The gradient information is realized with the edge

detector g and therefore the first term measures the amount of high gradient in respect

to the contour C. The second term is responsible for the closeness to the shape prior.

d 2 represents the squared distance of a point P (x, y) ∈ C from the prior C ∗ and is also


2.4. Shape Prior Segmentation 27

referred to as d 2 (x, y) = d 2 (C ∗ , (x, y)) in the literature. The energy in equation (2.40)

is minimized using a gradient descent scheme. In Figure 2.11 a segmentation of an

epicardium in an ultrasound image is shown.

(a) training and average shape (b) initialization (c) final contour

Figure 2.11: Segmentation of an epicardium using the model presented

in [17, 18] by Chen et al. Figure 2.11a shows a cluster of 79 curves and

the mean shape C ∗ (dotted contour). The initialization in Figure 2.11b

leads to a contour (solid curve) in 2.11c. The dotted line in the final

result represents the epicardium segmented by an expert. All figures are

reproduced from [17].

2.4.5 Global-to-Local Shape Registration

A different approach is traced by Paragios, Rousson and Ramesh in [45, 46]. They

handle the segmentation problem as a global-to-local registration task. The shapes are

modelled using an Euclidean distance map in the form of a level set representation.

The global registration is done via a rigid transformation whereas the local changes

are handled using a deformation field. For the global rigid registration Paragios et al.

propose the following energy term to register shape source S and shape target D with

the registration parameter A = (µ, R, T ) (scale = µ, rotation = R and translation = T ):

∫∫

E(A) =

Ω

( ) (µφ ) 2

D (x, y) − φ S A (x, y) dΩ (2.41)

The shape source and target (S, T ) are represented as two signed distance functions φ S

and φ D . Parts of the shape should be registered in a non-rigid way after a coarse alignment.

Therefore a local deformation field ( U(x, y), V (x, y) ) is introduced. The resulting


28 Chapter 2. Related Work

equation was proposed in [46] and restates equation (2.41) with some enhancements.

∫∫

E(A, (U, V )) = α

∫∫

+(1 − α)

Global model-based registration

{ }} {

Ω

Ω

N δ

(

φ D , φ S

)(µφ D − φ S (A)) 2dΩ

N δ

(

φ D , φ S

)(µφ D − φ S (A + (U, V ))) 2dΩ

} {{ }

Pixel-wise local deformation

(2.42)

The parameter α is introduced to balance the setting between global motion and local

deformation. The binary function N δ gets one if min { |φ S | , |φ D | } ≤ δ and zero otherwise.

It takes pixels into account that are in the range of distance δ away from the

shape. An example of this global to local registration task is shown in Figure 2.12.

Figure 2.12: This example, reproduced from [46], shows the approach

on first applying a global rigid transformation to a given shape source S

iteratively converging to a shape target D. When the rigid transformation

is done fine adjustments can be made by using the local deformation field in

a non-rigid way to fit the shape representation to the desired segmentation

result.

Thereupon Rousson, Paragios and Deriche proposed a method in [49, 50] where the

level set framework is used to evolve a segmentation that is guided by an active shape

model proposed by Cootes et al. in [20, 21]. Applying a PCA on the level set functions

allows a representation of shape variations with complex topologies. An example of

tracking a cardiac cycle is presented in Figure 2.13. Prior information is modelled using

a level set formulation like it is done in equation (2.41). If the representation of the

shape source belongs to the class of training shapes it can be derived from the principal

mode of the shape representation model (see equation (2.43)). As a consequence the

energy can be reformulated as presented in [50] where the model was applied to 3D


2.4. Shape Prior Segmentation 29

MRI data:

φ S = φ M +


E(φ, A, λ) =

Ω

m∑

λ j U j (2.43)

j=1

(

δ ɛ (φ) µφ −

(

φ M (A) +

m∑

j=1

) ) 2

λ j U j (A) dΩ (2.44)

In equation (2.43) and (2.44) mode weights λ = (λ 1 , . . . , λ m ) and the eigenvectors

(modes of variations) U j = (U 1 , . . . , U m ) are used to model the shape variation with

the use of a PCA. The proposed energy has to be minimized in two separate steps.

One for optimizing the level set function φ and the other for the rigid transformation

parameters A. The segmentation and registration task can be combined by using the

variational model proposed in [49, 50]. The result is a system of linear equations. By

solving these the shape weights λ can be estimated.

Figure 2.13: Cardiac tracking using the model of Rossini et al.. The first

row shows the curve evolution (red) and the projection to the model space

(yellow) in the first frame. The segmentation results of a cardiac cycle is

presented in the second row. The figures are reproduced from [49].

2.4.6 Shape Prior Driven Mumford-Shah Functional

In [6] Bresson et al. propose a method to combine active contour segmentation, statistically

learned parametric shape prior and the Mumford-Shah energy into a variational

level set framework which can be seen as an extension of the model of Leventon et al.

presented in Section 2.4.2. The energy is divided into three parts combining gradient


30 Chapter 2. Related Work

information with a shape model and with global and local image information:

E = β s · E shape + β b · E boundary + β r · E region (2.45)

with: E shape =

E boundary =

E region =

∮ 1

0

∮ 1

0

ˆφ 2( x pca , h xT (C(q))) ∣∣

C ′ (q) ∣ ∣ dq,

(

∣∣

g |∇u 0 (C(q))|)

C ′ (q) ∣ dq,

∫Ω in

(

|u 0 − u in | 2 + µ |∇u in | 2) dΩ

(

+ |u 0 − u out |

∫Ω 2 + µ |∇u out | 2) dΩ.

out

(2.46)

(a) Chan-Vese Model

(b) Bresson et al.

Figure 2.14: In comparison of the Chan-Vese segmentation model to the

method of Bresson et al. shows that using a shape prior the left ventricle is

fitted correctly. Without a guidance to the prior the segmentation includes

the occlusion. The figure is reproduced from [6]

The shape term in Equation (2.46) guides the active contour to fit the shape model

with the shape function ˆφ, the vector of PCA eigencoefficients x pca (shape vector), h xT

considers the geometric transformation of the shape model and C models the active

contour. The detection of object boundaries from image gradients is implemented using

the boundary term. The last two integrals describe the global alignment of shape prior


2.5. Discussion 31

and active contour in the sense of the Mumford-Shah model (Section 2.3.3) that drives

the segmentation to a homogeneous intensity region. Ω in and Ω out are the inside and

outside region delimited by the zero level set. The emphasis of shape, boundary and

region term is implemented with the constants β s , β b and β r . A comparison of a Chan-

Vese segmentation of an occluded left ventricle in a brain MRI image with Bressons

approach is done in Figure 2.14.

2.5 Discussion

In this chapter we presented various methods of image segmentation, reaching from

edge based methods to region based methods to more high-level approaches that incorporate

prior knowledge in form of shape models. The drawbacks on simple edge and

region based segmentation have already been shown in the introduction in Section 1.5.

Therefore prior knowledge was introduced to make these methods more robust. A

disadvantage of the presented shape prior segmentation models is the lack of efficient

methods to yield the desired results. In addition all approaches have problems to

overcome local minima and are not solvable in a globally optimal way.


Chapter 3

Geodesic Active Contour with

L 1 Shape Prior

Contents

3.1 Geodesic Active Contours with Shape Information . . . . . 33

3.2 Review of Total Variation Models . . . . . . . . . . . . . . . 34

3.3 Geometric Similarity . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 TV-L 1 Shape Prior Segmentation . . . . . . . . . . . . . . . . 42

3.5 Solutions to Variational Models . . . . . . . . . . . . . . . . . 43

3.6 Solving the Shape Prior Segmentation Model . . . . . . . . 46

3.7 Iterative Scheme to Solve the Segmentation Model . . . . . 52

3.1 Geodesic Active Contours with Shape Information

Our goal is to incorporate shape information in a segmentation model. For the segmentation

energy we decided to use a geodesic active contour (GAC) model. Adding a

additional function which describes a shape force to the GAC model will result in an

energy of the form

}

min

{E sp

{ ∫ |C|

= min g(C)

}

0

{{ }

GAC

33



}

P (C, S) . (3.1)

} {{ }

Shape Prior


34 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

The first term of equation (3.1) models a GAC whereas the second integral should drive

the segmentation towards the desired shape. The main drawback of the GAC model

are local minima within the active contour energy which makes a good initialization

essential for the desired segmentation. To find a globally optimal solution is one of the

main demands on our segmentation model. The other two are real-time performance

and the possibility for user interaction as defined in Section 1.6. Approaches based on

the calculus of variation have had great success in the last years and it is shown that

variational approaches show good parallelization attributes. We decided to have a closer

look on variational approaches and realize a GPU-based implementation to achieve the

demand of real-time ability. Nikolova et al. review methods to obtain global minimizers

for the classical computer vision problems of denoising and segmentation in [41].

In the following we give a short overview on classical and important variational

models. In Section 3.2.1 the well known model of Rudin, Osher and Fatemi (ROF

Model) and a modification with a L 1 data fidelity term in Section 3.2.2 is reviewed

emphasizing the application of noise removal. To return to the task of segmentation

the weighted total variation (TV) and its connection to GACs is presented. The segmentation

model proposed by Mumford and Shah and a variation by Chan and Vese

have already been presented in Section 2.3.3 and 2.3.4.

3.2 Review of Total Variation Models

Although continuous calculation approaches using variational models and partial differential

equations (PDEs) have been available much longer, the first approach of image

processing that drops the idea of discrete image representation was the segmentation

model of Mumford and Shah [39]. For an image restoration task the method of edge

preserving image denoising from Rudin, Osher and Fatemi (ROF Model), proposed in

the year 1992 in [51], was one of the earliest variational methods. The main advantages

of combining variational models and PDEs is that PDEs are well-defined mathematical

formulations. In addition variational methods are defined over a continuous space and

therefore problems that were previously transformed from continuous domains onto

discrete grids can be solved directly in their continuous world. Over the last years

variational image restoration became a popular and successful field and the methods

got extended to solve in-painting, segmentation and other vision problems. A selection

of seminal methods is given by Chan et al. in [12]. The main problem of variational


3.2. Review of Total Variation Models 35

models have been runtime issues. The optimization tasks are runtime intensive algorithms.

However the algorithms can be well parallelized and with the use of GPGPU

programming one can overcome the runtime drawback for the approaches.

3.2.1 Rudin Osher and Fatemi - Noise Removal

As stated before, Rudin Osher and Fatemi were the first who applied a variational

method to an image restoration task. In [51] a denoising model with the ability to

preserve edges was presented. The main idea is to reconstruct an intensity function

u(x, y) that has been corrupted with noise from an observed function u 0 (x, y) which

can e.g. be an image corrupted with Gaussian noise or a signal that is disturbed by

various means. Normally we are not aware of the noise so we get an ill-posed problem

as described in the introduction in Section 1.4:

u 0 (x, y) = u(x, y) + n(x, y) e.g. for additive noise n (3.2)

For a continuous representation the problem is restated as a least squares representation

to find an approximation of u.

min |u 0 − Au|

u

∫Ω

2 dx, Ω . . . Image domain (3.3)

The function A models the occurring distortions. In the field of image processing u 0

represents an observed image containing texture and noise. Texture can be described as

a repeating and meaningful structure whereas noise is characterized as an uncorrelated

random pattern. In classic literature the rest of the image, removing texture and

noise, but still containing hues and sharp edges, is called cartoon. In equation (3.2) u

represents the cartoon and n texture and noise.

with constraints:

min |∇u| dΩ

u

∫Ω

∫ ∫

u dΩ = u 0 dΩ

Ω

} {{

Ω

}

noise has zero means

and


(u − u 0 ) 2 dΩ = σ 2

Ω

} {{ }

and standard deviation σ

(3.4)

The method of ROF denoising was taken up by Chambolle et al. and restated as

a continuous energy minimization in [11]. With the modification that the constraint


Ω (u − u 0) 2 dΩ = σ 2 is replaced by ∫ Ω (u − u 0) 2 dΩ σ 2 this leads to the uncon-


36 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

strained energy minimization

{ }

min E ROF = min

u

u

{ ∫ Ω

|∇u| dΩ + 1 ∫

( ) } 2 u − u0 dΩ , (3.5)

2λ Ω

where λ > 0 is a Lagrange multiplier. The first part, the essential novelty of the ROF

model, ∫ Ω

|∇u| dΩ is called the regularization term that measures the variation of u

without penalizing discontinuities. The second integral ∫ ) 2

Ω(

u − u0 dΩ is named data

fidelity term which forces u to be close to u 0 .

Because of the L 2 -norm in the data

fidelity term it is often restated as TV-L 2 model. A denoising example of this model

is presented in Figure 3.1. The proposed ROF model comes with certain limitations.

The main issue is the loss of contrast that can be observed on even noise free images

(see Figure 3.2). In [53] Strong and Chan did a more detailed analysis of this problem

in the field of signal processing.

(a) Input image corrupted with Gaussian

noise.

(b) Filtered image with λ = 0.06.

Figure 3.1: Denoising example with the help of the ROF energy minimization

(TV-L 2 )

3.2.2 The L 1 Data Fidelity Term

Due to the known drawbacks of the ROF model, research was pushed to investigate

more versatile methods. Aujol et al. give a good overview in [3] on different methods

which were derived from the ROF approach using different norms calculating the fidelity

term. We concentrate on reviewing the TV-L 1 approach presented by Chan et al. [13]


3.2. Review of Total Variation Models 37

(a) unfiltered image (b) λ = 0.3

(c) λ = 1.0 (d) λ = 5.0

Figure 3.2: TV-L2 filter applied to a synthetic test image with line thickness

of 0.5, 1, 2 and 4 pt. The two gray gradients are 10 and 20 pt thick

whereas the circles have a diameter of 20, 40, 60, 80 and 100 pt. The

drawback of contrast loss is obvious for increasing filter strength.

and Yin et al. [59] which lead to the energy

{ }

min E

u

TV-L 1

= min

u

{ ∫ ∫


∣∇u∣ dΩ + λ

Ω


∣ ∣∣

∣u − u 0 dΩ

}. (3.6)

In Figure 3.3 the TV-L 1 model is applied to a test image. Structures of a certain size

(depending on λ) are removed and the contrast persists. The TV-L 1 approach is wellsuited

for removing impulse noise like in applications of in-painting or to select features

of a certain scale done by Yin et al. in [59]. The main drawback of the TV-L 1 energy

Ω


38 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

in equation (3.6) in comparison to the ROF model is that it is not strictly convex and

therefore more than one global minimum exist which makes the optimization task more

difficult.

(a) λ = 0.55 (b) λ = 0.2 (c) λ = 0.06

Figure 3.3: TV-L1 filter applied to a resolution test chart. Contrast

persists and structures fade away according to size.

3.2.3 Weighted Total Variation

In [4, 5] Bresson et al. proposed a modification of the ROF-model:

{ ∫ ∫

min g(x) |∇u| dΩ + λ

u

Ω

∣ }

∣ ∣∣

∣u − u 0 dΩ

Ω

(3.7)

The modified data fidelity term has already been discussed in the previous section

of the L 1 approach. The main novelty of Bressons approach is the introduction of the

weight g(x) within the regularization term. This so-called weighted TV-norm preserves

the geometry of the original features in comparison to the classical L 1 - or L 2 -approach.

The main drawback for this model is the lack of global minimizers when optimizing its

energy. Therefore in [32] Leung and Osher and later Bresson et al. in [5] presented a

method to find the global minima for the presented model: Using an edge detector for

the weighting function g(x) and if u is a characteristic function 1 C that is allowed to

vary smoothly in the interval [0, 1] the weighted Total Variation describes a Geodesic

Active Contour model and minimizes the same energy as in equation (2.11). With the

mentioned constraints the weighted TV-norm (see equation (3.8)) becomes a convex


3.3. Geometric Similarity 39

function and a globally optimal solution can be calculated.


T V g (u) =

Ω

g(x) |∇u| dΩ (3.8)

In addition Leung and Osher presented the flexibility of the energy in equation (3.7).

For a λ(x) = 0 the model can be used for Total Variation inpainting to recover destructed

image regions. With a λ(x) = λ 0 the energy can be used for a denoising task and

for λ(x) → ∞ u remains unchanged.

To come back again to the desired shape prior segmentation model the weight g(x)

is modelled with an edge function that represents the edge strength for each pixel as

we want to stick to strong edges with the help of the GAC energy.

We decided to

use the measure for natural images presented by Huang and Mumford in [27] which

has already been presented in the course of the GAC-model in Section 2.2.5. With an

characteristic function u that is allowed to vary smoothly in the interval [0, 1], the first

integral of equation (3.1) can be modelled with the help of the weighted TV-norm:

}

min

{E sp

{ ∫

= min

Ω

g(x) |∇u| dΩ

} {{ }

GAC



}

P (C, S) . (3.9)

} {{ }

Shape Prior

3.3 Geometric Similarity

Similarity defines the strength of relationship between two objects. What “similar”

means is application dependent and for our purpose only the shape is relevant and the

appearance can be left out. Two geometrical objects can be denoted as similar if one

is congruent to a rigid transformation of the other. There are different possibilities to

measure the similarity of two objects with various distances. A selection of methods to

describe similarities is given in Section 3.3.1. In Section 3.3.2 we define a L 1 measure

that can be incorporated into the energy Equation (3.1).

3.3.1 Distance Measures

Hamming Distance: The Hamming distance measures the area of symmetric difference

between two polygons. The area that is overlapping is taken into account.

Therefore the distance will be zero when the two objects are identical and properly


40 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

aligned. If one of the two has a slight variation at the boundary, due to noise or simply

a comparison of two different objects, the Hamming distance will be small. For a

complete misalignment the result is the sum of the two polygons areas.

Hausdorff Distance This measurement calculates the longest distance from one

point to another one in the second subset. The Hausdorff distance is not symmetrical.

Often an image is pre-processed with an edge detector to process the resulting binary

image. The difference of the Hamming and Hausdorff distance can be shown well with

a single outlier. If a shape has a single but grave outlier the Hausdorff distance will

change significantly but won’t have a large effect on the Hamming distance. In return

the Hamming distance will increase for a little change of the whole boundary (e.g.

shrinking of the complete shape) and have little effect on the Hausdorff measure.

Comparing Skeletons A completely different approach to measure shape similarity

extracts a skeleton from each object. With the use of thinning a tree-like structure

can be developed and the shape comparison is reduced to process two skeletons. To

compare two graphs the topology of the tree and the length of the edges can be used.

Manifold Clustering of Shapes A manifold is an abstract mathematical space

which allows to express more complicated structures in terms of properties of simpler

spaces. It is for example possible to describe a set of shapes in an infinite dimensional

vector space (manifold). Therefore the distance between two shapes can be calculated

and the resulting minimal distance is a measure for the similarity of the examined

shapes.

3.3.2 L 1 Shape Prior

To incorporate shape information in the segmentation model we have to define a distance

for the shape towards the current GAC segmentation in form of an energy equation.

We decided to describe the shape measurement as symmetric difference of regions

which can be formulated as L 1 -distance and therefore suits well for integrating into a

variational approach:


E sim = |u(x, y) − s| (3.10)

Ω


3.3. Geometric Similarity 41

For a two-dimensional image example the energy of equation (3.10) defines how well a

specific shape s fits to a function u(x, y) at every pixel (x, y). For two identical functions

s and u the resulting energy is zero. For a good alignment the similarity measure

remains low E sim → 0, whereas for misaligned shapes the energy will increase. In the

energy plots of Figure 3.4 the energy according to different alignments are evaluated

at each pixel. Therefore the shape s and the underlying function u(x) are represented

by the same binary image of a geometric shape.

(a) E = 15 (b) E = 1188 (c) E = 1490 (d) E = 1600 (e) E = 1847

Figure 3.4: Shape similarity measure of different alignments. For the

image u and shape representation s the same binary image is used. In the

first row the energy is evaluated for each single pixel. In the second row

the contours of the alignment are marked in red for u and in green for the

shape prior s for better illustration. In each column the resulting energy

of equation (3.10) is given which can be interpreted as a measure for the

alignment.

3.3.3 Rigid Shape Transformation φ ◦ s

The introduced L 1 shape prior gives a measure of the current alignment of a fixed

geometric shape. For the final segmentation method we want to have a possibility

to optimize the shape position. This optimization task is equivalent with searching a

global minimum of the proposed L 1 shape distance. To facilitate the modification of

the shape geometry during the minimization task, a rigid transformation φ = {R, t, s}

with the transformation parameters R for rotation, t for translation and s for the scale


42 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

is introduced into the energy equation (3.10):


E sim = |u(x, y) − φ ◦ s| (3.11)

Ω

We will offer two possibilities to optimize the segmentation result with the help of the

shape position in our applications. First the user will be able to modify the alignment

to get different segmentation results and choose a appropriate result. Secondly an

automatic position optimization should provide the optimal alignment of the shape s

towards some function u.

3.4 TV-L 1 Shape Prior Segmentation

Starting with the proposal on combining a GAC energy with a shape model,

equation (3.1) can be restated with the weighted TV-norm for the GAC part

(see equation (3.8)) and a L 1

segmentation energy:

shape prior (see equation (3.11)) to a variational

}

min

{E sp = min

u∈[0,1],φ

u∈[0,1],φ{ ∫ ∫

}

g(x) |∇u| dΩ + λ |u − φ ◦ s| dΩ

Ω

Ω

For the edge weight g(x) again the measure for natural images g(|∇I|) = e −η|∇I|κ

(3.12)

used. λ represents a parameter for the balance of segmentation energy towards shape

force. For a low λ the result of the GAC will be preferred, whereas for increasing λ the

shape prior will be taken into account. In Figure 3.5 the effect of the parameter λ is

shown.

is

(a) λ = 0.05 (b) λ = 0.075 (c) λ = 0.1 (d) λ = 0.125 (e) λ = 0.15

Figure 3.5: Evaluation of different λ settings. The results show that the

GAC is attracted to edges which are shown as input image. The shape information

is presented as green and the segmentation result as red contour.


3.5. Solutions to Variational Models 43

The introduced variational model of equation (3.12) can be solved globally optimal

for a specific shape position. To integrate the variability of the shape position a rigid

transformation can be applied to the shape prior which offers the possibility to find

a locally optimal alignment for the prior in respect to the given edge image. This

allows to segment objects within images where the GAC would fail due to weak edges,

occlusion or noise. To optimize the proposed energy model we first have a look at

solving variational models in general in Section 3.5.

3.5 Solutions to Variational Models

In this section the presented solutions are only an extract from existing methods to

solve total variation models. The main emphasis of the reviewed approaches lies on

solving the Euler-Lagrange equation. The main issue on solving TV models is the

L 1 norm that is non-differentiable at zero. This problem is stated in more detail in

Section 3.5.2 by means of the ROF model.

3.5.1 Calculus of Variation

Calculus of variation is a mathematical field that finds stationary values for given

functions. Most of the time this describes a minimum or maximum of the function.

Ideally the extrema appears at locations where the derivative vanishes which results in

solving the Euler-Lagrange equation. If an integral of the form


I =

f ( t, y, y ′) dt,

y ′ = dy

dt

(3.13)

is given, stationary values of I are possible if the Euler-Lagrange equation (3.14) is

fulfilled.

∂f

∂y − d ( ) ∂f !

dt ∂y ′ = 0 (3.14)

3.5.2 Explicit Solution of the ROF Model

Rudin, Osher and Fatemi presented a method on solving the proposed energy (3.5) and

its resulting Euler-Lagrange equation (3.15) with an explicit time marching method


44 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

(equation (3.16)) that is iterated till a steady state is reached.

( ) ∇u

−∇ · + 1 |∇u| λ (u − u 0) + ∂u

∂n = 0 (3.15)

[ ( ) ∇u

u n+1 = u n n

− dt ∇ ·

|∇u n + 1 ]

| λ (un − u 0 )

(3.16)

As stated before, the Euler-Lagrange equation √ is not defined at ∇u = 0. Therefore the

denominator |∇u| is replaced by |∇u| ɛ

= |∇u| 2 + ɛ which ensures that the term will

not become zero. The major drawback of this approximation is that convergence is

slow when |∇u| is small and if ɛ is large the edges get blurred. Because of the highly

non-linear fraction ∇u

|∇u|

a close form solution is very unlikely.

3.5.3 Dual Formulation

To overcome the problem of a degeneration in the case |∇u| → 0 Chan, Golub and Mulet

introduced a dual formulation for solving the total variation minimization problem in

[14]. Chambolle applied a similar method to remove singularities in [9, 10]. The main

intention is to remove the singularity by replacing the term ∇u

|∇u|

in the Euler-Lagrange

equation (3.15) with the dual variable p. The minimization task can be formulated

with the following two equations:

p = ∇u

|∇u|

→ p |∇u| − ∇u = 0 (3.17)

−∇ · p + 1 λ (u − u 0) = 0 (3.18)

Rearranging the equation (3.18) with respect to u and substituting the result into

equation (3.17) yields to the following equations:

u = u 0 + λ∇ · p (3.19)

p |∇ (u 0 + λ∇ · p)| − ∇ (u 0 + λ∇ · p) = 0 (3.20)


3.5. Solutions to Variational Models 45

Applying the equation (3.20) to a gradient descent algorithm leads to an iterative

method solving the dual variable p. As timestep dt = τ /λ is chosen.

p n+1 = p n − τ [ ∣∣∣∇ (

λ∇ · p n ) ∣ + u 0 ∣ · p n+1 − ∇ ( λ∇ · p n ) ]

+ u 0

λ

p n + τ (∇ ( λ∇ · p n ) )

+ u 0

(3.21)

p n+1 = λ

1 + τ ∣

∣∇ ( ) ∣

λ∇ · p

λ

n + u 0 ∣

Finally with solving equation (3.19) one gets the final solution of u. Chambolle comes

to the same algorithm in [9] with a slightly different derivation. In [10] he shows that

a convergence of the gradient descent algorithm can be guaranteed for τ ≤ 1 /D for a

D-dimensional problem.

3.5.4 Solving the TV-L1 Model

As mentioned in the second half of Section 3.2.1 the TV-L 1 model (3.6) is not strictly

convex. The second L 1 -norm in the data fidelity term prohibits a closed form solution

like the dual approach for the ROF model. As a first consequence Chan and Esedoglu

applied an additional replacement in [13] as for the explicit solution of the ROF model in

Section 3.5.2. The Euler-Lagrange equation for solving the TV-L 1 model √ holds another

L 1 norm |u − u 0 | in a denominator that is replaced with |u − u 0 | δ

= |u − u 0 | 2 + δ. For

the TV-norm |∇u| the replacement from Section 3.5.2 is retained with the mentioned

drawbacks:

( ∇u

∇ ·

|∇u|

∇ ·

)

+ λ u − u 0

|u − u 0 | = 0 (3.22)

( ∇u

|∇u| ɛ

)

+ λ u − u 0

|u − u 0 | δ

= 0 (3.23)

Due to the slow convergence of the iterative model Aujol et al. [3] approximated the

TV-L 1 energy defined in equation (3.6) with the convex formulation

{ }

min E

u,v TV-L 1

= min

u,v

solving as TV-L 2 model

{ }} {

{ ∫


∣∇u∣ dΩ + 1 ∫ ( ) 2 ∫


u − v dΩ + λ

Ω

Ω

} {{ }

thresholding scheme

Ω

∣ }

∣ ∣∣

∣v − u 0 dΩ . (3.24)


46 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

As illustrated in equation (3.24) the energy minimization regarding to u and v is done

in two steps:

1. Minimization in terms of u with a fixed v:

{ ∫

min |∇u| dΩ + 1 ∫

( ) } 2 u − v dΩ

u

Ω

2θ Ω

2. Minimization in terms of v with a fixed u:

{ ∫ 1

( ) ∫

2

min u − v dΩ + λ

v 2θ Ω

∣ }

∣ ∣∣

∣v − u 0 dΩ

Ω

(3.25)

(3.26)

The two steps are iterated consecutively till convergence. For the task in step 1

the Chambolle algorithm is reviewed in Section 3.5.3.

The second task is a pointwise

convex minimization problem.

equation (3.26) is given in equation (3.27).

The outcome of this is the soft thresholding

scheme in equation (3.28).

The resulting Euler-Lagrange formulation for

1

θ (u − v) + λ · sgn (v − u 0) → 1 θ (u − v) + λ · v − u 0

|v − u 0 |

!

= 0 (3.27)


u − λθ | v − u 0 < 0

⎪⎨

v = u + λθ | v − u 0 > 0

⎪⎩

u 0 | v − u 0 = 0

(3.28)

3.6 Solving the Shape Prior Segmentation Model

The following Euler-Lagrange equation depicts an explicit solution to the proposed

segmentation model.

(

∂E sp

∂u = ∇ · g ∇u )

+ λ u − φ ◦ s

|∇u| |u − φ ◦ s|

The methods reviewed in Section 3.5 for solving the ROF- and L 1

!

= 0 (3.29)

model can be

used to optimize the proposed equation (3.12), respectively solve the Euler-Lagrange

equation (3.29). However we want to avoid approximations of the total variation with

|∇u| ɛ

and the second L 1 -norm with |u − φ ◦ s| δ

as in Section 3.5.2, solutions based on a


3.6. Solving the Shape Prior Segmentation Model 47

dual approach come to the fore. Introducing a second variable v leads to a convex, solvable

formulation min u,v,φ {E sp }. This minimization task is similar to the optimization

of min u,v {E TV-L 1} from Section 3.5.4.

}

min

{E sp = min

u,v,φ

u,v,φ{ ∫ Ω


g∣∇u∣ dΩ + 1 ∫


Ω

( ) ∫

2

u − v dΩ + λ


∣v − φ ◦ s∣ dΩ

Ω

}

(3.30)

The minimization in respect of u and v can be split up into two separate optimization

steps:

1. Minimization in terms of u with a fixed v and φ:

{ ∫

min g |∇u| dΩ + 1 ∫

( ) } 2 u − v dΩ

u

Ω 2θ Ω

(3.31)

2. Minimization in terms of v with a fixed u and φ:

{ ∫ 1

( ) ∫

2

min u − v dΩ + λ

v 2θ Ω


∣v − φ ◦ s∣ dΩ

Ω

}

(3.32)

3. Optimize rigid transformation parameters φ with a fixed u and v:

{ ∫ }


min ∣v − φ ◦ s∣

φ Ω

(3.33)

4. Iterate until convergence.

3.6.1 Minimize u for fixed v and φ – Projected Gradient Descent

The optimization problem (3.31) resembles the ROF approach. The resulting Euler-

Lagrange equation (3.34) differs from equation (3.15) by the weight function g.

(

− ∇ · g ∇u )

+ 1 (u − v) = 0 (3.34)

|∇u| θ

To avoid an approximation with |∇u| ɛ

a duality based algorithm like is applied. To

present an alternative to the algorithm of Chambolle (see Section 3.5.3) the projected

gradient descent approach is introduced in the following. Therefore the TV-norm and


48 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

as a consequent the weighted TV-norm is replaced with a dual variable p:

{

}

|∇u| = max p · ∇u : ||p|| ≤ 1

{

}

g |∇u| = max p · ∇u : ||p|| ≤ g

(3.35)

(3.36)

The projection of p can be seen even more clearly in the dual formulation of Chambolle

(see Section 3.5.3) where p would be directly defined with the substitution in the Euler-

Lagrange equation (3.29) which results in p = g ∇u

projection of p to a maximal length of g.

|∇u| .

This can be interpreted as a

Inserting the relation (3.36) into the minimization problem (3.31) leads again to an

optimization task towards two variables:

min

u

max

||p||≤g{ ∫ p · ∇u dΩ + 1 ∫

Ω 2θ

Ω

(

u − v

) 2 dΩ

}

. (3.37)

For further calculations the first integral ∫ Ω

p·∇u dΩ can be replaced by the divergence

theorem (sometimes related to as Gauss’ theorem in the literature):


Ω


p · ∇u dΩ = − ∇p · u dΩ (3.38)

Ω

Because u is a convex function the minimization and maximization relation can be

exchanged. This allows a minimization using the first derivative with respect to u which

leads to a combined maximization task according to p and a minimization due to u.

Note that the divergence theorem is again applied to the first integral in equation (3.42)


−v∇p =


∇v · p, which leads to a maximization problem stated in equation (3.43).

{ ∫

max min − u · ∇p dΩ + 1 ∫

( ) } 2 u − v dΩ

||p||≤g u

Ω 2θ Ω

(3.39)

∂E

∂u = −∇p + 1 θ (u − v) = ! 0 → u = v + θ∇p (3.40)

{ ∫

max − ∇p · (v + θ∇p) dΩ + 1 ∫

( ) } 2 v + θ∇p − v dΩ (3.41)

||p||≤g Ω 2θ Ω

{ ∫


max − v · ∇p dΩ − θ (∇p) 2 dΩ + 1 ∫

}

(θ∇p) 2 dΩ (3.42)

||p||≤g Ω

Ω

2θ Ω

max

||p||≤g{ ∫ ∇v · p dΩ − θ ∫ }

(∇p) 2 dΩ

(3.43)

Ω 2

Ω


3.6. Solving the Shape Prior Segmentation Model 49

Next, the problem is converted into a minimization task by simply inverting

the prefix from the maximization task.

The resulting optimization problem in

form of a dual formulation can be solved in the continuous domain using the

Euler-Lagrange equation (3.45).

{ ∫

min −

||p||≤g


}

(∇p) 2 dΩ

(3.44)

∇v · p dΩ + θ

Ω 2 Ω

∂E

∂p = −∇ (v + θ · ∇p) = ! 0 , ||p|| ≤ g (3.45)

To consider the constraint of projecting p to a length of g, the minimization task

has to be split up into two steps. The name of the method comes from the iterative

scheme where first a gradient descent method is used to get a temporary dual variable

˜p and with a trailed re-projection p is restricted to the length of g.

˜p n+1 = p n + τ θ ∇ (u 0 + θ∇p n ) (3.46)

p n+1 =

max

˜p n+1

{1, ||˜p||n+1

g

} (3.47)

3.6.2 Minimize v for fixed u and φ

For the second minimization problem (3.32) a thresholding scheme can be derived from

the corresponding Euler-Lagrange equation (3.32):

1

θ (u − v) + λ sgn (v − φ ◦ s) ! = 0 (3.48)

Three different cases can be distinguished for the direct solution:


u − λθ | v − φ ◦ s < 0

⎪⎨

v = u + λθ | v − φ ◦ s > 0

⎪⎩

φ ◦ s | v − φ ◦ s = 0

(3.49)


50 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

3.6.3 Optimize the Rigid Transformation φ for fixed u and v

Exhaustive Search

The simplest, but computationally most costly search method is to simply test each

possible shape alignment sequentially. In theory every possible position ought to be

evaluated to find the globally optimal position. The only drawback is a runtime issue.

For a two dimensional rigid transformation a parameter for translation in x and y

direction, a rotation and scaling has to be optimized. This results into four nested

for-loops which are costly to solve for a larger search region. For all possible positions

this is not realistic and therefore the parameters of φ are restricted to a certain domain.

Optimizing within this region guarantees to find the optimal position for this domain.

In combination with user-interaction the coarse position can be stated by the user

and the local position optimization is done automatically. This approach also fits our

semi-automatic approach.

Binary Search

A more sophisticated search algorithm is the binary search. Therefore a predefined

interval is divided consecutively. For each alignment the energy equation (3.12) is

evaluated and the position with the lowest energy is taken. The main drawback of

the algorithm is that the optimal position is skipped which may occur especially with

difficult alignment problems. In addition it is not guaranteed that the position with

the optimal alignment will be reached. Instead the binary search will stop at a local

minima.

Consecutive Search

The result of splitting the nested four loops of an exhaustive search into a consecutively

algorithm will result into a search method that is much faster on the one hand but on

the other hand it cannot be guaranteed that the optimal position is found within the

search region.We implemented some variations of this method. The possibility of a

parallel search of the translation parameters x and y and afterwards a consecutive

search for a optimal rotation and scale transformations encountered that this method

is a quite good compromise on speed and accuracy.


3.6. Solving the Shape Prior Segmentation Model 51

Gradient Descent

Another optimization method to detect a local minima is to take the negative of the

gradient of the function at the current position and use it as a step size approach towards

the minimum. Often a timestep variable is added to control the size of the step size.

The main advantage is the speed of the optimization. The additional timestep variable

can have negative effects on the stability. When the value is too high the alignment

task may start to jitter and for low values the speed of the optimization will slow down.

However the main drawback of this optimization method is the lack of finding a global

optimum. Instead the gradient descent search will get stuck in local minima.

3.6.3.1 Discussion

We showed that there are several methods to optimize the shape position. For the

interactive segmentation tool that will be presented in Section 5.2 the exhaustive search

is used because it guarantees to find the globally optimal solution. With the possibility

of user-interaction the search region can be restricted and the application preserves

real-time performance. For tracking slow moving objects also the exhaustive search

method can achieve a sufficient frame rate. Though for faster movements the position

optimization has to be done with a gradient descent approach. The drawback, as

already stated beforehand, is the lack of finding the globally optimal solution. For

movements along a fixed plane the consecutive search can be a good compromise.


52 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

3.7 Iterative Scheme to Solve the Segmentation Model

In the following we want to combine the steps derived in Section 3.6 into an iterative

update scheme to give a better overview of the proposed algorithm and its solution

using a projected gradient descent method to minimize the energy equation.

1.

˜p n+1 = p n + τ θ ∇ (u 0 + θ∇p n )

2.

p n+1 =

max

˜p n

{1, ||˜p||n+1

g

}

3.

4.

u n+1 = v n + θ∇p n+1


u ⎪⎨

n+1 − λθ | v n − φ ◦ s < 0

v n+1 = u n+1 + λθ | v n − φ ◦ s > 0

⎪⎩

φ ◦ s | v − φ ◦ s = 0

5. Optimize φ for an optimal shape alignment.

6. Goto 1. until convergence.

In practice it turns out that the amount of iterations needs not to be balanced. 2 to 3

iterations are sufficient for the sub-optimization of the dual variable (step 1 and 2) to

provide a result for the calculation of u (3) and the thresholding step (4) so that the

whole algorithm converges to a stable result.


Chapter 4

Implementation

Contents

4.1 GPU design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3 Is there an alternative to CUDA? . . . . . . . . . . . . . . . 58

4.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 58

To reach real-time performance we have to compute the iterative solution as fast

as possible. Due to the good parallelization attributes of variational algorithms and

the enhancements of programming techniques on graphics hardware we decided to

implement the method using GPGPU programming with the help of NVidias CUDA

[43]. Since the introduction of the GeForce 8 series it is possible to gain easier access

to GPGPU programming because of available program feedback and unified shading

hardware. More details on unified shaders and the general GPU design is provided in

Section 4.1.

4.1 GPU design

To give a better understanding of the used programming techniques we want to give a

short overview of modern graphics hardware and how the design differs from previous

models. Because we work on NVidia hardware, we will concentrate on the main differences

from the GeForce 8 Series and newer to older GeForce GPUs. Former rendering

53


54 Chapter 4. Implementation

pipelines (Figure 4.1) were fixed and split into different parts implementing each task

on a different processor.

Figure 4.1: Schematic sequence of shader units in a traditional GPU

rendering pipeline.

Figure 4.2: Principle of unified shader model.

This data flow passes data (vertices, attributes, etc.) from the CPU to the GPU and

traverses the major processing stages linearly from the left to the right in Figure 4.1. To

change this sequential flow into a loop oriented model the unified shader architecture

was introduced with the release of the GeForce 8 Series [42]. A schematic overview

is given in Figure 4.2. Unified Stream Processors (SPs) can process any kind of data

and therefore the balance between the different stages is not fixed any more. If a

program or a loaded scene has to do more processing on pixel level than for example on

vertices the workload will be divided on demand. The data processing is loop oriented.


4.2. CUDA 55

Input is passed to the unified shader and results are redirected from local registers

to the processing unit for the next operation. When all shading operations are done

the resulting pixel fragment is handed over to the ROP unit. SPs can be grouped

together to provide high parallel computing possibilities. This can be used for GPGPU

computing which can be done with the CUDA framework from NVidia (see Section 4.2).

The principle of SPs is illustrated in Figure 4.3. This block diagram of a GeForce 8800

GTX gives an overview of the composition of the graphics device. The SPs and texture

units (TFs) are combined to blocks where single threads can run on. These blocks offer

some amount of local memory (L1 cache) and can exchange data on a shared memory

level (L2 cache). A better overview of provided memory is given in Figure 4.4 and of

the CUDA programming techniques in Section 4.2. Each SP unit can be assigned to

a specific shader task and the output can be redirected as an input to a different SP

very efficiently.

Figure 4.3: Block Diagram of GeForce 8800 GTX. The figure is reprinted

from [42]

4.2 CUDA

GPUs have been used for non-graphics computation for several years. With the introduction

of the unified shader model, GPUs offer the opportunity to build a framework


56 Chapter 4. Implementation

Figure 4.4: Provided memory of the GeForce 8 Series that can be used in

programming with CUDA. This figure is taken from the CUDA Programming

guide [43].

that lightens the learning curve of GPU computing. Since the introduction of the

GeForce 8 Series the thread management for the different shaders can be interpreted

as a single management facility for thread handling in GPGPU computing. With the

CUDA framework [43] NVidia provides a GPGPU technology based on the syntax of

the C programming language. One main advantage of the new GPU design is the

generic SPs in combination with the ability of generic addressing of the device memory.

In former GPU series it was not possible to write to arbitrary addresses in memory.

Generally a GPU is specialized for highly parallel intensive computation. They can be

regarded as a coprocessor to the CPU and are implemented as a set of multiprocessors.

Due to this, GPUs are predestined to apply identical operations on varying data. The

threads are organized as a grid of thread blocks (see Figure 4.5):


4.2. CUDA 57

Figure 4.5: Thread organization in grids and blocks for kernel execution.

The figure is taken from [43].

Thread Blocks: A bunch of threads that can communicate via shared memory and

can be synchronized on certain points. The thread ID is visible inside the kernel

and can be used for position sensing inside the processed data.

Grid of Thread Blocks Due to the limited number of threads within a block, the

thread blocks that execute the same kernel are jointed into a grid. The main

disadvantage compared to threads inside a block is the lack of fast communication

facilities. Again a block ID provides the opportunity to ascertain the current

position.


58 Chapter 4. Implementation

In Figure 4.4 the memory accessibility is shown in respect to the single threads. As

for CPUs, registers present the fastest and closest method of local memory access. In

addition local memory is available for kernel execution. To share data between threads

there is the ability to use shared memory which brings a high speedup when correctly

accessed. To achieve high memory bandwidth bank conflicts have to be avoided. Shared

memory is divided into memory banks where each one can handle one access per clock

cycle. When multiple threads access the same bank the access has to be serialized.

At best the bandwidth is as high as for register access.There is no other possibility to

share data between grid blocks than the device memory. Write access is only provided

from the global memory that is not cached unlike constant and texture memory that

are indeed cached but only readable for kernel functions.

The CPU can communicate towards the GPU through the global, constant and

texture memory. There is no possibility to access the device shared or even local

memory from the host side. Due to slower bandwidth between host and device memory,

data transfer between the two should be minimized to gain optimal performance. One

should take advantage of the high bandwidth between device and device memory.

4.3 Is there an alternative to CUDA?

Anything proposed by the graphics manufacturer NVidia has normally its counterpart

from AMD/ATI and vice versa. CUDA is no exception. AMD proposed “Close To

Metal” (CTM) in the year 2006 [2]. Like in CUDA, ATI offers gather and scatter

memory operations. The main idea on CTM is to gain more direct control on the

underlying hardware. Contrary to CUDA which offers high level “C-style” syntax,

CTM is more an assembly like language and therefore the learning curve is much higher

with AMDs GPGPU framework. Please refer to [2] for more details on CTM.

4.4 Implementation Details

General guidelines to gain a performance increase with GPGPU programming has already

been discussed in the Section 4.2. The iterative approach on solving our segmentation

model (see Section 3.7) can be processed on GPU by utilizing shared memory.

For the algorithm only the neighboring pixel are needed to update the current position.


4.4. Implementation Details 59

Therefore the image data can loaded as patches that fit the corresponding block sizes

and solve the problem according this patch without obtaining new data from the global

memory. The final result is written back to global memory and the next batch can be

processed. This method is well suited to benefit of the principle of SPs.


Chapter 5

Applications and Results

Contents

5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Interactive Medical Image Segmentation . . . . . . . . . . . 62

5.3 Shape Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4 Tracking Application . . . . . . . . . . . . . . . . . . . . . . . 76

5.5 Processing 3D CT/MR Data . . . . . . . . . . . . . . . . . . 79

5.1 Applications

The main intention was to build a semi-automatic segmentation system for medical

image data as described in Section 5.2. This gives a medical doctor the opportunity to

interact with the segmentation result. For very difficult data with bad edge information

the model can be used to favor the prior and just find an appropriate alignment for the

shape model. Therefore the shape alignment of Section 5.3 gains full control over the

final segmentation. Due to the strong linking of the segmentation and the alignment

task, the results cannot be assigned to a specific application in many situations. The

results that are related to both are presented in course of the shape alignment in

Section 5.3. In the next application the shape alignment utilized to build a tracking

system which is evinced in Section 5.4. The intention is to detect the selected object in

the next frame which is nothing else than optimizing the shape position with respect

to the position within the previous frame. In Section 5.5 the tracking is modified

61


62 Chapter 5. Applications and Results

to process 3D data sets. Using CT/MR slices as frame data offers the possibility to

reconstruct 3D objects with the help of a given 2D prior.

To provide easier user interaction a graphical user interface was designed to cover

the demand of the single applications. The specific tools are presented in the following

sections in the course of the application descriptions. The GUI is implemented with

Trolltechs QT.

5.2 Interactive Medical Image Segmentation

As a basis for the implementation, we chose the segmentation model of Pock and

Unger [47, 56, 57]. They proposed the energy function (5.1) to integrate user interaction

into a geodesic active contour model.

min

u∈[0,1]{ ∫ g(x) |∇u| dΩ + 1 ∫

}

λ(x) (u − f) 2 dΩ

Ω 2 Ω

(5.1)

The first term is again the g-weighted TV of u whereas f defines the user input.

Therefore foreground seeds are marked with 1 and background with 0 within f. With

the parameter λ the user can regulate how much the user seeds should be taken into

account or if the edge information is preferred with the aid of the GAC model with the

g-weighted TV.

The segmentation model of Pock and Unger is used to facilitate the creation of a

shape prior on the fly. The user can set foreground and background seeds and define the

resulting segmentation as prior for the further segmentation task. A click on the button

“set Shape” takes the shape form and switches to our proposed energy model. There the

user can again choose with the parameter λ, consistent with the energy equation (3.12),

if the segmentation should be attracted to the shape prior for higher λ or respectively

to the edge function with the use of the g-weighted TV for lower λ. A workflow of

shape definition with the corresponding user interface is shown in Figure 5.1.

second phalanx of the middle finger was segmented using foreground (on the bone)

and background (image border) seeds in Figure 5.1a which shows the complete GUI.

The other images only show the relevant parts of the program. The segmentation of

Figure 5.1a is used as a shape prior which results in a binary representation as shown in

Figure 5.1b. Switching to the edit mode “move shape” allows to position this shape onto

another finger. In Figure 5.1c the shape is represented as a green contour and it was

The


5.2. Interactive Medical Image Segmentation 63

moved onto the ring finger. Because of a slightly different shape the segmentation result

should also include the edge image and therefore the result in Figure 5.1d segments the

second phalanx with λ = 0.035. The GPU-based implementation enables real-time

interaction. When the shape position is modified the segmentation is accommodated

immediately.

(a) Shape initialization.

(b) (c) New shape position. (d) Segmentation with λ = 0.035.

Figure 5.1: Workflow of shape prior segmentation with a shape prior

defined on the fly.


64 Chapter 5. Applications and Results

The second possibility to get a shape prior into account is to load a binary representation

of the shape. Therefore an image with black and white regions has to be

prepared in advance whereas zero (black) would define the inside region of the shape

and accordingly one (white) the outside. Hence hand labeled data can be used as

easy as shape prior definitions that were made during the runtime which makes the

application versatile for different uses.

(a) Thresholding result with

T = 0.3.

(b) Segmentation with pure

GAC energy.

(c) Shape prior segmentation

with λ = 0.15.

(d) Thresholding result with

T = 0.3.

(e) Segmentation with pure

GAC energy.

(f) Shape prior segmentation

with λ = 0.15.

Figure 5.2: Segmentation of the first phalanx of an index finger (5.2a–

5.2c) and a metacarpal bone of a ring finger (5.2d–5.2f). The red curve

represents the final segmentation and the green points mark the reference

segmentation by an expert.

A labeled data set was available for the metacarpal bones and the proximal (first)

phalanges in a X-ray image of the hand. As an example we used the first phalanx

of the index finger and the metacarpal bone of the ring finger from a left hand. In

Figure 5.2 this data set was used to show the different results when using thresholding,

pure geodesic active contour energy and the proposed shape prior segmentation is


5.2. Interactive Medical Image Segmentation 65

shown. As a result we can see that the thresholding fails for both examples. The GAC

segmentation with the image borders as background and some hand labeled foreground

seeds perform well in regions with good contrast but also fail in regions where no good

edges are available. This is especially the case where the metacarpal bone meet the

carpal bones. Incorporating shape information the result is equivalent to the groundtruth

segmentation labels. The segmentation results in Figure 5.2c and 5.2f assume an

accurate alignment of the shape. For misaligned data the results are shown in the next

section in Figure 5.7 and 5.6.

(a) Thresholding result with

T = 0.3.

(b) Segmentation with pure

GAC energy.

(c) Shape prior segmentation

with λ = 0.15.

Figure 5.3: Segmentation of a vertebra in a X-ray image of the spline. Due

to very bad contrast the segmentation without prior fails. The definition

of the prior was prepared by us and therefore cannot be considered as

reference data.

A more complex example with low contrast images is presented in Figure 5.3.

Thresholding totally fails for segmenting a single vertebra in the sagittal X-ray image

of a spline. Also the pure geodesic active contour does not provide a reasonable

segmentation unless very much constraints are set by the user. Incorporating shape

information it is possible to obtain a reasonable segmentation in Figure 5.3.


66 Chapter 5. Applications and Results

Figure 5.4 shows a data set recorded with a camera in a more natural setup. The

aim is to segment the imaged hand on the images. The basic segmentation with thresholding

in Figure 5.4a again fails for the provided data. With GACs, including fore- and

background seeds, the segmentation is possible but the user has to provide constraints

to get a splitting of the fingers. With a shape prior in form of the hand the segmentation

succeeds for this image.

(a) Thresholding result with

T = 0.7.

(b) Segmentation with pure

GAC energy.

(c) Shape prior segmentation

with λ = 0.15.

Figure 5.4: Segmentation of a hand.

The provided examples always require an accurate alignment of the shape prior to

achieve the shown segmentation. For misaligned data the prior has to be aligned before

the segmentation is done. This problem is evinced in more details in the Section 5.3.

Exact time measurements are not very reasonable for the segmentation tool because

it highly depends on the input data and the amount of user interaction. In addition the

energy equation is evaluated in the area of the shape prior and therefore also depends

on the size of the shape representation. However we will mention some values for the

iterations processed per second and the resulting frame-rate in the Table 5.1 to convey

the efficiency of the proposed method and its GPU-based implementation.

Data set Image Size [px] Shape Size [px] Iterations/s Frames/s

X-ray (spline) 512 × 800 153 × 111 30800 205

X-ray (spline) 512 × 800 512 × 512 17100 57

Hand (index finger) 512 × 512 256 × 256 25300 113

Hand (ring finger) 512 × 512 256 × 512 20700 93

Ultrasound (heart) 800 × 640 384 × 384 23200 102

Table 5.1: Performance evaluation of the shape prior segmentation tool


5.3. Shape Alignment 67

5.3 Shape Alignment

If an image has very bad edge strength or many disturbing edges the segmentation

cannot be solely based on an edge function. There the user has the possibility to trust on

shape alignment. The ability to take a predefined shape into the segmentation system

establishes the possibility to stick to the shape with a high λ for the segmentation task.

The problem arises that the alignment has to be accurate so that the desired object

borders can be found. The following application proposes a method to automatically

align the shape towards the optimal position to segment the desired object. The search

methods for optimizing the parameters of the rigid shape transformation φ ◦ s are

discussed in Section 3.3.3. The linear search guarantees to find the optimal position

within the search region. Due to runtime issues – the linear search of rigid parameters

results into four nested for-loops – the search region will be limited in the application.

It is up to the user to define the allowed shape variation. As an alternative search

method a gradient descent algorithm is implemented which offers a timestep variable

for each optimization variable (translation, rotation and scale). Again the user has to

define how fast the model is allowed to change the segmentation prior. Generally the

application describes again an optimization task. The optimal position of the shape

prior is equivalent with the global minimum of the proposed energy equation (3.12).

For the alignment it is also possible that some parameters of the rigid transformation

are provided by user interaction or at least suggested by a coarse positioning. Figure 5.5

shows an example where the user gives the coarse position and the local optimization

of translation, rotation and scaling are determined with a line search optimization.

The green contour describes the shape boundary and the red represents the current

segmentation. For λ a value of 0.1 was chosen which is a good balance of weighting

the shape information to edge attraction. In Figure 5.5c the shape is aligned to the

underlying bone structure. When the shape prior is positioned on a more different

structure like in Figure 5.5d the rigid transformation tries to optimize the energy (3.12)

for the best fit.

Figure 5.6 and 5.7 show an example for shape position optimization of the handlabeled

data that has already been used for evaluating the segmentation task in Section

5.2. The rigid transformation parameters are optimized to get a reasonable alignment

of the shape prior to the calculated edge function. Assuming that the optimization

region lies within the give transformation restrictions, the shape gets optimally aligned


68 Chapter 5. Applications and Results

(a)

(b)

(c)

(d)

Figure 5.5: Shape alignment with user interaction.

and the resulting segmentation is equal than the hand-labeled contour.

In medical image segmentation various image modalities are available. For the examples

in Figure 5.8 and 5.9 two ultrasound images of the heart are processed. Due to

the noisy output of the ultrasound scanners the segmentation according to edges is difficult.

Incorporating a shape prior and optimizing the shape position the desired results

can be achieved. In Figure 5.8 the segmentation of the left ventricle in an echocardiogram

is shown. In Figure 5.8a the green contour represents the initial shape prior and

the red border in Figure 5.8b shows the corresponding segmentation. In Figure 5.8c

the shape is aligned to the optimal found position. There the optimized shape prior

(green) and the segmentation (red) are shown within the same image. Figure 5.8c

shows the result when a TV-L 1 filtering (see Section 3.2.2) is done before the segmen-


5.3. Shape Alignment 69

tation. In Figure 5.9a the shape of a right atrium is incorporated into the segmentation.

Figure 5.9b shows an intermediate state and Figure 5.9c the final segmentation with

the use of the optimized shape prior position. The contour in Figure 5.9d is a result

of a segmentation that takes more weight towards the shape force. With a λ = 0.1 the

segmented borders correspond more to the proposed shape geometry. In this case the

segmentation results seems to be smoother but this depends on the incorporated shape

prior.

The data set of the Figures 5.10–5.12 show sagittal and coronal X-ray images of

a spline. These are examples for very low contrast images which are important for

long-term studies where the radiation should be as low as possible. Due to the shape

Figure 5.6: Alignment of a given shape to a metacarpal bone of a ring

finger. The red curve represents the current segmentation and the green

points mark the reference segmentation by an expert.


70 Chapter 5. Applications and Results

Figure 5.7: Alignment of a given shape to the second phalanx of an index

finger. The red curve represents the current segmentation and the green

points mark the reference segmentation by an expert.


5.3. Shape Alignment 71

(a) Initial shape position. (b) Segmentation with λ = 0.02.

(c) Optimized shape alignment and segmentation

with λ = 0.02.

(d) Segmentation with λ = 0.1 with previous TV-

L 1 filtering λ = 0.3.

Figure 5.8: Segmentation of the left ventricle in an echocardiogram. The

green contour represents the pre-defined shape prior and the red contour

the resulting segmentation.

information the local alignment is able to find a reasonable position for a vertebra in

Figure 5.10 for the sagittal image and Figure 5.11 for the coronal one. In the example

of Figure 5.12 the shape consists of multiple vertebrae and the final alignment is feasible

again.

An example of non-medical image data is presented in Figure 5.13. An image of

a hand is used where the second half of the pictures contain some occlusion. Using a

pre-defined shape prior the segmentation of the hand is possible in both cases with the

help of local alignment optimization.


72 Chapter 5. Applications and Results

(a) Segmentation with the initialized shape position

(λ = 0.02)

(b) Segmentation with an intermediate shape position

(λ = 0.02)

(c) Segmentation after final alignment (λ = 0.02)

(d) Segmentation that favors the shape force with

λ = 0.1 which results into a smoother contour.

Figure 5.9: Segmentation of the right atrium in an echocardiogram with

the help of position alignment of a shape prior.


5.3. Shape Alignment 73

Figure 5.10: Alignment of a single vertebra in a sagittal X-ray image of

the spline.

Figure 5.11: Alignment of a single vertebra in a coronal X-ray image of

the spline.


74 Chapter 5. Applications and Results

Figure 5.12: Alignment of multiple vertebrae in a sagittal X-ray image of

the spline.


5.3. Shape Alignment 75

Figure 5.13: Alignment of a hand shape to a an image. This example

shows that the proposed method is robust against occlusion.


76 Chapter 5. Applications and Results

5.4 Tracking Application

Consequently the shape alignment can be applied to video sequences and therefore

track an object with the aid of shape information. Therefore the object position has

to be initialized at a certain position in the first frame. This can be done by either

stopping the video and segmenting the object with the help of the provided tools, or

by just loading a shape prior that describes the object. For the guided initialization

with a known prior the parameters for translation, rotation and scale can be adjusted

so that the segmentation is valid for an initialization. An automated alignment for

the initialization with the tools of Section 5.3 is another possibility to provide a valid

segmentation for the first frame. For the tracking itself a shape alignment is applied

from one frame to the next. The disadvantage when searching for the globally optimal

position with a line search method is obvious. For faster movement the allowed transformation

parameters have to be increased which slows down the tracking and for a

restricted domain of parameters problems occur when the object moves faster, rotates

more or gets smaller or bigger than the specified domain. Then the tracking gets incorrect

or lost in the worst case. The the tracking application can resort to the gradient

descent search method which will result in a much higher frame rate. The drawback

is that the tracking begins to jitter if the timestep gets to high or the segmentation

will not catch up with fast object movement for lower values. For a restricted search

region the tracking application can reach real-time performance despite of the very

costly search method.

In Figure 5.14 a real-time tracking example of a cup is shown. The espresso cup

is moved with a speed that a search region of a few pixel is sufficient and the position

update can be done in real-time. Switching to a search method like gradient descent

brings a performance boost but as mentioned in Section 3.3.3 it cannot guarantee to

find the globally optimal position. For a sequence like the one in Figure 5.14 the result

would be satisfying due to the lack of disturbing edges near the object.

In Figure 5.15 a more complex example is shown. This one is not processed in

real-time but as video sequence due to disturbing edges at the driving lane from the

shadows of the trees and illumination changes. The sequence is pre-processed with

some Gaussian smoothing so that disturbing edges can be reduced to a certain extend.

However the dominant edge of the shadow and the illumination change when entering

the tunnel remain. Despite the difficult condition the tracking provides a appropriate


5.4. Tracking Application 77

position estimation and the car is followed through the whole sequence. Slight misalignment

in some frames due to heavy illumination changes are already compensated

after a few frames.

Figure 5.14: Real-time tracking of an espresso cup using an exhaustive

search for position optimization and a restricted domain of allowed transformation

parameters.


78 Chapter 5. Applications and Results

Figure 5.15: Daimler-Chrysler sequence of a car approaching into a tunnel.

The sequence is very difficult to track based on edge information because

of many disturbing edges on the lane and the illumination changes.

However the shape prior segmentation with position optimization did a

fairly good job and did not loose the car when entering the tunnel.


5.5. Processing 3D CT/MR Data 79

5.5 Processing 3D CT/MR Data

As a specialized tracking and shape aligning application the algorithm is applied to

3D image data. The idea is to track an object through the slices of an MRT or CT

volume. First an object is segmented on a starting slice according to its shape and

second the prior position is optimized regarding the shape change of the object while

iterating through the volume slices. With the help of such a segmentation a three

dimensional representation can be built out of the single segmentation results. Here

the main emphasis does not lie on real-time performance and therefore the search region

can be enlarged. This will result in a more stable result. In Figure 5.17 the abdominal

aorta is tracked through the MR scan of an abdomen.

Figure 5.16: Aorta segmentation with the help of 3D data processing

with a 2D shape prior. The single MR slices are used as consecutive frames

and the shape position is optimized according the aorta.


80 Chapter 5. Applications and Results

Figure 5.17: 3D visualization of aorta segmentation. Different views show

the tubular structure in the volumetric data. The segmentation result is

shown in red.


Chapter 6

Conclusion and Outlook

Contents

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.1 Conclusion

In this master’s thesis an interactive segmentation model that incorporates prior information

was presented. In Section 2.2 and 2.3 we introduce some basic methods for

image segmentation. We showed that especially low level methods like thresholding

or simple edge chains have their drawbacks on complex data sets. However also more

advanced edge-based methods like the Snake model by Kass et al. [29] or the geodesic

active contours by Caselles et al. [8] depend on basic features like edges and do not

incorporate high level knowledge in their original implementations. The corresponding

optimization techniques are not able to find a global optimum and get stuck in

a local one. This drawback is also applicable for more sophisticated region-based approaches

like the Mumford-Shah segmentation model [39] or its modification by Chan

and Vese [15, 58].

To handle more difficult tasks, methods were developed that incorporate prior

knowledge in a segmentation framework. In Section 2.4 we give an extract of methods

that use shape models to improve the segmentation results. Level set methods by

Leventon et al. [33, 34] or Chen et al. [16–18] are difficult to optimize and the result is

81


82 Chapter 6. Conclusion and Outlook

not guaranteed to be globally optimal. The two-part method of Paragios et al. [45, 46]

combines a global registration task with a local deformation field. The last two approaches

by Cremers et al. [23, 25] and Bresson et al. [6] incorporate shape information

into the Mumford-Shah method.

The proposed variational shape prior segmentation is presented in Chapter 3. The

approach is based on a segmentation with geodesic active contours that prefers a defined

geometric shape. The geodesic active contour is implemented with a weighted TV-norm

and the shape force is integrated in the data-fidelity term that is modelled with a L 1 -

norm. This is stated in Section 3.3.2 and 3.4. The method to obtain a globally optimal

solution and optimize the given paramters is shown in Section 3.6.

The implementation itself is done with the help of NVidias GPGPU framework

CUDA. The optimization of the variational models can be accelerated and the methods

reach real-time performance. Details on graphics hardware and GPGPU computing is

given in Chapter 4.

Chapter 5 was devoted to the implemented applications and the corresponding

results. First an interactive segmentation application was reviewed and evaluated with

hand-labeled image data. This data was also used to review the task of shape alignment

in Section 5.3. Next a tracking application was implemented which makes use of the

shape alignment on consecutive image frames. Finally the tracking was modified to

process 3D image data with the use of a 2D shape prior.

6.2 Outlook

For further development several sub-parts may be enhanced. First the shape prior

representation could be implemented in a more flexible way. For fast treatment of

new data the binary shape prior offers the possibility to process the image without

previous learning step. However the segmentation is only based on one fixed shape

prior. Introducing an optional statistical prior or maybe a combination of a binary

shape and a learnable prior it is possible to incorporate different shapes and chose

the best adapted one for the particular situation. An idea is also to combine multiple

priors with some kind of forces. Therefore it would be possible to favor an alignment

of the shapes towards each other and the forces have effect on the alignment among

themselves.


6.2. Outlook 83

Another idea is to have a 3D representation of an object as shape prior. This

would make the segmentation algorithm much more versatile to different viewpoints.

A position estimation could be done beforehand and therefore very accurate priors

could be used for the final segmentation.

For the alignment and tracking application the position optimization is the main

performance bottleneck. Especially for fast object movement in the tracking application

real-time ability gets lost when using the exhaustive search method. It may guarantee

to find the globally optimal position within the search region but for an increase of the

allowed transformation the frame rate will drop. Therefore a global estimation of the

object movement would come in handy for a performance boost. An idea is to estimate

the coarse movement with optical flow and only do the accurate alignment within a

small environment. Optical flow has already been implemented on the GPU by Pock

and Zach in [47, 60] and the ability to calculate the flow field in real-time was proven.


84

Bibliography

[1] Ambrosio, L. and Tortorelli, V. (1992). On the approximation of free discontinuity

problems. Boll. Un. Mat. Ital., B(7),6(1):105–123.

[2] AMD Graphics Products Group (2006). ATI CTM Guide 1.01. Technical report,

AMD Corp., Sunnyvale, CA, USA.

[3] Aujol, J. F., Gilboa, G., Chan, T., and Osher, S. J. (2006). Structure-texture

image decomposition: Modeling, algorithms, and parameter selection. International

Journal of Computer Vision, 67(1):111–136.

[4] Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J. P., and Osher, S. J. (2005).

Global minimizers of the active contour/snake model. International Conference on

Free Boundary Problems: Theory and Applications (FBP).

[5] Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J. P., and Osher, S. J. (2007).

Fast global minimization of the active contour/snake model. Journal of Mathematical

Imaging and Vision, 28(2):151–167.

[6] Bresson, X., Vandergheynst, P., and Thiran, J. P. (2006). A variational model

for object segmentation using boundary information and shape prior driven by the

Mumford-Shah functional. International Journal of Computer Vision, 68(2):145–162.

[7] Canny, J. F. (1983). Finding edges and lines in images. Master’s thesis, Massachusetts

Institute of Technology, Dept. of Electrical Engineering and Computer

Science. Supervisor: J. Michael Brady.

[8] Caselles, V., Kimmel, R., and Sapiro, G. (1997). Geodesic active contours. International

Journal of Computer Vision, 22(1):61–79.

[9] Chambolle, A. (2004). An algorithm for total variation minimization and applications.

Journal of Mathematical Imaging and Vision, 20(1-2):89–97.

[10] Chambolle, A. (2005). Total variation minimization and a class of binary MRF

models. In Energy Minimization Methods in Computer Vision and Pattern Recognition,

pages 136–152.

[11] Chambolle, A. and Lions, P.-L. (1997). Image recovery via total variation minimization

and related problems. Numerische Mathematik, 76(2):167–188.


BIBLIOGRAPHY 85

[12] Chan, T., Esedoglu, S., Park, F., and Yip, A. (2006). Recent developments in total

variation image restoration. In Paragios, N., Chen, Y., and Faugeras, O., editors,

Handbook of Mathematical Models in Computer Vision, pages 17–31. Springer.

[13] Chan, T. F. and Esedoglu, S. (2005). Aspects of total variation regularized L 1

function approximation. SIAM Journal of Applied Mathematics, 65(5):1817–1837.

[14] Chan, T. F., Golub, G. H., and Mulet, P. (1999). A nonlinear primal-dual method

for total variation-based image restoration. SIAM Journal on Scientific Computing,

20(6):1964–1977.

[15] Chan, T. F. and Vese, L. A. (2001). Active contours without edges. IEEE Trans.

Image Processing, 10(2):266–277.

[16] Chen, Y., Guo, W., Huang, F., Wilson, D. C., and Geiser, E. A. (2003). Using prior

shape and points in medical image segmentation. In Rangarajan, A., Figueiredo, M.

A. T., and Zerubia, J., editors, Energy Minimization Methods in Computer Vision

and Pattern Recognition, 4th International Workshop, EMMCVPR 2003, Lisbon,

Portugal, July 7-9, 2003, Proceedings, volume 2683 of Lecture Notes in Computer

Science, pages 291–305. Springer.

[17] Chen, Y., Tagare, H. D., Thiruvenkadam, S., Huang, F., Wilson, D., Gopinath,

K. S., Briggs, R. W., and Geiser, E. A. (2002). Using prior shapes in geometric active

contours in a variational framework. International Journal of Computer Vision,

50(3):315–328.

[18] Chen, Y., Thiruvenkadam, S., Tagare, H. D., Huang, F., Wilson, D., and Geiser,

E. (2001). On the incorporation of shape priors into geometric active contours. In

Variational and Level Set Methods in Computer Vision, pages 145–152.

[19] Cohen, L. and Cohen, I. (1993). Finite-element methods for active contour models

and balloons for 2-d and 3-d images. PAMI, 15(11):1131–1147.

[20] Cootes, T. and Taylor, C. (1999). Statistical models of appearance for computer

vision. Technical report, University of Manchester, Wolfson Image Analysis Unit,

Imaging Science and Biomedical Engineering, Manchester, United Kingdom.

[21] Cootes, T. F., Taylor, C. J., Cooper, D. H., and Graham, J. (1995). Active shape

models: Their training and application. Computer Vision and Image Understanding,

61(1):38–59.


86

[22] Cremers, D., Kohlberger, T., and Schnörr, C. (2003). Shape statistics in kernel

space for variational image segmentation. Pattern Recognition, 36(9):1929–1943.

[23] Cremers, D., Schnörr, C., and Weickert, J. (2001). Diffusion snakes: Combining

statistical shape knowledge and image information in a variational framework. In

Paragios, N., editor, IEEE First Int. Workshop on Variational and Level Set Methods,

pages 137–144, Vancouver.

[24] Cremers, D., Schnörr, C., Weickert, J., and Schellewald, C. (2000). Diffusion snakes

using statistical shape knowledge. In Sommer, C. and Zeevi, Y., editors, Algebraic

Frames for the Perception-Action Cycle, volume 1888 of LNCS, pages 164–174, Kiel,

Germany. Springer.

[25] Cremers, D., Tischhäuser, F., Weickert, J., and Schnörr, C. (2002). Diffusion

snakes: Introducing statistical shape knowledge into the Mumford–Shah functional.

International Journal of Computer Vision, 50(3):295–313.

[26] Hadamard, J. (1902). Sur les problèmes aux dérivées partielles et leur signification

physique. Princeton Univ. Bull., 13:49–52.

[27] Huang, J. and Mumford, D. (1999). Statistics of natural images and models.

Proceedings IEEE Conference on Computer Vision and Pattern Recognition, 1:541–

547.

[28] Jähne, B. (1993). Digital Image Processing - Concepts, Algorithms and Scientific

Applications. Springer-Verlag, second edition.

[29] Kass, M. (1980). Snakes: Active contour models. International Journal of Computer

Vision, 1(4):321–331.

[30] Kichenassamy, S., Kumar, A., Olver, P., Tannenbaum, A., and Yezzi, A. (1996).

Conformal curvature flows: From phase transitions to active vision. Archive for

Rational Mechanics and Analysis, pages 275–301.

[31] Kichenassamy, S., Kumar, A., Olver, P. J., Tannenbaum, A. R., and Yezzi, Jr.,

A. J. (1995). Gradient flows and geometric active contour models. In International

Conference on Computer Vision, pages 810–815.


BIBLIOGRAPHY 87

[32] Leung, S. and Osher, S. (2005). Global minimization of the active contour model

with TV-inpainting and two-phase denoising. In Paragios, N., Faugeras, O. D.,

Chan, T., and Schnörr, C., editors, Variational, Geometric, and Level Set Methods

in Computer Vision, Third International Workshop, VLSM 2005, Beijing, China,

volume 3752 of Lecture Notes in Computer Science, pages 149–160. Springer.

[33] Leventon, M., Faugeraus, O., and Grimson, W. (2000a). Level set based segmentation

with intensity and curvature priors. In Proceedings Workshop on Mathematical

Methods in Biomedical Image Analysis, pages 4–11.

[34] Leventon, M., Grimson, W., and Faugeras, O. (2000b). Statistical shape influence

in geodesic active contours. In Proceedings IEEE Conference on Computer Vision

and Pattern Recognition, pages 316–323, Los Alamitos. IEEE.

[35] Leventon, M. E. (2000). Statistical models in medical image analysis. PhD thesis,

Massachusetts Institute of Technology. Supervisors: W. Eric Grimson and Olivier

D. Faugeras.

[36] Lindeberg, T. (1993). Scale-Space Theory in Computer Vision. Kluwer.

[37] Lindeberg, T. (1994). Scale-space theory: A basic tool for analysing structures at

different scales. Journal of Applied Statistics, 21(2):224–270.

[38] Marr, D. and Hildreth, E. (1979). Theory of edge detection. Proceedings Royal

Society of London Bulletin, 204:301–328.

[39] Mumford, D. and Shah, J. (1985). Boundary detection by minimizing functionals.

In Proceedings IEEE Computer Society Conference on Computer Vision and Pattern

Recognition, San Francisco, CA, June 10–13, pages 22–26. IEEE.

[40] Mumford, D. and Shah, J. (1988). Optimal approximations by piecewise smooth

functions and variational problems. Comm. on Pure and Applied Math., XLII(5):577–

685.

[41] Nikolova, M., Esedoglu, S., and Chan, T. F. (2006). Algorithms for finding global

minimizers of image segmentation and denoising models. SIAM Journal of Applied

Mathematics, 66(5):1632–1648.

[42] NVidia Corp. (2006). NVIDIA GeForce 8800 GPU architecture overview. Technical

report, Nvidia Corp., Santa Clara, CA, USA.


88

[43] NVidia Corp. (2007). NVIDIA CUDA Compute Unified Device Architecture –

programming guide 1.1. Technical report, Nvidia Corp., Santa Clara, CA, USA.

[44] Osher, S. and Sethian, J. A. (1988). Fronts propagating with curvature-dependent

speed: Algorithms based on Hamilton-Jacobi formulations. Journal of Computational

Physics, 79:12–49.

[45] Paragios, N., Rousson, M., and Ramesh, V. (2002). Matching distance functions:

A shape-to-area variational approach for global-to-local registration. In European

Conference on Computer Vision, volume II, page 775 ff.

[46] Paragios, N., Rousson, M., and Ramesh, V. (2003). Non-rigid registration using

distance functions. Computer Vision and Image Understanding, 89(2-3):142–165.

[47] Pock, T. (2008). Fast Total Variation for Computer Vision. PhD thesis, Institute

for Computer Graphics and Vision, Graz University of Technology. Supervisors:

Prof. Dr. Horst Bischof and Prof. Dr. Daniel Cremers.

[48] Pylyshyn, Z. (1986). Spring and fall fashions in cognitive science. presented to the

Cognitive Science Society’s Eighth Annual Conference, Amherst, Massachusetts.

[49] Rousson, M., Paragios, N., and Deriche, R. (2003). Active shape models from a

level set perspective. Technical report, I.N.R.I.A.

[50] Rousson, M., Paragios, N., and Deriche, R. (2004). Implicit active shape models

for 3D segmentation in MR imaging. In Barillot, C., Haynor, D. R., and Hellier,

P., editors, Medical Image Computing and Computer-Assisted Intervention–MICCAI

2004, volume 3216 of Lecture Notes in Computer Science, pages 209–216. Springer.

[51] Rudin, L. I., Osher, S. J., and Fatemi, E. (1992). Nonlinear total variation based

noise removal algorithms. Physica D: Nonlinear Phenomena, 60:259–268.

[52] Sobel, I. and Feldman, G. (1968). A 3x3 isotropic gradient operator for image

processing. presented at a talk at the Stanford Artificial Project in 1968, unpublished

but often cited (e.g. in Pattern Classification and Scene Analysis, Duda,R.

and Hart,P., John Wiley and Sons,’73, pp271-2).

[53] Strong, D. M. and Chan, T. F. (2000). Edge-preserving and scale-dependent

properties of total variation regularization. Technical report.


BIBLIOGRAPHY 89

[54] Tikhonov, A. N. (1963). Regularization of incorrectly posed problems. Soviet

Mathematics, 4:1624–1627.

[55] Tikhonov, A. N. and Arsenin, V. Y. (1977). Solutions of Ill-posed Problems. W.

H. Winston, Washington, D.C.

[56] Unger, M. (2008). An interactive framework for globally optimal image segmentation

with local constraints. Master’s thesis, Institute for Computer Graphics and

Vision, Graz University of Technology. Supervisor: Prof. Dr. Horst Bischof, Instructor:

Dr. Thomas Pock.

[57] Unger, M., Pock, T., and Bischof, H. (2008). Continuous Globally Optimal Image

Segmentation with Local Constraints. In Perš, J., editor, Computer Vision Winter

Workshop 2008, Moravske Toplice, Slovenia.

[58] Vese, L. A. and Chan, T. F. (2002). A multiphase level set framework for image

segmentation using the Mumford and Shah model. International Journal of Computer

Vision, 50(3):271–293.

[59] Yin, W., Goldfarb, D., and Osher, S. (2005). Image cartoon-texture decomposition

and feature selection using the total variation regularized L 1 functional. In Paragios,

N., Faugeras, O. D., Chan, T., and Schnörr, C., editors, Variational, Geometric,

and Level Set Methods in Computer Vision, Third International Workshop, VLSM

2005, Beijing, China, October 16, 2005, volume 3752 of Lecture Notes in Computer

Science, pages 73–84. Springer.

[60] Zach, C., Pock, T., and Bischof, H. (2007). A duality based approach for realtime

TV-L 1 optical flow. In German Pattern Recognition Symposium, pages 214–223.

More magazines by this user
Similar magazines