paper - Institute for Computer Graphics and Vision - Graz University ...

**Graz** **University** of Technology

**Institute** **for** **Computer** **Graphics** **and** **Vision**

Master’s Thesis

Globally Optimal

TV-L 1 Shape Prior Segmentation

Manuel Werlberger

**Graz**, Austria, May 2008

Thesis supervisor

Univ. Prof. DI Dr. Horst Bischof

Instructor

DI Dr. Thomas Pock

Abstract

Interpreting an image is a common **and** challenging task in computer vision. A human

observer does not only use intensity or color in**for**mation or other basic features when

looking **for** region boundaries but also takes prior knowledge into account. This increases

the robustness on the segmentation result **for** most images. The main intention

of our work is to propose a globally optimal segmentation algorithm that incorporates

prior knowledge in **for**m of a geometric shape. The proposed energy is based on a

weighted Total Variation energy **and** is optimized with fast numerical approaches like

the projected gradient descent method. The GPU-based implementation is able to

achieve real-time per**for**mance **for** the presented applications. We show the coherence

of the proposed energy model to **for**mer variational methods like the well-known edgepreserving

restoration model of Rudin, Osher **and** Fatemi **and** methods that incorporate

prior in**for**mation into classical segmentation models. Different applications are realized

with the proposed energy. First of all a semi-automatic, interactive segmentation tool

is implemented. The user can either define a shape prior on the fly using the weighted

Total Variation as geodesic active contour or load a predefined geometric shape. Next

the energy model can be used to align two shapes on each other or optimize the alignment

of a shape to an underlying edge function. Consequentially a tracking approach

was introduced with the ability to optimize the incorporated shape in**for**mation according

to consecutive frames. This position update is also used when processing 3D data

sets with a 2D prior which is particularly useful **for** segmenting tubular structures in

medical data sets with a single constraint on the first slice.

Keywords.

Segmentation, Geodesic Active Contour, Prior Knowledge, Shape Prior,

Total Variation, Variational Methods, globally optimal, GPU

iii

Acknowledgments

First of all, I would like to thank my family **for** always supporting me **and** giving me the

opportunity to follow the educational career to my liking. I am grateful to Prof. Horst

Bischof **for** supervising my master’s thesis. Special thanks go to Dr. Thomas Pock **for**

the guidance, **for** all the inspirational discussions **and** the time spent answering all my

questions **and** proof-reading my thesis. Without him this thesis would not have been

possible in this **for**m. Many thanks to all the members of the **Institute** of **Computer**

**Graphics** **and** **Vision** who had always time **for** discussions **and** suggestions to my work.

In particular this people are Markus Unger who shared an office with me **for** several

months, Martin Urschler, Bernhard Kainz **and** Werner Trobin.

During my studies I got to know many different people **and** gained some sincere

friendships. I am thankful **for** the support of my friends. I am much obliged to Michael

Rabatscher **for** the successful collaboration in many lectures **and** the supporting talks

we had. Finally, I would like to thank Julia **for** her love **and** all her suggestions **for**

improvements after reading my thesis.

v

Contents

1 Introduction 1

1.1 Motivation **and** Problem Statement . . . . . . . . . . . . . . . . . . . . . 1

1.2 Organization of the Master’s Thesis . . . . . . . . . . . . . . . . . . . . 2

1.3 Digital Image **and** its Continuous Space . . . . . . . . . . . . . . . . . . 3

1.4 Well- or Ill-Posed Problem? . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.5 What Makes Image Segmentation a Hard Task? . . . . . . . . . . . . . . 5

1.6 Shape Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Related Work 7

2.1 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Edge-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Edge Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2 Scale Space Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.2.1 Canny-Edge Detector . . . . . . . . . . . . . . . . . . . 10

2.2.3 Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.4 Balloons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.5 Geodesic Active Contours . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Region-Based Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3.2 Region splitting **and** merging . . . . . . . . . . . . . . . . . . . . 17

2.3.2.1 Region Merging . . . . . . . . . . . . . . . . . . . . . . 17

2.3.2.2 Region Splitting . . . . . . . . . . . . . . . . . . . . . . 17

vii

viii

CONTENTS

2.3.2.3 Splitting **and** Merging . . . . . . . . . . . . . . . . . . . 17

2.3.3 Mum**for**d-Shah . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.4 Chan-Vese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.4 Shape Prior Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.1 Introduction to the Level Set Framework . . . . . . . . . . . . . 21

2.4.2 Leventon et al. approach . . . . . . . . . . . . . . . . . . . . . . 23

2.4.3 Diffusion Snakes . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.4 Chen et al. approach . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.4.5 Global-to-Local Shape Registration . . . . . . . . . . . . . . . . . 27

2.4.6 Shape Prior Driven Mum**for**d-Shah Functional . . . . . . . . . . 29

2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Geodesic Active Contour with L 1 Shape Prior 33

3.1 Geodesic Active Contours with Shape In**for**mation . . . . . . . . . . . . 33

3.2 Review of Total Variation Models . . . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Rudin Osher **and** Fatemi - Noise Removal . . . . . . . . . . . . . 35

3.2.2 The L 1 Data Fidelity Term . . . . . . . . . . . . . . . . . . . . . 36

3.2.3 Weighted Total Variation . . . . . . . . . . . . . . . . . . . . . . 38

3.3 Geometric Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.1 Distance Measures . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.2 L 1 Shape Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3.3 Rigid Shape Trans**for**mation φ ◦ s . . . . . . . . . . . . . . . . . . 41

3.4 TV-L 1 Shape Prior Segmentation . . . . . . . . . . . . . . . . . . . . . . 42

3.5 Solutions to Variational Models . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.1 Calculus of Variation . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.5.2 Explicit Solution of the ROF Model . . . . . . . . . . . . . . . . 43

3.5.3 Dual Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5.4 Solving the TV-L1 Model . . . . . . . . . . . . . . . . . . . . . . 45

3.6 Solving the Shape Prior Segmentation Model . . . . . . . . . . . . . . . 46

3.6.1 Minimize u **for** fixed v **and** φ – Projected Gradient Descent . . . 47

CONTENTS

ix

3.6.2 Minimize v **for** fixed u **and** φ . . . . . . . . . . . . . . . . . . . . 49

3.6.3 Optimize the Rigid Trans**for**mation φ **for** fixed u **and** v . . . . . . 50

3.6.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.7 Iterative Scheme to Solve the Segmentation Model . . . . . . . . . . . . 52

4 Implementation 53

4.1 GPU design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3 Is there an alternative to CUDA? . . . . . . . . . . . . . . . . . . . . . . 58

4.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Applications **and** Results 61

5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Interactive Medical Image Segmentation . . . . . . . . . . . . . . . . . . 62

5.3 Shape Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4 Tracking Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5.5 Processing 3D CT/MR Data . . . . . . . . . . . . . . . . . . . . . . . . 79

6 Conclusion **and** Outlook 81

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Bibliography 84

List of Figures

1.1 Example of an ill-posed problem. . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Importance of prior knowledge. . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 The task of image segmentation depends on the given image **and** its

pictured objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Two different edge profiles **and** their first **and** second order derivative . 9

2.3 Edge detection on different scales. . . . . . . . . . . . . . . . . . . . . . 11

2.4 Histogram of a grayscale image with a clear bound to do a **for**e- **and**

background segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Segmentation with image thresholding . . . . . . . . . . . . . . . . . . . 16

2.6 The Ambrosio-Tortorelli approximation of the Mum**for**d-Shah functional. 19

2.7 Chan-Vese segmentation result. . . . . . . . . . . . . . . . . . . . . . . . 20

2.8 Defining level set properties . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.9 Contour topology change during the evolution of the level set function. . 22

2.10 Contour optimization with respect to statistical shape priors . . . . . . 26

2.11 Segmentation of an epicardium . . . . . . . . . . . . . . . . . . . . . . . 27

2.12 Global-to-Local Registration . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.13 Cardiac Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.14 Comparison of Chan-Vese Model to shape prior segmentation proposed

by Bresson et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1 Denoising with TV-L2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 TV-L2 filter applied to a resolution test chart. . . . . . . . . . . . . . . 37

3.3 TV-L1 filter applied to a resolution test chart. . . . . . . . . . . . . . . 38

xi

xii

LIST OF FIGURES

3.4 Shape similarity measure. . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Evaluation of different λ settings. . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Schematic sequence of shader units in a traditional GPU rendering

pipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.2 Principle of unified shader model. . . . . . . . . . . . . . . . . . . . . . . 54

4.3 Block Diagram of GeForce 8800 GTX. . . . . . . . . . . . . . . . . . . . 55

4.4 Provided memory of the GeForce 8 Series. . . . . . . . . . . . . . . . . . 56

4.5 Thread organization in grids **and** blocks **for** kernel execution. . . . . . . 57

5.1 Workflow of shape prior segmentation. . . . . . . . . . . . . . . . . . . . 63

5.2 Segmentation of the first phalanx of an index finger **and** a metacarpal

bone of a ring finger. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3 Segmentation of a vertebra. . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.4 Segmentation of a h**and**. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.5 Shape alignment with user interaction. . . . . . . . . . . . . . . . . . . . 68

5.6 Alignment of a given shape to a metacarpal bone of a ring finger. . . . . 69

5.7 Alignment of a given shape to the second phalanx of an index finger. . . 70

5.8 Segmentation of the left ventricle. . . . . . . . . . . . . . . . . . . . . . . 71

5.9 Position optimization **for** a shape of the right atrium. . . . . . . . . . . . 72

5.10 Alignment of a single vertebra (sagittal). . . . . . . . . . . . . . . . . . . 73

5.11 Alignment of a single vertebra (coronal). . . . . . . . . . . . . . . . . . . 73

5.12 Alignment of multiple vertebrae. . . . . . . . . . . . . . . . . . . . . . . 74

5.13 H**and** shape alignment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.14 Real-time tracking example. . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.15 Daimler-Chrysler tracking sequence. . . . . . . . . . . . . . . . . . . . . 78

5.16 Aorta segmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.17 3D visualization of aorta segmentation. . . . . . . . . . . . . . . . . . . . 80

Chapter 1

Introduction

Contents

1.1 Motivation **and** Problem Statement . . . . . . . . . . . . . . 1

1.2 Organization of the Master’s Thesis . . . . . . . . . . . . . . 2

1.3 Digital Image **and** its Continuous Space . . . . . . . . . . . . 3

1.4 Well- or Ill-Posed Problem? . . . . . . . . . . . . . . . . . . . 4

1.5 What Makes Image Segmentation a Hard Task? . . . . . . . 5

1.6 Shape Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1 Motivation **and** Problem Statement

For the human eye it is a natural procedure to partition the field of view into distinguishable

regions. There**for**e different types of features like edges, texture or appearance

are taken into account. The human observer also takes region descriptions, like object

shapes, into account. This master’s thesis incorporates shape description into a geodesic

active contour segmentation model to partition an image into non-overlapping regions.

In modern computer vision many kinds of segmentation algorithms exist that make

use of different local features like gradients, intensity, color, etc. The main drawback

using basic features is that the lack of intensity or image distortion may lead to wrong

segmentation results **for** more complex images. Model based vision in **for**m of shape

models (e.g. active shape models introduced by Cootes et al. [20]) is used to fit a

pre-learned model onto a nearby object. There**for**e it is necessary to train the model

1

2 Chapter 1. Introduction

with h**and** labeled samples in advance. In addition the initialisation is very important

**for** the segmentation outcome. Our proposed model can be defined on the fly **and** used

**for** real time segmentation.

1.2 Organization of the Master’s Thesis

In Section 1.3 **and** 1.4 some fundamentals about digital images **and** the character of

computer vision problems are discussed. This master’s thesis is concerned with image

segmentation **and** there**for**e we show the difficulties to obtain a valid segmentation in

Section 1.5. Section 1.6 focuses on the introduction of prior knowledge in **for**m of a

shape prior.

Chapter 2 reviews classical segmentation approaches. First in Section 2.2 edgebased

segmentation methods are presented. There**for**e the fundamentals of edge detectors

**and** their behavior to different scaled objects are reviewed. More sophisticated approaches

like the Snake model [29] **and** geodesic active contours [8, 30, 31] are presented

in Section 2.2.3-2.2.5. After that region-based approaches are examined in Section 2.3.

Section 2.3.1 **and** 2.3.2 focus on basic methods like thresholding **and** region splitting **and**

merging. However in Section 2.3.3 **and** 2.3.4 we will discuss the segmentation model

of Mum**for**d **and** Shah [39] **and** its variation from Chan **and** Vese [15, 58]. To improve

the robustness of such segmentation methods prior knowledge was incorporated. Various

methods that make use of shape models **for** the segmentation task are reviewed in

Section 2.4.

In Section 3 we introduce the idea of a variational shape prior segmentation method.

In Section 3.2 corresponding total variation methods are reviewed. The method how

shape in**for**mation is incorporated in the segmentation system is presented in Section

3.3.2. In Section 3.4 the energy model **for** the shape prior segmentation is proposed.

In the following methods on solving variational methods are reviewed in Section 3.5

**and** Section 3.6 focuses on the solution of the shape prior segmentation model.

Chapter 4 deals with the GPU-based implementation. Details on GPU design **and**

the used graphics hardware are given in Section 4.1. Section 4.2 introduces the GPGPU

programming framework CUDA from NVidia which is used **for** the implementation.

Chapter 5 is devoted to different applications **and** their results. First a program **for**

interactive image segmentation is presented in Section 5.2. There also an evaluation

1.3. Digital Image **and** its Continuous Space 3

on h**and**-labeled reference data is done. Some examples of shape alignments are shown

in Section 5.3 that is also evaluated on the same reference data among others. Next

the adaption of shape alignment to sequences is shown in a tracking application in

Section 5.4. Section 5.5 shows a possibility how to process 3D data with the help of a

2D shape prior.

Finally in Chapter 6 we give a conclusion to the presented algorithm **and** applications

**and** an outlook of possible future work **and** algorithm enhancements.

1.3 Digital Image **and** its Continuous Space

A digital image is a mapping from a real world scene to a representation that is readable

**for** a digital device (e.g. digital camera, computer, embedded system, ...). There**for**e

the continuous space of the analog image has to be mapped onto a discrete space by

sampling **and** quantization. Since computers are powerful enough it is possible to process

a large amount of data **and** apply complex algorithms. **Vision** algorithms have been

developed mainly by computer scientists, electrical engineers **and** mathematicians. The

main intention is to obtain in**for**mation from digital images. Applied functions can be

categorized into continuous **and** discrete functions which may represent the appliance

very well. There have been different approaches to solve vision based problems. The

computer science community may prefer discrete operations that fit the computer representation

well, mathematicians will specialize on a continuous space **for** well defined

mathematical models. For electrical engineers representing the image as a 2D signal allows

the appliance of signal-processing techniques. Fundamental algorithms **and** filters

have been developed by this means.

Nowadays more sophisticated algorithms are proposed **and** one of the mainstream

approaches emerged from one of the most important field of mathematical analysis,

namely the field of partial differential equations (PDEs). In general a PDE describes

an equation involving functions **and** their partial derivatives with respect to independent

variables. They actually come from Physics **and** got more common in other fields

over time. In the first place physics **and** biology, afterwards finance **and** now computer

vision benefit from the versatile mathematical model. As PDEs are stated in continuous

settings, the solution has to be proven in continuous space **and** afterwards the

calculation can be applied to the discrete space to find a numerical solution **for** the

4 Chapter 1. Introduction

respective problem.

1.4 Well- or Ill-Posed Problem?

The french mathematician Jacques Hadamard [26] defined that physical problems have

to be solved with well-posed mathematical models. To make a problem well-posed its

solution has to be

1. existent,

2. unique,

3. **and** depends continuously on initial data (is **for** example robust against noise).

(a) original image x (b) blurred image y = Ax (c) blurred **and** additive noise

y = Ax + n

Figure 1.1: Example of an ill-posed problem.

**Computer** vision is often stated as inverse optics. Most of the upcoming problems

are inverse ones which means that model parameters have to be estimated out of given

data. Consider e.g. y = Ax + n with a known operator A **and** n. To obtain the data

y from the given parameters x is the direct problem **and** usually well-posed, whereas

the inverse problem calculating x when the data y is given is usually ill-posed. As a

concrete example we have a look at an image x (Figure 1.1a) that was blurred with

a function A (Figure 1.1b). In addition we add some noise n. This will result into

an observed image y (Figure 1.1c). It is obvious that it is a difficult task to restore

the original image x if you only have the observed image y **and** only few in**for**mation

on the added distortions A **and** n. Keeping sharp edges at the right place is a key

1.5. What Makes Image Segmentation a Hard Task? 5

problem of such an ill-posed problem. Tikhonov **and** Arsenin [54, 55] developed a wellknown

regularization technique to solve such ill-posed problems. The main idea is to

restrict the space of acceptable solutions **and** define a model that minimizes the defined

function.

1.5 What Makes Image Segmentation a Hard Task?

(a) Dalmatian

(b) Teddy Bear

Figure 1.2: Importance of prior knowledge. The images are taken from a

talk of Pylyshyn about cognitive science [48].

In the Figure 1.2 humans can identify the shown objects without a problem although

edges are broken or not even existent, as in the image with the dalmatian (Figure 1.2a)

or regions are scrambled like in the teddy bear image (Figure 1.2b), it is no problem

**for** the human visual system to identify the important cues **for** a reasonable segmentation.

Though when a human has never seen a dalmatian or a teddy bear be**for**e it

would be hard **for** him to see the objects because of lacking prior in**for**mation **for** the

segmentation. People that are aware of the objects will be able to identify them fairly

reliable which suggests to incorporate previous knowledge to segmentation systems.

To build (semi-)automated vision systems an introduction of prior in**for**mation is

a key component. To identify a known object in an unseen image or scene previous

knowledge can be essential. Humans can easily h**and**le this task, even **for** very difficult

images as stated be**for**e. Robustness of automated segmentation system will improve the

more prior knowledge is available. The problem is that the computer has to interpret

this vast amount of data **and** this will badly effect runtime. Our idea is to offer only

6 Chapter 1. Introduction

shape in**for**mation to help the segmentation model to find a valid result.

1.6 Shape Prior

A reasonable partitioning of a complex scene with basic algorithms is hardly ever possible.

In our work we use higher-level in**for**mation in combination with approximate

knowledge of the desired object shape to get the desired segmentation. The outcome is

a very versatile segmentation algorithm that can be used not only **for** a segmentation

task but also **for** shape alignment, tracking **and** automated analysis of 3D data. In general

our intention is not to build a fully automated segmentation system. Especially

with medical image data a specialist has to supervise the application **and** if needed

interact with the system. There**for**e we have three prime concerns:

1. Real-time ability

2. User interaction (semi-automated approach)

3. Finding a globally optimal solution

To reach our dem**and**s we use variational methods **for** the image segmentation **and**

implement the algorithms with the help of GPGPU programming. This allows to gain

a speedup of these rather complicated algorithms **and** enables the application to react

on user input in real-time.

Chapter 2

Related Work

Contents

2.1 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Edge-Based Segmentation . . . . . . . . . . . . . . . . . . . . 8

2.3 Region-Based Segmentation . . . . . . . . . . . . . . . . . . . 14

2.4 Shape Prior Segmentation . . . . . . . . . . . . . . . . . . . . 20

2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.1 Image Segmentation

One of the most important tasks in computer vision corresponding to object borders

is to divide the image into multiple regions **and** detect objects **and** region borders. A

person can easily separate constituent parts of a scene, but **for** computers this is highly

ambiguous. The segmentation problem is not unique **and** depends on the available

image data like in Figure 2.1. For the left image the result would be a segmentation

of the single objects but **for** the right one there is no such obvious split up. To obtain

the final result different features can be used. An often used element to describe

certain objects are their contours **and** there**for**e edges are frequently used features **for**

segmentation. For object detection one often favors to detect complete regions that

correspond to objects. In medical image analysis it is often necessary to detect a

certain structure (e.g. bone, tumor, vessels, . . . ). Especially in medicine the exact

position is very important when identifying an object. In the following we give an

7

8 Chapter 2. Related Work

outline on basic edge- **and** region-based segmentation models.

(a) The single objects can be identified easily

in this scene.

(b) Demonstrating that segmentation is not a unique

problem. The result differs on the desired region that is

wanted to detect. (e.g.: tree, clouds, lake, . . .

Figure 2.1: The task of image segmentation depends on the given image

**and** its pictured objects.

2.2 Edge-Based Segmentation

Edge in**for**mation is a possibility to extract boundary segments from an image. There

are many kinds of edge detectors that can be used to create an edge image. But

edges alone are not enough to cluster the image into differentiable regions. There**for**e

the found edges have to be combined into chains that may describe object borders.

Another popular method to obtain region boundaries are active contour models. The

terminus active is introduced because the segmentation evolves over time **and** **for** certain

models it is even possible to change its topology. The goal of this process is to fit a

curve to the boundary of an object.

The main problems on edge based segmentation are edges which do not represent

object borders. They can be caused by noise or object texture **and** should not be taken

into account when searching **for** an object or region boundary. Also missing edges due

to occlusion have the same negative effect on the resulting segmentation.

2.2. Edge-Based Segmentation 9

2.2.1 Edge Detectors

Edge detectors make use of local changes in the intensity function. A big change

will result into a stronger edge than a small one. A threshold applied to the edge

strength can suppress weak edges. Examples of edge detectors are the well-known Sobel

filter [52] that searches **for** maxima in the first derivation **and** the Marr-Hildreth [38] **and**

Laplacian of Gaussian (LoG) algorithms [28] that detect zero-crossings in the second

derivative. For this purpose see Figure 2.2 where the strength of edges are modelled

dependent on the intensity change in the original image. We see that higher intensity

change results into stronger edges.

(a) Intensity Profile (b) 1 st Derivative (c) 2 nd Derivative

Figure 2.2: Two different edge profiles **and** their first **and** second order

derivative

2.2.2 Scale Space Theory

Real-world objects are normally defined on a range of various sizes. Digital representations

of objects only incorporate a single scale. For further processing it is sometimes

essential that the scale of the representation does fit to the real object. In [36, 37]

Lindeberg explored the field of scale space theory referring to computer vision tasks.

Mostly when talking about scale spaces, a linear (Gaussian) one is meant. A compilation

of gradually smoothed images is the basis of such a representation that should

make vision algorithms scale invariant.

But what is the coherence of different scale representations with edge detection?

Objects have features, textures **and** sub-parts of different size **and** there**for**e in a scaled

representation some features may vanish. An example of different scale spaces of a

gray-value image is presented in the left column of Figure 2.3 whereas the right column

10 Chapter 2. Related Work

shows the corresponding edge image.

edges that arose from object texture.

account after the Gaussian filtering with

Blurring can be used to eliminate disturbing

Only strong enough edges will be taken into

G(x, y) = 1

2πσ 2 e−( x 2 +y 2 )

2σ 2 . (2.1)

2.2.2.1 Canny-Edge Detector

An edge detector which uses the theory of scale spaces was proposed by Canny in the

year 1983 in [7]. The main intention of the edge detector was to fulfill the following

three conditions:

good detection: This requirement expresses that important edges ought to be found.

good localization: The detection should be accurate **and** the real edge position should

hardly vary from the detected edge position.

definite detection: The third constraint says that a single edge should not produce

multiple responses.

To suppress unimportant edges, the image f is filtered with a Gaussian convolution

G like in the scale space theory. Afterwards the normal direction n of the edges are

estimated **for** each pixel with the help of

n =

∇(G ∗ f)

|∇| (G ∗ f) . (2.2)

To suppress weak edges a non-maximal suppression is applied **and** the location of the

edges are evaluated with

∂ 2 G

∗ f = 0. (2.3)

∂n2 Next the edge strength s is evaluated with the magnitude of the gradient of the image

intensity function:

s = |∇(G ∗ f)| (2.4)

To remove spurious edge responses, a thresholding with hysteresis is applied. There**for**e

a high edge response is marked as definite edge **and** a response below the threshold is

2.2. Edge-Based Segmentation 11

(a) Unfiltered image.

(b)

(c) σ = 1

(d)

(e) σ = 5

(f)

Figure 2.3: The left column shows the original image **and** two Gaussian

filtered versions. The edge image, produced with a Sobel filter, is presented

on the right.

12 Chapter 2. Related Work

removed **and** classified as noise. To increase the robustness of the algorithm the steps

are repeated **for** increased smoothing **and** the results are combined. Canny proposed

this approach as feature synthesis.

2.2.3 Snakes

One of the earliest active contour models are Snakes proposed by Kass, Witkin **and**

Terzopoulos [29]. A Snake is defined as an energy minimizing spline C(s) which is

guided through the image by internal **and** external **for**ces. Snakes belong to the family

of parametric models as borders are represented in parametric **for**m. The energy that

is minimized was proposed by Kass et al. in the **for**m

E Snake =

∫ 1

0

E internal (C(s)) + E image (C(s)) + E constraints (C(s)) ds. (2.5)

As the energy term is non-convex there is no global solution to that problem. Snakes

are very sensitive to initialization **and** can result into different segmentation results.

The energy minimization is influenced by image **for**ces that pull the contour towards

edges. So the contour will snap to nearby line structures.

E image : The image **for**ces are modelled by a combination of three energy terms:

E image = w line E line + w edge E edge + w term E term (2.6)

E line can be a simple intensity value pushing the Snake towards contours with the

specified intensity. The edge term w edge E edge , as its name already says, attracts

the segmentation boundary towards edges. The termination term w term E term

stops the evolution process if no continuously contour is available **and** the segmentation

will also take line segments into account.

E internal : The internal spline energy is modelling a bending energy. The Snake can

act as a membrane or a thin plate. The needed behavior can be adjusted with

the two parameters α(s) **and** β(s).

(

E internal = 1 α(s)

∂C

2 ∣ ∂s ∣

2

+ β(s)

∂ 2 ∣ ) C ∣∣∣

2

∣ ∂s 2 (2.7)

2.2. Edge-Based Segmentation 13

E constraints : The third data term of the energy proposed by Kass et al. incorporates

user input or other high level features. E constraints can guide the Snake to a

meaningful segmentation result with the help of additional constraints.

In [29] Kass et al. propose an image energy that pulls the contour towards nearby

edges. In Chapter 2.2.2 the problem of edge scales is mentioned. Because of the

Gaussian convolution the following image energy only takes edges of a certain scale σ

into account.

E image = − |∇G σ ∗ I(x, y)| 2 (2.8)

Using this image energy, ignoring the user-driven term **and** using constant parameters

α **and** β leads to a simplified version of the energy in the **for**m of

E Snake = α 2

∫ 1

0

∂C(s)

∣ ∂s ∣

2

ds + β 2

∫ 1

0

∂ 2 ∣

C(s) ∣∣∣

2

∣ ∂s 2 ds − |∇G σ ∗ I(x, y)| 2 . (2.9)

2.2.4 Balloons

In 1993 an extension to Snakes was proposed by Cohen **and** Cohen in [19] by adding

an additional term to the energy:

E balloons = E Snake + w b

∫Ω in

dx (Ω in . . . region inside the contour) (2.10)

Snakes tend to converge to smaller regions because the contour is driven by optimizing

the length of the boundary. The so-called Balloons favor regions of specific sizes **and** by

adjusting the parameter w b the region can either shrink or exp**and** to find the solution.

The added term can be interpreted as additional constraint E constraints **for** the snake

model. The advance of Balloons is the versatility that the boundary can either shrink

or grow to detect the region boundaries. The disadvantage is that the position of

initialized contour **and** final object position has to be known so that the weight w b can

be set the right way.

2.2.5 Geodesic Active Contours

Based on the Snake model of Kass et al.

(Section 2.2.3), Caselles et al. [8] **and**

Kichenassamy et al. [30, 31] proposed an energy that is invariant with respect to new

14 Chapter 2. Related Work

parametrization of the curve unlike the Snake model. The Geodesic Active Contour

(GAC) (in 3D the model is called minimal surfaces) is defined as the energy optimization

{ }

min E GAC (C)

C

= min

C

{ ∫ |C|

0

}

g (|∇I (C(s))|) ds . (2.11)

|C| describes the Euclidean length of the curve C **and** the function g models an edge

detector. The edge strength has to be restricted to an interval g ∈ (0, 1]. Caselle et al.

used the edge function

g(|∇I|) =

**for** detecting the object boundaries.

1

1 + |∇(G σ ∗ I)| p , with p = 1 or 2 (2.12)

Another possibility **for** modelling g is using a

measurement optimized **for** natural images, proposed by Huang et al. in [27]:

g(|∇I|) = e −η|∇I|κ , e.g. with κ = 0.55 (2.13)

The general intention of the Snake model is to locate the curve at points with a

high edge strength **and** keep a certain smoothness in the curve. Caselles et al. proved

in [8] that these properties are still given when β → 0 in the energy equation (2.9).

The main advance of GACs are the profound mathematical framework that makes the

model very versatile **for** different applications. The user has to incorporate constraint

because C = 0 is always a minimizer **for** the GAC energy. This leads over to the major

h**and**icap of the model. Due to its non-convex energy the minimization task will not

find a globally optimal solution but get stuck in local minimia **and** there**for**e the result

depends on the contour initialization.

2.3 Region-Based Segmentation

In the previous chapter the intention was to find region borders. The next logical step

is to directly detect the regions themselves. It is obvious that it is no problem to

reconstruct image borders out of existing regions **and** contrariwise. The main idea is to

find coherent regions that have something in common. The decision if a pixel belongs

to a certain region is based on a homogeneity criteria H(R i ) in respect to gray values,

texture measurements, color, etc which gives a measure of the similarity **and**/or spatial

proximity among pixels.

2.3. Region-Based Segmentation 15

A complete split up of an image R into disjoint regions R 1 , R 2 , . . . , R S defines the

idea of segmentation **and** can be **for**mulated with the help of set theory:

S⋃

R = R i , ∀ i ≠ j | R i ∩ R j = 0 (2.14)

i=1

H(R i ) = TRUE, ∀ u = 1, 2, . . . , S (2.15)

H (R i ∪ R j ) = FALSE, ∀ i ≠ j (2.16)

Region-based segmentation methods are normally more robust on noisy data where

edges are difficult to detect. Also highly textured objects can be found with the use of

texture properties **for** the homogeneity criteria. The drawbacks are over- **and** undersegmented

results. In addition most region-based algorithms are not able to detect

objects that span over several disconnected regions.

2.3.1 Thresholding

Some segmentation problems can be solved with simple algorithms. An example is a

simple thresholding operation on gray levels to obtain regions with a certain intensity

value which depicts one of the simplest segmentation algorithms. The outcome of this

is a separation of **for**eground **and** background. There**for**e no model based in**for**mation is

used **and** no prior knowledge contributes to the final result. An example of a threshold

operation applied to two different images is shown in Figure 2.5. For Figure 2.5a the

**for**eground contains all the objects whereas the segmentation of the x-ray image in

Figure 2.5b fails **for** certain image regions.

The resulting image is normally a binary representation b that depends on the choice

of the threshold T :

b(x, y) = 1

b(x, y) = 0

∀ I(x, y) ≥ T

∀ I(x, y) < T

(2.17)

This method can be extended with multiple thresholds T 1 , . . . , R S to get a result that

is not binary but contains all segmented regions R 1 , . . . , R S . To optimize the threshold

values it is often a good idea to have a look at the image histogram. In the Figure 2.4

an image **and** its histogram is shown. There a specific threshold can be encountered to

segment the presented object from the background. A segmentation of this example is

16 Chapter 2. Related Work

done in Figure 2.5a.

(a) Image with homogeneous background

**and** several objects as **for**eground.

(b) Intensity histogram

Figure 2.4: Histogram of a grayscale image with a clear bound to do a

**for**e- **and** background segmentation.

(a) Threshold image of Figure 2.1a **and** the

corresponding segmentation result.

(b) Detail of a thresholding segmentation

applied to a X-ray image. The various gray

levels do not allow to get a reasonable segmentation

result out of a simple thresholding

operation.

Figure 2.5: Segmentation with image thresholding

2.3. Region-Based Segmentation 17

2.3.2 Region splitting **and** merging

2.3.2.1 Region Merging

Starting at the finest level where each pixel represents a single region, these regions

are merged as long as equation (2.15) is satisfied. The merging process connects two

adjacent regions that fulfill the same homogeneity criteria.

2.3.2.2 Region Splitting

Region splitting defines the opposite process of region merging. There**for**e the image

represents the starting region which is split into single regions until the criteria of

equation (2.14), (2.15) **and** (2.16) are met.

Although the algorithms of merging **and** splitting seem to be very similar, they

do not result into the same segmentation result. There are cases where the splitting

process is stopped **for** homogeneous regions whereas the merging process would not find

this region because of a restriction at an earlier step.

2.3.2.3 Splitting **and** Merging

To take advantage of both, the splitting **and** merging process, the two approaches

are combined. The split-**and**-merge algorithm uses a pyramidal image structure. On

any level of the pyramid regions are split into four sub-regions when the homogeneity

criteria is not satisfied. If any of these four parts are coherent, they are merged together

again. If none of the regions can be split or merged any more, regions that have not the

same parent or are in different pyramid levels are taken into account. If those regions

are homogeneous they are merged together too. Small-sized regions can be evaluated

separately at the end **and** merged together if appropriate.

2.3.3 Mum**for**d-Shah

Most of the time the presented basic methods, like **for** example the thresholding scheme,

are not applicable to real-world problems because of a too high complexity of the images.

Mum**for**d **and** Shah proposed a model [39] to approximate an observed image u 0 with

18 Chapter 2. Related Work

a function u by minimizing the energy

{ } { ∫

∫

1

min E MS (u, C) = min (u 0 − u) 2 dΩ + λ2

|∇u| 2 dΩ + νH d−1} . (2.18)

u,C

u,C 2 Ω

2 Ω\C

The approximation u is a piecewise constant **and** smooth function **and** λ defines the

scale where the smoothing is done. Starting with the last part of the equation, ν is a

tuning parameter **for** the Hausdorff measure H where d st**and**s **for** the dimensionality.

For the 2D case H represents the length of the discontinuity set C. The first integral

represents the fidelity term using a L 2 -norm which ensures that u is similar to u 0 . More

details on the fidelity term is discussed in course of the variational restoration approach

of Rudin, Osher **and** Fatemi in Section 3.2.1. The second integral, the regularization

term, ensures the smoothness of the segmentation result but not across discontinuities

which enables the method to process open structures.

The Mum**for**d-Shah segmentation model does not pick up textured objects directly

because they are not composed of one smooth region. Only objects featuring a homogeneous

region inside the boundary can be modelled with the proposed energy **and** the

segmentation **for** these will be valid. A benefit of the model is that a test image of two

objects can be segmented (Figure 2.6). Most st**and**ard active contour models will fail

with this class of images **and** return a single contour that enclose both objects. The

main drawback appears with textured objects because u 0 cannot be well approximated

by a piecewise smooth function u due to the occurring discontinuities.

Ambrosio **and** Tortorelli proposed an approximation of the Mum**for**d-Shah functional

in [1]. They introduced a dual variable v which will represent the discontinuities

of the processed image. In Figure 2.6b the edge set v is shown **for** a synthetic image.

{ } { ∫ ∫ ( ∫

min E AT (u, C) = min ρ |∇v| 2 + α v 2 |∇u| 2 +

u,v

u,v

Ω

Ω

Ω

)

(v − 1) 2

+ β |u − u 0 | 2} .

4αρ

(2.19)

2.3.4 Chan-Vese

Many variations on the Mum**for**d-Shah energy were proposed over the years. The one

we want to focus on is the approach of Chan **and** Vese [15, 58]. They took up the

challenge of solving the minimization problem of equation (2.18). The energy has to be

re**for**mulated to be solvable in a mathematically correct way. The method enables to

2.3. Region-Based Segmentation 19

(a) Test image with two objects. (b) Edge set v.

Figure 2.6: The Ambrosio-Tortorelli approximation of the Mum**for**d-Shah

functional is able to find both objects.

find objects where the boundaries are not necessarily defined by gradient. Chan-Vese

propose the following energy with the main idea on separating two different intensity

values c 1 **and** c 2 which splits the approximation of u 0 in two regions, defining u as

average of u 0 of the sub-regions in- **and** outside of C:

regularization terms

{

} { { }} {

min E CV (c 1 , c 2 , C) = min µ · H 1 (C) + ν · H 2 (c)

c 1 ,c 2 ,C

c 1 ,c 2 ,C

∫

∫

+λ 1 |u 0 − c 1 | 2 dΩ in + λ 2

Ω in

Ω out

|u 0 − c 2 | 2 dΩ out

} {{ }

fitting term

}

(2.20)

The set Ω in is defined as inside region of the contour C **and** Ω out there**for**e as outside

region: {Ω in } ∩ {Ω out } ∩ {C} = {Ω}. Using a variational **for**mulation **for** the level set

representation as proposed in [15] the energy E CV (c 1 , c 2 , C) can be re**for**mulated to

∫

∫

E (c 1 , c 2 , φ) = µ δ(φ) |∇φ| dΩ + ν H(φ) dΩ

∫Ω

Ω

∫

+λ 1 |u 0 − c 1 | 2 H(φ) dΩ + λ 2 |u 0 − c 2 | 2 (1 − H(φ)) dΩ.

Ω

Ω

(2.21)

For further details on level set representations see section 2.4.1. The solution **for** u using

the Chan-Vese model as a particular case of the Mum**for**d-Shah approach is proposed in

20 Chapter 2. Related Work

[15] **and** the constants c 1 **and** c 2 are interpreted as the average of u 0 over the resulting

sub-region:

u = c 1 H(φ) + c 2 (1 − H(φ)) (2.22)

∫

Ω

c 1 (φ) =

u 0H(φ) dΩ

∫Ω H(φ) dΩ (2.23)

∫

Ω

c 2 (φ) =

u 0 (1 − H(φ)) dΩ

∫Ω (1 − H(φ)) dΩ (2.24)

The segmentation model by Chan **and** Vese is often referred to as “Active Contours

without Edges” due to the name of their **paper** [15] **and** the corresponding, often reproduced

segmentation results on blurred images or objects without well-defined borders.

To show this effect of the Chan-Vese segmentation algorithm a synthetic image without

clear borders **and** edges is processed **and** the result is presented in Figure 2.7.

Figure 2.7: Segmentation result of a synthetic test image using the Chan-

Vese model.

2.4 Shape Prior Segmentation

In this section we want to review segmentation methods that identify structures of

known geometric shape. The shape is used to guide the segmentation to the desired

object. The **for**m of the shape representation depends on the field of application. Many

approaches use statistical models to represent a shape model whereas others only use

a single spline or binary image to integrate prior in**for**mation into the segmentation

2.4. Shape Prior Segmentation 21

model. Some of the reviewed methods use a level set approach to optimize the proposed

energies. There**for**e we present a short introduction of level sets in the following.

2.4.1 Introduction to the Level Set Framework

Introduced by Stanley Osher **and** James A. Sethian in the year 1988 [44] level set methods

have been enhanced over the years **and** there**for**e many variations **and** extensions

to the algorithm exist. The idea is to evolve a closed curve Γ within a plane called zero

level set. Γ is called the interface **and** bounds an open region Ω. The idea of Osher

**and** Sethian is to define a higher dimensional embedding function φ that represents the

interface. There**for**e the curve is defined at every point where the higher dimensional

level set function φ crosses the zero level set:

C = {φ (x) = 0} (2.25)

The level set properties depending on the interface position are represented in

equation (2.26) **and** Figure 2.8.

∀x ∈ Ω + : φ > 0

∀x ∈ Ω − : φ < 0

∀x ∈ ∂Ω = Γ : φ = 0

(2.26)

Figure 2.8: Defining level set properties

During the evolution of the contour, connectivity may change **and** undergo a topological

change (see Figure 2.9). A single level set initialization can h**and**le a split-up

22 Chapter 2. Related Work

into multiple regions like in Figure 2.9b **and** 2.9c which would not be feasible with parametric

models like Snakes. The general evolution of the implicit function φ is given by

the partial differential equation (PDE)

∂φ

∂t

+ ∇φ · ∂C

∂t

= 0. (2.27)

This equation is often referred to as level set equation **and** models the motion of φ

where φ (C (t) , t) = 0.

The dot-product takes only the normal component of the

contour velocity into account. There**for**e a **for**ce F describes this normal component

**and** leads to the PDE

∂φ

∂t

+ F |∇φ| = 0. (2.28)

(a) Initialization

(b)

(c)

(d) Final Result

Figure 2.9: Contour topology change during the evolution of the level set

function.

2.4. Shape Prior Segmentation 23

2.4.2 Leventon et al. approach

Leventon et al. devised a method in [33, 34] where a segmentation task is done with

respect to a shape model. There**for**e a PCA is applied on the signed distance functions

(SDFs) of the contour model. SDFs are preferred over parametric models because SDFs

provide more tolerance against slight misalignment during the aligning step. First a

statistical model is trained over a set of curves representing the shape in**for**mation.

The segmentation process itself is done by evolving a geodesic active contour using

local image features like image gradients **and** curvature (see level sets in Section 2.4.1).

In addition to evolving the level set **and** the image term an additional term that **for**ms a

shape **for**ce is added. To estimate the globally optimal shape pose **and** position a shape

parameter α **and** a pose parameter p are introduced using a maximum a posteriori

(MAP) approach:

{

}

(α MAP , p MAP ) = argmax P (α, p | φ, ∇I)

α,p

(2.29)

The combination of the shape pose **and** parameter describes a specific shape C ∗ = (α, p).

Exp**and**ed by the Bayes’ rule this leads to:

P (α, p | φ, ∇(I)) =

=

P (φ, ∇(I) | α, p) P (α, p)

P (φ, ∇I)

P (φ | α, p) P (∇I | α, p, φ) P (α) P (p)

P (φ, ∇I)

(2.30)

P (φ | α, p): Probability of a certain evolution interface φ given a shape pose (α, p).

P (∇I | α, p, φ): This gradient term represents the probability of certain image gradients

to the contour. Aligning the contour along the image border P (∇I | α, p, φ)

is maximized. In [34] Leventon et al. show that moving along the normal of the

object border, the probability can be modelled as a Gaussian distribution.

P (α) P (p): These two terms define the maximum a posteriori estimators of shape **and**

pose. There**for**e the priors are essential to estimate the final segmentation result.

Due to runtime issues not all probabilities are evaluated in every step. P (α) **and**

P (p) are only evaluated near the current level set result at every evolution step.

To maximize this a posteriori probability two independent optimization steps are

needed. The equation **for** evolving the surface φ with a shape prior C ∗ is presented in

24 Chapter 2. Related Work

[34, 35]:

φ (t + 1) = φ(t) + λ 1 (g (c + κ) |∇φ(t)| + ∇φ(t) · ∇g)

+ λ 2 (C ∗ (t) − φ(t))

(2.31)

λ 1 defines the update step size **and** λ 2 ∈ [0, 1] is a linear coefficient defining how much

to trust the maximum a posteriori estimate. The λ 1 weighted term represents the

“classical” geodesic active contour energy. To get shape knowledge into account the

λ 2 weighted part drives the shape of the evolved contour in direction of the estimated

prior. The final evolution equation is not a PDE since two separated, independent steps

evolve the final result.

2.4.3 Diffusion Snakes

In [23, 25] Cremers et al. propose a framework that integrates statistical shape knowledge

into the Mum**for**d-Shah segmentation model [39, 40]. This modification of the

Mum**for**d-Shah model allows to create an explicit parametrization of the contour. The

main advantage of representing segmentation **and** shape energy in one term is the possibility

of calculating the contour evolution **and** shape optimization in one task. All

previous mentioned approaches could only solve this with a two-step algorithm. In

[24] the combination of contour energy **and** shape optimization is introduced by Cremers

et al. using the equation

E(u, C ∗ ) = E MS (u, C ∗ ) + αE c (C ∗ ) (2.32)

which gives the possibility to control whether the energy favors contours that are similar

of the learnt shape contours E c or the Mum**for**d-Shah segmentation with the parameter

α. E MS represents the fit of the current segmentation to the gray value in**for**mation in

**for**m of the Mum**for**d-Shah model reviewed in Section 2.3.3. A PCA representation is

chosen **for** shape representation **and** the mean shape Cµ ∗ **and** the covariance matrix Σ

is determined with an average of all shapes Ci ∗ from the set of shapes {C1 ∗, C∗ 2 , . . . }

Cµ ∗ = mean [Ci ∗ ] (2.33)

(C

∗

Σ = mean[

i − Cµ) ∗ (

C

∗

i − Cµ

∗ ) ] t

(2.34)

2.4. Shape Prior Segmentation 25

The covariance matrix can be used to model a Gaussian probability distribution of

shapes P (C ∗ ) that can be used to define the energy as proposed in [24]:

(

P (C ∗ −

) ∝ e

2(C ∗ −Cµ) ∗ t Σ −1 (C ∗ −Cµ) ) ∗

( )

(2.35)

P (C ∗ −E(C

) ∝ e

)

(2.36)

E c (C ∗ ) = − log ( P (C ∗ ) ) + const. (2.37)

E c (C ∗ ) = 1 (

C ∗ − C ∗ ) t

µ Σ

−1 ( C ∗ − C ∗

2

µ)

(2.38)

In respect to the contour energy (2.38) the Mum**for**d-Shah energy (2.18) can be adapted

to

E(u, C(C ∗ )) = E MS

(

u, C(C ∗ ) ) + α 1 2

(

C ∗ − Cµ

∗ ) t Σ

−1 ( C ∗ − Cµ)

∗

(2.39)

For the stated equation it is assumed that the covariance matrix is of full rank so

that the inverse Σ −1 exists. If this is not the case the inverse can be substituted with

the pseudo inverse Σ ∗ defined in [24].

reproduced in Figure 2.10.

A segmentation example using this model is

In [22] Cremers et al. proposed a nonlinear alternative **for** the linear shape model

in Equation (2.38).

The linear shape prior does not per**for**m accurate **for** training

data which is not appropriately aligned. The introduced nonlinear kernel PCA maps

the Gaussian density estimation into a kernel space.

There**for**e the original data is

non-linearly trans**for**med into feature space where the distribution is estimated by a

Gaussian density. Though the corresponding density estimate in the original space is

non-Gaussian. This method offers the possibility to model versatile **for**ms of distributions.

The only drawback that arises is that the resulting energy is no longer convex.

Minimization with a gradient descent method will there**for**e end up in a local minima.

Cremers et al. proposed to minimize the energy with a two-part algorithm where the

image energy is optimized first **and** afterwards the best fitting shape is searched by

optimizing the shape energy.

2.4.4 Chen et al. approach

The main idea of Chen et al. [16–18] is to propagate an active contour by a velocity

that depends on image gradients **and** shape prior in**for**mation. Leventon et al. use

mean **and** variance to describe the shape prior statistics whereas Chen et al. only use

26 Chapter 2. Related Work

(a) init (b) (c)

(d) (e) result (f) training

Figure 2.10: Results of contour optimization with respect to statistical

shape priors proposed by Cremers et al. in [23–25]. Figures 2.10b–2.10d

show the evolution of the contour. All figures are reprinted from [25]

the first geometric moment (mean). The shape prior is built out of the different shapes’

contours **and** provided as a mean shape. Unlike the approach of Leventon et al. the

proposed model of Chen et al. proofs the mathematical existence of a solution to the

energy minimization problem [17, 18]. The proposed energy lets the contour stick to

high image gradients **and** tries to **for**m a shape related to the given prior:

E(C, µ, R, T ) =

∫ 1

0

( ( )

g |∇I| (C(p)) + λ ) )

2 d2( ∣

µRC(p) + T C ′ (p) ∣ dp (2.40)

In equation (2.40) a curve C = µRC + T with the rigid trans**for**mation parameters

scale = µ, rotation = R **and** translation = T is searched. The result of C should be be

closely related to the shape Prior C ∗ . The gradient in**for**mation is realized with the edge

detector g **and** there**for**e the first term measures the amount of high gradient in respect

to the contour C. The second term is responsible **for** the closeness to the shape prior.

d 2 represents the squared distance of a point P (x, y) ∈ C from the prior C ∗ **and** is also

2.4. Shape Prior Segmentation 27

referred to as d 2 (x, y) = d 2 (C ∗ , (x, y)) in the literature. The energy in equation (2.40)

is minimized using a gradient descent scheme. In Figure 2.11 a segmentation of an

epicardium in an ultrasound image is shown.

(a) training **and** average shape (b) initialization (c) final contour

Figure 2.11: Segmentation of an epicardium using the model presented

in [17, 18] by Chen et al. Figure 2.11a shows a cluster of 79 curves **and**

the mean shape C ∗ (dotted contour). The initialization in Figure 2.11b

leads to a contour (solid curve) in 2.11c. The dotted line in the final

result represents the epicardium segmented by an expert. All figures are

reproduced from [17].

2.4.5 Global-to-Local Shape Registration

A different approach is traced by Paragios, Rousson **and** Ramesh in [45, 46]. They

h**and**le the segmentation problem as a global-to-local registration task. The shapes are

modelled using an Euclidean distance map in the **for**m of a level set representation.

The global registration is done via a rigid trans**for**mation whereas the local changes

are h**and**led using a de**for**mation field. For the global rigid registration Paragios et al.

propose the following energy term to register shape source S **and** shape target D with

the registration parameter A = (µ, R, T ) (scale = µ, rotation = R **and** translation = T ):

∫∫

E(A) =

Ω

( ) (µφ ) 2

D (x, y) − φ S A (x, y) dΩ (2.41)

The shape source **and** target (S, T ) are represented as two signed distance functions φ S

**and** φ D . Parts of the shape should be registered in a non-rigid way after a coarse alignment.

There**for**e a local de**for**mation field ( U(x, y), V (x, y) ) is introduced. The resulting

28 Chapter 2. Related Work

equation was proposed in [46] **and** restates equation (2.41) with some enhancements.

∫∫

E(A, (U, V )) = α

∫∫

+(1 − α)

Global model-based registration

{ }} {

Ω

Ω

N δ

(

φ D , φ S

)(µφ D − φ S (A)) 2dΩ

N δ

(

φ D , φ S

)(µφ D − φ S (A + (U, V ))) 2dΩ

} {{ }

Pixel-wise local de**for**mation

(2.42)

The parameter α is introduced to balance the setting between global motion **and** local

de**for**mation. The binary function N δ gets one if min { |φ S | , |φ D | } ≤ δ **and** zero otherwise.

It takes pixels into account that are in the range of distance δ away from the

shape. An example of this global to local registration task is shown in Figure 2.12.

Figure 2.12: This example, reproduced from [46], shows the approach

on first applying a global rigid trans**for**mation to a given shape source S

iteratively converging to a shape target D. When the rigid trans**for**mation

is done fine adjustments can be made by using the local de**for**mation field in

a non-rigid way to fit the shape representation to the desired segmentation

result.

Thereupon Rousson, Paragios **and** Deriche proposed a method in [49, 50] where the

level set framework is used to evolve a segmentation that is guided by an active shape

model proposed by Cootes et al. in [20, 21]. Applying a PCA on the level set functions

allows a representation of shape variations with complex topologies. An example of

tracking a cardiac cycle is presented in Figure 2.13. Prior in**for**mation is modelled using

a level set **for**mulation like it is done in equation (2.41). If the representation of the

shape source belongs to the class of training shapes it can be derived from the principal

mode of the shape representation model (see equation (2.43)). As a consequence the

energy can be re**for**mulated as presented in [50] where the model was applied to 3D

2.4. Shape Prior Segmentation 29

MRI data:

φ S = φ M +

∫

E(φ, A, λ) =

Ω

m∑

λ j U j (2.43)

j=1

(

δ ɛ (φ) µφ −

(

φ M (A) +

m∑

j=1

) ) 2

λ j U j (A) dΩ (2.44)

In equation (2.43) **and** (2.44) mode weights λ = (λ 1 , . . . , λ m ) **and** the eigenvectors

(modes of variations) U j = (U 1 , . . . , U m ) are used to model the shape variation with

the use of a PCA. The proposed energy has to be minimized in two separate steps.

One **for** optimizing the level set function φ **and** the other **for** the rigid trans**for**mation

parameters A. The segmentation **and** registration task can be combined by using the

variational model proposed in [49, 50]. The result is a system of linear equations. By

solving these the shape weights λ can be estimated.

Figure 2.13: Cardiac tracking using the model of Rossini et al.. The first

row shows the curve evolution (red) **and** the projection to the model space

(yellow) in the first frame. The segmentation results of a cardiac cycle is

presented in the second row. The figures are reproduced from [49].

2.4.6 Shape Prior Driven Mum**for**d-Shah Functional

In [6] Bresson et al. propose a method to combine active contour segmentation, statistically

learned parametric shape prior **and** the Mum**for**d-Shah energy into a variational

level set framework which can be seen as an extension of the model of Leventon et al.

presented in Section 2.4.2. The energy is divided into three parts combining gradient

30 Chapter 2. Related Work

in**for**mation with a shape model **and** with global **and** local image in**for**mation:

E = β s · E shape + β b · E boundary + β r · E region (2.45)

with: E shape =

E boundary =

E region =

∮ 1

0

∮ 1

0

ˆφ 2( x pca , h xT (C(q))) ∣∣

C ′ (q) ∣ ∣ dq,

(

∣∣

g |∇u 0 (C(q))|)

C ′ (q) ∣ dq,

∫Ω in

(

|u 0 − u in | 2 + µ |∇u in | 2) dΩ

(

+ |u 0 − u out |

∫Ω 2 + µ |∇u out | 2) dΩ.

out

(2.46)

(a) Chan-Vese Model

(b) Bresson et al.

Figure 2.14: In comparison of the Chan-Vese segmentation model to the

method of Bresson et al. shows that using a shape prior the left ventricle is

fitted correctly. Without a guidance to the prior the segmentation includes

the occlusion. The figure is reproduced from [6]

The shape term in Equation (2.46) guides the active contour to fit the shape model

with the shape function ˆφ, the vector of PCA eigencoefficients x pca (shape vector), h xT

considers the geometric trans**for**mation of the shape model **and** C models the active

contour. The detection of object boundaries from image gradients is implemented using

the boundary term. The last two integrals describe the global alignment of shape prior

2.5. Discussion 31

**and** active contour in the sense of the Mum**for**d-Shah model (Section 2.3.3) that drives

the segmentation to a homogeneous intensity region. Ω in **and** Ω out are the inside **and**

outside region delimited by the zero level set. The emphasis of shape, boundary **and**

region term is implemented with the constants β s , β b **and** β r . A comparison of a Chan-

Vese segmentation of an occluded left ventricle in a brain MRI image with Bressons

approach is done in Figure 2.14.

2.5 Discussion

In this chapter we presented various methods of image segmentation, reaching from

edge based methods to region based methods to more high-level approaches that incorporate

prior knowledge in **for**m of shape models. The drawbacks on simple edge **and**

region based segmentation have already been shown in the introduction in Section 1.5.

There**for**e prior knowledge was introduced to make these methods more robust. A

disadvantage of the presented shape prior segmentation models is the lack of efficient

methods to yield the desired results. In addition all approaches have problems to

overcome local minima **and** are not solvable in a globally optimal way.

Chapter 3

Geodesic Active Contour with

L 1 Shape Prior

Contents

3.1 Geodesic Active Contours with Shape In**for**mation . . . . . 33

3.2 Review of Total Variation Models . . . . . . . . . . . . . . . 34

3.3 Geometric Similarity . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 TV-L 1 Shape Prior Segmentation . . . . . . . . . . . . . . . . 42

3.5 Solutions to Variational Models . . . . . . . . . . . . . . . . . 43

3.6 Solving the Shape Prior Segmentation Model . . . . . . . . 46

3.7 Iterative Scheme to Solve the Segmentation Model . . . . . 52

3.1 Geodesic Active Contours with Shape In**for**mation

Our goal is to incorporate shape in**for**mation in a segmentation model. For the segmentation

energy we decided to use a geodesic active contour (GAC) model. Adding a

additional function which describes a shape **for**ce to the GAC model will result in an

energy of the **for**m

}

min

{E sp

{ ∫ |C|

= min g(C)

}

0

{{ }

GAC

33

∫

+λ

}

P (C, S) . (3.1)

} {{ }

Shape Prior

34 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

The first term of equation (3.1) models a GAC whereas the second integral should drive

the segmentation towards the desired shape. The main drawback of the GAC model

are local minima within the active contour energy which makes a good initialization

essential **for** the desired segmentation. To find a globally optimal solution is one of the

main dem**and**s on our segmentation model. The other two are real-time per**for**mance

**and** the possibility **for** user interaction as defined in Section 1.6. Approaches based on

the calculus of variation have had great success in the last years **and** it is shown that

variational approaches show good parallelization attributes. We decided to have a closer

look on variational approaches **and** realize a GPU-based implementation to achieve the

dem**and** of real-time ability. Nikolova et al. review methods to obtain global minimizers

**for** the classical computer vision problems of denoising **and** segmentation in [41].

In the following we give a short overview on classical **and** important variational

models. In Section 3.2.1 the well known model of Rudin, Osher **and** Fatemi (ROF

Model) **and** a modification with a L 1 data fidelity term in Section 3.2.2 is reviewed

emphasizing the application of noise removal. To return to the task of segmentation

the weighted total variation (TV) **and** its connection to GACs is presented. The segmentation

model proposed by Mum**for**d **and** Shah **and** a variation by Chan **and** Vese

have already been presented in Section 2.3.3 **and** 2.3.4.

3.2 Review of Total Variation Models

Although continuous calculation approaches using variational models **and** partial differential

equations (PDEs) have been available much longer, the first approach of image

processing that drops the idea of discrete image representation was the segmentation

model of Mum**for**d **and** Shah [39]. For an image restoration task the method of edge

preserving image denoising from Rudin, Osher **and** Fatemi (ROF Model), proposed in

the year 1992 in [51], was one of the earliest variational methods. The main advantages

of combining variational models **and** PDEs is that PDEs are well-defined mathematical

**for**mulations. In addition variational methods are defined over a continuous space **and**

there**for**e problems that were previously trans**for**med from continuous domains onto

discrete grids can be solved directly in their continuous world. Over the last years

variational image restoration became a popular **and** successful field **and** the methods

got extended to solve in-painting, segmentation **and** other vision problems. A selection

of seminal methods is given by Chan et al. in [12]. The main problem of variational

3.2. Review of Total Variation Models 35

models have been runtime issues. The optimization tasks are runtime intensive algorithms.

However the algorithms can be well parallelized **and** with the use of GPGPU

programming one can overcome the runtime drawback **for** the approaches.

3.2.1 Rudin Osher **and** Fatemi - Noise Removal

As stated be**for**e, Rudin Osher **and** Fatemi were the first who applied a variational

method to an image restoration task. In [51] a denoising model with the ability to

preserve edges was presented. The main idea is to reconstruct an intensity function

u(x, y) that has been corrupted with noise from an observed function u 0 (x, y) which

can e.g. be an image corrupted with Gaussian noise or a signal that is disturbed by

various means. Normally we are not aware of the noise so we get an ill-posed problem

as described in the introduction in Section 1.4:

u 0 (x, y) = u(x, y) + n(x, y) e.g. **for** additive noise n (3.2)

For a continuous representation the problem is restated as a least squares representation

to find an approximation of u.

min |u 0 − Au|

u

∫Ω

2 dx, Ω . . . Image domain (3.3)

The function A models the occurring distortions. In the field of image processing u 0

represents an observed image containing texture **and** noise. Texture can be described as

a repeating **and** meaningful structure whereas noise is characterized as an uncorrelated

r**and**om pattern. In classic literature the rest of the image, removing texture **and**

noise, but still containing hues **and** sharp edges, is called cartoon. In equation (3.2) u

represents the cartoon **and** n texture **and** noise.

with constraints:

min |∇u| dΩ

u

∫Ω

∫ ∫

u dΩ = u 0 dΩ

Ω

} {{

Ω

}

noise has zero means

**and**

∫

(u − u 0 ) 2 dΩ = σ 2

Ω

} {{ }

**and** st**and**ard deviation σ

(3.4)

The method of ROF denoising was taken up by Chambolle et al. **and** restated as

a continuous energy minimization in [11]. With the modification that the constraint

∫

Ω (u − u 0) 2 dΩ = σ 2 is replaced by ∫ Ω (u − u 0) 2 dΩ σ 2 this leads to the uncon-

36 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

strained energy minimization

{ }

min E ROF = min

u

u

{ ∫ Ω

|∇u| dΩ + 1 ∫

( ) } 2 u − u0 dΩ , (3.5)

2λ Ω

where λ > 0 is a Lagrange multiplier. The first part, the essential novelty of the ROF

model, ∫ Ω

|∇u| dΩ is called the regularization term that measures the variation of u

without penalizing discontinuities. The second integral ∫ ) 2

Ω(

u − u0 dΩ is named data

fidelity term which **for**ces u to be close to u 0 .

Because of the L 2 -norm in the data

fidelity term it is often restated as TV-L 2 model. A denoising example of this model

is presented in Figure 3.1. The proposed ROF model comes with certain limitations.

The main issue is the loss of contrast that can be observed on even noise free images

(see Figure 3.2). In [53] Strong **and** Chan did a more detailed analysis of this problem

in the field of signal processing.

(a) Input image corrupted with Gaussian

noise.

(b) Filtered image with λ = 0.06.

Figure 3.1: Denoising example with the help of the ROF energy minimization

(TV-L 2 )

3.2.2 The L 1 Data Fidelity Term

Due to the known drawbacks of the ROF model, research was pushed to investigate

more versatile methods. Aujol et al. give a good overview in [3] on different methods

which were derived from the ROF approach using different norms calculating the fidelity

term. We concentrate on reviewing the TV-L 1 approach presented by Chan et al. [13]

3.2. Review of Total Variation Models 37

(a) unfiltered image (b) λ = 0.3

(c) λ = 1.0 (d) λ = 5.0

Figure 3.2: TV-L2 filter applied to a synthetic test image with line thickness

of 0.5, 1, 2 **and** 4 pt. The two gray gradients are 10 **and** 20 pt thick

whereas the circles have a diameter of 20, 40, 60, 80 **and** 100 pt. The

drawback of contrast loss is obvious **for** increasing filter strength.

**and** Yin et al. [59] which lead to the energy

{ }

min E

u

TV-L 1

= min

u

{ ∫ ∫

∣

∣∇u∣ dΩ + λ

Ω

∣

∣ ∣∣

∣u − u 0 dΩ

}. (3.6)

In Figure 3.3 the TV-L 1 model is applied to a test image. Structures of a certain size

(depending on λ) are removed **and** the contrast persists. The TV-L 1 approach is wellsuited

**for** removing impulse noise like in applications of in-painting or to select features

of a certain scale done by Yin et al. in [59]. The main drawback of the TV-L 1 energy

Ω

38 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

in equation (3.6) in comparison to the ROF model is that it is not strictly convex **and**

there**for**e more than one global minimum exist which makes the optimization task more

difficult.

(a) λ = 0.55 (b) λ = 0.2 (c) λ = 0.06

Figure 3.3: TV-L1 filter applied to a resolution test chart. Contrast

persists **and** structures fade away according to size.

3.2.3 Weighted Total Variation

In [4, 5] Bresson et al. proposed a modification of the ROF-model:

{ ∫ ∫

min g(x) |∇u| dΩ + λ

u

Ω

∣ }

∣ ∣∣

∣u − u 0 dΩ

Ω

(3.7)

The modified data fidelity term has already been discussed in the previous section

of the L 1 approach. The main novelty of Bressons approach is the introduction of the

weight g(x) within the regularization term. This so-called weighted TV-norm preserves

the geometry of the original features in comparison to the classical L 1 - or L 2 -approach.

The main drawback **for** this model is the lack of global minimizers when optimizing its

energy. There**for**e in [32] Leung **and** Osher **and** later Bresson et al. in [5] presented a

method to find the global minima **for** the presented model: Using an edge detector **for**

the weighting function g(x) **and** if u is a characteristic function 1 C that is allowed to

vary smoothly in the interval [0, 1] the weighted Total Variation describes a Geodesic

Active Contour model **and** minimizes the same energy as in equation (2.11). With the

mentioned constraints the weighted TV-norm (see equation (3.8)) becomes a convex

3.3. Geometric Similarity 39

function **and** a globally optimal solution can be calculated.

∫

T V g (u) =

Ω

g(x) |∇u| dΩ (3.8)

In addition Leung **and** Osher presented the flexibility of the energy in equation (3.7).

For a λ(x) = 0 the model can be used **for** Total Variation inpainting to recover destructed

image regions. With a λ(x) = λ 0 the energy can be used **for** a denoising task **and**

**for** λ(x) → ∞ u remains unchanged.

To come back again to the desired shape prior segmentation model the weight g(x)

is modelled with an edge function that represents the edge strength **for** each pixel as

we want to stick to strong edges with the help of the GAC energy.

We decided to

use the measure **for** natural images presented by Huang **and** Mum**for**d in [27] which

has already been presented in the course of the GAC-model in Section 2.2.5. With an

characteristic function u that is allowed to vary smoothly in the interval [0, 1], the first

integral of equation (3.1) can be modelled with the help of the weighted TV-norm:

}

min

{E sp

{ ∫

= min

Ω

g(x) |∇u| dΩ

} {{ }

GAC

∫

+λ

}

P (C, S) . (3.9)

} {{ }

Shape Prior

3.3 Geometric Similarity

Similarity defines the strength of relationship between two objects. What “similar”

means is application dependent **and** **for** our purpose only the shape is relevant **and** the

appearance can be left out. Two geometrical objects can be denoted as similar if one

is congruent to a rigid trans**for**mation of the other. There are different possibilities to

measure the similarity of two objects with various distances. A selection of methods to

describe similarities is given in Section 3.3.1. In Section 3.3.2 we define a L 1 measure

that can be incorporated into the energy Equation (3.1).

3.3.1 Distance Measures

Hamming Distance: The Hamming distance measures the area of symmetric difference

between two polygons. The area that is overlapping is taken into account.

There**for**e the distance will be zero when the two objects are identical **and** properly

40 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

aligned. If one of the two has a slight variation at the boundary, due to noise or simply

a comparison of two different objects, the Hamming distance will be small. For a

complete misalignment the result is the sum of the two polygons areas.

Hausdorff Distance This measurement calculates the longest distance from one

point to another one in the second subset. The Hausdorff distance is not symmetrical.

Often an image is pre-processed with an edge detector to process the resulting binary

image. The difference of the Hamming **and** Hausdorff distance can be shown well with

a single outlier. If a shape has a single but grave outlier the Hausdorff distance will

change significantly but won’t have a large effect on the Hamming distance. In return

the Hamming distance will increase **for** a little change of the whole boundary (e.g.

shrinking of the complete shape) **and** have little effect on the Hausdorff measure.

Comparing Skeletons A completely different approach to measure shape similarity

extracts a skeleton from each object. With the use of thinning a tree-like structure

can be developed **and** the shape comparison is reduced to process two skeletons. To

compare two graphs the topology of the tree **and** the length of the edges can be used.

Manifold Clustering of Shapes A manifold is an abstract mathematical space

which allows to express more complicated structures in terms of properties of simpler

spaces. It is **for** example possible to describe a set of shapes in an infinite dimensional

vector space (manifold). There**for**e the distance between two shapes can be calculated

**and** the resulting minimal distance is a measure **for** the similarity of the examined

shapes.

3.3.2 L 1 Shape Prior

To incorporate shape in**for**mation in the segmentation model we have to define a distance

**for** the shape towards the current GAC segmentation in **for**m of an energy equation.

We decided to describe the shape measurement as symmetric difference of regions

which can be **for**mulated as L 1 -distance **and** there**for**e suits well **for** integrating into a

variational approach:

∫

E sim = |u(x, y) − s| (3.10)

Ω

3.3. Geometric Similarity 41

For a two-dimensional image example the energy of equation (3.10) defines how well a

specific shape s fits to a function u(x, y) at every pixel (x, y). For two identical functions

s **and** u the resulting energy is zero. For a good alignment the similarity measure

remains low E sim → 0, whereas **for** misaligned shapes the energy will increase. In the

energy plots of Figure 3.4 the energy according to different alignments are evaluated

at each pixel. There**for**e the shape s **and** the underlying function u(x) are represented

by the same binary image of a geometric shape.

(a) E = 15 (b) E = 1188 (c) E = 1490 (d) E = 1600 (e) E = 1847

Figure 3.4: Shape similarity measure of different alignments. For the

image u **and** shape representation s the same binary image is used. In the

first row the energy is evaluated **for** each single pixel. In the second row

the contours of the alignment are marked in red **for** u **and** in green **for** the

shape prior s **for** better illustration. In each column the resulting energy

of equation (3.10) is given which can be interpreted as a measure **for** the

alignment.

3.3.3 Rigid Shape Trans**for**mation φ ◦ s

The introduced L 1 shape prior gives a measure of the current alignment of a fixed

geometric shape. For the final segmentation method we want to have a possibility

to optimize the shape position. This optimization task is equivalent with searching a

global minimum of the proposed L 1 shape distance. To facilitate the modification of

the shape geometry during the minimization task, a rigid trans**for**mation φ = {R, t, s}

with the trans**for**mation parameters R **for** rotation, t **for** translation **and** s **for** the scale

42 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

is introduced into the energy equation (3.10):

∫

E sim = |u(x, y) − φ ◦ s| (3.11)

Ω

We will offer two possibilities to optimize the segmentation result with the help of the

shape position in our applications. First the user will be able to modify the alignment

to get different segmentation results **and** choose a appropriate result. Secondly an

automatic position optimization should provide the optimal alignment of the shape s

towards some function u.

3.4 TV-L 1 Shape Prior Segmentation

Starting with the proposal on combining a GAC energy with a shape model,

equation (3.1) can be restated with the weighted TV-norm **for** the GAC part

(see equation (3.8)) **and** a L 1

segmentation energy:

shape prior (see equation (3.11)) to a variational

}

min

{E sp = min

u∈[0,1],φ

u∈[0,1],φ{ ∫ ∫

}

g(x) |∇u| dΩ + λ |u − φ ◦ s| dΩ

Ω

Ω

For the edge weight g(x) again the measure **for** natural images g(|∇I|) = e −η|∇I|κ

(3.12)

used. λ represents a parameter **for** the balance of segmentation energy towards shape

**for**ce. For a low λ the result of the GAC will be preferred, whereas **for** increasing λ the

shape prior will be taken into account. In Figure 3.5 the effect of the parameter λ is

shown.

is

(a) λ = 0.05 (b) λ = 0.075 (c) λ = 0.1 (d) λ = 0.125 (e) λ = 0.15

Figure 3.5: Evaluation of different λ settings. The results show that the

GAC is attracted to edges which are shown as input image. The shape in**for**mation

is presented as green **and** the segmentation result as red contour.

3.5. Solutions to Variational Models 43

The introduced variational model of equation (3.12) can be solved globally optimal

**for** a specific shape position. To integrate the variability of the shape position a rigid

trans**for**mation can be applied to the shape prior which offers the possibility to find

a locally optimal alignment **for** the prior in respect to the given edge image. This

allows to segment objects within images where the GAC would fail due to weak edges,

occlusion or noise. To optimize the proposed energy model we first have a look at

solving variational models in general in Section 3.5.

3.5 Solutions to Variational Models

In this section the presented solutions are only an extract from existing methods to

solve total variation models. The main emphasis of the reviewed approaches lies on

solving the Euler-Lagrange equation. The main issue on solving TV models is the

L 1 norm that is non-differentiable at zero. This problem is stated in more detail in

Section 3.5.2 by means of the ROF model.

3.5.1 Calculus of Variation

Calculus of variation is a mathematical field that finds stationary values **for** given

functions. Most of the time this describes a minimum or maximum of the function.

Ideally the extrema appears at locations where the derivative vanishes which results in

solving the Euler-Lagrange equation. If an integral of the **for**m

∫

I =

f ( t, y, y ′) dt,

y ′ = dy

dt

(3.13)

is given, stationary values of I are possible if the Euler-Lagrange equation (3.14) is

fulfilled.

∂f

∂y − d ( ) ∂f !

dt ∂y ′ = 0 (3.14)

3.5.2 Explicit Solution of the ROF Model

Rudin, Osher **and** Fatemi presented a method on solving the proposed energy (3.5) **and**

its resulting Euler-Lagrange equation (3.15) with an explicit time marching method

44 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

(equation (3.16)) that is iterated till a steady state is reached.

( ) ∇u

−∇ · + 1 |∇u| λ (u − u 0) + ∂u

∂n = 0 (3.15)

[ ( ) ∇u

u n+1 = u n n

− dt ∇ ·

|∇u n + 1 ]

| λ (un − u 0 )

(3.16)

As stated be**for**e, the Euler-Lagrange equation √ is not defined at ∇u = 0. There**for**e the

denominator |∇u| is replaced by |∇u| ɛ

= |∇u| 2 + ɛ which ensures that the term will

not become zero. The major drawback of this approximation is that convergence is

slow when |∇u| is small **and** if ɛ is large the edges get blurred. Because of the highly

non-linear fraction ∇u

|∇u|

a close **for**m solution is very unlikely.

3.5.3 Dual Formulation

To overcome the problem of a degeneration in the case |∇u| → 0 Chan, Golub **and** Mulet

introduced a dual **for**mulation **for** solving the total variation minimization problem in

[14]. Chambolle applied a similar method to remove singularities in [9, 10]. The main

intention is to remove the singularity by replacing the term ∇u

|∇u|

in the Euler-Lagrange

equation (3.15) with the dual variable p. The minimization task can be **for**mulated

with the following two equations:

p = ∇u

|∇u|

→ p |∇u| − ∇u = 0 (3.17)

−∇ · p + 1 λ (u − u 0) = 0 (3.18)

Rearranging the equation (3.18) with respect to u **and** substituting the result into

equation (3.17) yields to the following equations:

u = u 0 + λ∇ · p (3.19)

p |∇ (u 0 + λ∇ · p)| − ∇ (u 0 + λ∇ · p) = 0 (3.20)

3.5. Solutions to Variational Models 45

Applying the equation (3.20) to a gradient descent algorithm leads to an iterative

method solving the dual variable p. As timestep dt = τ /λ is chosen.

p n+1 = p n − τ [ ∣∣∣∇ (

λ∇ · p n ) ∣ + u 0 ∣ · p n+1 − ∇ ( λ∇ · p n ) ]

+ u 0

λ

p n + τ (∇ ( λ∇ · p n ) )

+ u 0

(3.21)

p n+1 = λ

1 + τ ∣

∣∇ ( ) ∣

λ∇ · p

λ

n + u 0 ∣

Finally with solving equation (3.19) one gets the final solution of u. Chambolle comes

to the same algorithm in [9] with a slightly different derivation. In [10] he shows that

a convergence of the gradient descent algorithm can be guaranteed **for** τ ≤ 1 /D **for** a

D-dimensional problem.

3.5.4 Solving the TV-L1 Model

As mentioned in the second half of Section 3.2.1 the TV-L 1 model (3.6) is not strictly

convex. The second L 1 -norm in the data fidelity term prohibits a closed **for**m solution

like the dual approach **for** the ROF model. As a first consequence Chan **and** Esedoglu

applied an additional replacement in [13] as **for** the explicit solution of the ROF model in

Section 3.5.2. The Euler-Lagrange equation **for** solving the TV-L 1 model √ holds another

L 1 norm |u − u 0 | in a denominator that is replaced with |u − u 0 | δ

= |u − u 0 | 2 + δ. For

the TV-norm |∇u| the replacement from Section 3.5.2 is retained with the mentioned

drawbacks:

( ∇u

∇ ·

|∇u|

∇ ·

)

+ λ u − u 0

|u − u 0 | = 0 (3.22)

( ∇u

|∇u| ɛ

)

+ λ u − u 0

|u − u 0 | δ

= 0 (3.23)

Due to the slow convergence of the iterative model Aujol et al. [3] approximated the

TV-L 1 energy defined in equation (3.6) with the convex **for**mulation

{ }

min E

u,v TV-L 1

= min

u,v

solving as TV-L 2 model

{ }} {

{ ∫

∣

∣∇u∣ dΩ + 1 ∫ ( ) 2 ∫

2θ

u − v dΩ + λ

Ω

Ω

} {{ }

thresholding scheme

Ω

∣ }

∣ ∣∣

∣v − u 0 dΩ . (3.24)

46 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

As illustrated in equation (3.24) the energy minimization regarding to u **and** v is done

in two steps:

1. Minimization in terms of u with a fixed v:

{ ∫

min |∇u| dΩ + 1 ∫

( ) } 2 u − v dΩ

u

Ω

2θ Ω

2. Minimization in terms of v with a fixed u:

{ ∫ 1

( ) ∫

2

min u − v dΩ + λ

v 2θ Ω

∣ }

∣ ∣∣

∣v − u 0 dΩ

Ω

(3.25)

(3.26)

The two steps are iterated consecutively till convergence. For the task in step 1

the Chambolle algorithm is reviewed in Section 3.5.3.

The second task is a pointwise

convex minimization problem.

equation (3.26) is given in equation (3.27).

The outcome of this is the soft thresholding

scheme in equation (3.28).

The resulting Euler-Lagrange **for**mulation **for**

1

θ (u − v) + λ · sgn (v − u 0) → 1 θ (u − v) + λ · v − u 0

|v − u 0 |

!

= 0 (3.27)

⎧

u − λθ | v − u 0 < 0

⎪⎨

v = u + λθ | v − u 0 > 0

⎪⎩

u 0 | v − u 0 = 0

(3.28)

3.6 Solving the Shape Prior Segmentation Model

The following Euler-Lagrange equation depicts an explicit solution to the proposed

segmentation model.

(

∂E sp

∂u = ∇ · g ∇u )

+ λ u − φ ◦ s

|∇u| |u − φ ◦ s|

The methods reviewed in Section 3.5 **for** solving the ROF- **and** L 1

!

= 0 (3.29)

model can be

used to optimize the proposed equation (3.12), respectively solve the Euler-Lagrange

equation (3.29). However we want to avoid approximations of the total variation with

|∇u| ɛ

**and** the second L 1 -norm with |u − φ ◦ s| δ

as in Section 3.5.2, solutions based on a

3.6. Solving the Shape Prior Segmentation Model 47

dual approach come to the **for**e. Introducing a second variable v leads to a convex, solvable

**for**mulation min u,v,φ {E sp }. This minimization task is similar to the optimization

of min u,v {E TV-L 1} from Section 3.5.4.

}

min

{E sp = min

u,v,φ

u,v,φ{ ∫ Ω

∣

g∣∇u∣ dΩ + 1 ∫

2θ

Ω

( ) ∫

2

u − v dΩ + λ

∣

∣v − φ ◦ s∣ dΩ

Ω

}

(3.30)

The minimization in respect of u **and** v can be split up into two separate optimization

steps:

1. Minimization in terms of u with a fixed v **and** φ:

{ ∫

min g |∇u| dΩ + 1 ∫

( ) } 2 u − v dΩ

u

Ω 2θ Ω

(3.31)

2. Minimization in terms of v with a fixed u **and** φ:

{ ∫ 1

( ) ∫

2

min u − v dΩ + λ

v 2θ Ω

∣

∣v − φ ◦ s∣ dΩ

Ω

}

(3.32)

3. Optimize rigid trans**for**mation parameters φ with a fixed u **and** v:

{ ∫ }

∣

min ∣v − φ ◦ s∣

φ Ω

(3.33)

4. Iterate until convergence.

3.6.1 Minimize u **for** fixed v **and** φ – Projected Gradient Descent

The optimization problem (3.31) resembles the ROF approach. The resulting Euler-

Lagrange equation (3.34) differs from equation (3.15) by the weight function g.

(

− ∇ · g ∇u )

+ 1 (u − v) = 0 (3.34)

|∇u| θ

To avoid an approximation with |∇u| ɛ

a duality based algorithm like is applied. To

present an alternative to the algorithm of Chambolle (see Section 3.5.3) the projected

gradient descent approach is introduced in the following. There**for**e the TV-norm **and**

48 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

as a consequent the weighted TV-norm is replaced with a dual variable p:

{

}

|∇u| = max p · ∇u : ||p|| ≤ 1

{

}

g |∇u| = max p · ∇u : ||p|| ≤ g

(3.35)

(3.36)

The projection of p can be seen even more clearly in the dual **for**mulation of Chambolle

(see Section 3.5.3) where p would be directly defined with the substitution in the Euler-

Lagrange equation (3.29) which results in p = g ∇u

projection of p to a maximal length of g.

|∇u| .

This can be interpreted as a

Inserting the relation (3.36) into the minimization problem (3.31) leads again to an

optimization task towards two variables:

min

u

max

||p||≤g{ ∫ p · ∇u dΩ + 1 ∫

Ω 2θ

Ω

(

u − v

) 2 dΩ

}

. (3.37)

For further calculations the first integral ∫ Ω

p·∇u dΩ can be replaced by the divergence

theorem (sometimes related to as Gauss’ theorem in the literature):

∫

Ω

∫

p · ∇u dΩ = − ∇p · u dΩ (3.38)

Ω

Because u is a convex function the minimization **and** maximization relation can be

exchanged. This allows a minimization using the first derivative with respect to u which

leads to a combined maximization task according to p **and** a minimization due to u.

Note that the divergence theorem is again applied to the first integral in equation (3.42)

∫

−v∇p =

∫

∇v · p, which leads to a maximization problem stated in equation (3.43).

{ ∫

max min − u · ∇p dΩ + 1 ∫

( ) } 2 u − v dΩ

||p||≤g u

Ω 2θ Ω

(3.39)

∂E

∂u = −∇p + 1 θ (u − v) = ! 0 → u = v + θ∇p (3.40)

{ ∫

max − ∇p · (v + θ∇p) dΩ + 1 ∫

( ) } 2 v + θ∇p − v dΩ (3.41)

||p||≤g Ω 2θ Ω

{ ∫

∫

max − v · ∇p dΩ − θ (∇p) 2 dΩ + 1 ∫

}

(θ∇p) 2 dΩ (3.42)

||p||≤g Ω

Ω

2θ Ω

max

||p||≤g{ ∫ ∇v · p dΩ − θ ∫ }

(∇p) 2 dΩ

(3.43)

Ω 2

Ω

3.6. Solving the Shape Prior Segmentation Model 49

Next, the problem is converted into a minimization task by simply inverting

the prefix from the maximization task.

The resulting optimization problem in

**for**m of a dual **for**mulation can be solved in the continuous domain using the

Euler-Lagrange equation (3.45).

{ ∫

min −

||p||≤g

∫

}

(∇p) 2 dΩ

(3.44)

∇v · p dΩ + θ

Ω 2 Ω

∂E

∂p = −∇ (v + θ · ∇p) = ! 0 , ||p|| ≤ g (3.45)

To consider the constraint of projecting p to a length of g, the minimization task

has to be split up into two steps. The name of the method comes from the iterative

scheme where first a gradient descent method is used to get a temporary dual variable

˜p **and** with a trailed re-projection p is restricted to the length of g.

˜p n+1 = p n + τ θ ∇ (u 0 + θ∇p n ) (3.46)

p n+1 =

max

˜p n+1

{1, ||˜p||n+1

g

} (3.47)

3.6.2 Minimize v **for** fixed u **and** φ

For the second minimization problem (3.32) a thresholding scheme can be derived from

the corresponding Euler-Lagrange equation (3.32):

1

θ (u − v) + λ sgn (v − φ ◦ s) ! = 0 (3.48)

Three different cases can be distinguished **for** the direct solution:

⎧

u − λθ | v − φ ◦ s < 0

⎪⎨

v = u + λθ | v − φ ◦ s > 0

⎪⎩

φ ◦ s | v − φ ◦ s = 0

(3.49)

50 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

3.6.3 Optimize the Rigid Trans**for**mation φ **for** fixed u **and** v

Exhaustive Search

The simplest, but computationally most costly search method is to simply test each

possible shape alignment sequentially. In theory every possible position ought to be

evaluated to find the globally optimal position. The only drawback is a runtime issue.

For a two dimensional rigid trans**for**mation a parameter **for** translation in x **and** y

direction, a rotation **and** scaling has to be optimized. This results into four nested

**for**-loops which are costly to solve **for** a larger search region. For all possible positions

this is not realistic **and** there**for**e the parameters of φ are restricted to a certain domain.

Optimizing within this region guarantees to find the optimal position **for** this domain.

In combination with user-interaction the coarse position can be stated by the user

**and** the local position optimization is done automatically. This approach also fits our

semi-automatic approach.

Binary Search

A more sophisticated search algorithm is the binary search. There**for**e a predefined

interval is divided consecutively. For each alignment the energy equation (3.12) is

evaluated **and** the position with the lowest energy is taken. The main drawback of

the algorithm is that the optimal position is skipped which may occur especially with

difficult alignment problems. In addition it is not guaranteed that the position with

the optimal alignment will be reached. Instead the binary search will stop at a local

minima.

Consecutive Search

The result of splitting the nested four loops of an exhaustive search into a consecutively

algorithm will result into a search method that is much faster on the one h**and** but on

the other h**and** it cannot be guaranteed that the optimal position is found within the

search region.We implemented some variations of this method. The possibility of a

parallel search of the translation parameters x **and** y **and** afterwards a consecutive

search **for** a optimal rotation **and** scale trans**for**mations encountered that this method

is a quite good compromise on speed **and** accuracy.

3.6. Solving the Shape Prior Segmentation Model 51

Gradient Descent

Another optimization method to detect a local minima is to take the negative of the

gradient of the function at the current position **and** use it as a step size approach towards

the minimum. Often a timestep variable is added to control the size of the step size.

The main advantage is the speed of the optimization. The additional timestep variable

can have negative effects on the stability. When the value is too high the alignment

task may start to jitter **and** **for** low values the speed of the optimization will slow down.

However the main drawback of this optimization method is the lack of finding a global

optimum. Instead the gradient descent search will get stuck in local minima.

3.6.3.1 Discussion

We showed that there are several methods to optimize the shape position. For the

interactive segmentation tool that will be presented in Section 5.2 the exhaustive search

is used because it guarantees to find the globally optimal solution. With the possibility

of user-interaction the search region can be restricted **and** the application preserves

real-time per**for**mance. For tracking slow moving objects also the exhaustive search

method can achieve a sufficient frame rate. Though **for** faster movements the position

optimization has to be done with a gradient descent approach. The drawback, as

already stated be**for**eh**and**, is the lack of finding the globally optimal solution. For

movements along a fixed plane the consecutive search can be a good compromise.

52 Chapter 3. Geodesic Active Contour with L 1 Shape Prior

3.7 Iterative Scheme to Solve the Segmentation Model

In the following we want to combine the steps derived in Section 3.6 into an iterative

update scheme to give a better overview of the proposed algorithm **and** its solution

using a projected gradient descent method to minimize the energy equation.

1.

˜p n+1 = p n + τ θ ∇ (u 0 + θ∇p n )

2.

p n+1 =

max

˜p n

{1, ||˜p||n+1

g

}

3.

4.

u n+1 = v n + θ∇p n+1

⎧

u ⎪⎨

n+1 − λθ | v n − φ ◦ s < 0

v n+1 = u n+1 + λθ | v n − φ ◦ s > 0

⎪⎩

φ ◦ s | v − φ ◦ s = 0

5. Optimize φ **for** an optimal shape alignment.

6. Goto 1. until convergence.

In practice it turns out that the amount of iterations needs not to be balanced. 2 to 3

iterations are sufficient **for** the sub-optimization of the dual variable (step 1 **and** 2) to

provide a result **for** the calculation of u (3) **and** the thresholding step (4) so that the

whole algorithm converges to a stable result.

Chapter 4

Implementation

Contents

4.1 GPU design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3 Is there an alternative to CUDA? . . . . . . . . . . . . . . . 58

4.4 Implementation Details . . . . . . . . . . . . . . . . . . . . . . 58

To reach real-time per**for**mance we have to compute the iterative solution as fast

as possible. Due to the good parallelization attributes of variational algorithms **and**

the enhancements of programming techniques on graphics hardware we decided to

implement the method using GPGPU programming with the help of NVidias CUDA

[43]. Since the introduction of the GeForce 8 series it is possible to gain easier access

to GPGPU programming because of available program feedback **and** unified shading

hardware. More details on unified shaders **and** the general GPU design is provided in

Section 4.1.

4.1 GPU design

To give a better underst**and**ing of the used programming techniques we want to give a

short overview of modern graphics hardware **and** how the design differs from previous

models. Because we work on NVidia hardware, we will concentrate on the main differences

from the GeForce 8 Series **and** newer to older GeForce GPUs. Former rendering

53

54 Chapter 4. Implementation

pipelines (Figure 4.1) were fixed **and** split into different parts implementing each task

on a different processor.

Figure 4.1: Schematic sequence of shader units in a traditional GPU

rendering pipeline.

Figure 4.2: Principle of unified shader model.

This data flow passes data (vertices, attributes, etc.) from the CPU to the GPU **and**

traverses the major processing stages linearly from the left to the right in Figure 4.1. To

change this sequential flow into a loop oriented model the unified shader architecture

was introduced with the release of the GeForce 8 Series [42]. A schematic overview

is given in Figure 4.2. Unified Stream Processors (SPs) can process any kind of data

**and** there**for**e the balance between the different stages is not fixed any more. If a

program or a loaded scene has to do more processing on pixel level than **for** example on

vertices the workload will be divided on dem**and**. The data processing is loop oriented.

4.2. CUDA 55

Input is passed to the unified shader **and** results are redirected from local registers

to the processing unit **for** the next operation. When all shading operations are done

the resulting pixel fragment is h**and**ed over to the ROP unit. SPs can be grouped

together to provide high parallel computing possibilities. This can be used **for** GPGPU

computing which can be done with the CUDA framework from NVidia (see Section 4.2).

The principle of SPs is illustrated in Figure 4.3. This block diagram of a GeForce 8800

GTX gives an overview of the composition of the graphics device. The SPs **and** texture

units (TFs) are combined to blocks where single threads can run on. These blocks offer

some amount of local memory (L1 cache) **and** can exchange data on a shared memory

level (L2 cache). A better overview of provided memory is given in Figure 4.4 **and** of

the CUDA programming techniques in Section 4.2. Each SP unit can be assigned to

a specific shader task **and** the output can be redirected as an input to a different SP

very efficiently.

Figure 4.3: Block Diagram of GeForce 8800 GTX. The figure is reprinted

from [42]

4.2 CUDA

GPUs have been used **for** non-graphics computation **for** several years. With the introduction

of the unified shader model, GPUs offer the opportunity to build a framework

56 Chapter 4. Implementation

Figure 4.4: Provided memory of the GeForce 8 Series that can be used in

programming with CUDA. This figure is taken from the CUDA Programming

guide [43].

that lightens the learning curve of GPU computing. Since the introduction of the

GeForce 8 Series the thread management **for** the different shaders can be interpreted

as a single management facility **for** thread h**and**ling in GPGPU computing. With the

CUDA framework [43] NVidia provides a GPGPU technology based on the syntax of

the C programming language. One main advantage of the new GPU design is the

generic SPs in combination with the ability of generic addressing of the device memory.

In **for**mer GPU series it was not possible to write to arbitrary addresses in memory.

Generally a GPU is specialized **for** highly parallel intensive computation. They can be

regarded as a coprocessor to the CPU **and** are implemented as a set of multiprocessors.

Due to this, GPUs are predestined to apply identical operations on varying data. The

threads are organized as a grid of thread blocks (see Figure 4.5):

4.2. CUDA 57

Figure 4.5: Thread organization in grids **and** blocks **for** kernel execution.

The figure is taken from [43].

Thread Blocks: A bunch of threads that can communicate via shared memory **and**

can be synchronized on certain points. The thread ID is visible inside the kernel

**and** can be used **for** position sensing inside the processed data.

Grid of Thread Blocks Due to the limited number of threads within a block, the

thread blocks that execute the same kernel are jointed into a grid. The main

disadvantage compared to threads inside a block is the lack of fast communication

facilities. Again a block ID provides the opportunity to ascertain the current

position.

58 Chapter 4. Implementation

In Figure 4.4 the memory accessibility is shown in respect to the single threads. As

**for** CPUs, registers present the fastest **and** closest method of local memory access. In

addition local memory is available **for** kernel execution. To share data between threads

there is the ability to use shared memory which brings a high speedup when correctly

accessed. To achieve high memory b**and**width bank conflicts have to be avoided. Shared

memory is divided into memory banks where each one can h**and**le one access per clock

cycle. When multiple threads access the same bank the access has to be serialized.

At best the b**and**width is as high as **for** register access.There is no other possibility to

share data between grid blocks than the device memory. Write access is only provided

from the global memory that is not cached unlike constant **and** texture memory that

are indeed cached but only readable **for** kernel functions.

The CPU can communicate towards the GPU through the global, constant **and**

texture memory. There is no possibility to access the device shared or even local

memory from the host side. Due to slower b**and**width between host **and** device memory,

data transfer between the two should be minimized to gain optimal per**for**mance. One

should take advantage of the high b**and**width between device **and** device memory.

4.3 Is there an alternative to CUDA?

Anything proposed by the graphics manufacturer NVidia has normally its counterpart

from AMD/ATI **and** vice versa. CUDA is no exception. AMD proposed “Close To

Metal” (CTM) in the year 2006 [2]. Like in CUDA, ATI offers gather **and** scatter

memory operations. The main idea on CTM is to gain more direct control on the

underlying hardware. Contrary to CUDA which offers high level “C-style” syntax,

CTM is more an assembly like language **and** there**for**e the learning curve is much higher

with AMDs GPGPU framework. Please refer to [2] **for** more details on CTM.

4.4 Implementation Details

General guidelines to gain a per**for**mance increase with GPGPU programming has already

been discussed in the Section 4.2. The iterative approach on solving our segmentation

model (see Section 3.7) can be processed on GPU by utilizing shared memory.

For the algorithm only the neighboring pixel are needed to update the current position.

4.4. Implementation Details 59

There**for**e the image data can loaded as patches that fit the corresponding block sizes

**and** solve the problem according this patch without obtaining new data from the global

memory. The final result is written back to global memory **and** the next batch can be

processed. This method is well suited to benefit of the principle of SPs.

Chapter 5

Applications **and** Results

Contents

5.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Interactive Medical Image Segmentation . . . . . . . . . . . 62

5.3 Shape Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.4 Tracking Application . . . . . . . . . . . . . . . . . . . . . . . 76

5.5 Processing 3D CT/MR Data . . . . . . . . . . . . . . . . . . 79

5.1 Applications

The main intention was to build a semi-automatic segmentation system **for** medical

image data as described in Section 5.2. This gives a medical doctor the opportunity to

interact with the segmentation result. For very difficult data with bad edge in**for**mation

the model can be used to favor the prior **and** just find an appropriate alignment **for** the

shape model. There**for**e the shape alignment of Section 5.3 gains full control over the

final segmentation. Due to the strong linking of the segmentation **and** the alignment

task, the results cannot be assigned to a specific application in many situations. The

results that are related to both are presented in course of the shape alignment in

Section 5.3. In the next application the shape alignment utilized to build a tracking

system which is evinced in Section 5.4. The intention is to detect the selected object in

the next frame which is nothing else than optimizing the shape position with respect

to the position within the previous frame. In Section 5.5 the tracking is modified

61

62 Chapter 5. Applications **and** Results

to process 3D data sets. Using CT/MR slices as frame data offers the possibility to

reconstruct 3D objects with the help of a given 2D prior.

To provide easier user interaction a graphical user interface was designed to cover

the dem**and** of the single applications. The specific tools are presented in the following

sections in the course of the application descriptions. The GUI is implemented with

Trolltechs QT.

5.2 Interactive Medical Image Segmentation

As a basis **for** the implementation, we chose the segmentation model of Pock **and**

Unger [47, 56, 57]. They proposed the energy function (5.1) to integrate user interaction

into a geodesic active contour model.

min

u∈[0,1]{ ∫ g(x) |∇u| dΩ + 1 ∫

}

λ(x) (u − f) 2 dΩ

Ω 2 Ω

(5.1)

The first term is again the g-weighted TV of u whereas f defines the user input.

There**for**e **for**eground seeds are marked with 1 **and** background with 0 within f. With

the parameter λ the user can regulate how much the user seeds should be taken into

account or if the edge in**for**mation is preferred with the aid of the GAC model with the

g-weighted TV.

The segmentation model of Pock **and** Unger is used to facilitate the creation of a

shape prior on the fly. The user can set **for**eground **and** background seeds **and** define the

resulting segmentation as prior **for** the further segmentation task. A click on the button

“set Shape” takes the shape **for**m **and** switches to our proposed energy model. There the

user can again choose with the parameter λ, consistent with the energy equation (3.12),

if the segmentation should be attracted to the shape prior **for** higher λ or respectively

to the edge function with the use of the g-weighted TV **for** lower λ. A workflow of

shape definition with the corresponding user interface is shown in Figure 5.1.

second phalanx of the middle finger was segmented using **for**eground (on the bone)

**and** background (image border) seeds in Figure 5.1a which shows the complete GUI.

The other images only show the relevant parts of the program. The segmentation of

Figure 5.1a is used as a shape prior which results in a binary representation as shown in

Figure 5.1b. Switching to the edit mode “move shape” allows to position this shape onto

another finger. In Figure 5.1c the shape is represented as a green contour **and** it was

The

5.2. Interactive Medical Image Segmentation 63

moved onto the ring finger. Because of a slightly different shape the segmentation result

should also include the edge image **and** there**for**e the result in Figure 5.1d segments the

second phalanx with λ = 0.035. The GPU-based implementation enables real-time

interaction. When the shape position is modified the segmentation is accommodated

immediately.

(a) Shape initialization.

(b) (c) New shape position. (d) Segmentation with λ = 0.035.

Figure 5.1: Workflow of shape prior segmentation with a shape prior

defined on the fly.

64 Chapter 5. Applications **and** Results

The second possibility to get a shape prior into account is to load a binary representation

of the shape. There**for**e an image with black **and** white regions has to be

prepared in advance whereas zero (black) would define the inside region of the shape

**and** accordingly one (white) the outside. Hence h**and** labeled data can be used as

easy as shape prior definitions that were made during the runtime which makes the

application versatile **for** different uses.

(a) Thresholding result with

T = 0.3.

(b) Segmentation with pure

GAC energy.

(c) Shape prior segmentation

with λ = 0.15.

(d) Thresholding result with

T = 0.3.

(e) Segmentation with pure

GAC energy.

(f) Shape prior segmentation

with λ = 0.15.

Figure 5.2: Segmentation of the first phalanx of an index finger (5.2a–

5.2c) **and** a metacarpal bone of a ring finger (5.2d–5.2f). The red curve

represents the final segmentation **and** the green points mark the reference

segmentation by an expert.

A labeled data set was available **for** the metacarpal bones **and** the proximal (first)

phalanges in a X-ray image of the h**and**. As an example we used the first phalanx

of the index finger **and** the metacarpal bone of the ring finger from a left h**and**. In

Figure 5.2 this data set was used to show the different results when using thresholding,

pure geodesic active contour energy **and** the proposed shape prior segmentation is

5.2. Interactive Medical Image Segmentation 65

shown. As a result we can see that the thresholding fails **for** both examples. The GAC

segmentation with the image borders as background **and** some h**and** labeled **for**eground

seeds per**for**m well in regions with good contrast but also fail in regions where no good

edges are available. This is especially the case where the metacarpal bone meet the

carpal bones. Incorporating shape in**for**mation the result is equivalent to the groundtruth

segmentation labels. The segmentation results in Figure 5.2c **and** 5.2f assume an

accurate alignment of the shape. For misaligned data the results are shown in the next

section in Figure 5.7 **and** 5.6.

(a) Thresholding result with

T = 0.3.

(b) Segmentation with pure

GAC energy.

(c) Shape prior segmentation

with λ = 0.15.

Figure 5.3: Segmentation of a vertebra in a X-ray image of the spline. Due

to very bad contrast the segmentation without prior fails. The definition

of the prior was prepared by us **and** there**for**e cannot be considered as

reference data.

A more complex example with low contrast images is presented in Figure 5.3.

Thresholding totally fails **for** segmenting a single vertebra in the sagittal X-ray image

of a spline. Also the pure geodesic active contour does not provide a reasonable

segmentation unless very much constraints are set by the user. Incorporating shape

in**for**mation it is possible to obtain a reasonable segmentation in Figure 5.3.

66 Chapter 5. Applications **and** Results

Figure 5.4 shows a data set recorded with a camera in a more natural setup. The

aim is to segment the imaged h**and** on the images. The basic segmentation with thresholding

in Figure 5.4a again fails **for** the provided data. With GACs, including **for**e- **and**

background seeds, the segmentation is possible but the user has to provide constraints

to get a splitting of the fingers. With a shape prior in **for**m of the h**and** the segmentation

succeeds **for** this image.

(a) Thresholding result with

T = 0.7.

(b) Segmentation with pure

GAC energy.

(c) Shape prior segmentation

with λ = 0.15.

Figure 5.4: Segmentation of a h**and**.

The provided examples always require an accurate alignment of the shape prior to

achieve the shown segmentation. For misaligned data the prior has to be aligned be**for**e

the segmentation is done. This problem is evinced in more details in the Section 5.3.

Exact time measurements are not very reasonable **for** the segmentation tool because

it highly depends on the input data **and** the amount of user interaction. In addition the

energy equation is evaluated in the area of the shape prior **and** there**for**e also depends

on the size of the shape representation. However we will mention some values **for** the

iterations processed per second **and** the resulting frame-rate in the Table 5.1 to convey

the efficiency of the proposed method **and** its GPU-based implementation.

Data set Image Size [px] Shape Size [px] Iterations/s Frames/s

X-ray (spline) 512 × 800 153 × 111 30800 205

X-ray (spline) 512 × 800 512 × 512 17100 57

H**and** (index finger) 512 × 512 256 × 256 25300 113

H**and** (ring finger) 512 × 512 256 × 512 20700 93

Ultrasound (heart) 800 × 640 384 × 384 23200 102

Table 5.1: Per**for**mance evaluation of the shape prior segmentation tool

5.3. Shape Alignment 67

5.3 Shape Alignment

If an image has very bad edge strength or many disturbing edges the segmentation

cannot be solely based on an edge function. There the user has the possibility to trust on

shape alignment. The ability to take a predefined shape into the segmentation system

establishes the possibility to stick to the shape with a high λ **for** the segmentation task.

The problem arises that the alignment has to be accurate so that the desired object

borders can be found. The following application proposes a method to automatically

align the shape towards the optimal position to segment the desired object. The search

methods **for** optimizing the parameters of the rigid shape trans**for**mation φ ◦ s are

discussed in Section 3.3.3. The linear search guarantees to find the optimal position

within the search region. Due to runtime issues – the linear search of rigid parameters

results into four nested **for**-loops – the search region will be limited in the application.

It is up to the user to define the allowed shape variation. As an alternative search

method a gradient descent algorithm is implemented which offers a timestep variable

**for** each optimization variable (translation, rotation **and** scale). Again the user has to

define how fast the model is allowed to change the segmentation prior. Generally the

application describes again an optimization task. The optimal position of the shape

prior is equivalent with the global minimum of the proposed energy equation (3.12).

For the alignment it is also possible that some parameters of the rigid trans**for**mation

are provided by user interaction or at least suggested by a coarse positioning. Figure 5.5

shows an example where the user gives the coarse position **and** the local optimization

of translation, rotation **and** scaling are determined with a line search optimization.

The green contour describes the shape boundary **and** the red represents the current

segmentation. For λ a value of 0.1 was chosen which is a good balance of weighting

the shape in**for**mation to edge attraction. In Figure 5.5c the shape is aligned to the

underlying bone structure. When the shape prior is positioned on a more different

structure like in Figure 5.5d the rigid trans**for**mation tries to optimize the energy (3.12)

**for** the best fit.

Figure 5.6 **and** 5.7 show an example **for** shape position optimization of the h**and**labeled

data that has already been used **for** evaluating the segmentation task in Section

5.2. The rigid trans**for**mation parameters are optimized to get a reasonable alignment

of the shape prior to the calculated edge function. Assuming that the optimization

region lies within the give trans**for**mation restrictions, the shape gets optimally aligned

68 Chapter 5. Applications **and** Results

(a)

(b)

(c)

(d)

Figure 5.5: Shape alignment with user interaction.

**and** the resulting segmentation is equal than the h**and**-labeled contour.

In medical image segmentation various image modalities are available. For the examples

in Figure 5.8 **and** 5.9 two ultrasound images of the heart are processed. Due to

the noisy output of the ultrasound scanners the segmentation according to edges is difficult.

Incorporating a shape prior **and** optimizing the shape position the desired results

can be achieved. In Figure 5.8 the segmentation of the left ventricle in an echocardiogram

is shown. In Figure 5.8a the green contour represents the initial shape prior **and**

the red border in Figure 5.8b shows the corresponding segmentation. In Figure 5.8c

the shape is aligned to the optimal found position. There the optimized shape prior

(green) **and** the segmentation (red) are shown within the same image. Figure 5.8c

shows the result when a TV-L 1 filtering (see Section 3.2.2) is done be**for**e the segmen-

5.3. Shape Alignment 69

tation. In Figure 5.9a the shape of a right atrium is incorporated into the segmentation.

Figure 5.9b shows an intermediate state **and** Figure 5.9c the final segmentation with

the use of the optimized shape prior position. The contour in Figure 5.9d is a result

of a segmentation that takes more weight towards the shape **for**ce. With a λ = 0.1 the

segmented borders correspond more to the proposed shape geometry. In this case the

segmentation results seems to be smoother but this depends on the incorporated shape

prior.

The data set of the Figures 5.10–5.12 show sagittal **and** coronal X-ray images of

a spline. These are examples **for** very low contrast images which are important **for**

long-term studies where the radiation should be as low as possible. Due to the shape

Figure 5.6: Alignment of a given shape to a metacarpal bone of a ring

finger. The red curve represents the current segmentation **and** the green

points mark the reference segmentation by an expert.

70 Chapter 5. Applications **and** Results

Figure 5.7: Alignment of a given shape to the second phalanx of an index

finger. The red curve represents the current segmentation **and** the green

points mark the reference segmentation by an expert.

5.3. Shape Alignment 71

(a) Initial shape position. (b) Segmentation with λ = 0.02.

(c) Optimized shape alignment **and** segmentation

with λ = 0.02.

(d) Segmentation with λ = 0.1 with previous TV-

L 1 filtering λ = 0.3.

Figure 5.8: Segmentation of the left ventricle in an echocardiogram. The

green contour represents the pre-defined shape prior **and** the red contour

the resulting segmentation.

in**for**mation the local alignment is able to find a reasonable position **for** a vertebra in

Figure 5.10 **for** the sagittal image **and** Figure 5.11 **for** the coronal one. In the example

of Figure 5.12 the shape consists of multiple vertebrae **and** the final alignment is feasible

again.

An example of non-medical image data is presented in Figure 5.13. An image of

a h**and** is used where the second half of the pictures contain some occlusion. Using a

pre-defined shape prior the segmentation of the h**and** is possible in both cases with the

help of local alignment optimization.

72 Chapter 5. Applications **and** Results

(a) Segmentation with the initialized shape position

(λ = 0.02)

(b) Segmentation with an intermediate shape position

(λ = 0.02)

(c) Segmentation after final alignment (λ = 0.02)

(d) Segmentation that favors the shape **for**ce with

λ = 0.1 which results into a smoother contour.

Figure 5.9: Segmentation of the right atrium in an echocardiogram with

the help of position alignment of a shape prior.

5.3. Shape Alignment 73

Figure 5.10: Alignment of a single vertebra in a sagittal X-ray image of

the spline.

Figure 5.11: Alignment of a single vertebra in a coronal X-ray image of

the spline.

74 Chapter 5. Applications **and** Results

Figure 5.12: Alignment of multiple vertebrae in a sagittal X-ray image of

the spline.

5.3. Shape Alignment 75

Figure 5.13: Alignment of a h**and** shape to a an image. This example

shows that the proposed method is robust against occlusion.

76 Chapter 5. Applications **and** Results

5.4 Tracking Application

Consequently the shape alignment can be applied to video sequences **and** there**for**e

track an object with the aid of shape in**for**mation. There**for**e the object position has

to be initialized at a certain position in the first frame. This can be done by either

stopping the video **and** segmenting the object with the help of the provided tools, or

by just loading a shape prior that describes the object. For the guided initialization

with a known prior the parameters **for** translation, rotation **and** scale can be adjusted

so that the segmentation is valid **for** an initialization. An automated alignment **for**

the initialization with the tools of Section 5.3 is another possibility to provide a valid

segmentation **for** the first frame. For the tracking itself a shape alignment is applied

from one frame to the next. The disadvantage when searching **for** the globally optimal

position with a line search method is obvious. For faster movement the allowed trans**for**mation

parameters have to be increased which slows down the tracking **and** **for** a

restricted domain of parameters problems occur when the object moves faster, rotates

more or gets smaller or bigger than the specified domain. Then the tracking gets incorrect

or lost in the worst case. The the tracking application can resort to the gradient

descent search method which will result in a much higher frame rate. The drawback

is that the tracking begins to jitter if the timestep gets to high or the segmentation

will not catch up with fast object movement **for** lower values. For a restricted search

region the tracking application can reach real-time per**for**mance despite of the very

costly search method.

In Figure 5.14 a real-time tracking example of a cup is shown. The espresso cup

is moved with a speed that a search region of a few pixel is sufficient **and** the position

update can be done in real-time. Switching to a search method like gradient descent

brings a per**for**mance boost but as mentioned in Section 3.3.3 it cannot guarantee to

find the globally optimal position. For a sequence like the one in Figure 5.14 the result

would be satisfying due to the lack of disturbing edges near the object.

In Figure 5.15 a more complex example is shown. This one is not processed in

real-time but as video sequence due to disturbing edges at the driving lane from the

shadows of the trees **and** illumination changes. The sequence is pre-processed with

some Gaussian smoothing so that disturbing edges can be reduced to a certain extend.

However the dominant edge of the shadow **and** the illumination change when entering

the tunnel remain. Despite the difficult condition the tracking provides a appropriate

5.4. Tracking Application 77

position estimation **and** the car is followed through the whole sequence. Slight misalignment

in some frames due to heavy illumination changes are already compensated

after a few frames.

Figure 5.14: Real-time tracking of an espresso cup using an exhaustive

search **for** position optimization **and** a restricted domain of allowed trans**for**mation

parameters.

78 Chapter 5. Applications **and** Results

Figure 5.15: Daimler-Chrysler sequence of a car approaching into a tunnel.

The sequence is very difficult to track based on edge in**for**mation because

of many disturbing edges on the lane **and** the illumination changes.

However the shape prior segmentation with position optimization did a

fairly good job **and** did not loose the car when entering the tunnel.

5.5. Processing 3D CT/MR Data 79

5.5 Processing 3D CT/MR Data

As a specialized tracking **and** shape aligning application the algorithm is applied to

3D image data. The idea is to track an object through the slices of an MRT or CT

volume. First an object is segmented on a starting slice according to its shape **and**

second the prior position is optimized regarding the shape change of the object while

iterating through the volume slices. With the help of such a segmentation a three

dimensional representation can be built out of the single segmentation results. Here

the main emphasis does not lie on real-time per**for**mance **and** there**for**e the search region

can be enlarged. This will result in a more stable result. In Figure 5.17 the abdominal

aorta is tracked through the MR scan of an abdomen.

Figure 5.16: Aorta segmentation with the help of 3D data processing

with a 2D shape prior. The single MR slices are used as consecutive frames

**and** the shape position is optimized according the aorta.

80 Chapter 5. Applications **and** Results

Figure 5.17: 3D visualization of aorta segmentation. Different views show

the tubular structure in the volumetric data. The segmentation result is

shown in red.

Chapter 6

Conclusion **and** Outlook

Contents

6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.1 Conclusion

In this master’s thesis an interactive segmentation model that incorporates prior in**for**mation

was presented. In Section 2.2 **and** 2.3 we introduce some basic methods **for**

image segmentation. We showed that especially low level methods like thresholding

or simple edge chains have their drawbacks on complex data sets. However also more

advanced edge-based methods like the Snake model by Kass et al. [29] or the geodesic

active contours by Caselles et al. [8] depend on basic features like edges **and** do not

incorporate high level knowledge in their original implementations. The corresponding

optimization techniques are not able to find a global optimum **and** get stuck in

a local one. This drawback is also applicable **for** more sophisticated region-based approaches

like the Mum**for**d-Shah segmentation model [39] or its modification by Chan

**and** Vese [15, 58].

To h**and**le more difficult tasks, methods were developed that incorporate prior

knowledge in a segmentation framework. In Section 2.4 we give an extract of methods

that use shape models to improve the segmentation results. Level set methods by

Leventon et al. [33, 34] or Chen et al. [16–18] are difficult to optimize **and** the result is

81

82 Chapter 6. Conclusion **and** Outlook

not guaranteed to be globally optimal. The two-part method of Paragios et al. [45, 46]

combines a global registration task with a local de**for**mation field. The last two approaches

by Cremers et al. [23, 25] **and** Bresson et al. [6] incorporate shape in**for**mation

into the Mum**for**d-Shah method.

The proposed variational shape prior segmentation is presented in Chapter 3. The

approach is based on a segmentation with geodesic active contours that prefers a defined

geometric shape. The geodesic active contour is implemented with a weighted TV-norm

**and** the shape **for**ce is integrated in the data-fidelity term that is modelled with a L 1 -

norm. This is stated in Section 3.3.2 **and** 3.4. The method to obtain a globally optimal

solution **and** optimize the given paramters is shown in Section 3.6.

The implementation itself is done with the help of NVidias GPGPU framework

CUDA. The optimization of the variational models can be accelerated **and** the methods

reach real-time per**for**mance. Details on graphics hardware **and** GPGPU computing is

given in Chapter 4.

Chapter 5 was devoted to the implemented applications **and** the corresponding

results. First an interactive segmentation application was reviewed **and** evaluated with

h**and**-labeled image data. This data was also used to review the task of shape alignment

in Section 5.3. Next a tracking application was implemented which makes use of the

shape alignment on consecutive image frames. Finally the tracking was modified to

process 3D image data with the use of a 2D shape prior.

6.2 Outlook

For further development several sub-parts may be enhanced. First the shape prior

representation could be implemented in a more flexible way. For fast treatment of

new data the binary shape prior offers the possibility to process the image without

previous learning step. However the segmentation is only based on one fixed shape

prior. Introducing an optional statistical prior or maybe a combination of a binary

shape **and** a learnable prior it is possible to incorporate different shapes **and** chose

the best adapted one **for** the particular situation. An idea is also to combine multiple

priors with some kind of **for**ces. There**for**e it would be possible to favor an alignment

of the shapes towards each other **and** the **for**ces have effect on the alignment among

themselves.

6.2. Outlook 83

Another idea is to have a 3D representation of an object as shape prior. This

would make the segmentation algorithm much more versatile to different viewpoints.

A position estimation could be done be**for**eh**and** **and** there**for**e very accurate priors

could be used **for** the final segmentation.

For the alignment **and** tracking application the position optimization is the main

per**for**mance bottleneck. Especially **for** fast object movement in the tracking application

real-time ability gets lost when using the exhaustive search method. It may guarantee

to find the globally optimal position within the search region but **for** an increase of the

allowed trans**for**mation the frame rate will drop. There**for**e a global estimation of the

object movement would come in h**and**y **for** a per**for**mance boost. An idea is to estimate

the coarse movement with optical flow **and** only do the accurate alignment within a

small environment. Optical flow has already been implemented on the GPU by Pock

**and** Zach in [47, 60] **and** the ability to calculate the flow field in real-time was proven.

84

Bibliography

[1] Ambrosio, L. **and** Tortorelli, V. (1992). On the approximation of free discontinuity

problems. Boll. Un. Mat. Ital., B(7),6(1):105–123.

[2] AMD **Graphics** Products Group (2006). ATI CTM Guide 1.01. Technical report,

AMD Corp., Sunnyvale, CA, USA.

[3] Aujol, J. F., Gilboa, G., Chan, T., **and** Osher, S. J. (2006). Structure-texture

image decomposition: Modeling, algorithms, **and** parameter selection. International

Journal of **Computer** **Vision**, 67(1):111–136.

[4] Bresson, X., Esedoglu, S., V**and**ergheynst, P., Thiran, J. P., **and** Osher, S. J. (2005).

Global minimizers of the active contour/snake model. International Conference on

Free Boundary Problems: Theory **and** Applications (FBP).

[5] Bresson, X., Esedoglu, S., V**and**ergheynst, P., Thiran, J. P., **and** Osher, S. J. (2007).

Fast global minimization of the active contour/snake model. Journal of Mathematical

Imaging **and** **Vision**, 28(2):151–167.

[6] Bresson, X., V**and**ergheynst, P., **and** Thiran, J. P. (2006). A variational model

**for** object segmentation using boundary in**for**mation **and** shape prior driven by the

Mum**for**d-Shah functional. International Journal of **Computer** **Vision**, 68(2):145–162.

[7] Canny, J. F. (1983). Finding edges **and** lines in images. Master’s thesis, Massachusetts

**Institute** of Technology, Dept. of Electrical Engineering **and** **Computer**

Science. Supervisor: J. Michael Brady.

[8] Caselles, V., Kimmel, R., **and** Sapiro, G. (1997). Geodesic active contours. International

Journal of **Computer** **Vision**, 22(1):61–79.

[9] Chambolle, A. (2004). An algorithm **for** total variation minimization **and** applications.

Journal of Mathematical Imaging **and** **Vision**, 20(1-2):89–97.

[10] Chambolle, A. (2005). Total variation minimization **and** a class of binary MRF

models. In Energy Minimization Methods in **Computer** **Vision** **and** Pattern Recognition,

pages 136–152.

[11] Chambolle, A. **and** Lions, P.-L. (1997). Image recovery via total variation minimization

**and** related problems. Numerische Mathematik, 76(2):167–188.

BIBLIOGRAPHY 85

[12] Chan, T., Esedoglu, S., Park, F., **and** Yip, A. (2006). Recent developments in total

variation image restoration. In Paragios, N., Chen, Y., **and** Faugeras, O., editors,

H**and**book of Mathematical Models in **Computer** **Vision**, pages 17–31. Springer.

[13] Chan, T. F. **and** Esedoglu, S. (2005). Aspects of total variation regularized L 1

function approximation. SIAM Journal of Applied Mathematics, 65(5):1817–1837.

[14] Chan, T. F., Golub, G. H., **and** Mulet, P. (1999). A nonlinear primal-dual method

**for** total variation-based image restoration. SIAM Journal on Scientific Computing,

20(6):1964–1977.

[15] Chan, T. F. **and** Vese, L. A. (2001). Active contours without edges. IEEE Trans.

Image Processing, 10(2):266–277.

[16] Chen, Y., Guo, W., Huang, F., Wilson, D. C., **and** Geiser, E. A. (2003). Using prior

shape **and** points in medical image segmentation. In Rangarajan, A., Figueiredo, M.

A. T., **and** Zerubia, J., editors, Energy Minimization Methods in **Computer** **Vision**

**and** Pattern Recognition, 4th International Workshop, EMMCVPR 2003, Lisbon,

Portugal, July 7-9, 2003, Proceedings, volume 2683 of Lecture Notes in **Computer**

Science, pages 291–305. Springer.

[17] Chen, Y., Tagare, H. D., Thiruvenkadam, S., Huang, F., Wilson, D., Gopinath,

K. S., Briggs, R. W., **and** Geiser, E. A. (2002). Using prior shapes in geometric active

contours in a variational framework. International Journal of **Computer** **Vision**,

50(3):315–328.

[18] Chen, Y., Thiruvenkadam, S., Tagare, H. D., Huang, F., Wilson, D., **and** Geiser,

E. (2001). On the incorporation of shape priors into geometric active contours. In

Variational **and** Level Set Methods in **Computer** **Vision**, pages 145–152.

[19] Cohen, L. **and** Cohen, I. (1993). Finite-element methods **for** active contour models

**and** balloons **for** 2-d **and** 3-d images. PAMI, 15(11):1131–1147.

[20] Cootes, T. **and** Taylor, C. (1999). Statistical models of appearance **for** computer

vision. Technical report, **University** of Manchester, Wolfson Image Analysis Unit,

Imaging Science **and** Biomedical Engineering, Manchester, United Kingdom.

[21] Cootes, T. F., Taylor, C. J., Cooper, D. H., **and** Graham, J. (1995). Active shape

models: Their training **and** application. **Computer** **Vision** **and** Image Underst**and**ing,

61(1):38–59.

86

[22] Cremers, D., Kohlberger, T., **and** Schnörr, C. (2003). Shape statistics in kernel

space **for** variational image segmentation. Pattern Recognition, 36(9):1929–1943.

[23] Cremers, D., Schnörr, C., **and** Weickert, J. (2001). Diffusion snakes: Combining

statistical shape knowledge **and** image in**for**mation in a variational framework. In

Paragios, N., editor, IEEE First Int. Workshop on Variational **and** Level Set Methods,

pages 137–144, Vancouver.

[24] Cremers, D., Schnörr, C., Weickert, J., **and** Schellewald, C. (2000). Diffusion snakes

using statistical shape knowledge. In Sommer, C. **and** Zeevi, Y., editors, Algebraic

Frames **for** the Perception-Action Cycle, volume 1888 of LNCS, pages 164–174, Kiel,

Germany. Springer.

[25] Cremers, D., Tischhäuser, F., Weickert, J., **and** Schnörr, C. (2002). Diffusion

snakes: Introducing statistical shape knowledge into the Mum**for**d–Shah functional.

International Journal of **Computer** **Vision**, 50(3):295–313.

[26] Hadamard, J. (1902). Sur les problèmes aux dérivées partielles et leur signification

physique. Princeton Univ. Bull., 13:49–52.

[27] Huang, J. **and** Mum**for**d, D. (1999). Statistics of natural images **and** models.

Proceedings IEEE Conference on **Computer** **Vision** **and** Pattern Recognition, 1:541–

547.

[28] Jähne, B. (1993). Digital Image Processing - Concepts, Algorithms **and** Scientific

Applications. Springer-Verlag, second edition.

[29] Kass, M. (1980). Snakes: Active contour models. International Journal of **Computer**

**Vision**, 1(4):321–331.

[30] Kichenassamy, S., Kumar, A., Olver, P., Tannenbaum, A., **and** Yezzi, A. (1996).

Con**for**mal curvature flows: From phase transitions to active vision. Archive **for**

Rational Mechanics **and** Analysis, pages 275–301.

[31] Kichenassamy, S., Kumar, A., Olver, P. J., Tannenbaum, A. R., **and** Yezzi, Jr.,

A. J. (1995). Gradient flows **and** geometric active contour models. In International

Conference on **Computer** **Vision**, pages 810–815.

BIBLIOGRAPHY 87

[32] Leung, S. **and** Osher, S. (2005). Global minimization of the active contour model

with TV-inpainting **and** two-phase denoising. In Paragios, N., Faugeras, O. D.,

Chan, T., **and** Schnörr, C., editors, Variational, Geometric, **and** Level Set Methods

in **Computer** **Vision**, Third International Workshop, VLSM 2005, Beijing, China,

volume 3752 of Lecture Notes in **Computer** Science, pages 149–160. Springer.

[33] Leventon, M., Faugeraus, O., **and** Grimson, W. (2000a). Level set based segmentation

with intensity **and** curvature priors. In Proceedings Workshop on Mathematical

Methods in Biomedical Image Analysis, pages 4–11.

[34] Leventon, M., Grimson, W., **and** Faugeras, O. (2000b). Statistical shape influence

in geodesic active contours. In Proceedings IEEE Conference on **Computer** **Vision**

**and** Pattern Recognition, pages 316–323, Los Alamitos. IEEE.

[35] Leventon, M. E. (2000). Statistical models in medical image analysis. PhD thesis,

Massachusetts **Institute** of Technology. Supervisors: W. Eric Grimson **and** Olivier

D. Faugeras.

[36] Lindeberg, T. (1993). Scale-Space Theory in **Computer** **Vision**. Kluwer.

[37] Lindeberg, T. (1994). Scale-space theory: A basic tool **for** analysing structures at

different scales. Journal of Applied Statistics, 21(2):224–270.

[38] Marr, D. **and** Hildreth, E. (1979). Theory of edge detection. Proceedings Royal

Society of London Bulletin, 204:301–328.

[39] Mum**for**d, D. **and** Shah, J. (1985). Boundary detection by minimizing functionals.

In Proceedings IEEE **Computer** Society Conference on **Computer** **Vision** **and** Pattern

Recognition, San Francisco, CA, June 10–13, pages 22–26. IEEE.

[40] Mum**for**d, D. **and** Shah, J. (1988). Optimal approximations by piecewise smooth

functions **and** variational problems. Comm. on Pure **and** Applied Math., XLII(5):577–

685.

[41] Nikolova, M., Esedoglu, S., **and** Chan, T. F. (2006). Algorithms **for** finding global

minimizers of image segmentation **and** denoising models. SIAM Journal of Applied

Mathematics, 66(5):1632–1648.

[42] NVidia Corp. (2006). NVIDIA GeForce 8800 GPU architecture overview. Technical

report, Nvidia Corp., Santa Clara, CA, USA.

88

[43] NVidia Corp. (2007). NVIDIA CUDA Compute Unified Device Architecture –

programming guide 1.1. Technical report, Nvidia Corp., Santa Clara, CA, USA.

[44] Osher, S. **and** Sethian, J. A. (1988). Fronts propagating with curvature-dependent

speed: Algorithms based on Hamilton-Jacobi **for**mulations. Journal of Computational

Physics, 79:12–49.

[45] Paragios, N., Rousson, M., **and** Ramesh, V. (2002). Matching distance functions:

A shape-to-area variational approach **for** global-to-local registration. In European

Conference on **Computer** **Vision**, volume II, page 775 ff.

[46] Paragios, N., Rousson, M., **and** Ramesh, V. (2003). Non-rigid registration using

distance functions. **Computer** **Vision** **and** Image Underst**and**ing, 89(2-3):142–165.

[47] Pock, T. (2008). Fast Total Variation **for** **Computer** **Vision**. PhD thesis, **Institute**

**for** **Computer** **Graphics** **and** **Vision**, **Graz** **University** of Technology. Supervisors:

Prof. Dr. Horst Bischof **and** Prof. Dr. Daniel Cremers.

[48] Pylyshyn, Z. (1986). Spring **and** fall fashions in cognitive science. presented to the

Cognitive Science Society’s Eighth Annual Conference, Amherst, Massachusetts.

[49] Rousson, M., Paragios, N., **and** Deriche, R. (2003). Active shape models from a

level set perspective. Technical report, I.N.R.I.A.

[50] Rousson, M., Paragios, N., **and** Deriche, R. (2004). Implicit active shape models

**for** 3D segmentation in MR imaging. In Barillot, C., Haynor, D. R., **and** Hellier,

P., editors, Medical Image Computing **and** **Computer**-Assisted Intervention–MICCAI

2004, volume 3216 of Lecture Notes in **Computer** Science, pages 209–216. Springer.

[51] Rudin, L. I., Osher, S. J., **and** Fatemi, E. (1992). Nonlinear total variation based

noise removal algorithms. Physica D: Nonlinear Phenomena, 60:259–268.

[52] Sobel, I. **and** Feldman, G. (1968). A 3x3 isotropic gradient operator **for** image

processing. presented at a talk at the Stan**for**d Artificial Project in 1968, unpublished

but often cited (e.g. in Pattern Classification **and** Scene Analysis, Duda,R.

**and** Hart,P., John Wiley **and** Sons,’73, pp271-2).

[53] Strong, D. M. **and** Chan, T. F. (2000). Edge-preserving **and** scale-dependent

properties of total variation regularization. Technical report.

BIBLIOGRAPHY 89

[54] Tikhonov, A. N. (1963). Regularization of incorrectly posed problems. Soviet

Mathematics, 4:1624–1627.

[55] Tikhonov, A. N. **and** Arsenin, V. Y. (1977). Solutions of Ill-posed Problems. W.

H. Winston, Washington, D.C.

[56] Unger, M. (2008). An interactive framework **for** globally optimal image segmentation

with local constraints. Master’s thesis, **Institute** **for** **Computer** **Graphics** **and**

**Vision**, **Graz** **University** of Technology. Supervisor: Prof. Dr. Horst Bischof, Instructor:

Dr. Thomas Pock.

[57] Unger, M., Pock, T., **and** Bischof, H. (2008). Continuous Globally Optimal Image

Segmentation with Local Constraints. In Perš, J., editor, **Computer** **Vision** Winter

Workshop 2008, Moravske Toplice, Slovenia.

[58] Vese, L. A. **and** Chan, T. F. (2002). A multiphase level set framework **for** image

segmentation using the Mum**for**d **and** Shah model. International Journal of **Computer**

**Vision**, 50(3):271–293.

[59] Yin, W., Goldfarb, D., **and** Osher, S. (2005). Image cartoon-texture decomposition

**and** feature selection using the total variation regularized L 1 functional. In Paragios,

N., Faugeras, O. D., Chan, T., **and** Schnörr, C., editors, Variational, Geometric,

**and** Level Set Methods in **Computer** **Vision**, Third International Workshop, VLSM

2005, Beijing, China, October 16, 2005, volume 3752 of Lecture Notes in **Computer**

Science, pages 73–84. Springer.

[60] Zach, C., Pock, T., **and** Bischof, H. (2007). A duality based approach **for** realtime

TV-L 1 optical flow. In German Pattern Recognition Symposium, pages 214–223.