21.12.2013 Views

Lehrveranstaltungsinhalt aus - Institute for Computer Graphics and ...

Lehrveranstaltungsinhalt aus - Institute for Computer Graphics and ...

Lehrveranstaltungsinhalt aus - Institute for Computer Graphics and ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Lehrveranstaltungsinhalt</strong> <strong>aus</strong> ”<br />

Bildanalyse und<br />

<strong>Computer</strong>grafik“<br />

Franz Leberl<br />

28. Jänner 2002


Contents<br />

0 Introduction 11<br />

0.1 Using Cyber-Cities as an Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

0.2 Introducing the Lecturer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

0.3 From images to geometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

0.4 Early Experiences in Vienna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

0.5 Geometric Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />

0.6 Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />

0.7 Modeling Denver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />

0.8 The Inside of buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

0.9 Built-Documentation Modeling the Inside of Things in Industry . . . . . . . . . . . 15<br />

0.10 Modeling Rapidly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

0.11 Vegetation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

0.12 Coping with Large Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />

0.13 Non-optical sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

0.14 The Role of the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

0.15 Two Systems <strong>for</strong> Smart Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19<br />

0.16 International Center of Excellence <strong>for</strong> City Modeling . . . . . . . . . . . . . . . . . 20<br />

0.17 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

0.18 Telecom Applications of City Models . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

1 Characterization of Images 37<br />

1.1 The Digital Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />

1.2 The Image as a Raster Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

1.3 System Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />

1.4 Displaying Images on a Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

1.5 Images as Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

1.6 Operations on Binary Raster Images . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

1.7 Algebraic Operations on Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

3


4 CONTENTS<br />

2 Sensing 51<br />

2.1 The Most Important Sensors: The Eye <strong>and</strong> the Camera . . . . . . . . . . . . . . . 51<br />

2.2 What is a Sensor Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />

2.3 Image Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

2.4 The Quality of Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

2.5 Non-Perspective Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

2.6 Heat Images or Thermal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57<br />

2.7 Multispectral Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

2.8 Sensors to Image the Inside of Humans . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

2.9 Panoramic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

2.10 Making Images Independent of Sunlight <strong>and</strong> in Any Weather: Radar Images . . . . 59<br />

2.11 Making Images with Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

2.12 Passive Radiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

2.13 Microscopes <strong>and</strong> Endoscopes Imaging . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

2.14 Objects-Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

2.15 Photometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

2.16 Data Garments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

2.17 Sensors <strong>for</strong> Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

2.18 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

3 Raster-Vector-Raster Convergence 69<br />

3.1 Drawing a straight line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

3.2 Filling of Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

3.3 Thick lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

3.4 The Transition from Thick Lines to Skeletons . . . . . . . . . . . . . . . . . . . . . 73<br />

4 Morphology 79<br />

4.1 What is Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79<br />

4.2 Dilation <strong>and</strong> Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

4.3 Opening <strong>and</strong> Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

4.4 Morphological Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

4.5 Shape Recognition by a Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . . 85<br />

4.6 Some Additional Morphological Algorithms . . . . . . . . . . . . . . . . . . . . . . 86


CONTENTS 5<br />

5 Color 93<br />

5.1 Gray Value Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93<br />

5.2 Color images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

5.3 Tri-Stimulus Theory, Color Definitions, CIE-Model . . . . . . . . . . . . . . . . . . 96<br />

5.4 Color Representation on Monitors <strong>and</strong> Films . . . . . . . . . . . . . . . . . . . . . 99<br />

5.5 The 3-Dimensional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

5.6 CMY-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

5.7 Using CMYK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

5.8 HSI-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />

5.9 YIQ-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />

5.10 HSV <strong>and</strong> HLS -Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<br />

5.11 Image Processing with RGB versus HSI Color Models . . . . . . . . . . . . . . . . 110<br />

5.12 Setting Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

5.13 Encoding in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

5.14 Negative Photography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />

5.15 Printing in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br />

5.16 Ratio Processing of Color Images <strong>and</strong> Hyperspectral Images . . . . . . . . . . . . . 113<br />

6 Image Quality 121<br />

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br />

6.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<br />

6.3 Gray Value <strong>and</strong> Gray Value Resolutions . . . . . . . . . . . . . . . . . . . . . . . . 121<br />

6.4 Geometric Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122<br />

6.5 Geometric Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123<br />

6.6 Histograms as a Result of Point Processing or Pixel Processing . . . . . . . . . . . 123<br />

7 Filtering 133<br />

7.1 Images in the Spatial Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133<br />

7.2 Low-Pass Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134<br />

7.3 The Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136<br />

7.4 High Pass-Filter - Sharpening Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 137<br />

7.5 The Derivative Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138<br />

7.6 Filtering in the Spectral Domain / Frequency Domain . . . . . . . . . . . . . . . . 140<br />

7.7 Improving Noisy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141<br />

7.8 The Ideal <strong>and</strong> the Butterworth High-Pass Filter . . . . . . . . . . . . . . . . . . . . 141<br />

7.9 Anti-Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142<br />

7.9.1 What is Aliasing ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142<br />

7.9.2 Aliasing by Cutting-off High Frequencies . . . . . . . . . . . . . . . . . . . . 142<br />

7.9.3 Overcoming Aliasing with an Unweightable Area Approach . . . . . . . . . 143<br />

7.9.4 Overcoming Aliasing with a Weighted Area Approach . . . . . . . . . . . . 143


6 CONTENTS<br />

8 Texture 151<br />

8.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151<br />

8.2 A Statistical Description of Texture . . . . . . . . . . . . . . . . . . . . . . . . . . 151<br />

8.3 Structural Methods of Describing Texture . . . . . . . . . . . . . . . . . . . . . . . 152<br />

8.4 Spectral Representation of Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . 152<br />

8.5 Texture Applied to Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153<br />

8.6 Bump Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154<br />

8.7 3D Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155<br />

8.8 A Review of Texture Concepts by Example . . . . . . . . . . . . . . . . . . . . . . 155<br />

8.9 Modeling Texture: Procedural Approach . . . . . . . . . . . . . . . . . . . . . . . . 155<br />

9 Trans<strong>for</strong>mations 161<br />

9.1 About Geometric Trans<strong>for</strong>mations . . . . . . . . . . . . . . . . . . . . . . . . . . . 161<br />

9.2 Problem of a Geometric Trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . 161<br />

9.3 Analysis of a Geometric Trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . 162<br />

9.4 Discussing the Rotation Matrix in two Dimensions . . . . . . . . . . . . . . . . . . 165<br />

9.5 The Affine Trans<strong>for</strong>mation in 2 Dimensions . . . . . . . . . . . . . . . . . . . . . . 167<br />

9.6 A General 2-Dimensional Trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . 169<br />

9.7 Image Rectification <strong>and</strong> Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 171<br />

9.8 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171<br />

9.8.1 Half Space Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172<br />

9.8.2 Trivial acceptance <strong>and</strong> rejection . . . . . . . . . . . . . . . . . . . . . . . . . 172<br />

9.8.3 Is the Line Vertical? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172<br />

9.8.4 Computing the slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172<br />

9.8.5 Computing the Intersection A in the Window Boundary . . . . . . . . . . . 172<br />

9.8.6 The Result of the Cohen-Sutherl<strong>and</strong> Algorithm . . . . . . . . . . . . . . . . 173<br />

9.9 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173<br />

9.10 A Three-Dimensional Con<strong>for</strong>mal Trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . 174<br />

9.11 Three-Dimensional Affine Trans<strong>for</strong>mations . . . . . . . . . . . . . . . . . . . . . . . 176<br />

9.12 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177<br />

9.13 Vanishing Points in Perspective Projections . . . . . . . . . . . . . . . . . . . . . . 177<br />

9.14 A Classification of Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178<br />

9.15 The Central Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178<br />

9.16 The Synthetic Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180<br />

9.17 Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181<br />

9.18 Interpolation versus Trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . . . 182<br />

9.19 Trans<strong>for</strong>ming a Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182


CONTENTS 7<br />

9.19.1 Presenting a Curve by Samples <strong>and</strong> an Interpolation Scheme . . . . . . . . 182<br />

9.19.2 Parametric Representations of Curves . . . . . . . . . . . . . . . . . . . . . 183<br />

9.19.3 Introducing Piecewise Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 183<br />

9.19.4 Rearranging Entities of the Vector Function Q . . . . . . . . . . . . . . . . 183<br />

9.19.5 Showing Examples: Three methods of Defining Curves . . . . . . . . . . . . 184<br />

9.19.6 Hermite’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184<br />

9.20 Bezier’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184<br />

9.21 Subdividing Curves <strong>and</strong> Using Spline Functions . . . . . . . . . . . . . . . . . . . . 185<br />

9.22 Generalization to 3 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />

9.23 Graz <strong>and</strong> Geometric Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187<br />

10 Data Structures 195<br />

10.1 Two-Dimensional Chain-Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195<br />

10.2 Two-Dimensional Polygonal Representations . . . . . . . . . . . . . . . . . . . . . 196<br />

10.3 A Special Data Structure <strong>for</strong> 2-D Morphing . . . . . . . . . . . . . . . . . . . . . . 197<br />

10.4 Basic Concepts of Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 197<br />

10.5 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198<br />

10.6 Data Structures <strong>for</strong> Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199<br />

10.7 Three-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200<br />

10.8 The Wire-Frame Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200<br />

10.9 Operations on 3-D Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

10.10Sweep-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

10.11Boundary-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201<br />

10.12A B-Rep Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202<br />

10.13Spatial Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202<br />

10.14Binary Space Partitioning BSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br />

10.15Constructive Solid Geometry, CSG . . . . . . . . . . . . . . . . . . . . . . . . . . . 204<br />

10.16Mixing Vectors <strong>and</strong> Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205<br />

10.17Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205<br />

11 3-D Objects <strong>and</strong> Surfaces 211<br />

11.1 Geometric <strong>and</strong> Radiometric 3-D Effects . . . . . . . . . . . . . . . . . . . . . . . . 211<br />

11.2 Measuring the Surface of An Object (Shape from X) . . . . . . . . . . . . . . . . . 211<br />

11.3 Surface Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213<br />

11.4 Representing 3-D Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214<br />

11.5 The z-Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215<br />

11.6 Ray-tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216<br />

11.7 Other Methods of Providing Depth Perception . . . . . . . . . . . . . . . . . . . . 218


8 CONTENTS<br />

12 Interaction of Light <strong>and</strong> Objects 223<br />

12.1 Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223<br />

12.2 Reflections from Polygon Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225<br />

12.3 Shadows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225<br />

12.4 Physically Inspired Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . 228<br />

12.5 Regressive Ray-Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228<br />

12.6 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228<br />

13 Stereopsis 235<br />

13.1 Binokulares Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235<br />

13.2 Stereoskopisches Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236<br />

13.3 Stereo-Bildgebung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237<br />

13.4 Stereo-Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238<br />

13.5 Non-Optical Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238<br />

13.6 Interactive Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br />

13.7 Automated Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239<br />

14 Classification 245<br />

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245<br />

14.2 Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245<br />

14.3 Features, Patterns, <strong>and</strong> a Feature Space . . . . . . . . . . . . . . . . . . . . . . . . 246<br />

14.4 Principle of Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246<br />

14.5 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248<br />

14.6 Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250<br />

14.7 Real Life Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251<br />

14.8 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251<br />

15 Resampling 255<br />

15.1 The Problem in Examples of Resampling . . . . . . . . . . . . . . . . . . . . . . . 255<br />

15.2 A Two-Step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255<br />

15.2.1 Manipulation of Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 256<br />

15.2.2 Gray Value Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256<br />

15.3 Geometric Processing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257<br />

15.4 Radiometric Computation Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257<br />

15.5 Special Case: Rotating an Image by Pixel Shifts . . . . . . . . . . . . . . . . . . . 258


CONTENTS 9<br />

16 About Simulation in Virtual <strong>and</strong> Augmented Reality 261<br />

16.1 Various Realisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261<br />

16.2 Why simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261<br />

16.3 Geometry, Texture, Illumination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261<br />

16.4 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262<br />

16.5 Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262<br />

17 Motion 265<br />

17.1 Image Sequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265<br />

17.2 Motion Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265<br />

17.3 Detecting Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265<br />

17.4 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266<br />

18 Man-Machine-Interfacing 269<br />

18.1 Visualization of Abstract In<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . . 269<br />

18.2 Immersive Man-Machine Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . 269<br />

19 Pipelines 271<br />

19.1 The Concept of an Image Analysis System . . . . . . . . . . . . . . . . . . . . . . . 271<br />

19.2 Systems of Image Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271<br />

19.3 Revisiting Image Analysis versus <strong>Computer</strong> <strong>Graphics</strong> . . . . . . . . . . . . . . . . . 272<br />

20 Image Representation 275<br />

20.1 Definition of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275<br />

20.1.1 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275<br />

20.1.2 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276<br />

20.1.3 Progressive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276<br />

20.1.4 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277<br />

20.1.5 Digital Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277<br />

20.2 Common Image File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278<br />

20.2.1 BMP: Microsoft Windows Bitmap . . . . . . . . . . . . . . . . . . . . . . . 278<br />

20.2.2 GIF: <strong>Graphics</strong> Interchange Format . . . . . . . . . . . . . . . . . . . . . . . 278<br />

20.2.3 PICT: Picture File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 279<br />

20.2.4 PNG: Portable Network <strong>Graphics</strong> . . . . . . . . . . . . . . . . . . . . . . . 279<br />

20.2.5 RAS: Sun Raster File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279<br />

20.2.6 EPS: Encapsulated PostScript . . . . . . . . . . . . . . . . . . . . . . . . . 279<br />

20.2.7 TIFF: Tag Interchange File Format . . . . . . . . . . . . . . . . . . . . . . 279<br />

20.2.8 JPEG: Joint Photographic Expert Group . . . . . . . . . . . . . . . . . . . 280<br />

20.3 Video File Formats: MPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280<br />

20.4 New Image File Formats: Scalable Vector Graphic - SVG . . . . . . . . . . . . . . 281


10 CONTENTS<br />

A Algorithmen und Definitionen 285<br />

B Fragenübersicht 289<br />

B.1 Gruppe 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289<br />

B.2 Gruppe 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302<br />

B.3 Gruppe 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328


Chapter 0<br />

Introduction<br />

0.1 Using Cyber-Cities as an Introduction<br />

We introduce the subject of “digital processing of visual in<strong>for</strong>mation”, also denoted as “digital<br />

image processing” <strong>and</strong> “computer graphics”. We introduce the subject by means of one particular<br />

application, namely 3D computer modelling of our cities. This is part of the wider topic of the so<br />

called “virtual habitat”. “Modelling cities”, what do we mean by that? The example in Slide 0.7<br />

shows a traditional representation of a city, in this particular example the “Eisene Tor” in Graz.<br />

In two dimensions we see the streetcar tracks, we see the Mariensäule, buildings <strong>and</strong> vegetation.<br />

This is the status quo of current urban 2-D computer graphics.<br />

The new approach is to represent this in three dimension as shown in Slide 0.8. The two dimensional<br />

map of the city is augmented to include the third dimension, thus the elevations, <strong>and</strong> in<br />

order to render, represent or visualise the city we add photographic texture to create as realistic<br />

a model of the city as possible. Once we have that we can stroll through the city, we can inspect<br />

the buildings, we can read the signs <strong>and</strong> derive from them what’s inside the buildings.<br />

The creation of the model <strong>for</strong> this city is a subject of “image processing”. The rendering of the<br />

model is the subject of “computer graphics”. These two belong together <strong>and</strong> constitute a field<br />

denoted as “digital processing of visual in<strong>for</strong>mation”.<br />

The most sophisticated recent modelling of a city was archieved of a section of Philadelphia. This<br />

employed a software called “Microstation” <strong>and</strong> was done by h<strong>and</strong> with great detail. In this case<br />

this detail includes vegetation, the virtual trees, waterfountains <strong>and</strong> people. I am attempting<br />

here to illustrate the concepts of “computer graphics” <strong>and</strong> “image processing” by talking about<br />

Cyber-Cities, namely how to create them from sensor data <strong>and</strong> how to visualise them. And this<br />

is the subject of this introduction.<br />

0.2 Introducing the Lecturer<br />

Be<strong>for</strong>e we go into the material, permit me to introduce myself. I have roots both in Graz <strong>and</strong><br />

in Boulder (Colorado, USA). My affiliations are since 1992 with the Technische Universität Graz,<br />

where I am a Professor of <strong>Computer</strong> Vision <strong>and</strong> <strong>Graphics</strong>. But I am also affiliated with a company<br />

in the United States since 1985 called Vexcel Corporation. In both places, the Vexcel Corporation<br />

<strong>and</strong> the University, cyber-cities play a role in the daily work. Vexcel Corporation in the US<br />

operates in four technical fields:<br />

1. It builds systems to process radar images<br />

11


12 CHAPTER 0. INTRODUCTION<br />

2. It deals with satellite receiving stations, to receive large quantities of images that are transmitted<br />

from satellites<br />

3. It deals with close range photogrammetry <strong>for</strong> “as-built” documentation <strong>and</strong><br />

4. It deals with images from the air<br />

Slide 0.19 is an example showing a remote sensing satellite ground receiving station installed<br />

in Hiroshima (Japan), carried on a truck to be moveable. Slide 0.20 shows a product of the<br />

Corporation, namely a software package to process certain radar-images interferometrically. We<br />

will towards the end of this class, talk quickly about this interferometry. What you see in Slide<br />

0.20 are interferometric “fringes” obtained from images, using a phase differences between the two<br />

images. The fringes indicate the elevation of the terrain, in this particular case Mt. Fuji in Japan.<br />

Another software package models the terrain <strong>and</strong> renders realistically looking images by superimposing<br />

the satellite images over the shape of the terrain with its mountains <strong>and</strong> valleys. Slide<br />

0.22 shows another software package to convert aerial photography to so called “ortho-photos”,<br />

a concept we will explain later in this class. Then we have an application, a software package<br />

called Foto-G, which supports the modelling of existing plants per<strong>for</strong>ming a task called “as builtdocumentation”.<br />

You take images of a facility or plant, extract from the image geometry the<br />

location <strong>and</strong> dimensions of pipes <strong>and</strong> valves, <strong>and</strong> obtain in a “reverse engineering mode” so called<br />

CAD (computer-aided-design) drawings of the facility.<br />

0.3 From images to geometric models<br />

We proceed to a serious of sub-topics to discuss the ideas of city-modeling. I would like to convey<br />

an idea of what the essence is of “digital processing of visual in<strong>for</strong>mation”. What we see in Slide<br />

0.25 is on the left part of an aerial photograph of a new housing development <strong>and</strong> on the right<br />

we see in<strong>for</strong>mation extracted from the image of the left using a process called “stereoscopy”,<br />

representing the small area that is marked in red on the right side. We are observing here a<br />

transition from images of an object to a model of that object.<br />

Such images as in Slide 0.26 show so-called “human scale objects” like buildings, fences, trees,<br />

roads. But images may show our entire planet. There have been various projects in Graz to<br />

address the extraction of in<strong>for</strong>mation from images in there is a burdle of problems available as<br />

topic <strong>for</strong> a Diplomarbeit or a Dissertation to address the optimum geometric scale <strong>and</strong> geometric<br />

resolution needed <strong>for</strong> a specific task at h<strong>and</strong>. If I want to model a building, what is the required<br />

optimum image resolution? We review in Slide 0.29 the Down-town of Denver at 30 cm per pixel.<br />

Slide 0.30 is the same Down-town at 1.20 m per pixel. Finally in Slide 0.31 we have 4 meters per<br />

pixel. Can we map the buildings <strong>and</strong> which accuracy can we get in mapping them?<br />

0.4 Early Experiences in Vienna<br />

Our <strong>Institute</strong> at the Technical University in Graz got involved in city-modelling in 1994 when we<br />

got invited by the Magistrat of Vienna to model a city block consisting of 29 buildings inside the<br />

block <strong>and</strong> another 25 buildings surrounding the block. The block is defined by the 4 streets in the<br />

7th district in Vienna. The work was per<strong>for</strong>med by 2 students in two diploma theses <strong>and</strong> the initial<br />

results were of course a LEGO-type representation of each building. The building itself can not<br />

be recognised, as seen in the example of a generic building. It can be recognised only if we apply<br />

the photographic texture. We can take this either from a photograph taken from a street level or<br />

from aerial photography taking from an airplane. The entire cityblock was modelled but a c<strong>aus</strong>e<br />

that some photographic texture was missing. Particularly the photographic texture was missing


0.5. GEOMETRIC DETAIL 13<br />

in the courtyards <strong>and</strong> so they shown black or grey here. When this occurs, the representation is<br />

without photographic texture, <strong>and</strong> is instead in the <strong>for</strong>m of a flat shaded representation.<br />

Slide 0.37 looks at the roof scape <strong>and</strong> we see that perhaps we should model the chimneys as shown<br />

here. However, the skylights were not modeled. What can we do with these data? We can walk or<br />

fly through the cities. We can assess changes <strong>for</strong> example by removing a building <strong>and</strong> replacing<br />

it by a new one. We call this “virtual reality”, but scientists often prefer the expression “virtual<br />

environment”, since “virtual” <strong>and</strong> “reality” represent a contradiction in terms. This differs of<br />

course from photographic reality, which is more detailed <strong>and</strong> more realistic by showing great<br />

geometric detail, showing wires, dirt on the road, cars, the effect of weather. There is yet another<br />

type of reality, namely “physical reality”, when we are out there in a city <strong>and</strong> we feel the wetness<br />

in our shoes, we feel the cold in the air, we hear the noise of birds, the screeching of cars. So we<br />

see various levels of reality: physical, photographic <strong>and</strong> virtual reality.<br />

0.5 Geometric Detail<br />

What geometric detail do we need when we model a city? Lets take the example of a roof. Slide<br />

0.44 is a roofshape extracted <strong>for</strong> the Vienna example, We have not applied to the roof photographic<br />

texture, but instead some generic computer texture. We will talk later of course about texture<br />

<strong>and</strong> I will try to explain different types of texture <strong>for</strong> use in rendering <strong>for</strong> computer graphics. If<br />

we apply this kind of generic texture we loose all in<strong>for</strong>mation about the specific characteristics<br />

of this roof. What we would like to have is the roof shown with chimneys. Maybe we need<br />

skylights as well far the fire-guard in order to direct people to an exit through the roof in the case<br />

of a catastrophy. There is a topic here <strong>for</strong> a Diplomarbeit <strong>and</strong> Dissertation theme to study the<br />

amount of geometric detail needed in the presence of photographic texture: the trade-off between<br />

photographic texture <strong>and</strong> geometry detail. To illustrate this further let us take a look at the same<br />

roof with its skylights <strong>and</strong> chimneys <strong>and</strong> now use photographic texture to illustrate how this roof<br />

looks like. If we take photographic texture, <strong>and</strong> if we have some chimneys, <strong>and</strong> if we render this<br />

roof from another perspective than that from which the photograph was taken, the chimneys will<br />

look very unnatural. So we need to do some work <strong>and</strong> create the geometric model of the chimneys.<br />

If we employ that model <strong>and</strong> we now superimpose the photographic texture over it, we see that<br />

we have sunshine casting shadows <strong>and</strong> we have certain areas of the roof that are covered by pixels<br />

from the shadows left by the chimneys. If the sunshine is from another side, say in the morning,<br />

but the picture was taken in the afternoon, we have wrong shadows. So we need to fix this by<br />

eliminating the shadows in the texture. We introduce the shadow in a proper rendering by a<br />

computation. We also need to fill in those pixels that are covered by the perspective distortion of<br />

the chimneys, <strong>and</strong> use generic pixels of the roof to fill in the areas where no picture exists. Slide<br />

0.50 is the final result: we have removed the shadow, we have filled in the pixels. We now have<br />

the best representation of that roof with its chimneys <strong>and</strong> we can render this now correctly in the<br />

morning <strong>and</strong> in the afternoon, with rain or with sunshine.<br />

0.6 Automation<br />

All of this modeling of cities is expensive, bec<strong>aus</strong>e it is based on manual work. In order to<br />

reduce the cost of creating such models one needs to automate their creation. Automation is a<br />

large topic <strong>and</strong> is available <strong>for</strong> many Diplomarbeiten <strong>and</strong> many Dissertations. Let me illustrate<br />

automation <strong>for</strong> about our city-models in Graz. There already exist 2-dimensional descriptions<br />

so the task of automating here is to achieve the transition from two to three dimensions. Slide<br />

0.52 is a two-dimensional so-called geographic in<strong>for</strong>mation system (GIS) of a certain area around<br />

the Schlossberg in Graz. Lets take a look at this particular building in Slide 0.53. We have a<br />

total of five aerial photographs, 3 of them are shown of that particular building in Slide 0.54.


14 CHAPTER 0. INTRODUCTION<br />

The five photographs can be converted into so called edge images, a classical component of image<br />

processing. There are topics hidden here <strong>for</strong> more Diplomarbeiten <strong>and</strong> Dissertationen.<br />

We also convert an input GIS data into an output edge image. This edge image from the GISvectors<br />

can now be the basis <strong>for</strong> a match between these five edge images <strong>and</strong> the two dimensional<br />

GIS image. They will not fit, bec<strong>aus</strong>e those edges of the roof as shown here are elevated <strong>and</strong><br />

there<strong>for</strong>e perspectively distorted as the other polygon is the representation of the footprint of the<br />

building.<br />

Algorithm 1 Affine matching<br />

1: Read in <strong>and</strong> organize one or more digital photos with their camera in<strong>for</strong>mation<br />

2: Compute an egde image <strong>for</strong> each of the photos<br />

3: Read in <strong>and</strong> organize the polygons of each building footprint<br />

4: Project the polygon into each photo’s edge image<br />

5: Vector-raster convert the polygon in each edge image, creating a polygon image<br />

6: Compute a distance trans<strong>for</strong>m <strong>for</strong> each polygon image<br />

7: repeat<br />

8: Compute the distance between each edge image <strong>and</strong> its polygon image using the distance<br />

trans<strong>for</strong>m<br />

9: Change the geometry of the polygon image<br />

10: until distance no longer gets reduced<br />

There is a process called “affine matching” which allows to match the edge images computed from<br />

the aeriaphotos <strong>and</strong> the representation which originally was a vector data structure. Affine matching<br />

is a Graz innovation: To match two different data structures namely raster <strong>and</strong> vector, which<br />

in addition are geometrically different, is the purpose of affine matching the footprint of the house<br />

is in an orthographic projection, with the roofline of the house in a central perspective projection.<br />

Affine matching overcomes these differences <strong>and</strong> finds the best possible matches between the data<br />

structures. The result in Slide 0.58 shows how the footprint was used to match the roofline of the<br />

building using this affine matching technique. The algorithm itself is rather simple described (see<br />

Algorithm 1). Now, the same idea of matching vectors with images is shown in the illustration<br />

of Slide 0.59 where we see in yellow the primary position of a geometric shape, say typically the<br />

footprint, <strong>and</strong> in red is the roofline. We need to match the roofline with the footprint. Slide 0.60<br />

is another example of these matches, <strong>and</strong> Slide 0.61 is the graphic representation of the roofline.<br />

0.7 Modeling Denver<br />

We talk about a method to model all the buildings of a city like Denver (Colorado, USA). This<br />

is an aerial photographic coverage of the entire city. Slide 0.63 is the down town area of Denver.<br />

From overlapping aerial photographs we can automatically create a digital elevation model (DEM)<br />

by a process called stereo matching. A DEM is a representation of the z-elevation to each (x, y)<br />

at a regular grid mesh of points. So we have a set of regularly space of (x, y) locations where we<br />

know the z-value of the terrain. We invite everybody to look into a Diplomarbeit or Dissertation<br />

topic of taking this kind of digital elevation model <strong>and</strong> create from what it is called the “Bald<br />

Earth”. One needs to create a filter which will take the elevation model <strong>and</strong> erase all the trees<br />

<strong>and</strong> all the buildings, so that the only thing that is left is the Bald Earth. What is being “erased”<br />

are towers, trees, buildings. That process needs an intelligent low-pass-filter. We will talk about<br />

low-pass-filters later in this class. Slide 0.67 is the result a so called Bald Earth DEM (das DEM<br />

der kahlen Erde). The difference between the two DEMs, namely the Bald Earth DEM <strong>and</strong> the<br />

full DEM is of course the elevation of the vertical objects that exist on top of the Bald Earth.<br />

These are the buildings, the cars, the vegetation. This is another topic one could study. Now we<br />

need to look at the difference DEM <strong>and</strong> automatically extract the footprints of buildings. We can


0.8. THE INSIDE OF BUILDINGS 15<br />

do that by some morphological operations, where we will close the gaps straighten the edges of<br />

buildings, then compute the contours of the buildings. Finally we obtain the buildings <strong>and</strong> place<br />

them on top of the Bald Earth.<br />

When we have done that, we can now superimpose over the geometric shapes of building “boxes”<br />

(the box-models) the photographic texture. We get a photorealistic model of all of Denver, all<br />

generated automatically from aerial photographs. There exist multiple views of the same area of<br />

Denver.<br />

0.8 The Inside of buildings<br />

City models are not only a subject of the outside of buildings, but also of their inside. Slide<br />

0.74 is the Nationalbibliothek in Vienna, in which there is a Representation Hall (Prunksaal). If<br />

one takes the architect’s drawings of that building, one can create a wire mesh representation as<br />

illustrated in Slide 0.75, consisting of arcs <strong>and</strong> nodes. We can render this without removal of the<br />

hidden surfaces <strong>and</strong> hidden lines to obtain this example.<br />

We can go inside this structure, take pictures <strong>and</strong> use photographic texture to photo-realistically<br />

render the inside of the Prunksaal in a manner that a visitor to the Prunksaal will never see. We<br />

can not fly into the Prunksaal like a bird. We can also see the Prunksaal in the light that computer<br />

rendering permits us to create. We can even go back a hundred years <strong>and</strong> show the Prunksaal at it<br />

was a hundred years ago, be<strong>for</strong>e certain areas were converted into additional shelf-space <strong>for</strong> books.<br />

There is a Diploma- <strong>and</strong> Dissertation-topic hidden in developments to produce images effectively<br />

<strong>and</strong> efficiently inside a building. An example is shown in Slide 0.80 <strong>and</strong> Slide 0.81 of the ceiling,<br />

imaging it efficiently in all its detail <strong>and</strong> colorful glory.<br />

Yet another subject is how to model objects inside a room like this statue of emperor Charles VI.<br />

He is shown in Slide 0.82 a triangulated mesh created from a point cloud. We will talk a little bit<br />

about triangulated meshes later. Slide 0.82 is based on 20.000 points that are triangulated in a<br />

non-trivial process. Slide 0.83 is a photo-realistic rendering of the triangulated point cloud, with<br />

each triangle being superimposed by the photographic texture that was created from photographs.<br />

A good scientific topic <strong>for</strong> Diplomarbeiten of Dissertationen is the transition from point clouds<br />

to surfaces. A non-trivial problem exists when we look at the h<strong>and</strong> of the emperor. We need to<br />

make sure to connect points in the triangles that should topologically be connected. And we do<br />

not want the emperor to have h<strong>and</strong>s like the feet of a duck.<br />

0.9 Built-Documentation Modeling the Inside of Things in<br />

Industry<br />

There exists not only cultural monuments, but also industrial plants. This goes back to that idea<br />

of “inverse” or “reverse engineering” to create drawings of a facility of a building <strong>for</strong> example, of<br />

a refinery. The refinery may have been built 30 or 40 years ago <strong>and</strong> the drawings are no longer<br />

available, since there was no CAD at that time. We take pictures of the inside of a building, using<br />

perhaps thous<strong>and</strong>s of pictures. We re-establish relationships between the pictures. We need to<br />

know from were they are taken. One picture overlaps with another picture. Which pictures show<br />

the same objects <strong>and</strong> which do not? That is done by developing this graph in Slide 0.89. Each<br />

node of the graph is “a postage stamp” of the picture <strong>and</strong> the arcs between these nodes describe<br />

the relationship. If there is no arcs then there is no relationship. Any images can be called up<br />

on a monitor. Also pairs of images can be set up. We can point to a point on one image <strong>and</strong><br />

a process will look <strong>for</strong> the corresponding point in the other overlapping image or images. The<br />

three dimensional location of the point we have pointing at in only one image will be shown in<br />

the three dimensional rendering of the object. So again, “from image to objects” means in this


16 CHAPTER 0. INTRODUCTION<br />

case “reverse engineering” or “as-built-documentation”. Again there are plenty of opportunities<br />

<strong>for</strong> research <strong>and</strong> study in the area of automation of all these processes.<br />

A classical topic is the use of two pictures of a some industrial structure to find correspondences<br />

of the same object in both images without any knowledge about the camera or object. By eye we<br />

can point to the same feature in two images, but this is not trivial to do by machine if we have<br />

no geometrie relationships established between the two images that would limit the search areas.<br />

One idea is to find many c<strong>and</strong>idates of features in both images <strong>and</strong> than determine by some logic<br />

which of those features might be identical. So we find one group of features in one image, <strong>and</strong><br />

another group in the other image. Then we decide which points or objects belong together. The<br />

result is shown as highlighted circles.<br />

A similar situation is illustrated in Slide 0.95, however with test targets to calibrate a camerasystem<br />

<strong>for</strong> as-built-documentation. We automatically extract all the test objects (cercles) from<br />

the images. We can see a three dimensional pattern of these calibration targets in Slide 0.96 <strong>and</strong><br />

Slide 0.97.<br />

Now the same approach can also be applied to the outside of buildings as shown in Slide 0.98 with<br />

three photographs of a railroad-station. The three images are input to an automatic algorithm to<br />

find edges, the edges get pruned <strong>and</strong> reduced so that we are only left with significant edges that<br />

represent windows, doors, awnings <strong>and</strong> the roofline of the building. This of course can also be<br />

converted into three dimensions. There is yet another research topic, namely “automated mapping<br />

of geometric details of facades”. Slide 0.100 <strong>and</strong> Slide 0.101 are the three dimensional renderings<br />

of those edges that are found automatically in 3-D.<br />

0.10 Modeling Rapidly<br />

We not only want to create these data at a low cost, we also want to get them rapidly. Slide<br />

0.103 is an example: a village as been imaged from a helicopter with a h<strong>and</strong>held camera, looking<br />

out to the horizon we appreciate an oblique, panoramic image. “Give us a model of that village<br />

tomorrow” may be the task. Particularly when it concerns catastrophies, disasters, military or<br />

anti-terror operations <strong>and</strong> so <strong>for</strong>th. The topic which is hidden here is that these photos were<br />

not taken with a well-controlled camera but accidentally <strong>and</strong> hastily from a helicopter <strong>and</strong> with<br />

an average amateur camera. The research topic here is the “use of uncalibrated cameras”. A<br />

wire-mesh representation of the geometry can be created by a stereo process. We can then place<br />

the buildings on top of the surface much like in the Denver-example discussed earlier <strong>and</strong> we can<br />

render it in a so-called flat-shaded representation. We can now look at it, navigate in the data set,<br />

but this is not visually as easy to interpret as it would be if we had photography super-imposed,<br />

which is the case in Slide 0.109 <strong>and</strong> Slide 0.110. Now we can rehearse an action needed bec<strong>aus</strong>e of<br />

a catastrophy or bec<strong>aus</strong>e of a terrorist attack in one of those buildings. We can fly around, move<br />

around <strong>and</strong> so <strong>for</strong>th.<br />

0.11 Vegetation<br />

“Vegetation” is a big <strong>and</strong> important topic in this field. Vegetation is difficult to map, difficult<br />

to render <strong>and</strong> difficult to remove. Vegetation as in the Graz-example, may obscure facades. If<br />

we made pictures to map the buildings <strong>and</strong> to get the photographic texture, then these trees,<br />

pedestrians <strong>and</strong> cars are a nuisance. What can we do? We need to eliminate the vegetation, <strong>and</strong><br />

this is an interesting research topic. The vegetation is eliminated with a lot of manual work. How<br />

can we automate that? There are ways <strong>and</strong> ideas to automate this kind of separation of objects<br />

that are at a different depth from the viewer using multiple images.


0.12. COPING WITH LARGE DATASETS 17<br />

Using vegetation <strong>for</strong> rendering, like in the picture of the Schloßberg of Slide 0.115, is not trivial<br />

either. How do we model vegetation in this virtual habitat? The Schloßberg example is based on<br />

vegetation that is photographically collected <strong>and</strong> then pasted onto flat surfaces that are mounted<br />

on tree trunks. This is acceptable <strong>for</strong> a still image like Slide 0.117, but if we have some motion,<br />

then vegetation produces a very irritating effect, bec<strong>aus</strong>e the trees move as we walk by. Another<br />

way, of course, is to really have a three dimensional rendering of a tree, but they typically are<br />

either very expensive or they look somewhat artificial, like the tree in the example of Slide 0.118.<br />

Vegetation rendering is thus also an important research topic.<br />

0.12 Coping with Large Datasets<br />

We have a need to cope with large data sets in the administration, rendering <strong>and</strong> visualization<br />

of city data. The example of modeling Vienna with its 220,000 buildings in real-time illustrates<br />

the magnitude of the challenge. Even if one compresses the 220,000 individual buildings into<br />

20,000 “blocks”, thus on average combining 10 buildings into a single building block, one still has<br />

to cope with a time-consuming rendering ef<strong>for</strong>t that is not possible to achieved in real-time. A<br />

recent doctoral thesis by M. Kofler (1998) reported on algorithms to accelerate the rendering on<br />

an unaided computer by the factor of 100, simply by using an intelligent data structure.<br />

If the geometric data are augmented by photographic texture, then the quantity of data gets even<br />

more voluminous. Just assume that one has 220,000 individual buildings consisting of 10 facades<br />

each, each facade representating roughly 10m × 10m, photographic texture at a resolution of<br />

5cm × 5cm per pixel. You are invited to compute the quantity of data that results from this<br />

consideration.<br />

Kofler’s thesis proposed a clever data structure called “LOD/R-tree”. “LOD” st<strong>and</strong>s <strong>for</strong> level<br />

of detail, <strong>and</strong> R-tree st<strong>and</strong>s <strong>for</strong> rectangular tree. The author took the entire city of Vienna <strong>and</strong><br />

defined <strong>for</strong> each building a rectangle. These are permitted to overlap. In addition, separate<br />

rectangles represent a group of buildings, even the districts are represented by one rectangle each.<br />

Actually, the structure was generalized to 3D, thus we are not dealing with rectangles but with<br />

cubes.<br />

Now as this is being augmented by photographic texture one needs to select the appropriate data<br />

structure, to be super-imposed over the geometry. As one uses the data one defines the so-called<br />

“Frustum” as the intanstaneous cone-of-view. At the front of the viewing cone one has high<br />

resolution, whereas in the back one employs low resolution. The idea is to store the photographic<br />

texture <strong>and</strong> the geometry at various levels of detail <strong>and</strong> then call up those levels of detail that<br />

are relevant, at a certain of distance to the viewer. This area of research is still rapidly evolving<br />

<strong>and</strong> “fast visualization” is there<strong>for</strong>e another subject of on-going research <strong>for</strong> Diplomarbeiten <strong>and</strong><br />

Dissertationen. The actual fly-over of Vienna using the 20,000 building blocks in real-time is now<br />

feasible on a regular personal computer producing about 10 to 20 frames per second as opposed<br />

to 10 seconds per frame prior to the LOD/R-tree data structure. Slide 0.129 <strong>and</strong> Slide 0.130 are<br />

two views computed with LOD/R-tree. The same LOD/R-tree data structure can also be used to<br />

fly over regular DEMs - recall that these are regular grids in (x, y) to which a z-value is attached<br />

at each grid intersection to represent terrain elevations. These meshes are then associated with<br />

photographic texture as shown in three segmential views. We generally call this “photorealistic<br />

rendering of outdoor environments”.<br />

Another view of a Digital Elevation Model (DEM), super-imposed with a higher resolution aerial<br />

photograph, is shown in Slide 0.135 <strong>and</strong> Slide 0.136.


18 CHAPTER 0. INTRODUCTION<br />

0.13 Non-optical sensing<br />

Non-photographic, there<strong>for</strong>e non-optical, sensors could be used <strong>for</strong> city modeling. Recall that<br />

we model cities from sensor data <strong>and</strong> then we render cities using the models as input <strong>and</strong> we<br />

potentially augment those by photographic texture. Which non-optical sensors can we typically<br />

consider? A first example is radar imagery. We can use imagery taken with microwaves at<br />

wavelengths between 1 mm to 25 cm or so. That radiation penetrates fog, rain, clouds <strong>and</strong> is<br />

thus capable of “all-weather” operations. The terrain is illuminated actively like with a flash<br />

light supporting a “day & night” operation. An antenna transmits microwave radiation, this gets<br />

reflected on the ground, echoes are coming back to the antenna which is now switched to receive.<br />

We will discuss radar imaging in a later section of this class. Let’s take a look at two images. One<br />

image of Slide 0.138 has the illumination from the top, the other has the illumination from the<br />

bottom. Each image point or pixel covers 30 cm × 30 cm on the ground representing a geometric<br />

resolution of 30 cm. Note that the illumination c<strong>aus</strong>es shadows to exist <strong>and</strong> how the shadows fall<br />

differently in the two images.<br />

The radar images can be associated with a direct observation of the digital elevation of the terrain.<br />

Slide 0.139 is an example associated with the previous two images of the area of the S<strong>and</strong>ia<br />

Research Laboratories in Albuquerque (New Mexico, USA). About 6,000 people work at S<strong>and</strong>ia.<br />

The individual buildings are shown in this dataset, which is in it-self rather noisy. But it becomes<br />

a very powerful dataset when it is combined with the actual images. We have found here a<br />

non-stereo way of directly mapping the shape of the Earth in three dimensions.<br />

Another example with 30 cm × 30 cm pixels is a small village, the so-called MOUT site (Military<br />

Operations in Urban Terrain). Four looks from the four cardinal directions show shadows <strong>and</strong><br />

other image phenomena that are different to underst<strong>and</strong> <strong>and</strong> are subject of later courses. We will<br />

not discuss those phenomena much further in this course. Note simply that we have four images<br />

of one <strong>and</strong> the same village <strong>and</strong> those phenomena in the four images look very different. Just<br />

study those images in detail <strong>and</strong> consider how shadows fall, how roofs are being imaged <strong>and</strong> note<br />

in particular one object, namely a church as marked. This church can be reconstructed using<br />

eleven measurements. There are about 47 measurements one can take from those four images,<br />

so that we have a set of redundant observations of these dimensions to describe the church. The<br />

model of the church is shown in Slide 0.141, <strong>and</strong> is compared to an actual photograph of the same<br />

church in Slide 0.142. This demonstrates that one can model a building not only from optical<br />

photography, but from various types of sensor data. We have seen radar images in combination<br />

with interferometry. There is a ample opportunity to study “Building re-construction from radar<br />

images” in the <strong>for</strong>m of Diploma <strong>and</strong> Doctoral thesis.<br />

Another sensor is the laser scanner. Slide 0.144 is an example of a laser scanner result from downtown<br />

Denver. How does a laser scanner operate? An airplane carries a laser device. It shoots a<br />

laser ray to the ground. It gets reflected <strong>and</strong> the time it takes to do the roundtrip is measured. If<br />

there is an elevation the roundtrip time is shorter than if there is a depression. The direction into<br />

which the laser “pencil” looks rapidly changes from left-to-right to create a “scanline”. Scanlines<br />

are being added up by the <strong>for</strong>ward motion of the plane. The scanlines accrue into an elevation<br />

map of the ground.<br />

The position of the airplane itself is determined using a Global Positioning System which is<br />

carried on the airplane. The position might have a systematic error. But by employing a second<br />

simultaneously observed GPS position on the ground one will really observe the relative motion<br />

between the airplane GPS <strong>and</strong> the stationary GPS plat<strong>for</strong>m on the ground. This leads to a position<br />

error in the cm-range <strong>for</strong> the airplane <strong>and</strong> to a very small error in the cm range <strong>for</strong> the distance<br />

between the airplane in the ground. Laser measurements are a very hot topic in city modeling,<br />

<strong>and</strong> there are advantages as well as disadvantages vis-a-vis building models from images. To study<br />

this issue could be a subject of Diploma <strong>and</strong> Doctoral thesis.<br />

Note that as the airplane flies along, only a narrow strip of the ground gets mapped. In order to<br />

cover a large area of the ground one has to combine individual strips. Slide 0.147 illustrates how


0.14. THE ROLE OF THE INTERNET 19<br />

the strips need to be merged <strong>and</strong> how any discrepancies between those strips, particularly in their<br />

overlaps, need to be removed by some computational measure. In addition, one needs to know<br />

points on the ground with their true coordinates in order to remove any uncertainties that may exist<br />

from the airplane observations. So finally we have a matched, merged, cleaned-up data set <strong>and</strong> we<br />

now can do the same thing that we did with the DEM from aerial photography, namely we merge<br />

the elevation data obtained from the laser scanner with potentially simultaneously collected video<br />

imagery, also taken from that same airplane: We obtain a laser scan <strong>and</strong> phototexture product.<br />

0.14 The Role of the Internet<br />

It is of increasing interest to look at a model of a city from remote locations. An example is the<br />

so-called “armchair tourism”, vacation planning <strong>and</strong> such. Slide 0.152 is an example of work done<br />

<strong>for</strong> a regional Styrian tourism group. They contracted to have a mountain-biking trail advertised<br />

on the Internet using a VRML model of the terrain. Shown is in Slide 0.153 a map near Bad<br />

Mitterndorf in Styria <strong>and</strong> a vertical view of a mountain-biking trail. Slide 0.154 is a perspective<br />

view of that mountain-bike trail super-imposed onto a digital elevation model that is augmented<br />

by photographic texture obtained from a satellite. This is actually available today via the Internet.<br />

The challenge is to compress the data without significant loss of in<strong>for</strong>mation <strong>and</strong> to offer that<br />

in<strong>for</strong>mation via the Internet at attractive real-time rates. Again Diploma <strong>and</strong> Doctoral thesis<br />

topics could address the Internet <strong>and</strong> how it can help to transport more in<strong>for</strong>mation faster <strong>and</strong> in<br />

more detail <strong>and</strong> of course in all three dimensions.<br />

Another example of the same idea is an advertisement <strong>for</strong> the Grazer Congress on the Internet.<br />

The Grazer Congress’s inside was to be viewable to far away potential organizers of conferences.<br />

They obtain a VRML view of the various inside spaces. Bec<strong>aus</strong>e of the need to compress those<br />

spaces, the data are geometrically very simple, but they carry the actual photographic texture<br />

that is available through photographs taken at the inside of the Grazer Congress.<br />

The Internet is a source of a great variety of image in<strong>for</strong>mation, an interesting variation of the city<br />

models relates to the so-called “orthophoto”, namely photographs taken from the air or from space<br />

that are geometrically corrected to take on the geometry of a map. The example of Slide 0.158<br />

shows the downtown of Washington D.C. with the U.S. Capitol (where the parliament resides).<br />

This particular web site is called “City Scenes”.<br />

0.15 Two Systems <strong>for</strong> Smart Imaging<br />

We already talked about imaging by regular cameras, by radar or non-imaging sensing <strong>and</strong> by<br />

laser. Let’s go a step further: specific smart sensing developed <strong>for</strong> city mapping. As part of a<br />

doctoral thesis in Graz a system was developed to be carried on the roof of a car with a number<br />

of cameras that allow one to reconstruct the facades of buildings in the city. Images are produced<br />

by driving with this system along those buildings. At the core of the system is a so-called linear<br />

detector array consisting of 6,000 CCD elements in color. These elements are combined with two<br />

or three optical systems, so that 3,000 elements are exposed through one lens <strong>and</strong> another 3,000<br />

elements through another lens. By properly arranging the lenses <strong>and</strong> the CCDs one obtains a<br />

system, whereby one lens collects a straight line of the facade looking <strong>for</strong>ward <strong>and</strong> the other lens<br />

collects a straight line either looking backwards or looking perpendicular at the building.<br />

In Slide 0.163 we see the car with the camera-rig driving by a few buildings in Graz- Kopernikusgasse.<br />

Slide 0.164 shows two images with various details from those images in Slide 0.165, in<br />

particular images collected of the Krones-Hauptschule. Simultaneously with the linear detector<br />

array collecting images line by line as the car moves <strong>for</strong>ward (this is also called “push broom<br />

imaging”), one can take images with a square array camera. So we have the lower resolution


20 CHAPTER 0. INTRODUCTION<br />

square array camera with maybe 700 × 500 pixels augmented by the linear detector array images<br />

with 3,000 pixels in one line <strong>and</strong> an infinite number of lines as the car drives by. The opportunity<br />

exists here as well to per<strong>for</strong>m work <strong>for</strong> Diploma or Doctoral-theses to develop the advantages <strong>and</strong><br />

disadvantages of square array versus line array cameras.<br />

A look at an image of a linear array shows its poor geometry bec<strong>aus</strong>e as the car drives there are<br />

lots of motions going on. In the particular doctoral thesis, the c<strong>and</strong>idate developed software <strong>and</strong><br />

algorithms to fix the geometric de<strong>for</strong>mations in the images. Used is the fact that many of the<br />

features are recti-linear, <strong>for</strong> example edges of windows <strong>and</strong> details on the wall. This can help to<br />

automatically produce good images. If two images are produced, one can produce a stereo<br />

rendering of the city scape. The human observer can obtain a 3 dimensional using stereo glasses,<br />

as we will discuss later.<br />

That linear detector array approach carried in a car as a rigid arrangement without any moving<br />

camera points was also used by the same author to create a panoramic camera. What is a panorama<br />

camera? This is a camera that sweeps (rotates) across the area of interest with an open shutter,<br />

producing a very wide angle of view, in this case of 360 degrees in the horizontal dimension <strong>and</strong><br />

maybe 90 degrees in the vertical direction. We can use two such images <strong>for</strong> stereoscopy by taking<br />

photos from two different positions. The example shown in Slide 0.172 has two images taken of<br />

an office space to combine into a stereo pair which can be used to recreate a complete digital 3-D<br />

model of the office space. These are the two raw images in which the “panoramic sweep” across<br />

360 o is presented as a flat image.<br />

What is the geometry of such a panoramic camera? This is rather complex. We do have a<br />

projection center O that is located on a rotation axis, which in turn defines a z-coordinate axis.<br />

The rotation axis passes through the center of an imaging lens. The CCD elements are arranged<br />

vertically at location z CCD . An object point p Obj is imaged onto the imaging surface at location<br />

z CCD . The distance between O <strong>and</strong> the vertical line through the CCD is called “focal distance”<br />

f CCD . An image is created by rotating the entire arrangement around the z-axis <strong>and</strong> collecting<br />

vertical rows of pixels of the object space, <strong>and</strong> as we move we a assemble many rows into a<br />

continuous image. One interesting topic about this typ of imaging would be to find out what<br />

the most efficient <strong>and</strong> smartest ways would be to image indoor spaces (more potential topics <strong>for</strong><br />

Diploma- <strong>and</strong> Doctoral research. To conclude Slide 0.175 is an image of an office space with a door,<br />

umbrella <strong>and</strong> a bookshelf that is created from that panoramic view in Slide 0.172 by geometrically<br />

“fixing” it to make it look like a photo from a conventional camera. The Congress Center in<br />

Graz has also been imaged in Slide 0.176 with a panoramic sweep; a separate sweep was made in<br />

slideFigure x to see how the ceiling looks when swept with a panoramic camera.<br />

0.16 International Center of Excellence <strong>for</strong> City Modeling<br />

Who is interested in research on city models in the world? What are the “centers of excellence”?<br />

In any endeavour that is new <strong>and</strong> “hot” you always want to know who is doing what <strong>and</strong> where.<br />

In Europe there were several Conferences in recent years on this subject. One of these was in<br />

Graz, one was in Ascona in Switzerl<strong>and</strong>, one in Bonn. Ascona was organized by the ETH-Zürich,<br />

Bonn by the University of Bonn, the Graz meeting by our <strong>Institute</strong>.<br />

The ETH-Zurich is home of considerable work in this area so much so that some University people<br />

even started a company, Cybercity AG. The work in Zurich addresses details of residential homes<br />

led to the organisation of two workshops in Ascona <strong>for</strong> which books have been published in the<br />

Birkhäuser-Verlag. One can see in these examples of Slide 0.182 Slide 0.183 Slide 0.184 Slide 0.185<br />

Slide 0.186 that they find edges, use those to segment the roof into it’s parts. They use multiple<br />

images of the same building to verify that the segmentation is correct <strong>and</strong> improve it if errors<br />

are found. The typical example from which they work is aerial photography at large scales (large<br />

scales are at 1:1500; small scales are at 1:20,000). Large models have been made, <strong>for</strong> example of<br />

Zurich as shown in Slide 0.186.


0.17. APPLICATIONS 21<br />

The most significant amount of work in this area of city modeling has probably been per<strong>for</strong>med at<br />

the University in Bonn. The image in Slide 0.188 is an example of an area in Munich. The method<br />

used in Bonn is fairly complex <strong>and</strong> encompasses an entire range of procedures that typically would<br />

be found in many chapters of books on image processing or pattern recognition. One calls the<br />

diagram shown in Slide 0.189 an “image processing pipeline”.<br />

The data processed in Bonn are the same as used in Zurich. There exists an international data-set<br />

<strong>for</strong> research so that various institutions have to ability to practice their skill <strong>and</strong> compare the<br />

results. We will later go through the individual worksteps that are being listed in the pipe-line.<br />

One result from Bonn using the international images shows edges <strong>and</strong> from the edges finds match<br />

points <strong>and</strong> corners in separate images of the same object. This indicates the top of a roof. This<br />

illustration in Slide 0.190 is explaining the principle of the work done in Bonn. Another Bonnapproach<br />

is to first create corners <strong>and</strong> then topologically connect the corners so that roof segments<br />

come into existence. Then these roof-segments are merged into the largest possible area that might<br />

present roofs as shown in this example.<br />

Another approach is to start the modelling of a building not from the image itself nor from its<br />

edges <strong>and</strong> corners, but to create point clouds by stereo measurements. This represents a dense<br />

digital elevation model as we have explained earlier in the Denver-example. Digital elevations are<br />

illustrated here by encoding the elevation by brightness values with dark being low, white being<br />

high. One can now try to fit planes to the elements of the digital elevation model. Slide 0.193 is<br />

an intermediate result, where it looks as if one has found some roofs. The digital elevation model<br />

here invites one to compute planes to define roofs <strong>and</strong> the sides of buildings.<br />

In North America the work on City modeling is typically sponsored by the Defence Advanced<br />

Research Projects Agency (DARPA). Their motivation is the military application, <strong>for</strong> example<br />

to fight urban wars or having robots move through cities, or face terrorists. DARPA programs<br />

typically address university research labs. The most visible ones were the University of Massachusetts<br />

in Amherst, the University of Southern Colorado, the Carnegie-Mellon University <strong>and</strong><br />

the Stan<strong>for</strong>d Research <strong>Institute</strong> (SRI), which is a spin-out from Carnegie-Mellon University. SRI<br />

is a well-known research lab that is separately organised as a foundation.<br />

In the US there are other avenues towards modeling of cities, which are not defense oriented. One<br />

is architecture. In Los Angeles there is the architecture department of the University of Cali<strong>for</strong>nia<br />

at Los Angeles. They are building a model the entire city of Los Angeles using students <strong>and</strong><br />

manual work.<br />

0.17 Applications<br />

Let me come to a conclusion of city modeling. Why do people create such modeling? The<br />

development of an anwer presents another opportunity to do application studies <strong>for</strong> Diploma<br />

<strong>and</strong> Doctoral thesis Let me illustrate some of those applications of city models. These certainly<br />

include city planning, architectural design, (car-)navigation, there is engineering re-construction<br />

of buildings that have been damaged <strong>and</strong> need to be repaired, then infotainment (entertainment),<br />

there is simulation <strong>and</strong> training <strong>for</strong> fire-guards <strong>and</strong> <strong>for</strong> disaster preparedness. Applications can<br />

be found in Telecom or in the Military. A military issue is guidance of robot soldiers, targeting<br />

<strong>and</strong> guiding of weapons. In Telecom we may need to transmit data from roof to roof as one<br />

way of broad b<strong>and</strong> wireless access systems. In infotainment we might soon have 3-dimensional<br />

phonebooks.<br />

0.18 Telecom Applications of City Models<br />

A particular computer graphics <strong>and</strong> image processing issue which should be of specific interest to<br />

Telematics-people is “the use of building models <strong>for</strong> Telecom <strong>and</strong> how these building models are


22 CHAPTER 0. INTRODUCTION<br />

made”. In Slide 0.202 is a three dimensional model of the downtown of Montreal. The purpose<br />

of this model is the plan to setup on top of roofs of high buildings. Those antennas would serve<br />

as hubs to illuminate other buildings <strong>and</strong> to receive data from other buildings, in a system that<br />

is called Local Multi-Point Distribution system (LMDS). This is a broadb<strong>and</strong> wireless access<br />

technology that competes with fibre optics in the ground <strong>and</strong> with satellite communication. We<br />

will see how the technologies will shake out, but LMDS is evolving everywhere, it is very scaleable<br />

since one can build up the system sequentially hub-by-hub, <strong>and</strong> one can increase the per<strong>for</strong>mance<br />

sequentially as more <strong>and</strong> more users in buildings sign up.<br />

Slide 0.204 is a model of a large section of Vancouver, where the buildings are modeled in support<br />

of an LMDS-project. In order to define where to place a hub one can go into software that<br />

automatically selects the best location <strong>for</strong> a hub. For example if we place an antenna on a high<br />

building we then can determine which buildings illuminated from that antenna <strong>and</strong> which are not.<br />

We use examples from a Canadian project to map more than 60 cities. One delivers to the<br />

Telecom company so called “raster data”, but also so called “vector data”, <strong>and</strong> also non-graphic<br />

data, namely addresses. We will talk later about raster <strong>and</strong> vector data structures, <strong>and</strong> we will<br />

discuss how they are converted into one another.<br />

The geometric accuracy of the shape of these buildings should be in the range of ±1 meter in x,<br />

y, <strong>and</strong> z in order to be useful <strong>for</strong> the optimum location of antennas.<br />

How many buildings are in a square km? In Montreal this was about 1000 buildings per sqkm<br />

in the downtown. Bec<strong>aus</strong>e the data need to be delivered quickly (Telecom-companies need them<br />

“now”), one can not always have perfect images to extract buildings from. So one must be able<br />

to mix pre-existing photography <strong>and</strong> new aerial sources <strong>and</strong> work from what is there. For this<br />

reason one needs to be robust in one’s procedures vis-à-vis the type of photography. The question<br />

often is: from what altitude is that photography taken <strong>and</strong> there<strong>for</strong>e what is the scale of the<br />

photographs?<br />

Some Telecom-companies want all buildings (commercial <strong>and</strong> residential), while others only need<br />

the commercial buildings. Most of the companies want all addresses. Even multiple addresses<br />

must be provided in the case of an apartment building. There is always a need to be quick <strong>and</strong><br />

inexpensive Companies expect that a hundred sqkm can be modeled per week which is a hundred<br />

thous<strong>and</strong> buildings per week. One cannot achieve this by h<strong>and</strong>. One has to do this by machine.<br />

One challenge might be that one is faced with aerial photography that is flown at too large a<br />

scale. Slide 0.207 shows a high-riser, looks different in one view from the other stereoscopic view<br />

in Slide 0.208. In a high-rise building we may not even see a certain side of the building in one<br />

photograph, but we see that side in the other. Our procedure must cope with these dissimilarities.<br />

In Slide 0.209 is a set of polygons extracted from an image <strong>and</strong> one can already see that some<br />

polygons are not visible from that particular photograph. Clearly those data where extracted from<br />

another photograph as shown in Slide 0.210. The same situation is illustrated again in this second<br />

example of Slide 0.211. Finally we have a raster representation of the buildings in Slide 0.212. So<br />

we have an (x, y)-grid on the ground <strong>and</strong> to each (x, y)-grid we have a z-elevation. The images<br />

shown be<strong>for</strong>e were the source of the building in the center of this particular raster representation.<br />

But we also want a vector-representation of the building footprints <strong>and</strong> of the details of the roofs<br />

as in the example of downtown Montreal. These vectors are needed, bec<strong>aus</strong>e the addresses can<br />

be associated with polygons describing a building, but one has a harder time associating addresses<br />

with a raster representation. However, the signal propagation computation needs raster data as<br />

shown here.<br />

The entire area of central Montreal has 400,000 buildings as shown in Slide 0.217. Zooming in<br />

on the green segment permits one to see city-blocks. Zooming in further produces individual<br />

buildings. A very complex building is the cathedral, which on a an aerial photograph looks like<br />

Slide 0.220.


0.18. TELECOM APPLICATIONS OF CITY MODELS 23<br />

Lets us summarize: the data-sets being used <strong>for</strong> this Telecom wave-propagation modeling in the<br />

LMDS application consists first of all of vector data of the buildingsSlide 0.222 but also of the<br />

vegetation, bec<strong>aus</strong>e the vegetation may block the intervisibility of antennas <strong>and</strong>, we show also<br />

the combination of both. Of course the same data are needed in a raster <strong>for</strong>mat of the building<br />

data, <strong>and</strong> finally a combination of raster <strong>and</strong> vector data to include the trees. And we must<br />

not <strong>for</strong>get the addresses. Again, there may be one address per building, or multiple addresses <strong>for</strong><br />

each building. The addresses are locked to the geometric data address-locators that are placed<br />

inside the polygons. As a result the addresses are associated with the polygons <strong>and</strong> thus with the<br />

buildings.<br />

What do such Telecom-data-sets go <strong>for</strong> in terms of price? A building may cost between $ 1 <strong>and</strong> $<br />

25. A square km may go <strong>for</strong> $ 100 to $ 600. However, if there are 1000 buildings per sqkm then<br />

obviously an individual building may cost less than one dollar. A metropolis such as Montreal<br />

may cover 4000 square km but the interest is focussed on 800 sqkm. On average of course there<br />

are less than 1000 buildings per sqkm. One might find more typically 200 or so buildings per sqkm<br />

over larger metropolitan regions.<br />

...


24 CHAPTER 0. INTRODUCTION


0.18. TELECOM APPLICATIONS OF CITY MODELS 25<br />

Slide 0.1 Slide 0.2 Slide 0.3 Slide 0.4<br />

Slide 0.5 Slide 0.6 Slide 0.7 Slide 0.8<br />

Slide 0.9 Slide 0.10 Slide 0.11 Slide 0.12<br />

Slide 0.13 Slide 0.14 Slide 0.15 Slide 0.16<br />

Slide 0.17 Slide 0.18 Slide 0.19 Slide 0.20<br />

Slide 0.21 Slide 0.22 Slide 0.23 Slide 0.24<br />

Slide 0.25 Slide 0.26 Slide 0.27 Slide 0.28


26 CHAPTER 0. INTRODUCTION<br />

Slide 0.29 Slide 0.30 Slide 0.31 Slide 0.32<br />

Slide 0.33 Slide 0.34 Slide 0.35 Slide 0.36<br />

Slide 0.37 Slide 0.38 Slide 0.39 Slide 0.40<br />

Slide 0.41 Slide 0.42 Slide 0.43 Slide 0.44<br />

Slide 0.45 Slide 0.46 Slide 0.47 Slide 0.48<br />

Slide 0.49 Slide 0.50 Slide 0.51 Slide 0.52<br />

Slide 0.53 Slide 0.54 Slide 0.55 Slide 0.56


0.18. TELECOM APPLICATIONS OF CITY MODELS 27<br />

Slide 0.57 Slide 0.58 Slide 0.59 Slide 0.60<br />

Slide 0.61 Slide 0.62 Slide 0.63 Slide 0.64<br />

Slide 0.65 Slide 0.66 Slide 0.67 Slide 0.68<br />

Slide 0.69 Slide 0.70 Slide 0.71 Slide 0.72<br />

Slide 0.73 Slide 0.74 Slide 0.75 Slide 0.76<br />

Slide 0.77 Slide 0.78 Slide 0.79 Slide 0.80<br />

Slide 0.81 Slide 0.82 Slide 0.83 Slide 0.84


28 CHAPTER 0. INTRODUCTION<br />

Slide 0.85 Slide 0.86 Slide 0.87 Slide 0.88<br />

Slide 0.89 Slide 0.90 Slide 0.91 Slide 0.92<br />

Slide 0.93 Slide 0.94 Slide 0.95 Slide 0.96<br />

Slide 0.97 Slide 0.98 Slide 0.99 Slide 0.100<br />

Slide 0.101 Slide 0.102 Slide 0.103 Slide 0.104<br />

Slide 0.105 Slide 0.106 Slide 0.107 Slide 0.108<br />

Slide 0.109 Slide 0.110 Slide 0.111 Slide 0.112


0.18. TELECOM APPLICATIONS OF CITY MODELS 29<br />

Slide 0.113 Slide 0.114 Slide 0.115 Slide 0.116<br />

Slide 0.117 Slide 0.118 Slide 0.119


30 CHAPTER 0. INTRODUCTION


0.18. TELECOM APPLICATIONS OF CITY MODELS 31<br />

Slide 0.120 Slide 0.121 Slide 0.122 Slide 0.123<br />

Slide 0.124 Slide 0.125 Slide 0.126 Slide 0.127<br />

Slide 0.128 Slide 0.129 Slide 0.130 Slide 0.131<br />

Slide 0.132 Slide 0.133 Slide 0.134 Slide 0.135<br />

Slide 0.136 Slide 0.137 Slide 0.138 Slide 0.139<br />

Slide 0.140 Slide 0.141 Slide 0.142 Slide 0.143<br />

Slide 0.144 Slide 0.145 Slide 0.146 Slide 0.147


32 CHAPTER 0. INTRODUCTION<br />

Slide 0.148 Slide 0.149 Slide 0.150 Slide 0.151<br />

Slide 0.152 Slide 0.153 Slide 0.154 Slide 0.155<br />

Slide 0.156 Slide 0.157 Slide 0.158 Slide 0.159<br />

Slide 0.160 Slide 0.161 Slide 0.162 Slide 0.163<br />

Slide 0.164 Slide 0.165 Slide 0.166 Slide 0.167<br />

Slide 0.168 Slide 0.169 Slide 0.170 Slide 0.171<br />

Slide 0.172 Slide 0.173 Slide 0.174 Slide 0.175


0.18. TELECOM APPLICATIONS OF CITY MODELS 33<br />

Slide 0.176 Slide 0.177 Slide 0.178 Slide 0.179<br />

Slide 0.180 Slide 0.181 Slide 0.182 Slide 0.183<br />

Slide 0.184 Slide 0.185 Slide 0.186 Slide 0.187<br />

Slide 0.188 Slide 0.189 Slide 0.190 Slide 0.191<br />

Slide 0.192 Slide 0.193 Slide 0.194 Slide 0.195<br />

Slide 0.196 Slide 0.197 Slide 0.198 Slide 0.199<br />

Slide 0.200 Slide 0.201 Slide 0.202 Slide 0.203


34 CHAPTER 0. INTRODUCTION<br />

Slide 0.204 Slide 0.205 Slide 0.206 Slide 0.207<br />

Slide 0.208 Slide 0.209 Slide 0.210 Slide 0.211<br />

Slide 0.212 Slide 0.213 Slide 0.214 Slide 0.215<br />

Slide 0.216 Slide 0.217 Slide 0.218 Slide 0.219<br />

Slide 0.220 Slide 0.221 Slide 0.222 Slide 0.223<br />

Slide 0.224 Slide 0.225 Slide 0.226 Slide 0.227<br />

Slide 0.228 Slide 0.229 Slide 0.230 Slide 0.231


0.18. TELECOM APPLICATIONS OF CITY MODELS 35<br />

Slide 0.232 Slide 0.233 Slide 0.234 Slide 0.235<br />

Slide 0.236 Slide 0.237 Slide 0.238 Slide 0.239<br />

Slide 0.240 Slide 0.241 Slide 0.242 Slide 0.243<br />

Slide 0.244 Slide 0.245 Slide 0.246 Slide 0.247<br />

Slide 0.248 Slide 0.249 Slide 0.250 Slide 0.251<br />

Slide 0.252 Slide 0.253 Slide 0.254 Slide 0.255<br />

Slide 0.256 Slide 0.257 Slide 0.258 Slide 0.259


36 CHAPTER 0. INTRODUCTION


Chapter 1<br />

Characterization of Images<br />

1.1 The Digital Image<br />

Images can be generated from at least two sources. The first is creation of the image from the<br />

measurements taken by a sensor. We would call this a “natural image”. In contrast, an image may<br />

also be generated by a computer describing an object or a situation that may or may not consist<br />

in the real-world. Such images are “computer generated” (CGI, computer-generated-images).<br />

All digital images have a coordinate system associated with them. Slide 1.5 is an original <strong>and</strong><br />

typical image with two dimensions <strong>and</strong> has a rectangular (Cartesian) coordinate system with<br />

axes x <strong>and</strong> y. There<strong>for</strong>e a location in the image can be defined by its coordinates x <strong>and</strong> y.<br />

Properties of the image can now be associated with that location. In that sense the image is an<br />

algebraic function f(x, y). When we deal with digital images then we discretize this continuous<br />

function <strong>and</strong> we replace the continuous image by rows <strong>and</strong> columns of image elements or pixels.<br />

A pixel is typically to be a square or rectangular entity. More realistically of course the sensor<br />

that may have c<strong>aus</strong>ed an image may have an instantaneous field-of-view that is not rectangular<br />

or square. It is oftentimes a circle. We are presenting an image digitally as an arrangement of<br />

square pixels, although the machinery which creates the digital image may not produce square<br />

pixels.<br />

Digital images are fairly simple arrangements of numbers that are associated with gray values as<br />

illustrated in Slide 1.7. If shows four different gray values between 0 <strong>and</strong> 30 with 0 being white<br />

<strong>and</strong> 30 being black. A very simple type of image is a so-called “binary image” or binary mask.<br />

That is an image of which the pixels have gray values of either 0 as white or 1 as black. Such a<br />

binary image may be obtained by thresholding a gray value image. We may have a threshold<br />

Algorithm 2 Threshold image<br />

1: create a binary output image with the same dimensions as the input image<br />

2: <strong>for</strong> all pixel p of the input image do<br />

3: retrieve grayvalue v of pixel p from image<br />

4: find pixel p ′ of output image corresponding to p<br />

5: if v ≥ v t then {compare grayvalue v with threshold v t }<br />

6: set p ′ to white<br />

7: else<br />

8: set p ′ to black<br />

9: end if<br />

10: end <strong>for</strong><br />

37


38 CHAPTER 1. CHARACTERIZATION OF IMAGES<br />

that takes all pixel values between 15 <strong>and</strong> 25 to be black (or 1) <strong>and</strong> all other gray values will be<br />

set to white or 0.<br />

An immediate question to ask is <strong>for</strong> the reason that this technology has been developed to take<br />

continuous gray values <strong>and</strong> convert them into digital pixel arrays. Let’s discuss a few advantages,<br />

a very significant one is “quantification”. In a digital environment we are not subject to judging<br />

an image with our opinions but one has actual measurements. This can be illuminated by an<br />

example of a gray area embedded either in a dark or a white background. Subjectively our eye<br />

will tell us that the gray area is brighter when embedded in a dark environment or darker when<br />

embedded in a brighter environment. But in reality the two gray values are identical. An<br />

eye can objectively differentiate a limited number of gray values. In a chaotic image we may be<br />

able to separate only 16 to 64 gray values. Relatively, though, namely in situations where we<br />

have two areas adjacent to one another, our eyes become very sensitive to the differences. But<br />

we cannot compare a gray-tone in one corner of an image to a gray-tone in another corner of the<br />

same image <strong>and</strong> be certain which one is brighter or darker. That can be easily accomplished in a<br />

digital environment.<br />

There is a whole host of other advantages that will not be discussed at the same level of detail.<br />

First, a very important one is the automation of the visual sense. We can give the computer eyes<br />

<strong>and</strong> can process the visual in<strong>for</strong>mation by machine, <strong>and</strong> thereby taking the work of interpreting<br />

various visual inputs away from the human. Examples are quality control in a factory environment<br />

or in inaccessible, dangerous areas.<br />

Second, an advantage is “flexibility”. We have options that we do not have in an analog environment<br />

or with the natural visual sense in configuring very flexible sensing systems <strong>for</strong> very specific<br />

tasks. Third, the ability to store, retrieve, transfer <strong>and</strong> publish visual in<strong>for</strong>mation at very little<br />

cost is another advantage if the in<strong>for</strong>mation is digital. We all have of course experience now with<br />

multimedia in<strong>for</strong>mation on the web <strong>and</strong> we all know that duplication <strong>and</strong> transfer is available at<br />

almost no cost. Forth is the advantage to enhance the visual sense of the human by an array of<br />

sensors, <strong>for</strong> example under water imaging, sound imaging, x-ray imaging, microwave imaging. We<br />

will address sensors in more detail.<br />

Fifth, digital processing of sensor data is essentially independent of the specifics of the sensor. We<br />

may have algorithms <strong>and</strong> software that are applicable to a variety of sensors. That is an advantage<br />

in a digital environment. Sixth is cost: digital images are inexpensive. This was mentioned already<br />

in the context of storage, transfer <strong>and</strong> publication. Expensive looking color images can be rendered<br />

on a computer monitor <strong>and</strong> yet we have no direct costs <strong>for</strong> those images. This is quite a difference<br />

from going to a photo lab <strong>and</strong> getting quality paper prints offer diapositive.<br />

The seventh advantage of digital images needs an example to explain. There exist numerous<br />

satellites orbiting the Earth <strong>and</strong> carrying Earth-observing sensors. One such system is from the<br />

US-NASA <strong>and</strong> is called “L<strong>and</strong>sat”, Slide ?? is an example of a L<strong>and</strong>sat image of the Ennstal<br />

with its rows <strong>and</strong> columns. What makes this image interesting is that the color presentation of<br />

what the sensor in orbit “sees”. The presentation is made from 7 separate spectral channels, not<br />

from simple red/green/blue color photography. Something that is very typical of the flexibility<br />

<strong>and</strong> versatility of digital sensors <strong>and</strong> digital image processing is this ability to extend the visual<br />

capabilities of humans <strong>and</strong> operate with many more images than a human can “see” or cope with.<br />

Prüfungsfragen:<br />

• Was versteht man unter einem ”<br />

Schwellwertbild“, und für welchen Zweck wird es verwendet?<br />

• Welche Vorteile haben digitale Bilder gegenüber analogen Bildern?<br />

• Was versteht man unter einem Mehrfach- oder Multispektralbild, und wofür wird es verwendet?


1.2. THE IMAGE AS A RASTER DATA SET 39<br />

1.2 The Image as a Raster Data Set<br />

A digital image is an array of pixels. It was already mentioned that in principle the images are<br />

continuous functions f(x, y). A very simple “image model” states that f(x, y) is the product of<br />

two separate functions. One function is the illumination I <strong>and</strong> the other function describes the<br />

properties of the object that is being illuminated, namely the reflection R. The reflection function<br />

may vary between 0 <strong>and</strong> 1 whereas the illumination function may vary between 0 <strong>and</strong> ∞.<br />

We now need to discretize this continuous function in order to end up with a digital image.<br />

We might create 800 by 1000 pixels, a very typical arrangement of pixels <strong>for</strong> the digital sensing<br />

environment. So we sample our continuous function f(x, y) into an N × M matrix with N rows<br />

<strong>and</strong> M columns. Typically our image dimension are 2 n . So our number of rows may be 64, 128,<br />

512, 1024 etc. We not only discretize or sample the image (x, y)-locations. We also have to take<br />

the gray value at each location <strong>and</strong> discretize it. We do that also at 2 b , with b typically being<br />

small <strong>and</strong> producing 2, 4, 8, 12, 16 bits per pixels.<br />

Definition 1 Amount of data in an image<br />

Definition 3: ”The amount of data of an image”<br />

To calculate the amount of data of an image you have to have given the geometric <strong>and</strong> radiometric<br />

resolution of the image.<br />

Let’s say we have an image with N columns <strong>and</strong> M rows (geometric resolution) <strong>and</strong> with the<br />

radiometric resolution of R bits per pixel.<br />

The amount of data b of the image is then calculated using the <strong>for</strong>mula:<br />

b = N ∗ M ∗ R<br />

A very simple question is shown in Slide 1.20. If we create an image of an object <strong>and</strong> we need to<br />

underst<strong>and</strong> from the image a certain detail in the object, say a spec of dirt on a piece of wood of<br />

60 cm by 60 cm, <strong>and</strong> if that dirt can be as small as 0.08 mm 2 , what’s the size of the image to be<br />

sure that we recognize all the dirt spots?<br />

The resolution of an image is a widely discussed issue. When we talk about a geometric resolution<br />

of an image than we typically associate with this the size of the pixel on the object <strong>and</strong> the number<br />

of pixels in an image. When we talk about radiometric resolution than we describe here the number<br />

of bits we have per pixel. Let us take the example of geometric resolution. We have in Slide 1.22<br />

<strong>and</strong> Slide 1.23 a sequence of images of a rose that begins with a resolution of a 1000 by 1000 pixels.<br />

We go down from there to ultimately 64 by 64 or even 32 by 32 pixels. Clearly at 32 by 32 pixels<br />

we cannot recognize the rose any more.<br />

Lets take a look at the radiometric resolution. We have in Slide 1.24 a black <strong>and</strong> white image of<br />

that a rose at 8 bits per pixel. We reduce the number of bits <strong>and</strong> in the extreme case we have<br />

one bit only, resulting in a binary image (either black or white). In the end we may have a hard<br />

time interpreting what we are looking at, unless we know already what to expect. As we will see<br />

later, image processing a 8-bits in black & white images is very common. A radiometric resolution<br />

at more bits per black & white pixel is needed <strong>for</strong> example in radiology. In medicine it is not<br />

uncommon to use 16 bits per pixel. With 8 bits we obviously get 256 gray values, if we have 12<br />

bits we have 4096 gray values.<br />

The color representation is more complex, we will talk about that extensively. In that case we<br />

do not have one 8-bit number per color pixel, but we typically have three numbers, one each <strong>for</strong><br />

red/green/blue, thus 24 bits in total per each color pixel.


40 CHAPTER 1. CHARACTERIZATION OF IMAGES<br />

Prüfungsfragen:<br />

• Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“.<br />

”<br />

darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells?<br />

Was ist<br />

• Beschreiben Sie den Vorgang der Diskretisierung beim<br />

einem digitalen Bild.<br />

Übergang von einem analogen zu<br />

• Was versteht man unter Sampling, und welche Probleme treten dabei auf? Sie sind eingeladen,<br />

in Ihrer Antwort Formeln zu benutzen.<br />

• Was bedeuten die Begriffe ”<br />

geometrische“ bzw. ”<br />

radiometrische“ Auflösung eines Bildes?<br />

Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen.<br />

1.3 System Concepts<br />

We talk about image-analysis, image-processing or pattern recognition <strong>and</strong> about computer graphics.<br />

What are their various basic ideas? Image processing goes from the image to a model of an<br />

object, <strong>and</strong> from there to an underst<strong>and</strong>ing of the object. In [GW92] an image analysis system<br />

is described in the first introduction chapter. One always begins with (a) sensors, thus with the<br />

image acquisition step, the creation of an image by a camera, radar system, by sound. Once the<br />

image is acquired it is, so to speak, “in the can”. We now can (b) improve the image, this is called<br />

“pre-processing”. Improving means fixing errors in the image, making the image look good <strong>for</strong><br />

the eye if a human needs to inspect it. Preprocessing produces a new, improved image.<br />

We now want to decompose the image into its primitives. We would like to (c) segment it into areas<br />

or fields, edges, lines, regions. This creates from the pre-processed image as it has been seen visually<br />

a new image in which the original pixels are substituted by the image regions, contours, edges. We<br />

denote this as “segmentation”. After segmentation we need to create a (d) representation <strong>and</strong> a<br />

description of the image contents. And finally we want to use the image contents <strong>and</strong> (e) interpret<br />

their meaning. How do objects looks like? This phase is called recognition <strong>and</strong> interpretation.<br />

All of this is based on (f) knowledge about a problem domain, about the sensor, about the object,<br />

about the application of the in<strong>for</strong>mation. So once the object in<strong>for</strong>mation has been interpreted we<br />

now can use the in<strong>for</strong>mation extracted from the image <strong>for</strong> action. We may make a decision to e.g.<br />

move a robot, or to dispose of a defective part or to place an urban waste dump <strong>and</strong> so <strong>for</strong>th.<br />

The typical ideas at the basis of computer graphics are slightly different. We start out from the<br />

computer in which we store data about objects <strong>and</strong> create an image as a basis <strong>for</strong> actions. So<br />

we have a database <strong>and</strong> an application model. We have a program to take the data from the<br />

database <strong>and</strong> to feed the data into a graphic system <strong>for</strong> display. The object of computer graphics<br />

is the visual impression of a human user. However, what may seem like two different worlds, image<br />

processing versus computer graphics, really are largly one <strong>and</strong> the same world. Image processing<br />

creates from images of the real world a model of that real world. <strong>Computer</strong> graphics takes a model<br />

of objects <strong>and</strong> creates from it an image of those objects. So in terms of a real world, computer<br />

graphics <strong>and</strong> image processing are entirely complementary. Image processing is going from real<br />

world to a model of the real world, <strong>and</strong> computer graphics takes the object of the real world <strong>and</strong><br />

creates an image of it.<br />

Where those two areas do diverge is in the non-real world. There is no sensing <strong>and</strong> no image<br />

analysis of a non-real world. What is computer graphics of a non-real world? Just look at<br />

cartoons <strong>and</strong> the movies. So there is point-of-view that says that image processing <strong>and</strong> computer<br />

graphics belong together. A slightly different point of view is to say that image processing <strong>and</strong><br />

computer graphics overlap in areas addressing the real world, <strong>and</strong> that there are areas that are<br />

separate.


1.4. DISPLAYING IMAGES ON A MONITOR 41<br />

Prüfungsfragen:<br />

• Skizzieren Sie den Vorgang der Bilderkennung als Kette von Prozessen von der Szene bis hin<br />

zur Szenenbeschreibung.<br />

1.4 Displaying Images on a Monitor<br />

The customary situation today is with a refresh buffer in which we store numbers <strong>and</strong> represent<br />

the image. We will use a display controller that managers this buffer based on data <strong>and</strong> software<br />

residing on a host computer. And we have a video controller that takes what’s in the buffer <strong>and</strong><br />

presents this in<strong>for</strong>mation on a computer monitor. In the buffer we might have a binary image<br />

at 1 bit per pixel. Or we may have a color image at 24 bits per pixel. These are the typical<br />

arrangements <strong>for</strong> refresh buffers. The refresh buffer typically is larger than the in<strong>for</strong>mation on<br />

a computer monitor. The computer monitor may display 800 by 1000 pixels. The refresh-buffer<br />

might hold 2000 by 2000 pixels. An image is displayed on the monitor using a cathode-ray tube<br />

or as LCD-arrangement. On a cathode-ray tube the image is being painted line by line on the<br />

phosphor is surface, going from top to bottom.<br />

Then the ray gets turned off. So it moves from left to right with the beam-on, right to left<br />

with the beam-off, top down with beam-on, down-to-top at beam-off. An image like the one in<br />

Slide “Wiedergabe bildhafter In<strong>for</strong>mation” is a line drawing. How could this be represented on<br />

a monitor? In the early days this was by a vector scan, so the cathode-ray was used to actually<br />

paint vectors on the monitor. Very expensive vector display monitors where originally built maybe<br />

as long as into the mid-80’s. The development of television monitors became very inexpensive,<br />

but vector monitors remained expensive, <strong>and</strong> so a transition took place from vector monitors to<br />

raster monitors, <strong>and</strong> today everything is represented in this raster. Vector scan) We could have<br />

a raster display to present the contours of an object, but we can also fill the object in the raster<br />

data <strong>for</strong>mat.<br />

Not all representations on a monitor are always dealing with the 3-dimensional world. Many<br />

representations in image <strong>for</strong>m can be of an artificial world or of technical data, thus of non-image<br />

in<strong>for</strong>mation. This is typically denoted by the concept of “visualization”. Slide “Polyline” is a<br />

visualization of data in one dimension. Associated with this very simple idea are concepts such as<br />

polylines (representing a bow tie) <strong>and</strong> we have a table of points 0 to 6 representing this polyline.<br />

There are concepts such as “markers” which are symbols that represent particular values in a two<br />

dimensional array. This has once been a significant element in computer graphic literature that<br />

today no longer represents a big issue.<br />

Prüfungsfragen:<br />

• Beschreiben sie die Komponenten, die in einem <strong>Computer</strong> zur Ausgabe und zur interaktiven<br />

Manipulation eines digitales Rasterbildes benötigt werden.<br />

• Beschreiben Sie unter Verwendung einer Skizze den Aufbau eines digitalen Rasterbildes auf<br />

der Leuchtfläche eines Elektronenstrahlschirmes .<br />

• Was ist der Unterschied zwischen Vektor- und Rasterdarstellung eines digitalen Bildes?<br />

Veranschaulichen Sie Ihre Antwort anh<strong>and</strong> eines einfachen Beispiels und beschreiben Sie die<br />

Vor- und Nachteile beider Verfahren.<br />

• Erklären Sie anh<strong>and</strong> einer Skizze den zeitlichen Ablauf des Bildaufb<strong>aus</strong> auf einem Elektronenstrahlschirm!


42 CHAPTER 1. CHARACTERIZATION OF IMAGES<br />

Algorithm 3 Simple raster image scaling by pixel replication<br />

1: widthratio ⇐ newimagewidth/oldimagewidth<br />

2: heightratio ⇐ newimageheight/oldimageheight<br />

3: <strong>for</strong> all y such that 0 ≤ y < newimageheight do<br />

4: <strong>for</strong> all x such that 0 ≤ x < newimagewidth do<br />

5: newimage[x, y] ⇐ oldimage[round(x/widthratio), round(y/heightratio)]<br />

6: end <strong>for</strong><br />

7: x ⇐ 0<br />

8: end <strong>for</strong><br />

Algorithm 4 Image resizing<br />

1: widthratio ⇐ newgraphicwidth/oldgraphicwidth<br />

2: heightratio ⇐ newgraphicheight/oldgraphicheight<br />

3: <strong>for</strong> all Points p in the graphic do<br />

4: p.x ⇐ p.x × widthratio<br />

5: p.y ⇐ p.y × heightratio<br />

6: end <strong>for</strong><br />

1.5 Images as Raster Data<br />

We deal with a continuous world of objects, such as curves or areas <strong>and</strong> we have to convert them<br />

into pixel arrays. Slide “Rasterkonvertiertes Objekt” shows the representation of a certain figure<br />

in a raster image. If we want to enlarge this, we obtain a larger figure with the exact same shape<br />

but a larger size of the object’s elements. If we enlarged the image by a factor of two, what was<br />

one pixel be<strong>for</strong>e now is talking up four pixels. The same shape that we had be<strong>for</strong>e would look<br />

identical but smaller if we had smaller pixels. We make a transition to pixels that are only a<br />

quarter as large as be<strong>for</strong>e. If we now enlarge the image, starting from the smaller pixels we get<br />

back the same shape we had be<strong>for</strong>e. However, if we reconvert from the vector to a raster <strong>for</strong>mat,<br />

then the original figure really will produce a different result at a higher resolution. So we need to<br />

underst<strong>and</strong> what pixel size <strong>and</strong> geometric resolution do in the transition from a vector world to a<br />

raster world.<br />

Prüfungsfragen:<br />

• Was versteht man unter ”<br />

Rasterkonversion“, und welche Probleme können dabei auftreten?<br />

1.6 Operations on Binary Raster Images<br />

There is an entire world of interesting mathematics dealing with binary images <strong>and</strong> operations on<br />

such binary images. These ideas have to do with neighborhoods, connectivity, edges, lines, <strong>and</strong><br />

regions. This type of mathematics was developed in the 1970’s. A very important contributor was<br />

Prof. Azriel Rosenfeld, who with Prof. Avi Kak wrote the original book on pattern recognition<br />

<strong>and</strong> image processing.<br />

What is a neighborhood? Remember that a pixel at location (x, y) has a neighborhood of<br />

four pixels, that are up <strong>and</strong> down, left <strong>and</strong> right of the pixel in the middle. We call this an N 4<br />

neighbourhood or 4-neighbors. We can also have diagonal neighbors N D with the lower left, lower<br />

right, upper right, upper left neighbors. We add these N D <strong>and</strong> the N 4 neighbors to obtain the N 8<br />

neighbors. This is being further illustrated as Prof. Rosenfeld did in 1970. Slide 1.56 presents<br />

the N 4 -neighbors <strong>and</strong> the N 8 -neighbors <strong>and</strong> associates this with a chess game’s movements of the<br />

king. We may also have the oblique-neighbors N v <strong>and</strong> the springer-neighbors N sp which are like


1.6. OPERATIONS ON BINARY RASTER IMAGES 43<br />

analogous chess movements of the springer etc.<br />

from the “Dame” game.<br />

Another diagonal neighborhood would derive<br />

We have neighborhoods of the first order, which are the neighbors of a pixel-x. The neighbors of<br />

the neighbors are “neighbors of second order” with respect to a pixel at x. We could increase the<br />

order by having neighbors of the neighbors of the neighbors.<br />

Definition 2 Connectivity<br />

2 Pixel haengen zusammen, wenn sie ein<strong>and</strong>ers Nachbarn sind und dieselbe Zusammenhangseigenschaft<br />

V besitzen.<br />

4-Zusammenhang:<br />

1: if q N4-Nachbar von p then {Def. 5}<br />

2: Pixel p und q haengen zusammen<br />

3: else<br />

4: Pixel p und q haengen nicht zusammen<br />

5: end if<br />

m-Zusammenhang:<br />

1: if (N4 (p) geschnitten N4 (q)) = 0 then {N4( x): Menge der x-N4-Nachbarn}<br />

2: if (q ist N4 -Nachbar von p)||(q ist ND-Nachbar von p) then {Def. 5}<br />

3: Pixel p und q haengen zusammen<br />

4: else<br />

5: Pixel p und q haengen nicht zusammen<br />

6: end if<br />

7: else<br />

8: Pixel p und q haengen nicht zusammen<br />

9: end if<br />

Connectivity is defined by two pixels belonging together: They are “connected” if they are one<br />

another’s neighbors. So we need to have a neighbor-relationship to define connectivity. Depending<br />

on a 4-neighborhood, an 8-neighborhood, a springer-neighborhood we can define various types of<br />

connectivities. We there<strong>for</strong>e say that two pixels p <strong>and</strong> q are one another’s neighbors if they are<br />

connected, if they are neighbors under a neighborhood-relationship.<br />

This becomes pretty interesting <strong>and</strong> useful once we start to do character-recognition <strong>and</strong> we need<br />

to figure out which pixels belong together <strong>and</strong> create certain shapes. We may have an example<br />

of three-by-three pixels of which four pixels are black <strong>and</strong> five pixels are white. We now can<br />

have connections established between those four black pixels under various connectivity rules. A<br />

connectivity with eight neighbors creates a more complex shape than a connectivity via so-called m-<br />

neighbors, where m-neighbors have been defined previously in Slide “Zusammenhaengende Pixel”.<br />

Definition 3 Distance<br />

Gegeben: Punkte p(x,y) und q(s,t)<br />

1: De(p,q) = 2√ (x − s) 2 + (y − t) 2 (Euklidische Distanz)<br />

2: D4-Distanz (City Block Distance)<br />

3: D8-Distanz (Schachbrett-Distanz)<br />

The neighborhood- <strong>and</strong> connectivity-relationships can be used to established distances between<br />

pixels, to define edges, lines <strong>and</strong> region in images, to define contours of objects, to find a path<br />

between any two locations in an image <strong>and</strong> to perhaps eliminate pixels as noise if they are not<br />

connected to any other pixels. A quick example of a distance addresses two pixels P <strong>and</strong> Q with


44 CHAPTER 1. CHARACTERIZATION OF IMAGES<br />

a distance depending on the neighborhood-relationships that we have defined. The Euclidian<br />

distance of course is simply obtained by the pythagorean sum of the coordinate differences. But<br />

if we take a 4-neighborhood as the base <strong>for</strong> distance measurements than we have a “city block<br />

distance”, two blocks up, two blocks over. Or if we have the 8-neighborhood than we have a<br />

“chessboard type of distance”.<br />

Let’s define an “edge”. This is important bec<strong>aus</strong>e there is a mathematical definition that is a little<br />

different from what one would define an edge to be in a sort casual way. An edge e in an image<br />

is a property of a pair of pixels which are neighbors of one another. That is thus a property of<br />

a pair of pixels <strong>and</strong> one needs to consider two pixels to define this. It is important that the two<br />

pixels are neighbors under a neighborhood relationship. Any pair of pixels that are neighbors of<br />

one another represent an edge. The edge has a “direction” <strong>and</strong> a “strength”. Clearly the strength<br />

of the edge is what is important to us. The edge is defined on an image B <strong>and</strong> an edge image is<br />

obtained by taking each edge value at each pixel. We can apply a threshold to the weight <strong>and</strong> the<br />

direction of the edge. All edges with a weight beyond a certain value become 1 <strong>and</strong> all edges less<br />

than a certain value become 0. In that case now we have converted our image into a binary edge<br />

image.<br />

What is a line? A line is a finite sequence of edges, with each edge e i , i = 1, . . . n. A line is<br />

a sequence of edges where the edges need to be one another’s neighbor under a neighborhood<br />

relationship. The edges must be connected. A line has a length, the length is the number of the<br />

edges that <strong>for</strong>m that line.<br />

What’s a region in the image? A region is a connected set R of pixels from an image B. A region<br />

has a contour. A contour is a line composed of edges <strong>and</strong> the edges are defined with the property<br />

of two neighboring pixels P <strong>and</strong> Q. P must be part of the region R, Q must not be. This sounds<br />

all pretty intuitive, but gets pretty complicated once one starts doing operations.<br />

Prüfungsfragen:<br />

• Wenn wir eine ”<br />

Distanz“ zwischen zwei Pixeln in einem Digitalbild anzugeben haben, stehen<br />

uns verschiedene Distanzmaße zur Verfügung. Zählen Sie bitte auf, welche Distanzmaße Sie<br />

kennen. Sie sind eingeladen, für die Beantwortung Formeln zu nutzen.<br />

• Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle<br />

”<br />

Arten von Nachbarschaften auf, die in der Vorlesung beh<strong>and</strong>elt wurden, und beschreiben Sie<br />

diese Nachbarschaften mittels je einer Skizze.<br />

• Welche Möglichkeiten gibt es, Pixel in einem digitalen Rasterbild als zusammenhängend zu<br />

definieren? Erläutern Sie jede Definition anh<strong>and</strong> einer Skizze.<br />

• Zu welchen Zwecken definiert man Nachbarschafts- und Zusammenhangsbeziehungen zwischen<br />

Pixeln in digitalen Rasterbildern?<br />

• Geben Sie die Definitionen der Begriffe ”<br />

Kante“, ”<br />

Linie“ und ”<br />

Region“ in einem digitalen<br />

Rasterbild an.<br />

1.7 Algebraic Operations on Images<br />

We can add two images, subtract, multiply, divide them, we can compare images by some logical<br />

operations <strong>and</strong> we can look at one image using a second image as a mask. Suppose we have a<br />

source image, an operator <strong>and</strong> a destination image. Now, depending on the operator we obtain<br />

a resulting image. We take a particular source <strong>and</strong> destination image <strong>and</strong> make our operator<br />

the function “replace” or the function “or” or the function “X or” or the function “<strong>and</strong>” to then<br />

obtain different results. We may have mask operations. In this case we take an image A to


1.7. ALGEBRAIC OPERATIONS ON IMAGES 45<br />

Algorithm 5 Logical mask operations<br />

This is an example <strong>for</strong> a mask operation. Two images are linked with the Boolean OR-operator,<br />

pixel by pixel.<br />

1: <strong>for</strong> all i=0, i


46 CHAPTER 1. CHARACTERIZATION OF IMAGES<br />

mask. For this we may have an input frame buffer A <strong>and</strong> an output frame puffer B. We may<br />

be able to process everything that is in these two buffers in a 1/30 of a second. So we can do an<br />

operation on N times M pixels in (N × M)/30 seconds, as illustrated in Slide “Operationen”.<br />

Prüfungsfragen:<br />

• Gegeben seien die zwei binären Bilder in Abbildung ??. Welches Ergebnis wird durch eine<br />

logische Verknüpfung der beiden Bildern nach einer ”<br />

xor“-Operation erhalten? Verwenden<br />

Sie bitte eine Skizze.<br />

• Erläutern Sie anh<strong>and</strong> einiger Beispiele, was man unter algebraischen Operationen mit zwei<br />

Bildern versteht.<br />

• Erklären Sie die Begriffe ”<br />

Maske“, ”<br />

Filter“ und ”<br />

Fenster“ im Zusammenhang mit algebraischen<br />

Operationen mit zwei Bildern. Veranschaulichen Sie Ihre Antwort anh<strong>and</strong> einer Skizze.


1.7. ALGEBRAIC OPERATIONS ON IMAGES 47


48 CHAPTER 1. CHARACTERIZATION OF IMAGES<br />

Slide 1.1 Slide 1.2 Slide 1.3 Slide 1.4<br />

Slide 1.5 Slide 1.6 Slide 1.7 Slide 1.8<br />

Slide 1.9 Slide 1.10 Slide 1.11 Slide 1.12<br />

Slide 1.13 Slide 1.14 Slide 1.15 Slide 1.16<br />

Slide 1.17 Slide 1.18 Slide 1.19 Slide 1.20<br />

Slide 1.21 Slide 1.22 Slide 1.23 Slide 1.24<br />

Slide 1.25 Slide 1.26 Slide 1.27 Slide 1.28


1.7. ALGEBRAIC OPERATIONS ON IMAGES 49<br />

Slide 1.29 Slide 1.30 Slide 1.31 Slide 1.32<br />

Slide 1.33 Slide 1.34 Slide 1.35 Slide 1.36<br />

Slide 1.37 Slide 1.38 Slide 1.39 Slide 1.40<br />

Slide 1.41 Slide 1.42 Slide 1.43 Slide 1.44<br />

Slide 1.45 Slide 1.46 Slide 1.47 Slide 1.48<br />

Slide 1.49 Slide 1.50 Slide 1.51 Slide 1.52<br />

Slide 1.53 Slide 1.54 Slide 1.55 Slide 1.56


50 CHAPTER 1. CHARACTERIZATION OF IMAGES<br />

Slide 1.57 Slide 1.58 Slide 1.59 Slide 1.60<br />

Slide 1.61 Slide 1.62 Slide 1.63 Slide 1.64<br />

Slide 1.65 Slide 1.66 Slide 1.67 Slide 1.68<br />

Slide 1.69 Slide 1.70 Slide 1.71 Slide 1.72<br />

Slide 1.73 Slide 1.74 Slide 1.75


Chapter 2<br />

Sensing<br />

2.1 The Most Important Sensors: The Eye <strong>and</strong> the Camera<br />

The eye is the primary sensor of a human. It is certainly important to underst<strong>and</strong> how it operates<br />

to underst<strong>and</strong> how a computer can mimic the eye <strong>and</strong> how certain new ideas in computer vision<br />

<strong>and</strong> also in computer graphics have developed taking advantage of the specificities of the eye.<br />

In Slide Slide 2.5 we show an eye <strong>and</strong> define an optical axis of an eye’s lens. This optical axis<br />

intersects the retina at a place called the fovea, which is the area of highest geometric <strong>and</strong> radiometric<br />

resolution. The lens can change its focal length using muscles that pull on the lens <strong>and</strong><br />

change its shape. As a result the human can focus on objects that are near by, <strong>for</strong> example at a<br />

25 cm distance which is typically used in reading a newspaper or book. Or it can focus at infinity<br />

looking out into the world.<br />

The light that is projected from the world through the lens onto the retina gets converted into<br />

signals that are then fed by nerves into the brain. The place where the nerve leaves the eye is<br />

called the blind spot. That is a location where no image can be sensed. The optical system of<br />

the eye consists, apart from the lens, of the so called vitreous humor 1 , in front of the lens is a<br />

protective layer called the cornea 2 <strong>and</strong> between the lens <strong>and</strong> the cornea is a space filled with liquid<br />

called the anterior chamber. There<strong>for</strong>e the optical system of the eye consists of essentially four<br />

optically active bodies: 1. the cornea, 2. the anterior chamber, 3. the lens <strong>and</strong> 4. the vitreous<br />

humor.<br />

The conversion of light into nerve signals is accomplished by means of rods <strong>and</strong> cones that are<br />

embedded in the retina. The rods 3 are black-<strong>and</strong>-white sensors. The eye has about 75 million of<br />

them, <strong>and</strong> they are distributed widely over the retina.<br />

If there is very little light, the rods will still be able to receive photons <strong>and</strong> convert them into<br />

recognizable nerve-signals. If we see color, we need the cones 4 . We have only 6 million of those<br />

<strong>and</strong> they are not that evenly distributed as the rods are. They are concentrated at the fovea so<br />

that the fovea has about 150.000 of those cones per square millimeter. That number is important<br />

to remember <strong>for</strong> a discussion of resolution later on.<br />

We take a look at the camera as an analogon of an eye. A camera may produce black-<strong>and</strong>-white<br />

or color-images, or even false color-images. Slide is a typical color image taken from an airplane of<br />

a set of buildings (see these images also in the previous Chapter 0). This color-photograph is built<br />

1 in German: Glaskörper<br />

2 in German: Hornhaut<br />

3 in German: Zäpfchen<br />

4 in German: Stäbchen<br />

51


52 CHAPTER 2. SENSING<br />

from three component images. First is a the red channel. Second is the green channel followed by<br />

the blue channel. We can combine those red/green/blue channels into a true color-image.<br />

In terms of technical imaging, a camera is capable of producing a single image or an entire image<br />

sequence. When we have multiple images or image sequences, we typically denote them as multiimages.<br />

A first case be in the <strong>for</strong>m of multi-spectral images, if we break up the entire range of electromagnetic<br />

radiation from ultraviolet to infrared into individual b<strong>and</strong>s <strong>and</strong> produce a separate image<br />

<strong>for</strong> each b<strong>and</strong>. We call the sum of those images multi-spectral. If we have many of those b<strong>and</strong>s we<br />

might call the images hyper-spectral. Typical hyper-spectral image cameras produce 256 separate<br />

images simultaneously, not just red/green/blue!<br />

A second case is to have the camera sit somewhere <strong>and</strong> make images over <strong>and</strong> over, always in the<br />

same color but observing changes in the scene. We call that multi-temporal. A third case is to<br />

observe a scene or an object from various positions. A satellite may fly over Graz <strong>and</strong> take images<br />

once as the satellite barely arrives over Graz, a moment later as the satellite already leaves Graz.<br />

We call this multi-position images.<br />

And then finally, a fourth case might have images taken not only by one sensor but by multiple<br />

sensors, not just by a regular optical camera, but perhaps also by radar or other sensors as we will<br />

discuss them later. That approach will produce some multi-sensor images.<br />

This multiplicity of images presents a very interesting challenge in image processing. Particularly<br />

when we have a need to merge images that are taken at separate times from separate positions<br />

<strong>and</strong> with different sensors, <strong>and</strong> if we want to automatically extract in<strong>for</strong>mation about an object<br />

from many images of that object, we have a good challenge. Multiple digital images of a particular<br />

object location results in multiple pixels per given location.<br />

Those pixels can be stacked on top of one another <strong>and</strong> then represent “a vector” with the actual<br />

gray values in each individual image being the “elements” of that vector. We can now apply the<br />

ideology of vector algebra to these multi-image pixels. Such a vector may be called feature vector,<br />

with the features being the color values of the pixel to which the vector belongs.<br />

Prüfungsfragen:<br />

• Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige<br />

Beispiele für Mehrfachbilder!<br />

2.2 What is a Sensor Model?<br />

So far we have only talked about one particular sensor, the camera as an analagon to the eye. We<br />

describe in image processing each sensor by a so called sensor model. What does a sensor model<br />

do? It replaces the physical image <strong>and</strong> the process of its creation by a geometric description of<br />

the image’s creation. We stay with the camera: this is designed to reconstruct the geometric ray<br />

passing through the perspective center of the camera, from there through the image plane <strong>and</strong> out<br />

into the world.<br />

Slide 2.11 illustrates that in a camera’s sensor model we have a perspective center 0, we have an<br />

image plane P , we have image coordinates x <strong>and</strong> h, we have an image of the perspective center<br />

H at the location that is obtained by dropping a line perpendicular from the perspective center<br />

onto the image plane. We find that our image coordinate system x, h, <strong>and</strong> its origin M does not<br />

necessarily have to coincide with location H.<br />

So what is now a sensor model? It is a set of techniques <strong>and</strong> of mathematical equations that allow<br />

us to take an image point P ′ as shown in Slide 2.11 <strong>and</strong> define a geometric ray going from location


2.2. WHAT IS A SENSOR MODEL? 53<br />

Definition 4 Perspective camera<br />

Definition 10 (Modellierung einer perspektiven Kamera(siehe Abschnitt 2.2)):<br />

Ziel: eine Beziehung zwischen dem perspektivischen Zentrum und der Welt aufzustellen; Werkzeug:<br />

perspektivische Trans<strong>for</strong>mation (projeziert 3 D-Punkte auf eine Ebene), ist eine nichtlineare Trans<strong>for</strong>mation.<br />

Beschreibung von Slide 2.12:<br />

Man arbeitet mit 2 Koordinatensystemen: 1.Bild-Koordinatensystem (x,y,z), 2.Welt-<br />

Koordinatensystem (X,Y,Z). Ein Strahl vom Punkt w im 3 D-Objektraum trifft auf die Bildebene<br />

(x,y) im Bildpunkt c. Das Zentrum dieser Bildebene ist der Koordinatenursprung, von dem <strong>aus</strong><br />

normal zu deren Ebene noch eine zusaetzliche z-Achse verlaeuft, die identisch mit der optischen<br />

Achse unserer Kameralinse ist. Dort, wo der Strahl diese z-Achse schneidet, hat man das sogenannte<br />

Linsenzentrum, welches die Koordinaten (0,0,L) besitzt; L ist bei Focuskameras mit der<br />

Focuslaenge zu vergleichen. Bedingung:<br />

Z > L<br />

d.h., alle Punkte, die uns interessieren, liegen hinter der Linse.<br />

Vektor<br />

w 0<br />

gibt die Position der Rotationsachsen im 3 D-Raum an, vom Ursprung des Welt-<br />

Koordinatensystems bis zum Zentrum der Aufhaengung der Kamera<br />

Vektor r definiert, wo der Bildursprung ist unter Beruecksichtigung der Rotationsachsen<br />

(X 0 , Y 0 , Z 0 ),<br />

welche die Kamera auf und ab<br />

rotieren lassen koennen, vom Zentrum der Aufhaengung bis zum Zentrum der Bildebene,<br />

r = (r 1 , r 2 , r 3 ) T<br />

.<br />

Perspektivische Trans<strong>for</strong>mation: Beziehung zwischen (x,y) und (X,Y,Z)<br />

Hilfsmittel: aehnliche Dreiecke<br />

x : L = (−X) : (Z − L) = X : (L − Z)<br />

y : L = (−Y ) : (Z − L) = Y : (L − Z)<br />

’-X’ bzw. ’-Y’ bedeuten, dass die Bildpunkte invertiert auftreten (Geometrie)<br />

x = L · X : (L − Z)<br />

y = L · Y : (L − Z)<br />

Homogene Koordinaten von einem Punkt im kartesischen Koordinatensystem:<br />

w kar = (X, Y, Z) T<br />

w hom = (k · X, k · Y, k · Z, k) T = (w hom1 , w hom2 , w hom3 , w hom4 ) T , k = const.! = 0<br />

Zurueckw<strong>and</strong>lung in kartesische Koordinaten:<br />

Perspektivische Trans<strong>for</strong>mationsmatrix:<br />

w kar = (w hom1 : w hom4 , w hom2 : w hom4 , w hom3 : w hom4 ) T<br />

P =<br />

⎛<br />

⎜<br />

⎝<br />

1 0 0 0<br />

0 1 0 0<br />

0 0 1 0<br />

0 0 −1 : L 1<br />

⎞<br />

⎟<br />


54 CHAPTER 2. SENSING<br />

0 (the perspective center) through P ′ into the world. What the sensor model does not tell us is<br />

where the camera is <strong>and</strong> how this camera is oriented in space. So we do not, from the sensor<br />

model, find the world point P in three dimensional space (x, y, z). We only take a camera <strong>and</strong> an<br />

image with its image point P ′ <strong>and</strong> from that can project back into the world a ray, but where that<br />

ray intersects the object point in the world needs something that goes beyond the sensor model.<br />

We need to know where the camera is in a World system <strong>and</strong> how it is oriented in 3D-space.<br />

In computer vision <strong>and</strong> in computer graphics we do not always deal with cameras that are carried<br />

in aircraft looking vertically down <strong>and</strong> having there<strong>for</strong>e a horizontal image plane. Commonly, we<br />

have cameras that are in a factory environment or similar situation <strong>and</strong> they look horizontally or<br />

obliquely at something that is nearby.<br />

Slide 2.12 illustrates the relationships between a perspective center <strong>and</strong> the world. We have an<br />

image plane which is defined by the image coordinate axes x <strong>and</strong> y (was x <strong>and</strong> h be<strong>for</strong>e) <strong>and</strong> a<br />

ray from the object space denoted as W will hit the image plane at location C. The center of the<br />

image plane is defined by the coordinate origin. Perpendicular onto the image plane (which was<br />

defined by x <strong>and</strong> y) is the Z-axis <strong>and</strong> may in this case be identical to the optical axis of the lens.<br />

In this particular case we would not have a difference between the image point of the perspective<br />

center (was H be<strong>for</strong>e) <strong>and</strong> the origin of the coordinate system (was M be<strong>for</strong>e).<br />

Now, in this robotics case we have two more vectors that define this particular camera. We have a<br />

vector r that defines where the image origin is with respect to our rotation axis that would rotate<br />

the camera. And we have a vector W 0 that gives us the position of that particular rotation axis<br />

in 3D-space. We still need to define <strong>for</strong> that particular camera its rotation axis that will rotate<br />

the camera up <strong>and</strong> down <strong>and</strong> that is oriented in a horizontal plane. We will talk about angles<br />

<strong>and</strong> positions of cameras later in the context of trans<strong>for</strong>mations. Let us there<strong>for</strong>e not pursue this<br />

subject here. All we need to say at this point is that a sensor model relates to the sensor itself<br />

<strong>and</strong> in robotics one might underst<strong>and</strong> the sensor model to include some or all of the exterior<br />

paraphernalia that position <strong>and</strong> orient the camera in 3D-space (the pose). In photogrammetry,<br />

just that later data are part of the so-called exterior orientation of the camera.<br />

Prüfungsfragen:<br />

• Erläutern Sie den Begriff ”<br />

Sensor-Modell“!<br />

2.3 Image Scanning<br />

Images on film need to be stored in a computer. But be<strong>for</strong>e they can be stored they need to be<br />

scanned. On film an image is captured in an emulsion. The emotion contains chemistry <strong>and</strong> as<br />

light falls onto the emulsion the material gets changed under the effect of photons. Those changes<br />

are very volatile. They need to be preserved by developing the film. The emulsion is protected<br />

from the environment by supercoats. The emulsion itself is applied to a film base. So the word<br />

“film” really applies to just the material on which the emulsion is fixed. There is a substrate that<br />

holds the emulsion onto the film base <strong>and</strong> the film base on its back often has a backing layer. That<br />

will be a black <strong>and</strong> white film.<br />

With colored film we have more than one emulsion. We have three of those layers on top of one<br />

another. We are dealing mostly with digital images, so analog film, photolabs <strong>and</strong> chemical film<br />

developments are not of great interest of us. But we need to underst<strong>and</strong> a few basic facts about<br />

film <strong>and</strong> the appearances of objects in film.<br />

Slide 2.15 illustrate that appearance. We have the ordinate of a diagram to record the density<br />

that exists from the reflections of the world onto the emulsion. Those densities are 0 when it is<br />

very white, there is no light <strong>and</strong> the film is totally transparent (negative film!). And as more <strong>and</strong><br />

more light falls onto that film the film will get more exposed <strong>and</strong> the density will get higher until


2.3. IMAGE SCANNING 55<br />

the negative is totally black. Now this negative film is exposed by the light that is emitted from<br />

the object through a lens onto the film. Typically, the relationship between the density recorded<br />

on film <strong>and</strong> light emitted from an object is a logarithmic one. As the logarithm of the emitted<br />

light increases along the abszissa the density will typically increase linearly <strong>and</strong> that is the most<br />

basic relationship between the light falling onto a camera <strong>and</strong> the light recorded on film, except<br />

in the very bright <strong>and</strong> the very dark areas. When there is almost no light falling on the film, the<br />

film will still show what is called a gross fog. So film typically will never be completely unexposed.<br />

There will always appear to be an effect as if a little bit of light had fallen onto the film. We have<br />

a lot of light coming in, we loose the linear relationship again <strong>and</strong> we come to the “shoulder” of<br />

the gradation curve. As additional light comes in, the density of the negative does not increase<br />

any more.<br />

Note that the slope of the linear region is denoted here by tan(α) <strong>and</strong> is called the gamma of the<br />

film. This defines more or less sensitive films <strong>and</strong> the sensitivity has to do with the slope of that<br />

linear region. If a lot of light is needed to change the density, we call this a slow or “low sensitivity<br />

film”. If a small change in light c<strong>aus</strong>es large change in density then we call this a “very sensitive<br />

film” <strong>and</strong> the linear region is shallower. The density range that we can record on film is often<br />

perhaps between 0 to 2. However, in same technical applications or in the graphic arts <strong>and</strong> in the<br />

printing industry, densities may go up to 3.6. And in medicine X-ray film density is going up as<br />

high as 4.0. Again, we will talk more about density later so keep in mind those numbers: Note<br />

that they are dimensionless numbers. We will interpret them later.<br />

We need to convert film to digital images. This is based on one of three basic technologies.<br />

First, so-called drum scanners have the transparent film mounted on the drum, inside the drum<br />

is a fixed light source, the drum rotates, the light source illuminates the film <strong>and</strong> the light that<br />

is coming through the film is collected by a lens <strong>and</strong> put on a photo detector (photo-multiplier 5 ).<br />

The detector sends electric signals which get A/D converted <strong>and</strong> produce at rapid intervals a series<br />

of numbers per one rotation of the drum. We do get a row of pixels per one drum rotation. That<br />

has been very popular but has recently been made obsolete bec<strong>aus</strong>e this device has sensitive <strong>and</strong><br />

rapid mechanic movements. It is difficult to keep these systems calibrated.<br />

Second, a much simpler way of scanning is by using not a single dot but a whole array of dots,<br />

namely a CCD (charge-coupled-device). We put them in a scan-head <strong>and</strong> collect light that is <strong>for</strong><br />

example coming from below the table, shining through the film, gets collected through the lens<br />

<strong>and</strong> gets projected onto a serious of detectors. There may be 6000, 8000, 10.000 or even 14.000<br />

detectors. And these detectors collect the in<strong>for</strong>mation about one single line of film. The detector<br />

charges are being read out, an A/D converter produces <strong>for</strong> each detector element one number.<br />

Again, the entire row of detectors will create in one instant a row of pixels. How do we get a<br />

continuous image? Of course by moving the scan head <strong>and</strong> we can be in the process of collecting<br />

the charges built up row by row into an image (push-broom technology).<br />

Third, we can have a square array detector field. The square CCD is mounted in the scan-head<br />

<strong>and</strong> the scan-head “grabs” a square. How do we get a complete image that is much larger that a<br />

single square?<br />

By stepping the camera, stopping it, staring at the object, collecting 1000 by 1000 pixels, reading<br />

them out, storing them in the computer, moving the scan head, stopping it again, taking the next<br />

one <strong>and</strong> so on. That technology is called step <strong>and</strong> stare. An array CCD is used to cover a large<br />

document by individual tiles but then assemble the tiles into a seam-less image.<br />

We get the push-broom single-path linear CCD array scanner typically in desktop-, household-,<br />

H.P.-, Microtec-, Mostec-, UMAX-type products.<br />

Those create an image in a single swath <strong>and</strong> are limited by the length of the CCD array. If we<br />

want to create a larger image than the length of a CCD array then we need to assemble image<br />

segments.<br />

5 in German: Sekundärelektronenverfielfacher


56 CHAPTER 2. SENSING<br />

So to create a swath by one movement of the scan head, we step the scan head over <strong>and</strong> repeat<br />

this swath in the new location. This is called the multiple path linear CCD scanner. Another<br />

name <strong>for</strong> this is xy-stitching. The scan head moves in x <strong>and</strong> y, individual segments are collected,<br />

then will be “stitched” together.<br />

Prüfungsfragen:<br />

• Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen<br />

(z.B. Fotografien)!<br />

2.4 The Quality of Scanning<br />

People are interested in how accurate scanning is geometrically. The assessment is typically based<br />

on scanning a grid <strong>and</strong> comparing the grid intersections in a digital image with the known grid<br />

intersection coordinates of the film document. A second issue is the geometric resolution. We<br />

check that by imaging a pattern.<br />

Slide 2.22 is called a US Air Force Resolution Target <strong>and</strong> each of the patterns has a very distinct<br />

distance between the black lines <strong>and</strong> intervals between of those black lines. As those black lines<br />

get smaller <strong>and</strong> narrower together we challenge the imaging system more <strong>and</strong> more.<br />

If we take a look at an area that is beyond the resolution of the camera than we will see that we<br />

cannot resolve the individual bars anymore. The limiting case that we can just resolve is used to<br />

describe the resolution capability of the imaging system. That may describe the per<strong>for</strong>mance of a<br />

scanner but it may just as well describe the resolution of a digital camera.<br />

These resolution targets come with tables that describe what each element resolves. For example,<br />

we have groups of six elements each (they are called Group 1, 2, 3, 4, 5, 6) <strong>and</strong> within each group<br />

we find six elements.<br />

In the example shown in Slide 2.24 one sees how the resolution is being designated by line pairs<br />

per millimeter. However, we have a pixels <strong>and</strong> the pixels have a side length. How do we relate<br />

the line pairs per millimeter to pixel diameter? We will discuss this later.<br />

The next subject <strong>for</strong> evaluating a digital image <strong>and</strong> developing a scanner is the gray value per<strong>for</strong>mance.<br />

We have a Kodak gray wedge that has been scanned. On the bright end the density<br />

is 0, on the dark end the density is 3.4. We have now individual steps of 0.1 <strong>and</strong> we can judge<br />

whether those steps get resolved both in the bright as well as in the dark area. On a monitor like<br />

this we can not really see all thirty-four individual steps in intervals of 0.1 D from 0 to 3.4. We<br />

can use Photoshop <strong>and</strong> do a function called histogram equalization, whatever that means, on each<br />

segment of this gray wedge. As a result we see that all the elements have been resolved in this<br />

particular case.<br />

Prüfungsfragen:<br />

• Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren<br />

kann man sie ermitteln?<br />

2.5 Non-Perspective Cameras<br />

Cameras per se have been described as having a lens projecting light onto film <strong>and</strong> then we scan<br />

the film. We might also have instead of film a digital square array CCD in the film plane to get<br />

the direct digital image. In that case we do not go through a film scanner. We can also have a


2.6. HEAT IMAGES OR THERMAL IMAGES 57<br />

camera on a tripod with a linear array, moving the linear array while the light is falling on the<br />

image plane collecting the pixels in a sequential motion much like a scanner would. There also<br />

are stranger cameras yet which do not have a square array in the film plane <strong>and</strong> avoid a regular<br />

perspective lens. These are non-perspective cameras.<br />

First let us look at a linear CCD array in howing a CCD array with 6000 elements that are<br />

arranged side by side, each element having a surface of 12 mm x 12 mm. These are being read out<br />

very rapidly <strong>and</strong> so that a new line can be exposed as the array moves <strong>for</strong>ward. For example, an<br />

interesting arrangement with two lenses is shown in Slide 2.28: the two lenses expose one single<br />

array in the back. Half of the array looks in one direction, half in the other direction. By moving<br />

the whole scan head we now can assemble two digital strip images. Such a project to build this<br />

camera was completed as part of a PhD thesis in Graz. The student built a rig on top of his car,<br />

mounted this camera, he drove through the city, collecting images of building facades as we have<br />

seen earlier (See Chapter 0).<br />

Prüfungsfragen:<br />

• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder<br />

Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?<br />

2.6 Heat Images or Thermal Images<br />

Heat images collect electromagnetic radiation in the middle to the far infrared, not in the near<br />

infrared. So it is not next to visible light in the electromagnetic spectrum. That type of sensing can<br />

be accomplished by a mirror that would illuminate (look at) essentially one small instantaneous<br />

field-of-view (IFOR), in the <strong>for</strong>m of a circular area on the ground, collect the light from there,<br />

project it onto a detector <strong>and</strong> make sure that in a rapid sequence one can collect a series of those<br />

circular areas on the ground.<br />

What we have here is an instantaneous angle-of-view α. We have the center of a cone that relates<br />

the sensor to the ground, <strong>and</strong> the axis of the cone is at an angle of the vertical called “A”. In the<br />

old days, say in the sixties <strong>and</strong> seventies, often-times the recording was not digital but on film.<br />

Slide 2.35 illustrates the old-fashioned approach. We have infrared-light coming from the ground.<br />

It is reflected off a mirror, goes through an optical system that focuses that light on the IRdetector,<br />

it converts the incoming photons into an electric signal which is then used to modulate<br />

the intensity of light which is then projected via another lens <strong>and</strong> a mirror onto a piece of curved<br />

film.<br />

Slide 2.36 was collected in 1971 or 1972 in Holl<strong>and</strong>. These thermal images were taken from an<br />

airplane over regularly patterned Dutch l<strong>and</strong>scapes. What we see here is the geometrical distortion<br />

of fields, as a result of the airplane wobbling in the air as the individual image lines are collected<br />

in each row. Each image line is accrued to its previous one by a sequential motion of the airplane.<br />

A closer look shows that there are areas that are bright <strong>and</strong> others that are dark. If it is a positive<br />

then the bright things are warm, the dark things are cold.<br />

Prüfungsfragen:<br />

• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder<br />

Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?


58 CHAPTER 2. SENSING<br />

2.7 Multispectral Images<br />

We already saw the concept of multi-spectral images. In principle they get, or in the past have<br />

been, collected by a rotating mirror that reflects the light from the ground off a mirror onto a<br />

refraction prism. The refraction prism splits the white light coming from the ground into its<br />

color-components. We have <strong>for</strong> each color a detector. This could be three <strong>for</strong> red/green/blue or<br />

226 <strong>for</strong> hyper-spectral-systems. Detectors convert the incoming light into an electric signal <strong>and</strong><br />

they get either A/D converted or directly recorded. In the old days recording was onto a magnetic<br />

tape unit, today we record everything on a digital disc with a so-called direct capture system DCS.<br />

When one does these measurements with sensors one really is into a lot of open air physics. One<br />

needs to underst<strong>and</strong> what is light, electromagnetic radiation. When energy comes from the sun a<br />

lot of it is in the visible area, somewhat less in the ultraviolet, some what less in the infrared.<br />

The sun’s energy is augmented by energy that the Earth itself radiates off as an active body.<br />

However, its energy is in the longer wavelengths. The visible light goes, of course, from blue via<br />

green to red. The infrared goes from the near infrared to the middle <strong>and</strong> far infrared. As our<br />

wavelengths get longer we go away from infrared <strong>and</strong> we go into the short waves, microwaves, long<br />

microwaves <strong>and</strong> radiowaves.<br />

When we observe in a sensor the radiation that comes in from the surface of the Earth we don’t get<br />

an even distribution of the energy as the sun has sent it to the Earth but we get the reflection of<br />

the surface <strong>and</strong> those reflections are depending on what’s on the ground, but also depends on what<br />

the atmosphere does to the radiation. A lot of that radiation gets blocked by the atmosphere, in<br />

particular from the infrared on. There are a few windows at 10 micrometers, <strong>and</strong> at 14 micrometers<br />

wavelength, where the energy gets blocked less <strong>and</strong> we can obtain infrared radiation. In the visible<br />

<strong>and</strong> near infrared the atmosphere lets this radiation through unless, of course, the atmosphere<br />

contains a lot of water in <strong>for</strong>m of clouds, rain or snow: that will block the visible light just as well<br />

as it blocks a lot of the longer wavelength. The blocking of the light in the atmosphere is also a<br />

measure of the quality of the atmosphere.<br />

In imaging the Earth’s surface, the atmosphere is a “nuisance”. It reduces the ability to observe<br />

the Earth’s surface. However, the extent to which we have blockage by the atmosphere tells us<br />

something about pollution, moisture etc. So something that can be a nuisance to one application<br />

can also be useful in another.<br />

We are really talking here about the ideas that are at a base of a field called remote sensing. A<br />

typical image of the Earth’s surface shown in Slide 2.42.<br />

In a color photograph has no problem from the atmosphere, we have the energy from the sun<br />

illuminating the ground, we have the red/green/blue colors of a film image, it can be scanned <strong>and</strong><br />

put into the computer, <strong>and</strong> the computer can use the colors to assess what is on the ground.<br />

Prüfungsfragen:<br />

• Skizzieren Sie das Funktionsprinzip eines ”<br />

multispektralen Abtastsystemes“ (Multispectral<br />

Scanner). Sie sind eingeladen, in der Beantwortung eine grafische Skizze zu verwenden.<br />

2.8 Sensors to Image the Inside of Humans<br />

Sensors cover a very wide field <strong>and</strong> imaging is a subset of sensing (think also of acoustics, temperature,<br />

salinity <strong>and</strong> things like that). Very well known are so called CAT scans (computer aided<br />

tomography). That was invented in 1973 <strong>and</strong> in 1975 the inventors received the Nobel prize, two<br />

scientists from Engl<strong>and</strong> (Houndsfield&Cormack). It was the fastest recognition of a breakthrough<br />

ever. It revolutionized medicine bec<strong>aus</strong>e it allowed medical people to look at the inside


2.9. PANORAMIC IMAGING 59<br />

of humans at a resolution <strong>and</strong> accuracy that was previously unavailable without having to open<br />

up that human.<br />

Slide 2.44 illustrates the idea of the CAT scan that represents the transmissivity of little cubes of<br />

tissue inside the human. While a pixel is represented in two dimensions, here each gray value<br />

represents how much radiation was transmitted through a volume element. So there<strong>for</strong>e those gray<br />

values do not associate well with a 2D pixel but with a 3D voxel or volume element. A typical<br />

CAT image that may appear in 2D really reflects in x <strong>and</strong> y a 1 mm × 1 mm base, but in z it<br />

may reflect a 4 mm depth.<br />

Prüfungsfragen:<br />

• Erklären Sie, wie man mit Hilfe der <strong>Computer</strong>tomografie ein dreidimensionales Volumenmodell<br />

vom Inneren des menschlichen Körpers gewinnt.<br />

2.9 Panoramic Imaging<br />

We talked in Chapter 0 about the increasingly popular panoramic images.<br />

They used to be produced by spy satellites, spy airplanes, spacecraft of other planets or of the<br />

Earth. The reason why we are interestest in these images is that we would like to have a high<br />

geometric resolution <strong>and</strong> a very wide swath, thus a wide field of view at high resolution. Those two<br />

things are in conflict. A wide angle lens gives an overview image or one has to have a tele-lens to<br />

give a very detailed image, but only of a small element of the object. How can we have both a very<br />

high resolution of a tele-lens <strong>and</strong> still have a coverage from a wide angle lens? That is obtained<br />

by moving the tele lens, by sweeping it to produce a panoramic image (compare the material from<br />

Chapter 0).<br />

Prüfungsfragen:<br />

• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder<br />

Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?<br />

2.10 Making Images Independent of Sunlight <strong>and</strong> in Any<br />

Weather: Radar Images<br />

Slide 2.49 is an image taken from a European Space Agency (ESA) satellite called ERS-1, of an<br />

area in Tirol’s Ötztal. There exists a second image so that the two together permit us to see a<br />

three dimensional model in stereo. We will talk about this topic of stereo later. How is a radar<br />

image being produced? Let’s assume we deal with an aircraft sensor.<br />

Bec<strong>aus</strong>e we are making images with radiation that is way beyond the infrared, namely in the<br />

microwaves (we have one millimeter to two meter wavelengths, but typically 3 to 4 to 5 cm<br />

wavelengths). We can not use glass lenses to focus that radiation. We need to use something else,<br />

namely antennas. So a wave gets generated, it’s traveling through a waveguide to an antenna.<br />

The antenna transmits the small burst of energy, a pulse. That travels through the atmosphere to<br />

the ground. It illuminates the area on the ground with a footprint that is a function of the shape<br />

of the antenna. The ground reflects it back, the antenna goes into the listening mode <strong>and</strong> “hears”<br />

is the echo. The echo is coming from the nearby objects first, from the far away objects latest.<br />

This gets amplified, gets A/D converted, gets sampled <strong>and</strong> produces a row of pixels, in this case<br />

radar image pixels. The aircraft moves <strong>for</strong>ward, the same repeats itself 3000 times at second. One


60 CHAPTER 2. SENSING<br />

obtains a continuous image of the ground. Since we illuminate the ground by means of the sensor,<br />

we can image day-<strong>and</strong>-night. Since we use microwaves, we can image through clouds, snow <strong>and</strong><br />

rain (all weather).<br />

Prüfungsfragen:<br />

• Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile<br />

bietet dieses Verfahren?<br />

• Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten <strong>aus</strong> digitale Bilder<br />

erzeugen, <strong>aus</strong> welchen ein topografisches Modell des Geländes (ein Höhenmodell) <strong>aus</strong> einer<br />

einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der<br />

elektromagnetischen Strahlung, die für diese Zwecke genutzt werden!<br />

2.11 Making Images with Sound<br />

There is a very common technique to map the floor of the oceans. There exists really only one<br />

technique right now that is widely applicable. Under-Water SONAR. SONAR means sound,<br />

navigation <strong>and</strong> range. It is a total analogy to radar except that we don’t use antennas <strong>and</strong><br />

electromagnetic energy but we use membranes that vibrate instead of sound impulses <strong>and</strong> we<br />

need water <strong>for</strong> sound to travel. The sound pulse travels through the water, hits the ground, gets<br />

reflected, the membrane goes into a listening mode <strong>for</strong> the echos. These get processed <strong>and</strong> create<br />

one line of pixels. As the ship moves <strong>for</strong>ward, line by line gets accrued into a continuous image.<br />

The medical ultrasound technology is similar to under-water imaging, but there are various different<br />

approaches. Some methods of sound imaging employ the Doppler-effect. We will not discuss<br />

medical ultrasound in this class, but defer to later classes in the “image processing track”.<br />

Prüfungsfragen:<br />

• Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung!<br />

2.12 Passive Radiometry<br />

We mentioned earlier that the Earth is active, is transmitting radio-waves without being illuminated<br />

by the sun. This can be measured by passive radiometry. We have an antenna, not a lens.<br />

It “listens” to the ground. The antenna receives energy which comes from a small circular area on<br />

the ground. That radiation is collected by the antenna, is processed <strong>and</strong> creates an image point.<br />

By moving the antenna we can move that point on the ground <strong>and</strong> thereby have a scanning motion<br />

producing an image scan that gets converted into a row of pixels. By moving the aircraft <strong>for</strong>ward<br />

we accumulate rows of pixels <strong>for</strong> a continuous image. Passive radiometry is the basis of weather<br />

observations from space where large areas are being observed, <strong>for</strong> example the arctic regions.<br />

Prüfungsfragen:<br />

• Was versteht man unter ”<br />

passiver Radiometrie“?


2.13. MICROSCOPES AND ENDOSCOPES IMAGING 61<br />

2.13 Microscopes <strong>and</strong> Endoscopes Imaging<br />

The most popular microscopes <strong>for</strong> digital imaging are so called scanning electron-microscopes<br />

(SEM) or X-ray-microscopes. Endoscopes are optical devices using light to look “inside things”.<br />

Most users are in medicine to look into humans. There is a lens-system <strong>and</strong> light to illuminate the<br />

inside of the human. The lens collects the light, brings it back out, goes in the computer <strong>and</strong> on<br />

the monitor the medical staff can see the inside of the human, the inside of the heart, the inside<br />

of arteries <strong>and</strong> so <strong>for</strong>th. The endoscopes are often times taking on the shape of thick “needles”<br />

that be inserted into a human.<br />

The same approach is used in mechanical engineering to inspect the inside of engines, <strong>for</strong> example<br />

to find out what happens while an explosion takes place inside a cylinder chamber in an engine.<br />

Prüfungsfragen:<br />

• Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung<br />

digitaler Rasterbilder verwendet werden!<br />

2.14 Objects-Scanners<br />

The task is to model a 3D object, a head, a face, an engine, a chair. We would like to have a<br />

representation of that object in the computer. This could already be a result of a complete image<br />

processing system, of which the sensor is only a component, as is suggested in Slide 2.58. The<br />

sensor produces a 3D model from images of the entire object. This could be done in various ways.<br />

One way is to do it by a linear array camera that is being moved over the object <strong>and</strong> obtains a<br />

strip-image. This is set up properly in the scanner, to produce a surface patch. Multiple patches<br />

must be assembled. This is done automatically by making various sweeps of the camera over the<br />

object as it gets rotated.<br />

We can also have a person sit down on a rotating chair <strong>and</strong> a device will optically (by means of<br />

an infrared laser) scan the head <strong>and</strong> produce a 3D replica of the head. Or the object is fixed <strong>and</strong><br />

the IR-laser is rotating.<br />

The next technique would be to scan an object by projecting a light pattern on to the surface.<br />

That is called structured light 6 . Finally we can scan an object by having something touch it with<br />

a touch-sensitive pointer <strong>and</strong> the pointer is under a <strong>for</strong>ce that keeps the tip of the pointer on<br />

the object as it moves; another approach is to have a pointer move along the surface <strong>and</strong> track<br />

the pointer by one of may Tracking Technologies (optical, magnetic, sound, see also Augmented<br />

Reality later on).<br />

Prüfungsfragen:<br />

• Welchem Zweck dient ein sogenannter ”<br />

Objektscanner“? Nennen Sie drei verschiedene Verfahren,<br />

nach denen ein Objektscanner berührungslos arbeiten kann!<br />

2.15 Photometry<br />

We are now already at the borderline between sensors <strong>and</strong> image processing/image analysis. In<br />

photometry we do not only talk about sensors. However, photometry is a particular type of sensor<br />

6 in German: Lichtschnitte


62 CHAPTER 2. SENSING<br />

arrangement. We image a 3D object with one camera taking multiple images like in a time series,<br />

but each image is taken with a different illumination. So we may have four or ten lamps at different<br />

positions. We take one image with lamp 1, a second image with lamp 2, a third image with lamp<br />

4 etc. We collect these multiple images thereby producing a multi illumination image dataset.<br />

The shape reconstruction is based on a model of the surface reflection properties. Reviewing those<br />

properties, the radiometry of the image produces the object shape.<br />

2.16 Data Garments<br />

Developments attributed to computer graphics concern so-called data-garments. We need to<br />

sense not only properties of the objects of interest, but also need to sense where an observer is<br />

bec<strong>aus</strong>e we may want to present him or her with a view of an object in the computer from specific<br />

places <strong>and</strong> directions. The computer must know in these cases where we are. This is achieved<br />

with data-gloves <strong>and</strong> head-mounted displays (HMD). For tracking the display’s pose, we may have<br />

magnetic tracking devices to track where our head is, in which direction we are looking. There<br />

is also optical tracking which is more accurate <strong>and</strong> less sensitive to electric noise, there may be<br />

acoustic tracking of the position <strong>and</strong> attitude of the head using ultrasound.<br />

Prüfungsfragen:<br />

• Was versteht man unter data garmets“ (Datenkleidung)?<br />

”<br />

Geräte dieser Kategorie!<br />

Nennen Sie mindestens zwei<br />

2.17 Sensors <strong>for</strong> Augmented Reality<br />

In order to underst<strong>and</strong> what the sensor needs <strong>for</strong> augmented reality, we need first to underst<strong>and</strong><br />

what augmented reality is. Let us take a simple view. Augmented reality is a simultaneous visual<br />

perception by a human being of the real environment, of course by looking at it, <strong>and</strong> superimposing<br />

onto that real environment virtual objects <strong>and</strong> visual data that are not physically present in the<br />

real environment.<br />

How do we do this? We provide the human with transparent glasses which double as computer<br />

monitors. So we use one monitor <strong>for</strong> the left eye, another monitor <strong>for</strong> the right eye. The monitors<br />

show a computer generated image, but they are transparent (or better semitransparent). We not<br />

only see what is on the monitor, we also see the real world. The technology is called head mounted<br />

displays or HMDs. Now, <strong>for</strong> an HMD to make any sense, the computer needs to know where the<br />

eyes are <strong>and</strong> in what direction they are looking. There<strong>for</strong>e we need to combine this HMD with a<br />

way of detecting the exterior orientation or pose.<br />

That is usually accomplished by means of magnetic positioning. Magnetic positioning, however,<br />

is fairly inaccurate <strong>and</strong> heavily affected by magnetic fields that might exist in a facility with<br />

computers. There<strong>for</strong>e we tend to augment magnetic positioning by optical positioning as suggested<br />

in Slide 2.63. A camera is looking at the world, mounted rigidly with the HMDs. Grabbing an<br />

image, one derives from the image where the camera is <strong>and</strong> in which direction it is pointed <strong>and</strong><br />

one also detects where the eyes are <strong>and</strong> in which direction they are looking. Now we have the basis<br />

<strong>for</strong> the computer to feed into the glasses the proper object in the proper position <strong>and</strong> attitude<br />

so that the objects are where they should be. As we review augmented reality, we immediately<br />

can see an option of viewing the real world via the cameras <strong>and</strong> feeding the eyes not with the<br />

direct view of reality, but indirectly with the camera’s views. This reduces the calibration ef<strong>for</strong>t<br />

in optical tracking.<br />

Prüfungsfragen:


2.18. OUTLOOK 63<br />

• Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter<br />

Trackingverfahren und erläutern Sie deren Vor- und Nachteile!<br />

Antwort:<br />

Tracking Vorteile Nachteile<br />

magnetisch robust kurze Reichweite<br />

schnell<br />

ungenau<br />

optisch genau An<strong>for</strong>derung an Umgebung<br />

aufwändig<br />

2.18 Outlook<br />

The topic of imaging sensors is wide. Naturally we have to skip a number of items. However, some<br />

of these topics will be visited in other classes <strong>for</strong> those interested in image processing or computer<br />

graphics. They also appear in other courses of our school. Two examples might illustrate this<br />

matter. The first is Interferometry, a sensing technology combined with a processing technology<br />

that allows one to make very accurate reconstructions of 3D shapes by making two images <strong>and</strong><br />

measuring the phase of the radiation that gave rise to each pixel. We will deal with this off <strong>and</strong><br />

on throughout “image processing”.<br />

Second, there is the large area of medical imaging, with a dedicated course. This is a rapidly<br />

growing area where today there are ultrafast CAT scanners producing thous<strong>and</strong>s of images of a<br />

patient in a very short time. It becomes a real challenge <strong>for</strong> the doctor to take advantage of<br />

these images <strong>and</strong> reconstruct what the objects are of which those images are taken. This very<br />

clearly needs a sophisticated level of image processing <strong>and</strong> computer graphics to help human<br />

analysts with an underst<strong>and</strong>ing what’s in the images <strong>and</strong> to reconstruct the relevant objects<br />

in 3D. A clear separation of the field into Image Processing/<strong>Computer</strong> Vision <strong>and</strong> <strong>Computer</strong><br />

<strong>Graphics</strong>/Visualization is not really useful <strong>and</strong> feasible.


64 CHAPTER 2. SENSING


2.18. OUTLOOK 65<br />

Slide 2.1 Slide 2.2 Slide 2.3 Slide 2.4<br />

Slide 2.5 Slide 2.6 Slide 2.7 Slide 2.8<br />

Slide 2.9 Slide 2.10 Slide 2.11 Slide 2.12<br />

Slide 2.13 Slide 2.14 Slide 2.15 Slide 2.16<br />

Slide 2.17 Slide 2.18 Slide 2.19 Slide 2.20<br />

Slide 2.21 Slide 2.22 Slide 2.23 Slide 2.24<br />

Slide 2.25 Slide 2.26 Slide 2.27 Slide 2.28


66 CHAPTER 2. SENSING<br />

Slide 2.29 Slide 2.30 Slide 2.31 Slide 2.32<br />

Slide 2.33 Slide 2.34 Slide 2.35 Slide 2.36<br />

Slide 2.37 Slide 2.38 Slide 2.39 Slide 2.40<br />

Slide 2.41 Slide 2.42 Slide 2.43 Slide 2.44<br />

Slide 2.45 Slide 2.46 Slide 2.47 Slide 2.48<br />

Slide 2.49 Slide 2.50 Slide 2.51 Slide 2.52<br />

Slide 2.53 Slide 2.54 Slide 2.55 Slide 2.56


2.18. OUTLOOK 67<br />

Slide 2.57 Slide 2.58 Slide 2.59 Slide 2.60<br />

Slide 2.61 Slide 2.62 Slide 2.63 Slide 2.64


68 CHAPTER 2. SENSING


Chapter 3<br />

Raster-Vector-Raster Convergence<br />

Algorithm 7 Digital differential analyzer<br />

1: dy = y 2 − y 1<br />

2: dx = x 2 − x 1<br />

3: m = dy/dx<br />

4: y = y 1<br />

5: <strong>for</strong> x = x 1 to x 2 do<br />

6: draw (x, round(y))<br />

7: y = y + m {Step y by slope m}<br />

8: end <strong>for</strong><br />

3.1 Drawing a straight line<br />

We introduce the well-known Bresenham Algorithm from 1965. The task is to draw a straight<br />

line on a computer monitor <strong>and</strong> to replace a vector representation of a straight line that goes from<br />

the beginning point to an end point by a raster representation in the <strong>for</strong>m of pixels. Obviously, as<br />

we zoom in on a straight line, that is shown on a computer monitor we do notice that we are really<br />

looking at an irregular edge of an area that is representing the straight line. The closer we look,<br />

the more we see that the edge of that straight line is not straight at all. Conceptually, we need to<br />

find those pixels in a raster representation that will represent the straight line, as shown in Slide<br />

3.4. The simplest method of assigning pixels to the straight line is the so-called DDA Algorithm<br />

(Digital Differential Analyzer). Conceptually we intersect the straight line with the columns that<br />

pass through the center of the patterns of pixels. The intersection coordinates are (x i , y i ) <strong>and</strong><br />

at the next column of pixels they are (x i + 1, y i + n). The DDA algorithm (see Algorithm 7)<br />

uses rounding operations to find the nearest pixel simple by rounding the y-coordinates. Slide ??<br />

illustrates graphically the operations of the DDA algorithm, Slide 3.6 is a conventional procedure<br />

doing what was just described graphically. Obviously the straight line’s beginning point is defined<br />

by (x 0 , y 0 ), the end point is defined by (x l , y l ) <strong>for</strong> simplicity’s sake we say that x is an integer<br />

value, then we define auxiliary values, dx, dy, y <strong>and</strong> m as real numbers <strong>and</strong> we go then through a<br />

loop column by column of pixels doing rounding operations to find those pixels that will represent<br />

the straight line.<br />

The DDA Algorithm is slow bec<strong>aus</strong>e it uses rounding operations. In 1965 Bresenham proposed his<br />

algorithm that was exceedingly fast <strong>and</strong> outper<strong>for</strong>med the DDA Algorithm by far <strong>and</strong> Pittoray<br />

in 1967 proposed the Midpoint-Line-Algorithm (see Algorithm ??). These algorithms avoid the<br />

rounding operations <strong>and</strong> simply operate with decision variables only. For a long time, the vector<br />

69


70 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE<br />

to raster conversion implemented by Bresenham <strong>and</strong> Pittoray was only applicable to straight<br />

lines. It was as late 1985 that this ideology of very fast conversion of vector to raster was extended<br />

to involve circles <strong>and</strong> ellipses. In this class we will not go beyond straight lines.<br />

The next six illustrations address the Bresenham Algorithm. We begin by defining the sequence<br />

of pixels that are being visited by the algorithm as East, <strong>and</strong> North-East of the previous pixel <strong>and</strong><br />

we find an auxiliary position m which is halfway between the North-East <strong>and</strong> the East pixel. The<br />

actual intersection of the straight line with a line through the column of pixels to be visited is<br />

denoted by Q. Essentially Bresenham now says: ”Given that we know the previous pixel we must<br />

make a decision whether we should assign to the straight line the pixel NE or the pixel E. You<br />

can of course immediately see that the approach here is applicable to straight lines that progress<br />

between the angles of 0 <strong>and</strong> 45 degrees. However, <strong>for</strong> directions between 45 <strong>and</strong> 90 degrees <strong>and</strong> so<br />

<strong>for</strong>th the same ideas apply <strong>and</strong> with minimum modifications. Slide 3.10 <strong>and</strong> Slide 3.11 actually<br />

describe the procedure used <strong>for</strong> the Midpoint Line Algorithm with the beginning point (x 0 , y 0 )<br />

<strong>and</strong> end point (x 1 , y 1 ) <strong>and</strong> will come back from the procedure with the set of raster pixels that<br />

describe the straight line. Again we have to have a dx <strong>and</strong> a dy with increments E <strong>and</strong> increments<br />

NE, we have an auxiliary variable b <strong>and</strong> we have variables x <strong>and</strong> y. Now the algorithm itself<br />

is self-explanatory, we really do not need much text to explain it. The reader is invited to work<br />

through the algorithm.<br />

The next two Slide 3.12 <strong>and</strong> Slide 3.13 explain the basic idea behind the midpoint line algorithm.<br />

Note that we have introduced an auxiliary point M into the approach <strong>and</strong> the coordinates of that<br />

point are (x p + 1, y p + 1/2). The equation of a straight line clearly is<br />

ax m + by m + c = 0,<br />

a point that is not on the straight line will produce with the equation a value of either more or<br />

less than zero. Values larger than zero would be above the straight line, values less than zero with<br />

a negative signal below the straight line.<br />

Now we can write the equation of a straight line also as<br />

y = dy<br />

dx x + b.<br />

This can be rearranged as shown in Slide 3.13, we can ultimately write down that a variable d<br />

that can be larger than zero, equal to zero or less than zero equals<br />

d = dy(x p + 1) − dx(y p + 1 2 ) + c.<br />

If d is larger than zero, then the pixel of interest is NE, otherwise the pixel of interest is E. If E<br />

is selected as the next pixel, then we have to compute a new value <strong>for</strong> d, a d new , by putting into<br />

the equation of a straight line the coordinate of a new midpoint M which then we would have to<br />

call (x p + 2, y p + 0.5), which, if we look at is, is really nothing else but the old value of d + dy. But<br />

if we select NE as the next pixel, then our midpoint has the coordinates (x p + 2, y p + 1.5), which<br />

is nothing else but the old value of d + dy − dx. Once we realize that, we see that the equation of<br />

a straight line comes out <strong>for</strong> a value of the midpoint M as a + b/2 <strong>and</strong> if we do that <strong>and</strong> we don’t<br />

want to divide anything by two, we simply multiply everything by a factor of 2 <strong>and</strong> we end up by<br />

saying<br />

2d = 2dy − dx.<br />

So Bresenham’s trick was to avoid multiplications <strong>and</strong> divisions, <strong>and</strong> simply make decision<br />

whether things are larger or smaller than zero <strong>and</strong> by finding a value that is larger than zero<br />

add to that value one number if it is less than zero, add another number <strong>and</strong> work one’s way along<br />

the straight line from pixel to pixel. So this was a pretty creative algorithm to be fast.<br />

There is a problem. The line that is horizontal has a sequence of pixels that are basically a pixel


3.2. FILLING OF POLYGONS 71<br />

diameter of ?. See in Slide 3.15 that line a would be a dark line. However if we incline that line by<br />

45 degrees, then the pixels we find to be assigned to that line have a distance that is the diameter<br />

of a pixel times the square root of 2. There<strong>for</strong>e we have across the entire length of straight line<br />

fewer pixels, the same line would be less dark. We will address this <strong>and</strong> related subjects later in<br />

section 3.3.<br />

Prüfungsfragen:<br />

• Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus<br />

gegenüber dem DDA-Algorithmus.<br />

• Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt<br />

werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden.<br />

Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln<br />

führen.<br />

• Das Quadrat Q in normalisierten Bildschirmkoordinaten <strong>aus</strong> Beispiel B.2 wird in ein Rechteck<br />

R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten trans<strong>for</strong>miert. Zeichnen Sie die<br />

Verbindung der zwei Punkte p ′ 1 und p ′ 2 in Abbildung B.20 ein und bestimmen Sie grafisch<br />

jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu<br />

approximieren!<br />

3.2 Filling of Polygons<br />

Another issue when converting from the vector world to the raster world is dealing with areas that<br />

have boundaries in the <strong>for</strong>m of polygons. Such polygons could be convex, concave, they could<br />

intersect themselves, they could have isl<strong>and</strong>s. It is very quickly a non-trivial problem to take a<br />

polygon from the vector world, create from it a raster representation <strong>and</strong> fill the area inside the<br />

polygons. Slide 3.17 illustrates the issue. Instead of finding pixels along the polygon we simply<br />

have the task of finding pixels that are inside the polygon represented by a sequence of vectors.<br />

We define a scan line as a row of pixels going from left to right . The illustrations in Slide 3.17<br />

illustrate that the first pixel would be assigned when along the scan line we intersect the first<br />

vector, <strong>and</strong> every time we find along the scan line an intersection with a vector from the polygon,<br />

we change from assigning pixels to not-assigning pixels <strong>and</strong> vice-versa.<br />

A second approach shown in Slide 3.18 is the idea of using the Bresenham algorithm to rastorize<br />

all the vectors defining the polygon <strong>and</strong> then, after that, go along the scan lines <strong>and</strong> take the pairs<br />

of pixels from the Bresenham algorithm <strong>and</strong> fill intermediate spaces with additional pixels. As<br />

we can see in this example, that approach may produce pixels that have a center outside of the<br />

actual polygon. There is yet another algorithm that we could use that takes the polygonal points<br />

at the inside of the polygon. That is different from the previous application of the Bresenham<br />

algorithm.<br />

Slide 3.21 illustrates <strong>for</strong> the first time a concept, which we will address in a moment, <strong>and</strong> that is<br />

if we have a very narrow polygon, a triangle, we might get a very irregular pattern of pixels, <strong>and</strong><br />

when we look at this kind of pattern, we notice that we have a severe case of aliasing. Aliasing is<br />

a topic of interest in computer graphics.<br />

Prüfungsfragen:<br />

• Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon <strong>aus</strong>gefüllt<br />

(also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme<br />

treten auf, wenn das Polygon sehr ”<br />

spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)?


72 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE<br />

3.3 Thick lines<br />

A separate subject is the various ways one can use to plot thick lines, not simply applying a<br />

Bresenham algorithm to a mathematically infinitely line, but to say a fat line. One way of doing<br />

that is to apply a Bresenham algorithm <strong>and</strong> then replicate the pixels along the columns <strong>and</strong><br />

saying that when found a pixel according to Bresenham I now make five pixels out of that. We do<br />

that, then the thickness of the line becomes a function of the slope of the straight line. A second<br />

way of plotting a thick line is by taking the Bresenham pixels <strong>and</strong> think of applying at each<br />

location of the pixel a rectangular pen. That is, as in the example of Slide 3.23 a pensize of 5 × 5,<br />

each 25 pixels (see Algorithm 8.<br />

Algorithm 8 Thick lines using a rectangular pen<br />

1: procedure drawThickLine2(x1,y1,x2,y2,thickness,color);<br />

2: var x,i:integer;<br />

3: p1x,p1y,p2x,p2y:integer;<br />

4: dx,dy,y,m:real;<br />

5: Begin<br />

6: dy:=y2-y1;<br />

7: dx:=x2-x1;<br />

8: m:=dy/dx;<br />

9: y:=y1;<br />

10: <strong>for</strong> x:=x1 to x2 do<br />

11: p1x:=x-(thickness div 2); {upper left point}<br />

12: p1y:=Round(y)+(thickness div 2);<br />

13: p2x:=x+(thickness div 2); {lower right point}<br />

14: p2y:=Round(y)-(thickness div 2);<br />

15: drawFilledRectangle(p1x,p1y,p2x,p2y,color); {rectangle with p1 <strong>and</strong> p2}<br />

16: y:=y+m;<br />

17: end <strong>for</strong>;<br />

18: end; {drawThickLine2}<br />

{Note: drawFilledRectangle draws a rectangle given by the upper left<br />

<strong>and</strong> the lower right point. If you want to use a circular pen simply replace the rectangle with<br />

drawFilledCircle(x,y,(thickness div 2),color). Syntax: drawFilledCircle(mx,my,radius,color)}<br />

The difficulty of fat lines becomes evident if we have circles. Let us assume in Slide 3.25 that we<br />

have pixel replication as the method, we use Bresenham to assign pixels to the circle <strong>and</strong> then<br />

we add one pixel at top <strong>and</strong> one pixel below at each pixel. What we can very quickly see is that<br />

the thickness of the line describing the circle is good at zero <strong>and</strong> ninety degrees, but is narrower<br />

at 45 degrees where the same thickness, which was t at 0 <strong>and</strong> 90 degrees reduces to t divided by<br />

square root of two. This problem goes away if we think of using a moving pen with 3 × 3 pixels.<br />

In that case the variation in pixels goes away. Yet another approach will be, that if we apply<br />

a vector-to-raster-conversion algorithm, to two contours by changing the radius of the circle <strong>and</strong><br />

then we fill the area described by the two contours with pixels again, we see that we do avoid the<br />

change in thickness of the lines.<br />

Prüfungsfragen:<br />

• Nennen Sie verschiedene Techniken, um ”<br />

dicke“ Linien (z.B. Geradenstücke oder Kreisbögen)<br />

zu zeichnen.


3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 73<br />

Definition 5 Skeleton<br />

The skeleton of a region R contains all points p which have more than one nearest neighbour on<br />

the border-line of R. The points p are the centers of these discs which intersect the border-line b<br />

in two or more points.<br />

The detection of skeletons is useful <strong>for</strong> shape recognition <strong>and</strong> runs in O(n 2 ) <strong>for</strong> concave polygons<br />

<strong>and</strong> O(n log n) <strong>for</strong> convex polygons.<br />

3.4 The Transition from Thick Lines to Skeletons<br />

The best known algorithm to make a transition from a thick line or an area to a representation<br />

by the area’s skeleton is by Bloom from the year 1967. We define a region R, <strong>and</strong> its borderline<br />

B. The basic idea of the medial axis trans<strong>for</strong>m (see Definition 3.4) is to take a region as shown in<br />

Slide 3.30 <strong>and</strong> replace this region by those pixels (string of individual pixels) which have more than<br />

one single nearest neighbor along the boundary of the area. When we look at the area at example<br />

(a) in the slide, we can very quickly recognize that every point along the dashed lines has two<br />

nearest points along the border, either on the left <strong>and</strong> right border or on the left <strong>and</strong> top border<br />

etc. When we create a pattern like that <strong>and</strong> we have a disturbance as we see in image (b) of that<br />

slide, we see that we get immediately a stop from the center line leading towards the disturbances.<br />

Example (c) shows how this basic idea of finding pixels who have two nearest neighbors along the<br />

borderline will create a pattern when the area itself is not rectangular, but has an L-shape. Slide<br />

3.31 summarizes in words, how we go from a region to a boundary line b, <strong>and</strong> from the boundary<br />

line we go to pixels p which have more than a single nearest neighbor on the boundary line b. As<br />

a result the pixels p <strong>for</strong>m the so-called medial axis of region R.<br />

This basic matter of finding the medial axis is expensive, bec<strong>aus</strong>e the distances need to be computed<br />

among all the pixels within the region R <strong>and</strong> all the pixels on the boundary line B. A lot of sorting<br />

would go on. For this reason, Bloom considered a different approach. He said, the transition<br />

from the region to the skeleton, or the medial axis, is better achieved by means of rethinning<br />

algorithm. There<strong>for</strong>e we go from the edge of a region <strong>and</strong> we delete contour pixels. What is a<br />

pixel on the contour? A pixel on the contour is part of the region R <strong>and</strong> has a value of 1 in a<br />

binary representation, <strong>and</strong> it has at least one zero among its eight neighbors, which is there<strong>for</strong>e a<br />

pixel that does not belong to region R. Slide 3.32 explains the basic idea of a thinning algorithm.<br />

We have a pixel p 1 <strong>and</strong> its eight neighbors p 2 through p 9 . We can now associate with a pixel p 1<br />

a number of non-zero neighbors by simply adding up the gray values of the eight neighborhoods.<br />

We compute a second auxiliary number S of p 1 , which is the number of transitions from zero to<br />

one in the ordered set of values of pixels p 2 to p 8 . The decision whether a pixel p 1 gets deleted or<br />

not depends on the outcome of four computations. We compute ( also shown in Slide 3.34 ).<br />

Pixel p is deleted if:<br />

Pixel p is also deleted if:<br />

2


74 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE<br />

we can see that after the initial iteration through all pixels, which pixels have been deleted. After<br />

five iterations the result is obtained in slide Slide 3.36.<br />

We have now dealt with the issue of converting a given vector to a set of binary pixels <strong>and</strong> have<br />

denoted that as vector raster conversion, this is also denoted as scan conversion <strong>and</strong> it occurs in<br />

the representation of vector data in a raster monitor environment. What we have not yet talked<br />

about is the inverse issue, namely given is a raster <strong>and</strong> a pattern <strong>and</strong> we would like to get vectors<br />

from it. We have touched upon a raster pattern <strong>and</strong> replacing it by a medial axis or skeleton. But<br />

we have not yet really come out from that conversion with a set of vectors. Yet, the raster-vectorconversion<br />

is an important element in dealing with object recognition. A particular example has<br />

been hinted at in Slide 3.36 bec<strong>aus</strong>e it clearly represents an example from character recognition.<br />

The letter H in a binary raster image is described by many pixels. To recognize as a raster H,<br />

might be based on a conversion to a skeleton, a replacement of the skeleton by a set of vectors <strong>and</strong><br />

then by submitting those vectors to a set of rules that would tell us which letter we are dealing<br />

with.<br />

Prüfungsfragen:<br />

• Wenden Sie die ”<br />

medial axis“ Trans<strong>for</strong>mation von Bloom auf das Objekt in Abbildung B.39<br />

links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen.


3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 75


76 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE<br />

Slide 3.1 Slide 3.2 Slide 3.3 Slide 3.4<br />

Slide 3.5 Slide 3.6 Slide 3.7 Slide 3.8<br />

Slide 3.9 Slide 3.10 Slide 3.11 Slide 3.12<br />

Slide 3.13 Slide 3.14 Slide 3.15 Slide 3.16<br />

Slide 3.17 Slide 3.18 Slide 3.19 Slide 3.20<br />

Slide 3.21 Slide 3.22 Slide 3.23 Slide 3.24<br />

Slide 3.25 Slide 3.26 Slide 3.27 Slide 3.28


3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 77<br />

Slide 3.29 Slide 3.30 Slide 3.31 Slide 3.32<br />

Slide 3.33 Slide 3.34 Slide 3.35 Slide 3.36<br />

Slide 3.37 Slide 3.38


78 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE


Chapter 4<br />

Morphology<br />

Prüfungsfragen:<br />

• Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch,<br />

mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser<br />

Pixelanordnung.<br />

4.1 What is Morphology<br />

This is an interesting subject. It is not very difficult yet also not to be underestimated. We talk<br />

about shape <strong>and</strong> the structure of objects in images. It’s a topic that has to do with binary image<br />

processing. Recall that binary images have pixels that are only either black or white. Objects<br />

typically are described by a group of black pixels <strong>and</strong> the background consists of all white pixels.<br />

So one has a two-dimensional space of integer numbers to which we apply set theory in morphology.<br />

Let us take an object - we call it A - <strong>and</strong> that object is hinged at a location designated in Slide 4.5<br />

by a little round symbol. Morphology now says that A is a set of pixels in this two-dimensional<br />

space. A separate object B is also defined by a set of pixels. We now translate A by distance x<br />

<strong>and</strong> obtain a new set called Ax. The translation is described by two numbers, x 1 <strong>and</strong> x 2 , <strong>for</strong> the<br />

two dimensions of the translation.<br />

We can write the expression in Slide 4.6 to define the result A after the translation: Ax consists of<br />

all pixels c, so that c is equal to a + x, where a are all the pixels from pixel set A. Geometrically<br />

<strong>and</strong> graphically we can illustrate the translation very simply by the two distances x 1 <strong>and</strong> x 2 of<br />

Slide 4.7. Instead of A we have (A)x. A very simple concept <strong>for</strong> humans becomes a somewhat<br />

complex equation in the computer.<br />

Morphology also talks about “reflection”.<br />

We have a pixel set B <strong>and</strong> reflect it into a set ˆB, which is the set of all pixels x such that x is<br />

−b, where b is each pixel from pixel set B. The interpretation of −b is needed. Geometrically, ˆB<br />

is the mirror reflection of B, <strong>and</strong> we have mirrored B over the hinge point (point of reflection).<br />

The next concept we look at is “complementing” a set A into a set A C . A C is the set of all pixels<br />

x so that x are just all those pixels that do not belong to set A.<br />

An object to be composed of all the pixels inside a contour is called A, <strong>and</strong> A C is the background.<br />

Next we can take two or objects A, B <strong>and</strong> we build a difference A − B. The difference is the<br />

set of all pixels x, such that x belongs to set A but not to set B. We can describe this by a new<br />

symbol <strong>and</strong> say this is the interscetion of two sets, namely of set A <strong>and</strong> the complement B C of B.<br />

79


80 CHAPTER 4. MORPHOLOGY<br />

Definition 6 Difference<br />

Given two objects A <strong>and</strong> B as sets of pixels (points of the 2D-Integer-space).<br />

The difference of the two sets A <strong>and</strong> B is defined as<br />

A − B = {x|x ɛ A, x not ɛ B} = A intersects B C .<br />

Slide 4.14 shows A, B <strong>and</strong> A − B is now A reduced by the area of B covering part of A.<br />

Prüfungsfragen:<br />

• Was ist Morphologie“?<br />

”<br />

Antwort: die Anwendung nichtlinearer Operatoren auf die Form eines Objekts<br />

4.2 Dilation <strong>and</strong> Erosion<br />

“Dilation” means that we make something bigger (in German: Blähung). The symbol we use to<br />

describe the dilation of a set A using a “structure element” B is shown in Slide ??. A dilated by<br />

B is the collection of all the pixels x that belong to the reflected <strong>and</strong> translated structure element<br />

B, <strong>and</strong> belong to A, provided they are not zero, or not empty.<br />

This sounds pretty difficult, but when we look at it geometrically it is very simple.<br />

A be a square with a side length d <strong>and</strong> B is another square of a diameter d/4. If we reflect B<br />

around a reflection point that is in the center of the square then the reflection is the same as the<br />

original structure element. Thus we reflect (with no effect) <strong>and</strong> shift B by a distance to pixel x.<br />

As we go to each pixel x of set A, we place the (reflected) structure element there - we translate<br />

ˆB to the location x - <strong>and</strong> we now have the union of the pixels in set A <strong>and</strong> the structure element<br />

ˆB. We add up all pixels that are in the union of A <strong>and</strong> ˆB. What we do is to add a little fringe<br />

around area A that is obtained by moving the pixels of set B over A <strong>and</strong> through all pixels of<br />

A. ˆB will extend along the fringe of A, so we make A a little larger. If our structure element is<br />

not a square but a rectangle of dimension d in one direction <strong>and</strong> d/4 in the other, then we obtain<br />

an enlargement of our area A that is significant in one direction <strong>and</strong> less significant in the other<br />

direction.<br />

Algorithm 9 Dilation<br />

1: <strong>for</strong> all x do<br />

2: Y = Translate(Reflect(B), x)<br />

3: <strong>for</strong> all y element Y do<br />

4: if (y element A) AND (x not element X) then<br />

5: Insert(X, x)<br />

6: end if<br />

7: end <strong>for</strong><br />

8: end <strong>for</strong><br />

9: return X<br />

Dilation has a sister operation called “erosion” (Abmagerung). The erosion is thex x opposite of a<br />

dilation, <strong>and</strong> the symbol designating an erosion is a little circle with a minus in it, shown in Slide<br />

4.18.<br />

The result of an erosion consists of all those pixels x that come from the structure element B<br />

placed at location x such that the shifted structure element completely lies within set A. How<br />

does this look like geometrically?


4.2. DILATION AND EROSION 81<br />

Definition 7 Erosion<br />

X ⊖ B = {d ∈ E 2 : B d ⊆ X}<br />

B . . . binary erosion matrix<br />

B d . . . B translated by d<br />

X . . . binary image matrix<br />

Outgoing from this equation we get to following equal expression:<br />

X ⊖ B = ⋂<br />

b∈B<br />

X −b<br />

In Slide 4.19 we have subtracted from set A a fringe that has been deleted like an eraser of the<br />

size of B. Doing this with a non-square but rectangular structure element we receive a result<br />

that in the particular case of Slide 4.19 reduces set A to merely a linear element bec<strong>aus</strong>e there<br />

is only one row of pixels that satisfies the erosion condition using this type of structure element<br />

with dimensions d <strong>and</strong> d/4.<br />

There is a duality of erosion <strong>and</strong> dilation bec<strong>aus</strong>e we can express an erosion of set A by structure<br />

element B as a dilation taking the complement of A C <strong>and</strong> dilate the complement A C of A with a<br />

reflection ˆB of B.<br />

This is being demonstrated in Slide 4.21 where we go through the erosion definition of A by<br />

structure element B <strong>and</strong> say the complement of that eroded object A by structure element B<br />

equals the complement of the set of all pixels x, such that B gets placed over x <strong>and</strong> we count<br />

those pixels of A that are not on B. We go through our previous definitions <strong>and</strong> we can show in<br />

Slide 4.21 that we end up with a dilation of the complement A C of set A with the reflection ˆB of<br />

structure element B.<br />

Prüfungsfragen:<br />

• Erläutern Sie die morphologische ”<br />

Erosion“ unter Verwendung einer Skizze und eines Formel<strong>aus</strong>druckes.<br />

• Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation<br />

Erosion“ angew<strong>and</strong>t werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dilation<br />

genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In <strong>and</strong>eren<br />

”<br />

Worten: statt der Erosion sollen <strong>and</strong>ere morphologische Operationen eingesetzt werden, die<br />

in geeigneter Reihenfolge nachein<strong>and</strong>er <strong>aus</strong>geführt das gleiche Ergebnis liefern wie eine Erosion.)<br />

Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und<br />

benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende<br />

Formelement ist ebenfalls in Abbildung B.65 dargestellt.<br />

Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt <strong>aus</strong> der<br />

Definitionsmenge Z 2 zeigt!<br />

Antwort: Die morphologische Erosion kann durch eine Abfolge der folgenden Operationen<br />

ersetzt werden (siehe Abbildung 4.1):<br />

1. Komplement<br />

2. Dilation<br />

3. Komplement


82 CHAPTER 4. MORPHOLOGY<br />

1 2<br />

3<br />

Formelement<br />

Figure 4.1: Morphologische Erosion als Abfolge Komplement→Dilation→Komplement<br />

• Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich<br />

durch die Gleichung<br />

(A ⊖ B) c = A c ⊕ ˆB<br />

<strong>for</strong>mulieren. Warum ist in dieser Gleichung die Reflexion ( ˆB) von Bedeutung?<br />

• Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen Erosion“ ”<br />

bzw. Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur<br />

”<br />

Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion<br />

bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden<br />

kann!<br />

Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen.<br />

Antwort: Man betrachtet den gewünschten Kernel für die morphologischen Operationen<br />

als Filtermaske (mit ”<br />

1“ für jedes gesetzte ”<br />

Pixel“ im Kernel, ”<br />

0“ sonst) und faltet das<br />

Binärbild mit dieser Maske. Im Ergebnisbild stehen nun Werte g(x, y), wobei<br />

– g(x, y) ≥ 1, wenn mindestens ein Pixel der Maske mit dem Inputbild in Deckung war<br />

(Dilation), bzw.<br />

– g(x, y) ≥ K, wenn alle Pixel der Maske mit dem Inputbild in Deckung waren (Erosion),<br />

wobei K die Anzahl der gesetzten Maskenpixel ist.<br />

4.3 Opening <strong>and</strong> Closing<br />

We have a more complex operation that is a sequence of previously defined operations. We call<br />

them “opening” <strong>and</strong> “closing”. Let’s take first the question of opening. We may have two objects,<br />

one to the left <strong>and</strong> the other one to the right <strong>and</strong> they are connected by a thin bridge, perhaps<br />

bec<strong>aus</strong>e of a mistake in sensing <strong>and</strong> preprocessing of the data.


4.3. OPENING AND CLOSING 83<br />

We can separate those two objects by an operation called “opening”. Opening a set A by means<br />

of a structure element B is defined by a symbol shown in Slide 4.24 namely by an open little<br />

circle. This begins with the erosion of A using structure element B, <strong>and</strong> subsequently dilating<br />

again the result by the structure element B as well. So we first shrink, then we enlarge again. But<br />

in shrinking we get rid of certain things that are not there anymore when we enlarge.<br />

Slide 4.25 shows the circular structure<br />

Definition 8 Open<br />

A ◦ B = (A ⊖ B) ⊕ B<br />

◦. . .open , ⊖. . .erosion , ⊕. . .dilation ; B is a circular structure element<br />

element B <strong>and</strong> the original object A. As we now erode object A <strong>and</strong> obtain a shrunk situation,<br />

object A is certainly broken up into two eroded smaller objects.<br />

Now the bridge between the two points in the original set A is narrower than the size of the<br />

structure element, so the structure element will, like an eraser, erase that bridge. Now we want to<br />

go back to the original size. So we dilate with the structure element B again <strong>and</strong> what we obtain<br />

is now the separation of thinly connected objects.<br />

Slide 4.27 <strong>and</strong> Slide 4.28 are a summary of the opening operation.<br />

We proceed to the “closing” operation. Closing set A with the help of structure element B is<br />

defined by a little filled circle.<br />

We first dilate A by B <strong>and</strong> then we erode the result by structure element B. We do the opposite<br />

of opening. The process will remove little holes in things. One will not break up, but connect,<br />

one will fill in, remove noise.<br />

Definition 9 Closing<br />

A • B = (A ⊕ B) ⊖ B<br />

⊖ Erosion: remove all structures smaller than the structure element B<br />

⊕ Dilation: restore the original size excepting the removed structures<br />

Closing set A with structure element B means to first dilate A by B <strong>and</strong> afterwards erode<br />

the result by structure element B.<br />

Slide 4.30 Slide 4.31 Slide 4.32 Slide 4.33 feature a complex shape. The shape seems to break<br />

apart when it really should not. We take the original figure <strong>and</strong> dilate (make it larger). As it<br />

grows, this will reduce small details. The resulting object is less sophisticated, less detailed than<br />

we had be<strong>for</strong>e.<br />

Closing an object A using the structure element B can again be shown to be the dual with opening,<br />

concerning complementarity <strong>and</strong> reflection. Closing an object A with respect of structure element<br />

B <strong>and</strong> creating the complement of the result is the same as opening the complement of A with<br />

the mirror reflection of structure element B.<br />

Prüfungsfragen:<br />

• Erläutern Sie das morphologische ” Öffnen“ unter Verwendung einer Skizze und eines Formel<strong>aus</strong>druckes.


84 CHAPTER 4. MORPHOLOGY<br />

Formelement<br />

Figure 4.2: morphologisches Öffnen<br />

• Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man1 die zugrundeliegenden<br />

Operationen (Erosion und Dilation) wiederholt <strong>aus</strong>führen. Welches der<br />

folgenden beiden Verfahren führt zum gewünschten Ergebnis:<br />

1. Es wird zuerst die Erosion n-mal <strong>aus</strong>geführt und anschließend n-mal die Dilation, also<br />

(((A ⊖B) . . . ⊖ B) ⊕B) . . . ⊕ B<br />

} {{ } } {{ }<br />

n−mal ⊖ n−mal ⊕<br />

2. Es wird die Erosion <strong>aus</strong>geführt und anschließend die Dilation, und der Vorgang wird<br />

n-mal wiederholt, also<br />

(((A ⊖B) ⊕ B) . . . ⊖ B) ⊕ B<br />

} {{ }<br />

n−mal abwechselnd ⊖/⊕<br />

Begründen Sie Ihre Antwort und erklären Sie, warum das <strong>and</strong>ere Verfahren versagt!<br />

(a) ist richtig, bei (b) bleibt das Objekt nach der ersten ⊖/⊕-Iteration un-<br />

Antwort:<br />

verändert.<br />

• Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öffnen“<br />

mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische ”<br />

Effekt tritt auch in diesem Beispiel auf?<br />

Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis<br />

” ”<br />

rechts in Abbildung B.31 eintragen.<br />

Antwort: siehe Abbildung 4.2, typischer Effekt: Trennung von Regionen, die durch eine<br />

schmale Brücke“ verbunden sind<br />

”<br />

4.4 Morphological Filter<br />

Definition 10 Morphological filter<br />

A morphological filter consits out of one or more morphologic operations such as dilation, erosion,<br />

open, close, hit <strong>and</strong> miss that are applied sequentially to an input image.<br />

A very simple application is morphological filtering. Say we have an object such as an ice floe on<br />

the ocean <strong>and</strong> we have several little things floating around it. We would like to recognize <strong>and</strong> map<br />

the large ice floe.<br />

1 abgesehen von einer Vergrößerung des Maskenelements B


4.5. SHAPE RECOGNITION BY A HIT OR MISS OPERATOR 85<br />

We would like to isolate this object, measure its surface, its contour, see where it is. In an<br />

automated process we need to remove all the trash around it. We need to fill the holes <strong>and</strong> get<br />

rid of the extraneous details on the open water.<br />

Morphological filtering is illustrated in Slide 4.38 <strong>and</strong> Slide 4.39.<br />

We find a structure element which has to be a little larger than the elements that we would like to<br />

remove. We first let that structure element run over the image <strong>and</strong> per<strong>for</strong>m an erosion operation.<br />

When we erode with the structure element every object that is smaller than that structure element<br />

will disappear, but those holes will get bigger. We follow with dilation after the erosion. That<br />

combination is what we call the opening operation. We have removed all the small items outside<br />

the object, but the elements inside the object are still there.<br />

We now do the opposite operation, namely the closing. That means we take the opening result<br />

<strong>and</strong> do a dilation, which increases the size of the object in such a way that it will also close up all<br />

the holes, then we have to shrink it again. We have to do a dilation with our structure element B<br />

<strong>and</strong> that operation is “closing”. The sequence of opening, thinning a result, <strong>and</strong> closing, produces<br />

a clean object without extraneous detail. We have applied morphological filtering.<br />

Prüfungsfragen:<br />

• Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern<br />

Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert.<br />

Verwenden Sie bitte dazu Formel<strong>aus</strong>drücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf.<br />

Stellen Sie auch das Ergebnisbild dar.<br />

• Erklären Sie anh<strong>and</strong> eines Beispiels den Vorgang des morphologischen Filterns!<br />

4.5 Shape Recognition by a Hit or Miss Operator<br />

Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have<br />

three small objects X, Y <strong>and</strong> Z <strong>and</strong> we would like to find object X as shown in Slide 4.41<br />

The union of X, Y , <strong>and</strong> Z is denoted as the auxiliary object A. Now we define a structure element<br />

W , <strong>and</strong> from that structure element a second structure element as the difference of W <strong>and</strong> shape<br />

X that we are looking <strong>for</strong>. That gives an interesting structure element which in this case looks<br />

like the frame of a window. We build the complement A C of A, which is the background without<br />

the objects X, Y , <strong>and</strong> Z.<br />

If we erode A with X then the object that is smaller than X gets wiped out, the object that is<br />

larger than X will be showing as an area which results from the erosion by object X. For X we<br />

obtain a single pixel in Slide 4.42. The automated process has produced pixels that are c<strong>and</strong>idates<br />

<strong>for</strong> the object of interest, X. We need to know which pixel to choose.<br />

We go through this operation again, but use A C as the object <strong>and</strong> W − X as structure element.<br />

The erosion of A C by the structure element W − X produces the background with an enlarged<br />

hole <strong>for</strong> the 3 objects X, Y , <strong>and</strong> Z, <strong>and</strong> two auxiliary objects, namely the single pixel where our<br />

X is located <strong>and</strong> a pattern consisting of several pixels <strong>for</strong> the small objects in Slide 4.43. We<br />

intersect the two erosion results we had obtained, once eroding A with object X, the other with<br />

A C eroded by W − Z. The intersection produces a single pixel at the location of our object X.<br />

This is the so-called Hit-or-Miss-Method of finding an instance where object X exists.<br />

All other objects that are either bigger or smaller will disappear. The process <strong>and</strong> the <strong>for</strong>mula<br />

are shown in Slide 4.46. Slide 4.46 summarizes the Hit-or-Miss Process that was illustrated in the<br />

previous paragraph. The process uses a symbol with a circle <strong>and</strong> a little asterisk in it. Again: A


86 CHAPTER 4. MORPHOLOGY<br />

Definition 11 Hit or Miss Operator<br />

A ⊗ W = (A ⊖ W 1 ) ∩ (A C ⊖ W 2 )<br />

Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have<br />

three small objects X, Y <strong>and</strong> Z <strong>and</strong> we would like to find object X. The union of X, Y , <strong>and</strong> Z is<br />

denoted as the auxiliary object X. Now we define a structure element W , <strong>and</strong> from that structure<br />

element a second structure element as the difference of W <strong>and</strong> shape X that we are looking <strong>for</strong>.<br />

That gives an interesting structure element which in this case looks like the frame of a window.<br />

We build the complement A C of A, which is the background without the objects X, Y <strong>and</strong> Z. If<br />

we erode A with X then the object that is smaller than X gets wiped out, the object that is larger<br />

than X will be showing as an area which results from the erosion by object X. For X we obtain<br />

a single pixel. The automated process has produced pixels that are c<strong>and</strong>idates <strong>for</strong> the object of<br />

interest, X. We need to know which pixel to choose.<br />

We go through this operation again, but use A C as the object <strong>and</strong> W − X as structure element.<br />

The erosion of A C by the structure element W − X produces the background with an enlarged<br />

hole <strong>for</strong> the 3 objects X, Y , <strong>and</strong> Z <strong>and</strong> two auxiliary objects, namely the single pixel where our<br />

X is located <strong>and</strong> a pattern consisting of several pixels <strong>for</strong> the small objects. We intersect the<br />

two erosion results we had obtained, once eroding A with object X, the other with A C eroded<br />

by W − Z. The intersection produces a single pixel at the location of our object X. This is the<br />

so-called Hit-or-Miss-Method of finding an instance where object X exists.<br />

All other objects that are either bigger or smaller will disappear. The process uses a symbol with<br />

a circle <strong>and</strong> a little asterisk in it. Again: A is eroded by X <strong>and</strong> the complement of A is eroded by<br />

W − X. The two results get intersected. We have two structure elements, X <strong>and</strong> W − X.<br />

is eroded by X <strong>and</strong> the complement of A is eroded by W − X. The two results get intersected.<br />

We have two structure elements, X <strong>and</strong> W − X.<br />

Slide 4.46 shows that the equation can be rewritten in various <strong>for</strong>ms.<br />

Prüfungsfragen:<br />

• Wie ist der ”<br />

Hit-or-Miss“-Operator A ⊛ B definiert? Erläutern Sie seine Funktionsweise zur<br />

Erkennung von Strukturen in Binärbildern!<br />

Antwort: Es gilt<br />

A ⊛ B = (A ⊖ B) ∩ [ A C ⊖ (W − B) ] ,<br />

wobei W ein Strukturelement größer als B ist. Bei Erosion von A mit B verschwinden<br />

alle Teile von A, die kleiner sind als B, ein Teil in der Form von B bleibt als isoliertes<br />

Pixel zurück. Bei Erosion von A C mit W − B werden alle Löcher von A C , die größer sind<br />

als B, aufgeweitet, während Teile der Form B wieder ein einzelnes Pixel ergeben. Der<br />

Mengendurchschnitt liefert also genau dort ein gesetztes Pixel, wo ein Teil von A mit B<br />

identisch ist.<br />

4.6 Some Additional Morphological Algorithms<br />

Morphological algorithms that are commonly used deal with finding the contour of an object,<br />

findintranslationg the skeleton of an object, filling regions, cutting off branches from skeletons. The<br />

whole world of morphological algorithms is clearly applicable in character recognition, particularly<br />

in dealing with h<strong>and</strong>writing. It is always applied in those cases where the object of interest can<br />

be described in a binary image, where we do not need color nor gray values. Instead we simply<br />

have object or non-object.


4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 87<br />

Given an object A in Slide 4.48 <strong>and</strong> Slide 4.49, we are looking <strong>for</strong> the contour of A as b(A). We<br />

use a structure element B to find the contour. The contour of region A is obtained by subtracting<br />

from A an eroded version of A. The erosion should just be by one pixel. Structure element B is<br />

a 3 × 3 window.<br />

Definition 12 Contour<br />

We present the <strong>for</strong>mal definition of a contour. It is the digital counterpart of a boundary of an<br />

analog set.<br />

We are looking <strong>for</strong> the contour of A as b(A).<br />

b(A) = A − (A ⊖ B) (4.1)<br />

We use a structure element B to find the contour. The contour of region A is obtained by<br />

subtracting from A an eroded version of A. The erosion should just be by one pixel. Structure<br />

element B is a 3x3 window which looks like :<br />

⎡<br />

a 11 a 12 a 13<br />

⎤<br />

a 31 a 32 a 33<br />

⎢<br />

B = ⎣ .<br />

. .. .<br />

⎥<br />

⎦ (4.2)<br />

The contour of a connected set of points R is defined as the points of R having at least one neighbor<br />

not in R. The contour is the outline or visible edge of a mass, <strong>for</strong>m or object.<br />

Slide 4.49 shows the erosion of region A <strong>and</strong> the difference from region A to get to the contour<br />

pixels.<br />

Region filling is the opposite operation, starting from a binary representation of a contour. We<br />

want to fill the interior of the contour. This particular contour is continuous, non-interrupted,<br />

under an 8-neighborhood relationship (recall: up, down, left, right plus all oblique relationships).<br />

We build the complement A C of contour A. The structure element B is again a 3 × 3 matrix<br />

but only using the 4-neighbors. Region filling is an iterative process according to Slide 4.51.<br />

We get a running index k, which increases as we go through the iterations, create at each step<br />

an intermediate result that always looks back at the complement A C of A, using the structure<br />

element B <strong>and</strong> applying a dilation of the previous iteration by the structure element <strong>and</strong> the union<br />

with the complement A C of A, <strong>and</strong> repeat this step by step, until such time that we do not get<br />

any new pixels added. The issue is the starting point X 0 , which is an arbitrary pixel inside the<br />

contour from which we start the process.<br />

A final illustration of the usefulness of morphology deals with the automated recognition of zip<br />

codes that are h<strong>and</strong>-written.<br />

Slide 4.52 <strong>and</strong> Slide 4.53 presents a h<strong>and</strong>-written address that is being imaged. Through some<br />

pre-processing that h<strong>and</strong>-writing has been converted into a binary image. The first step might<br />

be to threshold the gray-tone image to convert to a binary image. After having thresholded the<br />

address we need to find the area with the zip-codes.<br />

Let us address the task of extracting all connected components in the area that comprises the<br />

address field. From a segmentation into components, one finds rectangular boxes containing a<br />

connected object. One would assume to have now each digit separate from all the other digits.<br />

However, if two digits are connected like in this example with a digit 3 <strong>and</strong> a digit 7, then we<br />

misread this to be one single digit. We can help ourselves considering the shape of the rectangular<br />

box, plus using knowledge about how many digits one has in a zip-code. It is five basically in<br />

the United States, so one needs to have five digits <strong>and</strong> so one can look <strong>for</strong> joined characters by<br />

measuring the relative widths of the boxes that enclose the characters. We must expect certain<br />

dimensions of the box surrounding a digit. Opening <strong>and</strong> closing operations can separate digits<br />

that should be separate, or merge broken elements that should describe a single digit. Actual<br />

character recognition (OCR <strong>for</strong> “Optical Character Recognition”) then takes each binary image


88 CHAPTER 4. MORPHOLOGY<br />

window with one digit <strong>and</strong> seeks to find which value between 0 <strong>and</strong> 9 this could be. This can be<br />

based on a skeleton of each segment, <strong>and</strong> a count of the structure with nodes <strong>and</strong> arcs. We will<br />

address this topic later in this class.<br />

As a short outlook beyond morphology of binary images, let’s just state that there is a variation<br />

of morphology applied to gray value images.<br />

Gray-tone images can be filtered with morphology, <strong>and</strong> an example is presented in Slide 4.55.<br />

Prüfungsfragen:<br />

• Gegeben sei die in Abbildung ?? dargestellte Pixelanordnung. Beschreiben Sie grafisch und<br />

mittels Formel das Verfahren der morphologischen Ermittlung des Umrisses des dargestellten<br />

Objektes mit einem von Ihnen vorzuschlagenden Strukturelement.<br />

• Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des<br />

R<strong>and</strong>es eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete<br />

Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung<br />

B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse.


4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 89


90 CHAPTER 4. MORPHOLOGY<br />

Slide 4.1 Slide 4.2 Slide 4.3 Slide 4.4<br />

Slide 4.5 Slide 4.6 Slide 4.7 Slide 4.8<br />

Slide 4.9 Slide 4.10 Slide 4.11 Slide 4.12<br />

Slide 4.13 Slide 4.14 Slide 4.15 Slide 4.16<br />

Slide 4.17 Slide 4.18 Slide 4.19 Slide 4.20<br />

Slide 4.21 Slide 4.22 Slide 4.23 Slide 4.24<br />

Slide 4.25 Slide 4.26 Slide 4.27 Slide 4.28


4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 91<br />

Slide 4.29 Slide 4.30 Slide 4.31 Slide 4.32<br />

Slide 4.33 Slide 4.34 Slide 4.35 Slide 4.36<br />

Slide 4.37 Slide 4.38 Slide 4.39 Slide 4.40<br />

Slide 4.41 Slide 4.42 Slide 4.43 Slide 4.44<br />

Slide 4.45 Slide 4.46 Slide 4.47 Slide 4.48<br />

Slide 4.49 Slide 4.50 Slide 4.51 Slide 4.52<br />

Slide 4.53 Slide 4.54 Slide 4.55


92 CHAPTER 4. MORPHOLOGY


Chapter 5<br />

Color<br />

5.1 Gray Value Images<br />

A precursor to color images is of course a black & white image. Some basic issues can be studied<br />

with black & white images be<strong>for</strong>e we proceed to color. A regular gray value image is shown in<br />

Slide 5.3. We need to characterize a gray value image by its densities, the way it may challenge<br />

our eyes, the manner in which it captures the physics of illumination <strong>and</strong> reflection, <strong>and</strong> how it<br />

is presented to the human viewer. We have discussed such concepts as the density of film, the<br />

intensity of the light that is reflected from objects, <strong>and</strong> the quality of an image in terms of its<br />

histogram.<br />

Intensity describes the energy, light or brightness. When an intensity value is zero, we are talking<br />

about darkness, no light. If the intensity is bright, then we should have a large value describing<br />

it. The opposite is true <strong>for</strong> film. A film with a density zero is completely transparent, whereas a<br />

film at a density 4 is totally opaque <strong>and</strong> will not let any light go through.<br />

A negative film that is totally transparent is representing an object that does not send any light<br />

through the optical system. A negative that is totally opaque had been brightly illuminated. The<br />

opposite is true <strong>for</strong> a positive film. The darker the positive film, the less light it represents.<br />

In Chapter 2 we already talked about the eye, but we did not address sensitivity of the eye to<br />

brightness, energy <strong>and</strong> light. A notable characteristic of the eye is that it is very sensitive to ratios.<br />

If we present to an eye two different brightnesses, let’s say a density of 0.11 <strong>and</strong> 0.10 then the eye<br />

might perceive this as if it were the same densities as 0.55 <strong>and</strong> 0.5, both being 10% different from<br />

one another. The sensitivity of the eye to differences ∆I of the intensity of light I is expressed by<br />

the Weber-Ratio ∆I/I.<br />

What is now the interval, as a ratio r, when presenting an image with n discrete gray values? Let<br />

us define n intensity steps I:<br />

I n = r n I 0<br />

If we say intensity I is the maximum <strong>and</strong> intensity I 0 is the minimum, then we will have to compute<br />

the value of r that allows to break up the interval I 0 to I into n steps. Slide 5.5 illustrates the<br />

issue:<br />

r = n √<br />

In<br />

I 0<br />

If n = 3, then we have 4 different levels of intensity, namely 1/8, 1/4, 1/2 <strong>and</strong> 1. The eye needs<br />

an r value of 0.01 or the differences between two intensities will not be recognizable, conceptually<br />

presenting a capability of resolving 100 different gray values.<br />

93


94 CHAPTER 5. COLOR<br />

A monitor presents an intensity I, that is a function of N, the number of electrons creating the<br />

intensity on the monitor. Slide 5.6 presents the relationship. Film has a density that relates<br />

linearly to the logarithm of the energy of light that falls onto the film. The dynamic range is the<br />

ratio of the highest <strong>and</strong> lowest intensity that a medium can represent. In the event of a monitor<br />

that value might be 200, in the event of film it might be 1000, in the event of paper it might be<br />

100. Note that the dynamic range d is the power of base 10 that the medium can support, thus<br />

10d. For film the ratio of brightest <strong>and</strong> darkest intensity is 1000 <strong>and</strong> there<strong>for</strong>e film typically has<br />

a density range d = 3, whereas paper lies at d < 2.<br />

Continuous tone photography cannot be printed directly. Instead one needs to create so-called<br />

half tone images by means of a raster pattern. These images make use of the spatial integration<br />

that human eyes per<strong>for</strong>m. A half tone is a representation of a gray tone. The image is resolved<br />

into discrete points, each point is associated with an area on paper. At each point one places a<br />

small dot proportional in size to the density of the object. If it is bright the dots are small, <strong>and</strong><br />

they are large dots if the object is dark. One denotes this also as screening of a gray tone image.<br />

Note that screening typically is arranged at an angle of 45 o . In Slide 5.7 is a so-called half-tone<br />

image. Slide 5.8 makes the transition to the digital world. Gray tones can be obtained in a digital<br />

environment by substituting <strong>for</strong> each pixel a matrix of subpixels. If we have 2 × 2 subpixels we<br />

can represent five gray values as shown in Slide 5.9. Similarly, a 3 × 3 pattern will permit one<br />

to represent 10 different gray values. We call the matrix into which we subdivide pixels a dither<br />

matrix: a D2 - dither matrix means that 2 × 2 pixels are used to represent one digital gray value<br />

of a digital image.<br />

The basic principle is demonstrated in Algorithm 10. An example <strong>for</strong> the creation of a 3 × 3 dither<br />

matrix would be:<br />

D =<br />

⎡<br />

⎣ 6 8 4<br />

1 0 3<br />

5 2 7<br />

An image gray value is checked against each element of the dither matrix <strong>and</strong> only those pixels<br />

are set, where the gray value is larger than the value in the dither matrix.<br />

For a gray value of 5 the given matrix D would produce the following pattern<br />

⎡<br />

P = ⎣<br />

0 0 1<br />

1 1 1<br />

0 1 0<br />

⎤<br />

⎦<br />

⎤<br />

⎦<br />

A dither matrix of n × n defines n 2 + 1 different patterns. It should be created wisely in order not<br />

to define patterns that produce artefacts. For instance the following pattern (<strong>for</strong> a gray value of<br />

3) would create horizontal lines if applied on larger areas.<br />

D =<br />

⎡<br />

⎣ 5 3 6<br />

1 0 2<br />

8 4 7<br />

⎤<br />

⎦ v=3<br />

−→<br />

⎡<br />

⎣P =<br />

0 0 0<br />

1 1 1<br />

0 0 0<br />

⎤<br />

⎦<br />

Prüfungsfragen:<br />

• Was versteht man unter dem ”<br />

dynamischen Bereich“ eines Mediums zur Wiedergabe bildhafter<br />

In<strong>for</strong>mationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung?<br />

Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen<br />

Bereiches!


5.2. COLOR IMAGES 95<br />

Algorithm 10 Halftone-Image (by means of a dither matrix)<br />

1: dm = createDitherMatrix(n, n) {create a Dither-Matrix n × n}<br />

2: <strong>for</strong> all pixels (x, y) of the image do<br />

3: v == getGrayValueOfPixel(x, y)<br />

4: <strong>for</strong> all elements (i, j) of dm do {checking the value against the matrix}<br />

5: if v > dm(i, j) then<br />

6: setPixel(OutputImage,x · n + i,y · n + j,black) {applying the pattern}<br />

7: else<br />

8: setPixel(OutputImage,x · n + i,y · n + j,white)<br />

9: end if<br />

10: end <strong>for</strong><br />

11: end <strong>for</strong><br />

Häufigkeit<br />

Grauwert<br />

Figure 5.1: Histogramm von Abbildung B.29<br />

• Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt,<br />

wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster<br />

dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt?<br />

• Skizzieren Sie das Histogramm des digitalen Grauwertbildes <strong>aus</strong> Abbildung B.29, und kommentieren<br />

Sie Ihre Skizze!<br />

Antwort: Das Histogramm ist bimodal, wobei die Spitze im Weiß-Bereich etwas flacher ist<br />

als im Schwarz-Bereich, da das Bild im hellen Bereich mehr Struktur aufweist als im dunklen<br />

Bereich (siehe Abbildung 5.1).<br />

5.2 Color images<br />

Of course computer graphics <strong>and</strong> digital image processing are significantly defined by color. Color<br />

has been a mysterious phenomenon through the history of mankind <strong>and</strong> there are numerous models<br />

that explain color <strong>and</strong> how color works.<br />

Slide 5.12 does this with a triangle: the three corners of the triangle represent white, black <strong>and</strong><br />

color, so that the arcs of the triangle represent values of gray, tints between white <strong>and</strong> pure color<br />

or shades between pure color <strong>and</strong> black. The concept of tones fills the area of the triangle. A color<br />

is being judged against existing color tables. A very widely used system is by Munsell. This is<br />

organized along 3 ordering schemes: hue (color), value (lightness) <strong>and</strong> saturation. These 3 entities<br />

can be the coordinate axes of a 3D space. We will visit the 3-dimensional idea later in subtopic<br />

5.5.


96 CHAPTER 5. COLOR<br />

Color in image processing represents us with many interesting phenomena. The example in Slide<br />

5.16 is a technical image, a so-called false color image. In this case film is being used that is not<br />

sensitive to blue, but is instead sensitive to green, red <strong>and</strong> infrared. In this particular film, the<br />

infrared light falling onto the emulsion will activate the red layer in the film. The red light will<br />

activate the green layer, the green light will activate the blue layer. As a result, an image will<br />

show infrared as red. Slide 5.16 is a vegetated area. We recognize that vegetation is reflecting a<br />

considerable amount of infrared light, much more so than red or green light. Healthy vegetation<br />

will look red, sick vegetation will reflect less infrared light <strong>and</strong> will there<strong>for</strong>e look whitish.<br />

Color images not only serve to represent the natural colors of our environment, or the electromagnetic<br />

radiation as we receive it with our eyes or by means of sensors, but color may also be used<br />

to visualize things that are totally invisible to humans.<br />

Slide 5.18 is an example of a terrain elevation in the <strong>for</strong>m of color, looking at the entire world.<br />

Similarly, Slide 5.19 illustrates the rings of planet Saturn <strong>and</strong> uses color to highlight certain<br />

segments of those rings to draw the human observer’s attention. The colors can be used to mark<br />

or make more clearly visible to a human interpreter a physical phenomenon or particular data<br />

that one wants the human to pay attention to. This is called pseudo-color.<br />

Prüfungsfragen:<br />

• Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild<br />

(pseudo color image)? Nennen Sie je einen typischen Anwendungsfall!<br />

5.3 Tri-Stimulus Theory, Color Definitions, CIE-Model<br />

The eye has color sensitive cones around the fovea, the area of highest color sensitivity in the eye.<br />

It turns out that these cones are not equally sensitive to red, green <strong>and</strong> blue. Slide 5.22 shows that<br />

we have much less sensitivity to blue light than we have to green <strong>and</strong> red. The eye’s cones can see<br />

the electromagnetic spectrum from 0.4 to 0.7 µm wavelength (or 400 to 700 nanometers). We find<br />

that the eye’s rods are most sensitive in the yellow - green area. Sensitivity luminance is best in<br />

that color range. Slide 5.23 illustrates the concept of the tri-stimulus idea. The tri-stimulus theory<br />

is attractive since it explains that all colors can be made from only 3 basic colors. If one were to<br />

create all spectral colors from red, green, blue, our cones in the eye would have to respond at the<br />

levels shown in Slide 5.13. The problem exists that one would have to allow <strong>for</strong> negative values<br />

in red, which is not feasible. So those colors cannot be created. Such colors are being falsified by<br />

too much red.<br />

The physics of color is explained in Slide 5.25.<br />

White light from the sun is falling onto an optical prism, breaking up the white light into the<br />

rainbow colors from ultraviolet via blue, green, yellow, orange to red <strong>and</strong> on to infrared. These<br />

are the spectral colors first scientifically explained by Sir Isaac Newton in 1666. We all recall from<br />

elementary physics that the electromagnetic spectrum is ordered by wavelength or frequency <strong>and</strong><br />

goes from cosmic rays via gamma rays <strong>and</strong> X rays to ultraviolet, then on to the visible light, from<br />

there to near infrared, far infrared, microwaves, television <strong>and</strong> radio frequencies. Wavelengths of<br />

visible light range between 0.35 µm to 0.7 µm. Ultraviolet has shorter wavelengths in the range<br />

of 0.3 µm, infrared goes from 0.7 to perhaps several 300 µm.<br />

We would like to create color independent of natural light. We have two major ways of doing this.<br />

One is based on light, the other on pigments. We can take primary colors of light <strong>and</strong> mix them<br />

up.<br />

These primary colors would be green, blue <strong>and</strong> red, spectrally clean colors. As we mix equal<br />

portions of those three, we produce a white color. If we mix just two of them each we get yellow,<br />

cyan <strong>and</strong> magenta.


5.3. TRI-STIMULUS THEORY, COLOR DEFINITIONS, CIE-MODEL 97<br />

In contrast to additive mixing of light there exist subtractive primaries of pigments. If we want to<br />

print something we have colors to mix. Primary colors in that case are magenta, yellow <strong>and</strong> cyan.<br />

As we mix equal parts we get black. If we mix pairs of them, we get red, green <strong>and</strong> blue. We call<br />

yellow, magenta <strong>and</strong> cyan primary colors, green, red <strong>and</strong> blue secondary colors of pigment. To<br />

differentiate between subtractive <strong>and</strong> additive primaries, we talk about pigments <strong>and</strong> light. An<br />

important difference between additive <strong>and</strong> subtractive colors is the manner in which they are being<br />

generated. A pigment absorbs a primary color of light <strong>and</strong> reflects the other two. Naturally then,<br />

if blue <strong>and</strong> green get reflected but red is absorbed, that pigment appears cyan, <strong>and</strong> represents<br />

the primary pigment “cyan”. The primary colors of light are perceived by the eye’s cones on the<br />

retina as red, green <strong>and</strong> blue, <strong>and</strong> combinations are perceived as secondary colors.<br />

The Commission Internationale of Éclairage (CIE) has been responsible <strong>for</strong> an entire world of<br />

st<strong>and</strong>ards <strong>and</strong> definitions. As early as 1931, CIE confirmed the spectral wavelenghts <strong>for</strong> red with<br />

100 nm, green with 546.1 nm <strong>and</strong> blue with 435.8 nm.<br />

So far we have not yet been concerned about the dimensions of the color issue. But Munsell<br />

defined concepts such as hue 1 , intensity (value or lightness), <strong>and</strong> saturation or chroma 2 . We<br />

can build from such concepts a three dimensional space <strong>and</strong> define chromaticity, thus color, as a<br />

2-dimensional subspace.<br />

The necessity of coping with negative color as one builds spectral colors from RGB has led the<br />

Commission Internationale l’Éclairage (CIE) to define 3 primary colors X, Y <strong>and</strong> Z. CIE defined<br />

their values to <strong>for</strong>m the spectral colors as shown in Slide 5.27<br />

The Y -curve was chosen to be identical to the luminous efficiency function of the eye.<br />

The auxiliary values X, Y <strong>and</strong> Z are denoted as tri-stimulus values, defining tri-chromatic coefficients<br />

x, y, z as follows:<br />

x =<br />

y =<br />

z =<br />

X<br />

X + Y + Z<br />

Y<br />

X + Y + Z<br />

Z<br />

X + Y + Z<br />

<strong>and</strong> x + y + z = 1.<br />

A 3-dimensional space is defined by X, Y, Z <strong>and</strong> by x, y, z. X, Y, Z are the amounts of red,<br />

green, <strong>and</strong> blue to obtain a specific color; whereas x, y, z are normalized tri-chromatic coefficients.<br />

One way of specifying color with the help of the tri-chromatic coefficients is by means of a CIE<br />

chromaticity diagram.<br />

A two dimensional space is defined by the plane x + y + z = 1 with an x-<strong>and</strong> a y-axis, whereby<br />

the values along the x-axis represent red, <strong>and</strong> y is green. The values vary between 0 <strong>and</strong> 1. The<br />

z-value (blue) results from z = 1 − x − y.<br />

There are several observations to be made about the CIE chromaticity diagram:<br />

1. A point is marked as “green”, <strong>and</strong> is composed of 62% green, 25% red <strong>and</strong> from z = 1−x−y,<br />

13% blue.<br />

2. Pure spectral colors from a prism or rainbow are found along the edge of the diagram, with<br />

their wavelength in nm.<br />

1 in German: Farbton<br />

2 in German: Sättigung


98 CHAPTER 5. COLOR<br />

3. Any point inside the tongue-shaped area represents a color that cannot only be composed<br />

from x, y <strong>and</strong> z, but also from the spectral colors along the edge of the tongue.<br />

4. There is a point marked that has 33% of x, 33% of y <strong>and</strong> 33% of z <strong>and</strong> in the CIE-value <strong>for</strong><br />

white light.<br />

5. Any point along the boundary of the chromaticity chart represents a saturated color.<br />

6. As a point is defined away from the boundary of the diagram we have a desaturated color<br />

by adding more white light. Saturation at the point of equal energy is 0.<br />

7. A straight line connecting any 2 colors defines all the colors that can be mixed addditively<br />

from the end points.<br />

8. From the white point to the edge of the diagram, one obtains all the shades of a particular<br />

spectral color.<br />

9. Any three colors I, J, K define all other colors that can be mixed from them, by looking at<br />

the triangle by I, J, K.<br />

Definition 13 Conversion from CIE to RGB<br />

To device-specifically trans<strong>for</strong>m between different monitor RGB-spaces we can use trans<strong>for</strong>mations<br />

from a particular RGB monitor -space to CIE XYZ-space.<br />

The general trans<strong>for</strong>mation can be written as:<br />

X = X r · R m + X g · G m + X b · B m<br />

Y = Y r · R m + Y g · G m + Y b · B m<br />

Z = Z r · R m + Z g · G m + Z b · B m<br />

Under the assumption that equal RGB voltages (1,1,1) should lead to the colour white <strong>and</strong> specifying<br />

chromaticity coordinates <strong>for</strong> a monitor consisting of long-persistence phosphors like this:<br />

x y<br />

red 0.620 0.330<br />

we have <strong>for</strong> example:<br />

green 0.210 0.685<br />

blue 0.150 0.063<br />

The inverse trans<strong>for</strong>mation is:<br />

X = 0.584 · R m + 0.188 · G m + 0.179 · B m<br />

Y = 0.311 · R m + 0.614 · G m + 0.075 · B m<br />

Z = 0.047 · R m + 0.103 · G m + 0.939 · B m<br />

R m = 2.043 · X − 0.568 · Y − 0.344 · Z<br />

G m = −1.036 · X + 1.939 · Y + 0.043 · Z<br />

B m = 0.011 · X − 0.184 · Y + 1.078 · Z<br />

Prüfungsfragen:<br />

• Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer<br />

Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften<br />

sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen,<br />

und welche den Schnittpunkten der Geraden durch A, B mit dem R<strong>and</strong> des CIE-Farbraumes?


5.4. COLOR REPRESENTATION ON MONITORS AND FILMS 99<br />

• Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt<br />

werden? Begründen Sie Ihre Antwort anh<strong>and</strong> einer Skizze!<br />

5.4 Color Representation on Monitors <strong>and</strong> Films<br />

The CIE chromaticity diagram describes more colors than the subset that is displayable on film<br />

on a monitor, or on a printer.<br />

The subset of colors that may be displayable on a medium can be represented from its primary<br />

colors in an additive system. A monitor uses the RGB model. In order <strong>for</strong> the same color to<br />

appear on a printer that was perceived on a monitor, <strong>and</strong> that might come from scanning color<br />

film, the proper mix of that color from the triangles can be assessed via the CIE chromaticity<br />

diagram.<br />

Prüfungsfragen:<br />

• Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit<br />

der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz?<br />

5.5 The 3-Dimensional Models<br />

The tri-stimulus values x, y, z define a 3D space as shown in Slide 5.33 with the plane x+y +z = 1<br />

marked. If a color monitor builds its colors from 3 primaries RGB, then it will be able to display<br />

a subset of the CIE-colors.<br />

The xyz-space is shown in Slide 5.35 in 3 views.<br />

We extend our model to a three dimensional coordinate system with the red, green <strong>and</strong> blue color<br />

axes, the origin at black, a diagonal extending away from the origin under 45 degrees with each<br />

axis giving us gray values until we hit the white point. The red-blue plane defines the magenta<br />

color, the red-green plane defines yellow <strong>and</strong> the green-blue plane defines cyan. That resulting<br />

color model is shown in Slide 5.36 <strong>and</strong> is illustrated in Slide 5.37.<br />

The RGB values range between 0 <strong>and</strong> 1. The RGB model is the basis of remote sensing <strong>and</strong><br />

displaying color images on various media such as monitors.<br />

How does one modify the histogram of an RGB-image? Clearly changing the intensity of each<br />

component image separately will change a resulting color. This needs to be avoided. We will<br />

discuss other color models that will help here.<br />

Prüfungsfragen:<br />

• Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie<br />

mindestens drei Beispiele davon!<br />

5.6 CMY-Model<br />

Prüfungsfragen:<br />

• Gegeben sei ein Farbwert C RGB = (0.8, 0.5, 0.1) T im RGB-Farbmodell.


100 CHAPTER 5. COLOR<br />

Definition 14 CMY color model<br />

CMY st<strong>and</strong>s <strong>for</strong>:<br />

C . . . Cyan<br />

M . . . Magenta<br />

Y . . . Yellow<br />

The three dimensional geometric representation of the CMY-Model can be done in the same way<br />

as the RGB-Model representation i.e a cube.<br />

In contrast to the RGB-Model the CMY-Model uses the principle of subtractive colors.<br />

Subtractive colors are seen when pigments in an object absorb certain wavelengths of white light<br />

while reflecting the rest.<br />

We see examples of this all around us. Any colored object, whether natural or man-made, absorbs<br />

some wavelengths of light <strong>and</strong> reflects or transmits others; the wavelengths left in the reflected/transmitted<br />

light make up the color we see.<br />

Some examples:<br />

• White light falling onto a cyan pigment will be reflected as a mix of blue <strong>and</strong> green since<br />

red will get absorbed.<br />

• White light falling onto a magenta pigment will be reflected as a mix of red <strong>and</strong> blue since<br />

green will get absorbed.<br />

• White light falling onto a yellow pigment will be reflected as a mix of red <strong>and</strong> green since<br />

blue will get absorbed.<br />

There<strong>for</strong>e the conversion of RGB to CMY is supported by the physics of light <strong>and</strong> pigments. This<br />

leads to the following conversion-<strong>for</strong>mulas:<br />

C = 1 − R<br />

M = 1 − G<br />

Y = 1 − B<br />

R = 1 − C<br />

G = 1 − M<br />

B = 1 − Y<br />

The CMY-Model is not used on monitors but in printing.


5.7. USING CMYK 101<br />

1. Welche Spektralfarbe entspricht am ehesten dem durch C RGB definierten Farbton?<br />

2. Finden Sie die entsprechende Repräsentation von C RGB im CMY- und im CMYK-<br />

Farbmodell!<br />

Antwort:<br />

C CMY = (1, 1, 1) T − C RGB = (0.2, 0.5, 0.9) T<br />

K = min(C, M, Y ) = 0.2<br />

C CMYK = (0, 0.3, 0.7, 0.2) T<br />

Der gegebene Farbton entspricht etwa orange.<br />

5.7 Using CMYK<br />

Definition 15 CMYK color model<br />

CMYK is a scheme <strong>for</strong> combining primary pigments. The C st<strong>and</strong>s <strong>for</strong> cyan (aqua), M st<strong>and</strong>s <strong>for</strong><br />

magenta (pink), Y is yellow, <strong>and</strong> K st<strong>and</strong>s <strong>for</strong> black. The CMYK pigment model works like an<br />

”upside-down” version of the RGB (red, green, <strong>and</strong> blue) color model. The RGB scheme is used<br />

mainly <strong>for</strong> computer displays, while the CMYK model is used <strong>for</strong> printed color illustrations (hard<br />

copy).<br />

K is being defined as the minimum of C ′ , M ′ , <strong>and</strong> Y ′ so that C is really redefined as C ′ − K, M<br />

as M ′ − K, <strong>and</strong> Y as Y ′ − K.<br />

Conversion from RGB to CMYK:<br />

C ′ = 1 − R<br />

M ′ = 1 − G<br />

Y ′ = 1 − B<br />

K = min(C ′ , M ′ , Y ′ )<br />

C = C ′ − K<br />

M = M ′ − K<br />

Y = Y ′ − K<br />

Defining K (black) from CMY is called undercolor removal. Images become darker than they<br />

would be as CMY-alone, <strong>and</strong> there is less need <strong>for</strong> expensive printing colors CMY, which also<br />

need time to dry on paper.<br />

Prüfungsfragen:<br />

• Entsprechend welcher Formel wird eine CMYK-Farbdarstellung in eine RGB-Darstellung<br />

übergeführt?<br />

• Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in<br />

das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird<br />

das CMYK-Modell verwendet?<br />

• Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit<br />

der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz?<br />

• Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 0% magenta, 50% gelb und 30% schwarz<br />

gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den<br />

Farbton in Worten!


102 CHAPTER 5. COLOR<br />

Antwort: es ist<br />

C CMYK = (0.7, 0.0, 0.5, 0.3) T<br />

C CMY = (1, 0.3, 0.8) T<br />

C RGB = (0, 0.7, 0.2) T<br />

Die Farbe entspricht einem leicht bläulichen Grünton.<br />

5.8 HSI-Model<br />

The hue-saturation-intensity color model derives from a trans<strong>for</strong>mation of the RGB color space<br />

that is rather complicated. The HSI-model is useful when analyzing images where color <strong>and</strong><br />

intensity is important by itself. Also one may do an improvement of the image in its HSI-version,<br />

not the natural RGB-representation.<br />

Slide 5.44 introduces the transition from RGB to HSI. A color located at P in the RGB triangle<br />

has its hue H described by the angle with respect to the red axis. Saturation S is the distance<br />

from the white point, thus from the point of equal RGB at the center of the triangle.<br />

Intensity is not within the triangle of Slide 5.44, but is perpendicular to the triangle in plane, Slide<br />

5.45 explains. The HSI-model is thus a pyramid - like shape. It is visualized in Slide 5.46.<br />

Conversion of RGB to HSI has been explained in concept, but it is based on one elaborate algorithm.<br />

The easiest element is intensity I which simply is I = 1/3(R + B + G). We do not detail<br />

H <strong>and</strong> S, nor do we address the inverse conversion from HSI to RGB.<br />

5.9 YIQ-Model<br />

Prüfungsfragen:<br />

• Zum YIQ-Farbmodell:<br />

1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell?<br />

2. Wo wird das YIQ-Farbmodell eingesetzt?<br />

• Ein Farbwert C RGB = (R, G, B) T im RGB-Farbmodell wird in den entsprechenden Wert<br />

C YIQ = (Y, I, Q) T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet:<br />

⎛<br />

0.299 0.587 0.114<br />

C YIQ = ⎝ 0.596 −0.275 −0.321<br />

⎞<br />

⎠ · C RGB<br />

0.212 −0.528 0.311<br />

Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix <strong>aus</strong>gedrückt? (Hinweis:<br />

Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem<br />

Zusammenhang die Y-Komponente hat.)<br />

5.10 HSV <strong>and</strong> HLS -Models<br />

Variations on the HSI-Models are available. The HSV model (Hue-Saturation-Value) is also called<br />

the HSB model with B <strong>for</strong> brightness. This responds to the intuition of an artist, who thinks


5.10. HSV AND HLS -MODELS 103<br />

Definition 16 YIQ color model<br />

This model is used in U.S. TV broadcasting.<br />

well-known matrix M.<br />

The RGB to YIQ trans<strong>for</strong>mation is based on a<br />

M =<br />

⎛<br />

⎝<br />

0.299 0.587 0.114<br />

0.596 −0.275 −0.321<br />

0.212 −0.523 0.311<br />

⎞<br />

⎠<br />

The Y -component is all one needs <strong>for</strong> black & white TV. Y has the highest b<strong>and</strong>width, I <strong>and</strong> Q<br />

get less. Transmission of I, Q are separate from Y , where I, Q are encoded in a complex signal.<br />

RGB to YIQ Conversion:<br />

YIQ to RGB Conversion:<br />

Y = 0.299 · R + 0.587 · G + 0.114 · B<br />

I = 0.596 · R − 0.275 · G − 0.321 · B<br />

Q = 0.212 · R − 0.523 · G + 0.311 · B<br />

R = 1 · Y + 0.956 · I + 0.621 · Q<br />

G = 1 · Y − 0.272 · I − 0.647 · Q<br />

B = 1 · Y − 1.105 · I + 1.702 · Q<br />

Again, simple image processing such as histogram changes can take place with only Y . Color does<br />

not get affected since that is encoded in I, Q.<br />

in terms of tint, shade <strong>and</strong> tone. We introduce a cylindrical coordinate system, <strong>and</strong> the model<br />

defines a hexagon.<br />

In the coordinates with Slide 5.44. The hue is again measured as an angle around the vertical<br />

axis, in this case with intervals of 120 degrees going from one primary color to the next (Red<br />

at 0 degrees, green at 320. Blue at 240 <strong>and</strong> the intermediate degrees are then yellow, cyan <strong>and</strong><br />

magenta). The value of saturation S is a ratio going from 0 at the center of the pyramid to one<br />

at the side of the hexagon. The values <strong>for</strong> V are varying between 0 <strong>for</strong> black <strong>and</strong> one <strong>for</strong> white.<br />

Note that the top of the hex? can be obtained by looking at the RGB cube along the diagonal<br />

axis from white to black. This is illustrated in Slide 5.45. This also provides the basic idea of<br />

converting an RGB input into an HSV color model.<br />

The HSL (hue-lightness-saturation) model of color is defined by a double hex-cone shown in Slide<br />

5.49. The HLS model is essentially obtained as a de<strong>for</strong>mation of the HSV model by pulling up<br />

from the center of the base of the hex-cone (the V = 1 plane). There<strong>for</strong>e a trans<strong>for</strong>mation of an<br />

RGB into an HLS color model is similar to the RGB to HSV trans<strong>for</strong>mation.<br />

The HSV color space is visualized in Slide 5.52. Similarly Slide 5.53 illustrates an entire range of<br />

color models in the <strong>for</strong>m of cones <strong>and</strong> hex-cones.<br />

Prüfungsfragen:<br />

• Gegeben sei ein Farbwert C RGB = (0.8, 0.4, 0.2) T im RGB-Farbmodell. Schätzen Sie grafisch<br />

die Lage des Farbwertes C HSV in Abbildung B.32 (also die Entsprechung von C RGB im HSV-<br />

Modell). Skizzieren Sie ebenso die Lage eines Farbwertes C HSV ′ , der den gleichen Farbton<br />

und die gleiche Helligkeit aufweist wie C HSV , jedoch nur die halbe Farbsättigung!


104 CHAPTER 5. COLOR<br />

Algorithm 11 Conversion from RGB to HSI<br />

1: Input, real R, G, B, the RGB color coordinates to be converted.<br />

2: Output, real H, S, I, the corresponding HSI color coordinates.<br />

3: float Z, n, Hf, Sf, delta<br />

4: Z = ((R-G)+(R-B))*0.5<br />

5: n = sqrt((R-G)*(R-G)+(R-B)*(G-B))<br />

6:<br />

7: if n! = 0 then<br />

8: delta=acos(Z/n)<br />

9: else<br />

10: delta=0.0<br />

11: end if<br />

12:<br />

13: if B


5.10. HSV AND HLS -MODELS 105<br />

Algorithm 12 Conversion from HSI to RGB<br />

1: Input, real H, S, I, the HSI color coordinates to be converted.<br />

2: Output, real R, G, B, the corresponding RGB color coordinates.<br />

3: float H, S, I<br />

4: float rt3, R, G, B, hue<br />

5:<br />

6: if S = 0 then<br />

7: R=I, G=I, B=I<br />

8: else<br />

9: rt3=1/sqrt(3.0)<br />

10: end if<br />

11:<br />

12: if 0.0


106 CHAPTER 5. COLOR<br />

Algorithm 13 Conversion from GRB to HSV<br />

1: Input, real R, G, B, the RGB color coordinates to be converted.<br />

2: Output, real H, S, V, the corresponding HSV color coordinates.<br />

3: real B, bc, G, gc, H, R, rc, rgbmax, rgbmin, rmodp, S, V<br />

4: rgbmax = max ( R, G, B )<br />

5: rgbmin = min ( R, G, B )<br />

6: V = rgbmax<br />

7:<br />

8: Compute the saturation.<br />

9: if rgbmax/ = 0.0 then<br />

10: S = ( rgbmax - rgbmin ) / rgbmax<br />

11: else<br />

12: S = 0.0<br />

13: end if<br />

14:<br />

15: Compute the hue.<br />

16: if S = 0.0 then<br />

17: H = 0.0<br />

18: else<br />

19: rc = ( rgbmax - R ) / ( rgbmax - rgbmin )<br />

20: gc = ( rgbmax - G ) / ( rgbmax - rgbmin )<br />

21: bc = ( rgbmax - B ) / ( rgbmax - rgbmin )<br />

22: if R = rgbmax then<br />

23: H = bc - gc<br />

24: else<br />

25: if G = rgbmax then<br />

26: H = 2.0 + rc - bc<br />

27: else<br />

28: H = 4.0 + gc - rc<br />

29: end if<br />

30: H = H * 60.0<br />

31: Make sure H lies between 0 <strong>and</strong> 360.0<br />

32: H = rmodp ( H, 360.0 )<br />

33: end if<br />

34: end if


5.10. HSV AND HLS -MODELS 107<br />

Algorithm 14 Conversion from HSV to RGB<br />

1: Input, real H, S, V, the HSV color coordinates to be converted.<br />

2: Output, real R, G, B, the corresponding RGB color coordinates.<br />

3: real B, f, G, H, hue, i, p, q, R, rmodp, S, t, V<br />

4:<br />

5: if s = 0.0 then<br />

6: R = V, G = V, B = V<br />

7: else<br />

8: Make sure HUE lies between 0 <strong>and</strong> 360.0<br />

9: hue = rmodp ( H, 360.0 )<br />

10: hue = hue / 60.0<br />

11: i = int ( hue )<br />

12: f = hue - real ( i )<br />

13: p = V * ( 1.0 - S )<br />

14: q = V * ( 1.0 - S * f )<br />

15: t = V * ( 1.0 - S + S * f )<br />

16: end if<br />

17:<br />

18: if i = 0 then<br />

19: R = V, G = t, B = p<br />

20: else<br />

21: if i = 1 then<br />

22: R = q, G = V, B = p<br />

23: else<br />

24: if i = 2 then<br />

25: R = p, G = V, B = t<br />

26: else<br />

27: if i = 3 then<br />

28: R = p, G = q, B = V<br />

29: else<br />

30: if i = 4 then<br />

31: R = t, G = p, B = V<br />

32: else<br />

33: if i = 5 then<br />

34: R = V, G = p, B = q<br />

35: end if<br />

36: end if<br />

37: end if<br />

38: end if<br />

39: end if<br />

40: end if


108 CHAPTER 5. COLOR<br />

Algorithm 15 Conversion from RGB to HLS<br />

1: Input, real R, G, B, the RGB color coordinates to be converted.<br />

2: Output, real H, L, S, the corresponding HLS color coordinates.<br />

3: real B, bc, G, gc, H, L, R, rc, rgbmax, rgbmin, rmodp, S<br />

4:<br />

5: Compute lightness.<br />

6: rgbmax = max ( R, G, B )<br />

7: rgbmin = min ( R, G, B )<br />

8: L = ( rgbmax + rgbmin ) / 2.0<br />

9:<br />

10: Compute saturation.<br />

11: if rgbmax = rgbmin then<br />

12: S = 0.0<br />

13: else<br />

14: if L


5.10. HSV AND HLS -MODELS 109<br />

Algorithm 16 Conversion from HLS to RGB<br />

1: Input, real H, L, S, the HLS color coordinates to be converted.<br />

2: Output, real R, G, B, the corresponding RGB color coordinates.<br />

3: real B, G, H, hlsvalue, L, m1, m2, R, S<br />

4:<br />

5: if L


110 CHAPTER 5. COLOR<br />

grün<br />

gelb<br />

cyan<br />

weiß<br />

¡¤£¦¥ ¨§<br />

rot<br />

¢¡¤£¦¥<br />

blau<br />

magenta<br />

Figure 5.2: eine Ebene im HSV-Farbmodell<br />

Antwort: Es gilt (siehe Abbildung 5.2):<br />

C HSV = (20 ◦ , 75%, 0.8)<br />

C ′ HSV = (20 ◦ , 37.5%, 0.8)<br />

C ′ RGB = (0.8, 0.6, 0.5)<br />

Halbierung der Sättigung im HSV-Modell bedeutet Halbierung der Entfernung vom Zentrum.<br />

Die Komponenten des entsprechenden Punktes im RGB-Modell liegen näher bein<strong>and</strong>er, die<br />

Ordnung bleibt aber erhalten.<br />

• Welche Farbe liegt ”<br />

in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb<br />

und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser<br />

geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau?<br />

5.11 Image Processing with RGB versus HSI Color Models<br />

An RGB color test pattern is shown in Slide 5.51. This test pattern is being used to calibrate<br />

printers, monitors, scanners, image color through a production system that is based on color.<br />

This particular test pattern is a digital <strong>and</strong> offers 8 bits of red, green <strong>and</strong> blue. This pattern is<br />

symmetric from top to bottom, consisting of one black b<strong>and</strong> on top, b<strong>and</strong>s two, three <strong>and</strong> four are<br />

the primary colors. For the RGB model 5, 6, 7 are the secondary colors, b<strong>and</strong> 8 should be white,<br />

b<strong>and</strong> 9 then is a continuous variation from blue to red. B<strong>and</strong> 9 is a gray wedge.<br />

The manner in which the b<strong>and</strong> of rainbow colors is shown in Slide 5.52 obtained is by continuously<br />

varying from left to right the intensity of blue through values of 1 to 0, of red from 0 to full<br />

intensity, <strong>and</strong> then green goes from 0 to full <strong>and</strong> back to 0 across the b<strong>and</strong>. Using the process we<br />

have conceptually hinted at in the HSI-model converts this RGB image into an HSI image. The<br />

easy part is the computation of the intensity I, the complex process is the computation of hue<br />

<strong>and</strong> saturation. In Slide 5.46, we are looking at the same pattern in terms of hue: we see that<br />

we have lost all sense of color <strong>and</strong> essentially have a bright image on the left <strong>and</strong> a dark image<br />

on the right in the color b<strong>and</strong>. Looking at the saturation the variation in the various colors has<br />

also disappeared <strong>and</strong> the variation of saturation going in the color b<strong>and</strong> from left to center to<br />

right. Most of the in<strong>for</strong>mation is in the intensity b<strong>and</strong> although some differences in colors have<br />

disappeared here.


5.12. SETTING COLORS 111<br />

The advantage of the HSI model is that we can optimize an image by just optimizing the intensity<br />

segment of the HSI presentation. It is not uncommon that one goes from the RGB into the<br />

HSI color model, modifies the intensity b<strong>and</strong> only <strong>and</strong> then does the trans<strong>for</strong>mation back into<br />

RGB. This typically will apply <strong>for</strong> histogram modifications of color images. As stated earlier<br />

this optimization will preserve the color <strong>and</strong> saturation <strong>and</strong> it will only change the contrast as<br />

we perceive it through the intensity of the image. Doing the optimization on each color b<strong>and</strong><br />

separately will give us unpredictable color results.<br />

Slide 5.53 illustrates the approach by means of an underexposed RGB original of a Kakadu bird.<br />

The result obtained by an HSI trans<strong>for</strong>mation <strong>and</strong> histogram equalization of just the intensity<br />

b<strong>and</strong> produces the result shown next. We do have a much improved <strong>and</strong> satisfactory image.<br />

A similar ideology is used when one creates color products from multiple input sources: an example<br />

might be a high resolution black <strong>and</strong> white satellite image at one meter pixel size that is being<br />

combined with a lower resolution color image in RGB at 4 meter resolution. A process to combine<br />

those two image sources takes the RGB low resolution image <strong>and</strong> converts it into an HSI-model.<br />

The I component is then removed <strong>and</strong> <strong>for</strong> it one inserts the higher resolution black <strong>and</strong> white<br />

satellite image. The result is trans<strong>for</strong>med back into RGB space. The entire operation requires of<br />

course that all images have the same pixel size <strong>and</strong> are a perfect geometric match.<br />

5.12 Setting Colors<br />

We have now found that a great number of different color models exist that allow us to define<br />

colors in various ways. Slide 5.60 is a pictorial summary of the various color models. The models<br />

shown are those that are common in the image processing <strong>and</strong> technical arena. The most popular<br />

color model in the very large printing <strong>and</strong> graphic arts industry is not shown here, <strong>and</strong> that is<br />

the CMYK model. Setting a color on a monitor or printer requires that the color be selected on<br />

a model that the output device uses.<br />

Let us assume that we choose the red, green, blue model <strong>for</strong> presentation of an image on a color<br />

monitor. It is customary to change the red, green <strong>and</strong> blue channels in order to obtain a desired<br />

output color. Inversely, an output color could be selected <strong>and</strong> the RGB components from which<br />

that output color is created are being set automatically.<br />

If we were to choose the HSV color model we would create a sample color by selection of an angle<br />

<strong>for</strong> the hue we would shift saturation on a slider between 0 <strong>and</strong> 1, we would set the value also<br />

between 0 <strong>and</strong> 1 <strong>and</strong> in the process obtain the result in color. Inversely, a chosen color could be<br />

converted into its HSV components.<br />

Finally, the HSI <strong>and</strong> RGB models can be looked at simultaneously: as we change the HSI values,<br />

the system instantaneously computes the RGB output <strong>and</strong> vice versa. In the process, the<br />

corresponding colors are being shown as illustrated in Slide 5.63.<br />

Optical illusions are possible in comparing colors: Slide 5.64 shows the same color appearing<br />

differently when embedded in various backgrounds.<br />

5.13 Encoding in Color<br />

This is the topic of pseudo-color in image processing where we assign color to gray values in order<br />

to highlight certain phenomena <strong>and</strong> make them more easily visible to a human observer. Slide 5.67<br />

illustrates a medical X-ray image, initially in a monochrome representation. The gray values can<br />

be “sliced” into 8 different gray value regions which then can be encoded in color. The concept of<br />

this segmentation into gray value regions is denoted as intensity slicing, sometimes density slicing<br />

<strong>and</strong> is illustrated in Slide 5.66. The medical image may be represented by Slide 5.66 where the


112 CHAPTER 5. COLOR<br />

gray values are encoded as f(x, y). A plane is defined that intersects the gray values at a certain<br />

level li, one can assign now all pixels with a value greater than li to one color. All pixels below<br />

the slicing plane can be assigned to another color, <strong>and</strong> by moving the slicing plane we can see<br />

very clearly on a monitor which pixels are higher <strong>and</strong> lower than the slicing plane. This becomes<br />

a much more easily interpretable situation than one in which we see the original gray values only.<br />

Another matter of assigning color to a black <strong>and</strong> white image is illustrated in Slide 5.68. The<br />

idea is to take an input value f(x, y) <strong>and</strong> apply it, three different trans<strong>for</strong>mations, one into a red<br />

image, one into a green image <strong>and</strong> one into a blue image, so that the to three different images<br />

are assigned to the red, green <strong>and</strong> blue guns of a monitor. A variety of trans<strong>for</strong>mations would<br />

be available to obtain from a black <strong>and</strong> white image the colorful output. Of course, the matter<br />

of Slide 5.68 is nothing but a more general version of the specialized slicing plane applied in the<br />

previous Slide 5.66.<br />

Slide 5.69 illustrates the general trans<strong>for</strong>mation from a gray level to color with the example of an<br />

X-ray image obtained from a luggage checking system at an airport. We can see in the example<br />

how various color trans<strong>for</strong>mations enhance a luggage with <strong>and</strong> without explosives such that the<br />

casual observer might notice the explosive in the luggage very quickly. We skip the discussion of<br />

the details of a very complex color trans<strong>for</strong>mation but refer to [GW92, chapter 4.6].<br />

Prüfungsfragen:<br />

• Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild<br />

(pseudo color image)? Nennen Sie je einen typischen Anwendungsfall!<br />

5.14 Negative Photography<br />

The negative black <strong>and</strong> white photograph of Slide 5.71 is usually converted to positive by inverting<br />

the gray values as in Slide 5.72. This is demonstrated in Algorithm ??. Well we take a trans<strong>for</strong>mation<br />

that simply inverts the values 0 to 255 into 255 to 0. This trivial approach will not<br />

work with color photography. As shown in Slide 5.73 color negatives typically are masked with<br />

a protective layer that has a brown-reddish color. If one were to take an RGB scan of that color<br />

negative <strong>and</strong> convert it into a positive by inverting the red, green <strong>and</strong> blue components directly<br />

one would obtain a fairly unattractive result as shown in Slide 5.74. One has first to eliminate the<br />

protective layer, that means one has to go to the edge of the photograph <strong>and</strong> find an area that is<br />

not part of the image to determine the RGB components that represent that protective layer <strong>and</strong><br />

then we have to subtract the R component from all pixel R-values, similarly in the B component<br />

<strong>and</strong> in green G. As a result we obtain a clean negative as shown in Slide 5.75. If we now convert<br />

that slide we obtain a good color positive as shown in Slide 5.76. Again, one calls this type of<br />

negative a masked negative (compare Algorithm 18). There have been in the past developments<br />

of color negative film that is not masked. However, that film is <strong>for</strong> special purposes only <strong>and</strong> is<br />

not usually available.<br />

Algorithm 18 Masked negative of a color image<br />

1: locate a pixel p which color is known in all planes {e.g. the black film border}<br />

2: <strong>for</strong> all planes plane do<br />

3: diff = grayvalue(p, plane) - known grayvalue(p, plane) {calculate the “masking layer”}<br />

4: <strong>for</strong> all pixel picture do<br />

5: grayvalue(pixel,plane) = grayvalue(pixel, plane) - diff {correct the color}<br />

6: Invert(pixel) {invert the corrected negative pixel to get the positive}<br />

7: end <strong>for</strong><br />

8: end <strong>for</strong>


5.15. PRINTING IN COLOR 113<br />

Prüfungsfragen:<br />

• Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig,<br />

um dar<strong>aus</strong> mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen<br />

Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer<br />

als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund<br />

des Positivbildes an!<br />

5.15 Printing in Color<br />

As we observe advertisement spaces with their posters, we see colorful photographs <strong>and</strong> drawings<br />

which, when we inspect them from a short distance, are really the sum of four separate screened<br />

images.<br />

We have said earlier that <strong>for</strong> printing the continuous tone images are being converted into half<br />

tones <strong>and</strong> we also specified in a digital environment that each pixel is further decomposed by a<br />

dithering matrix into subpixels.<br />

When printing a color originally, one typically uses the four color approach <strong>and</strong> bases this on<br />

the cyan, magenta, yellow <strong>and</strong> black pigments which are the primary colors from which the color<br />

images are being produced. Each of these separates of the four components is screened <strong>and</strong> the<br />

screen has an angle with respect to the horizontal or vertical. However, in order to avoid a Moiree<br />

effect, by interference of the different screens with one another, the screens themselves are slightly<br />

rotated with respect to one another. This type of printing is used in the traditional offset printing<br />

industry.<br />

If printing is then directly from a computer onto a plotter paper, then the dithering approach is<br />

used instead. If we look at a poster that is directly printed with a digital output device <strong>and</strong> not<br />

via an offset press, we can see how the dithering matrix is responsible <strong>for</strong> each of the dots on the<br />

poster. Again each dot is encoded by one of the four basic pigment colors, cyan, magenta yellow<br />

or black.<br />

Prüfungsfragen:<br />

• Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird<br />

verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert?<br />

Antwort: Vier separate Bilder (je eines für die Komponenten Cyan, Magenta, Yellow und<br />

Black) werden überein<strong>and</strong>er gedruckt (CMYK-Farbmodell). Jede Ebene ist ein Halftone-<br />

Bild, wobei die Ebenen geringfügig gegenein<strong>and</strong>er rotiert sind, um den Moiree-Effekt zu<br />

verhindern.<br />

5.16 Ratio Processing of Color Images <strong>and</strong> Hyperspectral<br />

Images<br />

We start out from a color image <strong>and</strong> <strong>for</strong> simplicity we make the assumption that we only have two<br />

color b<strong>and</strong>s, R, G, so that we can explain the basic idea of ratio imaging. Suppose a satellite is<br />

imaging the terrain in those two colors. As the sun shines onto the terrain, we will have a stronger<br />

illumination on terrain slopes facing the sun than on slopes that face away from the sun. Yet,<br />

the trees may have the exact same color on both sides of the mountain. When we look now at<br />

the image of the terrain, we will see differences between the slopes facing the sun <strong>and</strong> the slopes<br />

facing away from the sun.


114 CHAPTER 5. COLOR<br />

In Slide 5.81 let’s take three particular pixels, one from the front slope, one from the back, <strong>and</strong><br />

perhaps a third pixel from a flat terrain, all showing the same type of object, namely a tree. We<br />

now enter these three pixels into a feature space that is defined by the green <strong>and</strong> red color axes.<br />

Not surprisingly, the three locations <strong>for</strong> the pixels that we have chosen are on a straight line from<br />

the origin. Clearly, the color of all three pixels is the same, but the intensity is different. We are<br />

back again with the ideology of the HSI-model.<br />

We now can create two images from the one color input image. Both of those images are black<br />

<strong>and</strong> white. In one case, we place at each pixel its ratio R/G, the angle that that vector <strong>for</strong>ms with<br />

the abscissa. In the other image we place at each pixel the distance of the pixel from the origin in<br />

the feature space: As a result, we obtain one black <strong>and</strong> white image in Slide 5.82, that is clean of<br />

color <strong>and</strong> shows us essentially the variations in intensity as a function of slope. The other image<br />

in Slide 5.83 shows us the image clean of variations of density as if it were all flat <strong>and</strong> there<strong>for</strong>e<br />

the variations of color are only shown there. Conceptually, one image is the I component of an<br />

HSI trans<strong>for</strong>mation <strong>and</strong> the other one is the H component. Such ratio images have in the past<br />

been used to take satellite images <strong>and</strong> make an estimate of the slope of the terrain, assuming that<br />

the terrain cover is fairly uni<strong>for</strong>m. That clearly is the case on glaciers, the arctic or antaractic or<br />

in heavily wooded areas.


5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 115


116 CHAPTER 5. COLOR<br />

Slide 5.1 Slide 5.2 Slide 5.3 Slide 5.4<br />

Slide 5.5 Slide 5.6 Slide 5.7 Slide 5.8<br />

Slide 5.9 Slide 5.10 Slide 5.11 Slide 5.12<br />

Slide 5.13 Slide 5.14 Slide 5.15 Slide 5.16<br />

Slide 5.17 Slide 5.18 Slide 5.19 Slide 5.20<br />

Slide 5.21 Slide 5.22 Slide 5.23 Slide 5.24<br />

Slide 5.25 Slide 5.26 Slide 5.27 Slide 5.28


5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 117<br />

Slide 5.29 Slide 5.30 Slide 5.31 Slide 5.32<br />

Slide 5.33 Slide 5.34 Slide 5.35 Slide 5.36<br />

Slide 5.37 Slide 5.38 Slide 5.39 Slide 5.40<br />

Slide 5.41 Slide 5.42 Slide 5.43 Slide 5.44<br />

Slide 5.45 Slide 5.46 Slide 5.47 Slide 5.48<br />

Slide 5.49 Slide 5.50 Slide 5.51 Slide 5.52<br />

Slide 5.53 Slide 5.54 Slide 5.55 Slide 5.56


118 CHAPTER 5. COLOR<br />

Slide 5.57 Slide 5.58 Slide 5.59 Slide 5.60<br />

Slide 5.61 Slide 5.62 Slide 5.63 Slide 5.64<br />

Slide 5.65 Slide 5.66 Slide 5.67 Slide 5.68<br />

Slide 5.69 Slide 5.70 Slide 5.71 Slide 5.72<br />

Slide 5.73 Slide 5.74 Slide 5.75 Slide 5.76<br />

Slide 5.77 Slide 5.78 Slide 5.79 Slide 5.80<br />

Slide 5.81 Slide 5.82 Slide 5.83


5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 119<br />

Prüfungsfragen:<br />

• Was ist ein ”<br />

Ratio-Bild“?<br />

• Zu welchem Zweck würde man als Anwender ein sogenanntes ”<br />

Ratio-Bild“ herstellen? Verwenden<br />

Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes.


120 CHAPTER 5. COLOR


Chapter 6<br />

Image Quality<br />

6.1 Introduction<br />

As image quality we generally denote an objective impression of the crispness, the color, the detail,<br />

the composition of an image. Slide 6.2 is an example of an exciting image with a lot of detail,<br />

crispness <strong>and</strong> color. Slide 6.3 adds the excitement of motion <strong>and</strong> a sentiment of activity <strong>and</strong> cold.<br />

Generally in engineering we do not deal with these concepts that are more artistic <strong>and</strong> aesthetic.<br />

We deal with art definitions.<br />

6.2 Definitions<br />

In images we define quality by various components. Slide 6.5 illustrates radiometric concepts of<br />

quality that relate to density <strong>and</strong> dynamic range. Density 0 means that the light can go through<br />

the image unhindered, density 4 means that the image blocks the light. Intensity is the concept<br />

associated with the object. Greater intensity means that more light is coming from the object.<br />

The dynamic range of an image is the greatest density value divided by the least density value<br />

in the image, the darkest value divided by the brightest value. The dynamic range is typically<br />

encoded logarithmically.<br />

Prüfungsfragen:<br />

• Was versteht man unter dem ”<br />

dynamischen Bereich“ eines Mediums zur Wiedergabe bildhafter<br />

In<strong>for</strong>mationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung?<br />

Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen<br />

Bereiches!<br />

6.3 Gray Value <strong>and</strong> Gray Value Resolutions<br />

We have already described in earlier presentations the idea of resolving gray values. Chapter 3 the<br />

concept of a gray wedge <strong>and</strong> how a gray wedge gets scanned to assess the quality of a scanning<br />

process. Similarly we can assess the quality of an image by describing how many different gray<br />

values the image can contain. Slide ?? illustrates the resolution of a gray value image.<br />

Note again that in this case we talk about the gray values in an image whereas in the previous<br />

chapter we talked about the quality of the conversion of a given continuous tone image into a<br />

digital rendition in a computer in the process of scanning.<br />

121


122 CHAPTER 6. IMAGE QUALITY<br />

Resolving great radiometric detail means that we can recognize objects in the shadow, while we<br />

also can read writing on a bright roof. Resolution of the gray values in the low density bright<br />

areas does not compromise a resolution in the high density dark areas. Slide ?? is a well resolved<br />

image.<br />

Prüfungsfragen:<br />

• Was versteht man unter der Grauwerteauflösung eines digitalen Rasterbildes?<br />

Antwort: Die Anzahl der verschiedenen Grauwerte, die in dem Bild repräsentiert werden<br />

können<br />

6.4 Geometric Resolution<br />

Again, just as in the process of scanning an image, we can judge the image itself independent from<br />

its digital or analog <strong>for</strong>mat. I refer to an earlier illustration which essentially describes again by<br />

means of the US Air Force (USAF) resolution target how the quality of an image can be described<br />

by means of how well it shows small objects on the ground of in the scene. We recall that the<br />

USAF target, when photographed presents to the camera groups of line patterns <strong>and</strong> within each<br />

group elements. So in this particular case of Slide 6.10 group 6, element 1 is the element still<br />

resolved. We know from an accompanying table that that particular element in group 6 presents<br />

the resolution of 64 line pairs per mm. We can see in the lower portion of the slide that element<br />

6 in group 4 represents 28 pairs per mm.<br />

We have now in Slide 6.11 a set of numbers typical of the geometrical resolution in digital image.<br />

We have a resolution where we typically deal with of dots per inch, <strong>for</strong> example when something<br />

is printed. So a high resolution is 3000 dots per inch, a low resolution is 100 dots per inch. Note<br />

that at 3000 dots per inch, each point is about 8 micrometers, recall that 1000 dots per inch is 25<br />

micrometers per pixel. Which leads us to the second measure of geometric resolution: the size of<br />

a pixel. When we go to a computer screen, we have a third measure <strong>and</strong> we say the screen can<br />

resolve 1024 by 1024 pixels, irrespective of the size of the screen.<br />

Recall the observations about the eye <strong>and</strong> the fovea. We said that we had about 150 000 cone<br />

elements per mm on the fovea. So when we focus our attention on the computer monitor, those<br />

1000 by 1000 pixels would represent the resolution of about 3 by 3 mm on the retina. We may<br />

really not have any use <strong>for</strong> a screen with more resolution, bec<strong>aus</strong>e we wouldn’t be able to digest<br />

the in<strong>for</strong>mation on the screen in one glance, bec<strong>aus</strong>e it would overwhelm our retina.<br />

A next measure of resolution is the linear line pairs per mm, mentioned earlier. The 25 line pairs<br />

per mm is a good average of resolution <strong>for</strong> photography on a paper-print, <strong>and</strong> 50 line pairs per mm<br />

is a very good resolution on film. Best resolutions can be obtained with spy photography which<br />

is very slow filming needs lots of exposure times, but is capable of resolving great detail. In that<br />

case we make it in access of 75 line pairs per mm.<br />

It might be of interest to define the geometric resolution of an unaided eye, that is 3 to 8 pixels<br />

per mm at a distance of 25 cm. Again, when a human person sits in front of a monitor <strong>and</strong> starts<br />

seeing the images at a continuous pattern, <strong>and</strong> not recognizing individual pixels, at 3 pixels per<br />

mm the screen could have a dimension of 300 by 300 mm. For an eagle-eyed person at 8 pixels<br />

the same surface that the human can resolve would be about an 12 by 12 cm square.<br />

It is of interest to relate these resolutions to one another, this is shown in Slide 6.15.<br />

Film may have n line pairs per mm. This represents 2.8 × n pixels per mm (see below). If we had<br />

film of 25 pairs per mm then we would have to represent this image at 14 micrometers per pixel<br />

under this relationship. Now on a monitor of a sidelength of 250 mm with 1024 pixels, one pixel<br />

has the dimension of 0.25 mm.


6.5. GEOMETRIC ACCURACY 123<br />

We can again confirm that if we have on a monitor each pixel occupying 250 micrometers (equals<br />

0.25 mm) then we have 4 pixels in a mm, then typically the range of normal vision people perceive<br />

this as a continuous tone image, the actual range is at 125 to 300 micrometer per pixel.<br />

The Kell-factor proposed during World War II in the context of television suggests that resolving<br />

a single line pair of a black <strong>and</strong> white line by 2 pixels will be insufficient bec<strong>aus</strong>e statistically we<br />

cannot be certain that those pixels would fall directly on each dark line <strong>and</strong> on each bright line,<br />

but they fall halfway in between. If they do, the line pairs will not be resolved. There<strong>for</strong>e Kell<br />

proposed, that the proper number of pixels to resolve under all circumstances each line pair needs<br />

is 2 √ 2 the number of line pairs per mm.<br />

Prüfungsfragen:<br />

• Ein sehr hochauflösender Infrarotfilm wird mit einer geometrischen Auflösung von 70 Linienpaaren<br />

pro Millimeter angepriesen. Mit welcher maximalen Pixelgröße müsste dieser<br />

Film abgetastet werden, um jedweden In<strong>for</strong>mationsverlust gegenüber dem Filmoriginal zu<br />

vermeiden?<br />

• Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit<br />

welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine<br />

Skizze.<br />

6.5 Geometric Accuracy<br />

An image always represents a certain geometric accuracy of the object. Again we have already<br />

taken a look at the basic idea when we talked in an earlier chapter about the conversion of a given<br />

analog picture into a digital <strong>for</strong>um. Geometric accuracy of an image is described by the sensor<br />

model, a concept mentioned in the chapter on sensors. We have deviations between the geometric<br />

locations of object points in a perfect camera, from the geometric locations of our real camera.<br />

Those discrepancies can be described in a calibration procedure. Calibrating imaging systems is<br />

a big issue <strong>and</strong> has given many diploma engineers <strong>and</strong> doctors their degrees in vision. The basic<br />

idea is illustrated in Slide 6.17.<br />

Prüfungsfragen:<br />

• Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen<br />

Rasterbildes?<br />

6.6 Histograms as a Result of Point Processing or Pixel<br />

Processing<br />

The basic element of analyzing the quality of any image is a look at its histogram. Slide 6.19<br />

illustrates an input image in color, that is semidark <strong>and</strong> <strong>for</strong> which we want to build its histogram:<br />

we find many pixels in the darker range <strong>and</strong> fewer in the brighter range. We can now change this<br />

image by redistributing the histogram in a process called histogram equalization. We see however<br />

in Slide 6.20 that we have a histogram <strong>for</strong> each of the color component images, while we are only<br />

showing a composite of the colors denoted as luminosity. The summary of this manipulation is<br />

shown in Slide 6.22.<br />

A very common improvement of an image’s quality is a change of the assignment of gray values to<br />

the pixels of an image. This is based on the histogram. Let us assume that there indeed in each


124 CHAPTER 6. IMAGE QUALITY<br />

Algorithm 19 Histogram equalization<br />

1: For an N x M image of G gray-levels (often 256), create an array H of length G initialized<br />

with 0 values.<br />

2: Form the image histogram: Scan every pixel <strong>and</strong> increment the relevant member of H - if pixel<br />

p has intensity g p , per<strong>for</strong>m<br />

H[g p ] = H[g p ] + 1<br />

3: Form the cumulative image histogram H c<br />

4: Set<br />

H c [0] = H[0]<br />

H c [p] = H c [p − 1] + H[p] where p = 1, 2, ..., G − 1<br />

T [p] = round( G − 1<br />

NM H c[p])<br />

5: Rescan the image <strong>and</strong> write an output image with gray-levels g q , setting<br />

g q = T [g q ]<br />

Definition 17 Histogram stretching<br />

Stretching or spreading of an histogram is mapping the grey value of each pixel of an image or<br />

part of an image to an piecewise continuous function T (r).<br />

Normally the gradation curve T (r) is monotonous growing <strong>and</strong> assigns a small range of gray values<br />

of the input image over the entire range of available values, so that the result image looks as if it<br />

had a lot more contrast.


6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 125<br />

of the 8 bit input images exists one more pixel. We may have individual gray values, say the gray<br />

value 67 which may have 10 000 pixels, gray values 68 may have none, gray values 69 may have<br />

none, but gray value 70 may again have 7000 pixels. We can change this image by allocating. The<br />

is input gray values to new output values depending on their frequency as seen in the histogram.<br />

We aim <strong>for</strong> a histogram that is as uni<strong>for</strong>m as possible. Slide 6.23 shows a detail of the previous<br />

slide in one case with the input histogram <strong>and</strong> in a second case with the equalized histogram: we<br />

have attempted to distribute the gray values belonging to the input pixels such that the histogram<br />

is as uni<strong>for</strong>m as possible. Slide 6.24 shows how we can change the gray values from an input image<br />

B into an output image C. Geometrically we describe the operation by a 2-d diagram with the<br />

abscissa <strong>for</strong> input pixels <strong>and</strong> the ordinate <strong>for</strong> the gray values. The relationship between input <strong>and</strong><br />

output pixels is shown on the curve in the slide representing a look-up-table.<br />

Slide 6.25 illustrates again how an input image can be changed <strong>and</strong> how a certain area of the<br />

input image can be highlighted in the output image. We simply set all input pixels below a<br />

certain threshold A <strong>and</strong> above a certain threshold B to zero, <strong>and</strong> then set the intermediate range<br />

<strong>and</strong> spread that range to a specific value in the output image. Another method of highlighting is<br />

to take an input image <strong>and</strong> convert it one-on-one to an output image with the exception of a gray<br />

value range from a lower gray value A to an upper gray value B which is set into one output gray<br />

value, thereby accentuating this part of the input image.<br />

Another analysis is shown in Slide 6.26 where we represent the 8 bits of a gray value image as<br />

8 separate images <strong>and</strong> in each bit plane we see the bit that is set in the byte of the image. Bit<br />

plane 7 is most significant, bit plane zero the least significant. We obtain an in<strong>for</strong>mation about<br />

the contents of an image as shown in Slide 6.27 where we see the 8 levels of an image <strong>and</strong> note<br />

that we have a thresholded type image at level 7 <strong>and</strong> we have basically no in<strong>for</strong>mation in level 0.<br />

We see there is very low in<strong>for</strong>mation in the lower three bits of that image. In reality we may not<br />

deal with an 8 bit image, but with a 5 bit image.<br />

Histograms let us see where all pixels are aggregated. Low digital numbers represent a dark image,<br />

pixels clustered in the high digital numbers show a bright image. A narrow histogramm with a<br />

single peak is a low contrast image bec<strong>aus</strong>e does not have many different gray values. However,<br />

if an image has all of its gray values occupied with pixels <strong>and</strong> if those are equally distributed, we<br />

obtain a high contrast - high quality image.<br />

How do we change the histogram of an image <strong>and</strong> spread it or equalize it? We think of the image<br />

gray values in the input on the abscissa of a 2D-diagramm <strong>and</strong> translate them to output gray<br />

values on the ordinate. We use a curve that relates the input to the output pixels. The curve is<br />

denoted as gradation curve or t(r). Let’s take an example of an image with very low contrast as<br />

signified by a histogram that has only values in the range round 64 <strong>and</strong> 10 gray values to the left<br />

<strong>and</strong> to the right.. We now spread this histogram by a curve t(r) that takes the input values where<br />

we have many <strong>and</strong> spreads them over many values in the output image. As a result we now have<br />

pixels values spread over the entire range of available values, so that the image looks as if it had<br />

a lot more contrast. We may not really be able to change the basic shape of the histogram, but<br />

we can certainly stretch it as shown in slide ??. Equlisation is illustrated in slide ??.<br />

We may want to define a desired histogram <strong>and</strong> try to approach this histogram given an input<br />

image which may have a totally different histogram. How does this work? Slide 6.36 explains.<br />

Let’s take a thermal image of an indoor scene. We show the histogram of this input image <strong>and</strong><br />

<strong>for</strong> compansion we also illustrate the result of equalization. We would like, however, to have a<br />

histogram as shown in the center of the histogram display of the slide. We change the input<br />

histogram to approach the designed histogram as best as we possbile obtaining the third image.<br />

The resulting image permits one to see chairs in the room.<br />

Slide 6.36 summarizes that enhancement is the improvement of the image by locally processing<br />

each pixel separately from the other pixels. This may not only concern contrast but could address<br />

noise as well. A noisy input image can become worse if we improve the histogram since we may<br />

increase the noise. If we do some type of histogram equalization that is locally changing we might<br />

get an improvment of the image structure <strong>and</strong> increased ease of interpretability.


126 CHAPTER 6. IMAGE QUALITY<br />

An example is shown in slide 6.38. We have a very noisy image <strong>and</strong> then embedded in the image<br />

are 5 targets of interest. With a global histogram process we may not be able to resolve the detail<br />

within those 5 targets. We might enhance the noise that already exists in the image <strong>and</strong> still not<br />

see what is inside the targets. However, when we go through the image <strong>and</strong> we look at individual<br />

segments via a small window <strong>and</strong> we improve the image locally, moving the window from place<br />

to place with moving new parameters at each location, we might obtain the result as shown in<br />

the next component of the Slid. We find that detail within each target consists of a point <strong>and</strong> a<br />

square around that point.<br />

Algorithm 20 Local image improvement<br />

g(x,y) ... Ergebnisbild<br />

f(x,y) ... Ausgangsbild<br />

und<br />

wobei<br />

k ... Konstante<br />

M ... globaler Mittelwert<br />

σ ... St<strong>and</strong>ardabweichung der Grauwerte<br />

g(x, y) = A(x, y) ∗ {f(x, y) − m(x, y)} + m(x, y)<br />

A(x, y) = k ∗ M<br />

σ(x, y)<br />

We have taken an input image f(x, y) <strong>and</strong> created a resulting image g(x, y), by a <strong>for</strong>mula shown<br />

in slide 6.39. There is a coefficient A(x, y) involved <strong>and</strong> a mean of m(x, y). So in a window we<br />

computed mean gray value m(x, y) as an average gray value, we subtract it from each gray value<br />

in the image f, we multiply the difference by a multiplication factor A(x, y) <strong>and</strong> then add back<br />

the mean m(x, y).<br />

What is this A(x, y)? It is in itself a function of (x, y) in each window we compute the mean m(x, y)<br />

<strong>and</strong> a st<strong>and</strong>ard deviation σ(x, y) of the gray values. We also compute a global average M, separate<br />

from the average of each small window, <strong>and</strong> we have some constant k. These improvements of<br />

images according to the <strong>for</strong>mula, <strong>and</strong> similar approaches, are heavily used in medical imaging<br />

<strong>and</strong> many other areas where images are presented to the eye <strong>for</strong> interactive analysis. We are<br />

processing images here, but be<strong>for</strong>e we analyze them. There<strong>for</strong>e, we call this preprocessing. A<br />

particular example of preprocessing is shown in the medical image of slide 6.40 illustrating how a<br />

bl<strong>and</strong> image with no details reveals its detail after some local processing.<br />

Another idea is the creation of difference images, <strong>for</strong> example an X-ray image of a brain taken<br />

be<strong>for</strong>e <strong>and</strong> after some injection is given as a contrast agent. We then have an image of the brain<br />

be<strong>for</strong>e <strong>and</strong> after the contrast agent has entered into the blood stream. The two images can then<br />

be subtracted <strong>and</strong> will highlight the vessels that contain the contrast material.<br />

How else can we improve images? We can take several noisy images <strong>and</strong> average them. For<br />

example we can take a microscopic image of some cells, <strong>and</strong> a single image may be very noisy, but<br />

by repeating the image <strong>and</strong> computing the average of the gray values of each pixels we eliminate<br />

the noise <strong>and</strong> obtain a better signal. Slide 6.44 shows the effect of averaging 128 images.<br />

Prüfungsfragen:<br />

• Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses<br />

Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist,<br />

das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu<br />

segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form<br />

eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)!


6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 127<br />

50<br />

0<br />

0<br />

255<br />

Figure 6.1: Histogramm eines Graukeils<br />

• Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender<br />

Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses<br />

Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze R<strong>and</strong> in Abbildung<br />

B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst.<br />

Antwort: siehe Abbildung 6.1<br />

• Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar<br />

1 und die Spring Conference on <strong>Computer</strong> <strong>Graphics</strong> stattfinden. Durch einen<br />

automatischen Prozess wurde dar<strong>aus</strong> Abbildung B.74(b) erzeugt, wobei einige Details (z.B.<br />

die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur<br />

Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise!<br />

• Skizzieren Sie das Histogramm eines<br />

1. dunklen,<br />

2. hellen,<br />

3. kontrastarmen,<br />

4. kontrastreichen<br />

monochromen digitalen Rasterbildes!<br />

Antwort: Siehe Abbildung 6.2, man beachte, dass die Fläche unter der Kurve immer gleich<br />

groß ist.<br />

1 Für interessierte Studenten <strong>aus</strong> der Vertiefungsrichtung <strong>Computer</strong>grafik besteht die Möglichkeit, kostenlos an<br />

diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren.


128 CHAPTER 6. IMAGE QUALITY<br />

(a) dunkel<br />

(b) hell<br />

(c) kontrastarm<br />

(d) kontrastreich<br />

Figure 6.2: Histogramme


6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 129


130 CHAPTER 6. IMAGE QUALITY<br />

Slide 6.1 Slide 6.2 Slide 6.3 Slide 6.4<br />

Slide 6.5 Slide 6.6 Slide 6.7 Slide 6.8<br />

Slide 6.9 Slide 6.10 Slide 6.11 Slide 6.12<br />

Slide 6.13 Slide 6.14 Slide 6.15 Slide 6.16<br />

Slide 6.17 Slide 6.18 Slide 6.19 Slide 6.20<br />

Slide 6.21 Slide 6.22 Slide 6.23 Slide 6.24<br />

Slide 6.25 Slide 6.26 Slide 6.27 Slide 6.28


6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 131<br />

Slide 6.29 Slide 6.30 Slide 6.31 Slide 6.32<br />

Slide 6.33 Slide 6.34 Slide 6.35 Slide 6.36<br />

Slide 6.37 Slide 6.38 Slide 6.39 Slide 6.40<br />

Slide 6.41 Slide 6.42 Slide 6.43 Slide 6.44<br />

Slide 6.45


132 CHAPTER 6. IMAGE QUALITY


Chapter 7<br />

Filtering<br />

7.1 Images in the Spatial Domain<br />

We revisit the definition of an image space with its cartesian coordinates x <strong>and</strong> y to denote the<br />

columns <strong>and</strong> rows of pixels. We define a pixel at location (x, y) <strong>and</strong> denote its gray value with<br />

f(x, y). Filtering changes the gray value f of an input image into an output gray value g(x, y) in<br />

accordance with Slide 7.3. The trans<strong>for</strong>mation<br />

g(x, y) = T [f(x, y)]<br />

is represented by an operator T which acts on the pixel at location (x, y) <strong>and</strong> on its neighbourhood.<br />

The neighbourhood is defined by a mask which may also be denoted as template, window or filter<br />

mask. We can there<strong>for</strong>e state in general terms that: a filter is an operation that produces from<br />

an input image <strong>and</strong> its pixels f(x, y) an output image with pixels g(x, y) by a filter operator T .<br />

This operator uses in the trans<strong>for</strong>mation the input pixel <strong>and</strong> its neighbourhood to produce a value<br />

in the output pixel. We will see later that filtering is a concept encompassing many different<br />

types of operations to which the basic definition applies common. It may be of interest to note<br />

that some of the operations we have previously discussed can be classified as filter operations,<br />

namely the trans<strong>for</strong>mation of the image where an operation addresses a neighbourhood of size<br />

1 × 1. Those trans<strong>for</strong>mations produce from an input an output pixel via the transfer function<br />

T that one calls “point operations” or trans<strong>for</strong>mations of individual pixels. We have the special<br />

case of contrast enhancement in Slide 7.4, <strong>and</strong> of “thresholding”. Similarely, these operations on<br />

single pixels included the inversion of a negative to a positive as shown in Slide 7.5. The same<br />

type of operation is shown in Slide 7.6. The astronomic imaging sensors at times produce a very<br />

high density range that challenges the capabilities of film <strong>and</strong> certainly of monitors. On an 8-bit<br />

image we may not really appreciate the detail that a star may provide through a high resolution<br />

telescope. To do better justice to a high density range image a single pixel operation is applied<br />

that non-linearly trans<strong>for</strong>ms the input gray values into the output gray values. Again, in the<br />

narrow sense of our definition of filtering, this is a “filter operation”. However, we have previously<br />

discussed the same trans<strong>for</strong>mation under the name of contrast stretching. In this particular case,<br />

the contrast stretch is logarithmic.<br />

Prüfungsfragen:<br />

• In der Vorlesung wurden die Operationen ”<br />

Schwellwert“ und ”<br />

Median“, anzuwenden auf<br />

digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden<br />

Operationen im Kontext der Filterung?<br />

133


134 CHAPTER 7. FILTERING<br />

7.2 Low-Pass Filtering<br />

Let us define a mask of 3 by 3 pixels in Slide 7.8. We enter into that mask values w 1 , w 2 , . . . , w 9 .<br />

We call those values w “weights”. We now place the 3 by 3 pixel mask on top of the input image<br />

which has gray values denoted as z i . Let us assume that we center the 3 by 3 mask over the pixel<br />

z 5 so that w 5 is on top of z 5 . We can now compute a new gray value g 5 as the sum of the products<br />

of the values w i <strong>and</strong> z i , w i · z i in accordance with slide. This describes an operation on an input<br />

image without specifying the values in the filter mask. We need to assign such values to the 3 by<br />

3 mask: a low-pass filter is filled with a set of values shown in slide: in this example we assign the<br />

value 1/9. The sum of all values is 1. Similarly, a larger mask of 5 × 5 values may be filled with<br />

1/25, a 7 × 7 filter mask with 1/49. The three examples are typical low-pass filters.<br />

Slide 7.10 illustrates the effect the low-pass filter masks, filled with weights of 1/k, with k being<br />

the number of pixels in the filter mask. Slide 7.10 shows the image of a light bulb <strong>and</strong> how the<br />

effect of low-pass filters increases the blur as the size of the filter mask increases from 25 via 125<br />

to 625 values representing windows with side lengths of 5,1 <strong>and</strong> 25 pixels.<br />

We will next consider the analogy between “filtering” <strong>and</strong> “sampling”. Slide 7.11 shows an image<br />

<strong>and</strong> the gray value profile along a horizontal line (row of pixels). The continuous gray value trace<br />

needs to be “sampled” into discrete pixels. We show in Slide 7.12 the basic concept of a transition<br />

from the continuous gray value trace to a set of pixels. If we reconstruct the original trace from<br />

the discrete pixels, we will obtain a new version of the continuous gray value trace. If turns out<br />

that the reconstruction is nothing else but a filtered version of the original. Sampling <strong>and</strong> signal<br />

reconstruction are thus an analogy to filtering, <strong>and</strong> sampling theory is related to filter theory.<br />

Slide 7.13 illustrates one particular <strong>and</strong> important low-pass-filter: the sinc-filter. A sinc function<br />

is<br />

sinc(f) = sin(πf)<br />

πf<br />

<strong>and</strong> represents the Fourier-trans<strong>for</strong>m of a rectangular “pulse” in the Fourier-space (see below).<br />

Slide 7.13 illustrates how a filtered value of the input function is obtained from the large filter mask<br />

representing the sinc-function. By shifting the sinc-function along the abszissa <strong>and</strong> computing a<br />

filter value at each location, we obtain a smoothed version of the input signal. This is analogous<br />

to sampling the input signal <strong>and</strong> reconstructing it from the samples.<br />

Next we consider the median filter. This is a popular <strong>and</strong> frequently used operator. It inspects<br />

each gray value under the filter window <strong>and</strong> picks that gray value under that window which has<br />

half of the pixels with larger <strong>and</strong> the other half with smaller gray values. Essentially the gray<br />

values under the filter window are being sorted <strong>and</strong> the median value is chosen. Where would<br />

this be superior to an arithmetic mean? Clearly, the median filter does suppress high frequency<br />

in<strong>for</strong>mation or rapid changes in the image. Thus it suppresses salt <strong>and</strong> pepper noise. Salt <strong>and</strong><br />

pepper noise results from irregularities where individual pixels are corrupted. They might be<br />

either totally black or totally white.<br />

By applying a median filter one will throw out these individual pixels <strong>and</strong> replace them by one midrange<br />

pixel from the neighbourhood. The effect can sometimes be amazing. Slide 7.16 illustrates<br />

with a highly corrupted image of a female person, <strong>and</strong> a corruption of the image with about 20%<br />

of the pixels. Computing the arithmetic mean will produce a smoother image but will not do away<br />

with the effect of noise. Clusters of corrupted pixels will result in persistent corruptions of the<br />

image. However, the median filter will work a miracle. An image, almost as good as the input<br />

image, without many corruptions, will result.<br />

A median filter also has a limitation: If we have fine details in an image, say individual narrow<br />

linear features (an example would be telegraph wires in an aerial photo) then those pixels marking<br />

such a narrow object will typically get suppressed <strong>and</strong> replaced by the median value in their<br />

environment. As a result the fine linear detail would no longer show in the image.


¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¢<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

7.2. LOW-PASS FILTERING 135<br />

0 0 5 0 0 0 0 0 0<br />

0 0 5 0<br />

0 0 1 5<br />

0 0 0 5<br />

0 4<br />

0 0<br />

2 4<br />

0 0 0<br />

1 2 4<br />

5 5 5<br />

0 0 1 3 5 5 5 5 5<br />

0 1 3 5 5 5 2 5 5<br />

0 2 5 5 3 5 5 5 5<br />

0<br />

0<br />

0<br />

0<br />

0 0 0 0<br />

0 1 2 1<br />

1 2 4 4<br />

1 3 5 5<br />

Figure 7.1: Anwendung eines Median-Filters<br />

Prüfungsfragen:<br />

• Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche<br />

Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um<br />

die Darstellung der Methode und die Begründung, warum diese Methode die Störungen<br />

entfernen wird.<br />

• Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er<br />

eingesetzt?<br />

• Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umr<strong>and</strong>eten Bereiches<br />

des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in<br />

Abbildung B.14 eintragen.<br />

Antwort: Siehe Abbildung 7.1<br />

• Skizzieren Sie die Form des Filterkerns eines G<strong>aus</strong>sschen Tiefpassfilters. Worauf muss man<br />

bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten?<br />

• Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass<br />

1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals<br />

unverändert lässt,<br />

2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals<br />

vollständig unterdrückt!<br />

Antwort: siehe Abbildung 7.3<br />

(a) Tiefpass<br />

(b) Hochpass<br />

Figure 7.2: Tief- und Hochpassfilter


136 CHAPTER 7. FILTERING<br />

7.3 The Frequency Domain<br />

We have so far looked at images represented as gray values in an (x, y) cartesian coordinate system.<br />

We call this the spatial-domain representation. There is another representation of images using<br />

sinus- <strong>and</strong> cosinus-functions called spectral representation. The trans<strong>for</strong>mation of the spatialdomain<br />

image f(x, y) into a spectral-domain representation F (u, v) is via a Fourier-trans<strong>for</strong>m:<br />

∫ ∫<br />

F {f(x, y)} = F (u, v) = f(x, y)e −2jπ(ux+vy) dxdy<br />

x<br />

The spectral representation is with the independent variables u, v which are the frequencies in the<br />

coordinate directions. The spectral representation can be converted back into a special representation<br />

by the inverse trans<strong>for</strong>m:<br />

∫ ∫<br />

f(x, y) = F (u, v)e −2jπ(ux+vy) dudv(???)<br />

u<br />

v<br />

In the discrete world of pixels, the double integral ∫ ∫ is replaced by a double summation ∑ ∑ .<br />

A filter operation can be seen as a convolution (Faltung) in accordance with Slide 7.18. The<br />

convolution is defined in nd graphically illustrated in through .<br />

In this case the two functions f(x) <strong>and</strong> g(x) are one-dimensional functions <strong>for</strong> simplicity. They<br />

are being convolved using an operation denoted by a symbol ∗:<br />

f(x) ∗ g(x) =<br />

∫ ∞<br />

y<br />

t=−∞<br />

f(t)g(x − t)dt<br />

We define the function f(t) as a simple rectangle on the interval 0 ≤ t ≤ 1. The second function<br />

g(t) is also defined in the same space as a box on the interval 0 ≤ t ≤ 1. We illustrate the function<br />

g at location −t <strong>and</strong> at x − t produce the product of f(t). g(x − t), <strong>and</strong> as shown in Slide 7.24 as<br />

the shaded area. We illustrate this at x = x 1 , <strong>and</strong> x = x 2 . The convolution now is the integral of<br />

all these areas as we move g(x − t) into the various positions along the axis x. When there is no<br />

overlap between the two functions the product f · g is empty. As a result, the integral produces<br />

values that increase monotonously from 0 to c <strong>and</strong> then decrease from c to 0 as the co-ordinate x<br />

goes from 0 through 1 to the value of 2. This produces a “smoothed” version of the input function<br />

f.<br />

It is now of interest to appreciate that a convolution in the spatial domain is a multiplication in<br />

the spectral domain. This was previously explained in Slide 7.18. We can thus execute a filter<br />

operation by trans<strong>for</strong>ming the input image f <strong>and</strong> the filter function f into the spectral domain,<br />

resulting in F <strong>and</strong> H. We multiply the two spectral representation, obtain G as the spectral<br />

representation of the output image. After of to a Fourier trans<strong>for</strong>m of G we have the viewable<br />

output image g.<br />

This would be the appropriate point in this course to interrupt the discussion of filtering <strong>and</strong><br />

inserting a “tour d’horizon” of the Fourier-trans<strong>for</strong>m. We will not do this in this class, <strong>and</strong><br />

reserve that discussion <strong>for</strong> a later course as part of the specialization track in “image processing”.<br />

However, a Fourier-trans<strong>for</strong>m of an image is but one of several trans<strong>for</strong>ms in image processing.<br />

There are others, such as the Hadamard-trans<strong>for</strong>m, a Cosine-trans<strong>for</strong>mation, Walsh-trans<strong>for</strong>ms<br />

<strong>and</strong> similar. Of interest now is the question of filtering in the spatial domain, representing a<br />

convolution, or in the spectral domain representing a multiplication. At this point we only state<br />

that with large filter masks at sizes greater than 15 x 15 pixels, it may be more efficient to use<br />

the spectral representation. We do have the cost of 3 Fourier trans<strong>for</strong>ms (note: f ??? F, h ??? H,<br />

G ??? G), but the actual convolution is being replaced by a simple multiplication of F???H.<br />

Slide 7.25 now introduces certain filter windows <strong>and</strong> their representation both in the spatial <strong>and</strong><br />

spectral domains. presents a one-dimensional filter functions. We are, there<strong>for</strong>e, looking at a row


7.4. HIGH PASS-FILTER - SHARPENING FILTERS 137<br />

of pixels in the spatial domain or a row of pixels in the spectral domain through the center of the<br />

2D function. The 2D functions themselves are rotationally symmetric.<br />

A typical low-pass filter in the spatial domain will have a G<strong>aus</strong>sian shape. Its representation in<br />

the spatial domain is similar to its representation in the spectral domain. In the spectral domain it<br />

is evident that the filter rapidly approaches a zero-value, there<strong>for</strong>e suppressing higher frequencies.<br />

In the spectral domain a high-pass filter has a large value as frequencies increase <strong>and</strong> is zero at<br />

low frequencies. Such a high pass-filter looks like the so called mexican hat, if presented in the<br />

spatial domain. A b<strong>and</strong> pass-filter in two dimensions is a ring like a “donut shape”, <strong>and</strong> in the<br />

one dimensional case it is a G<strong>aus</strong>sian curve that is displaced with the respect to the origin. In<br />

the spatial domain the b<strong>and</strong>-pass filter-shape is similar to a “mexican hat”. However, the values<br />

in the high pass-filter are negative outside the central area in the spectral domain, whereas is in<br />

the b<strong>and</strong> pass-filter the shape goes first negative, then positive again .<br />

Prüfungsfragen:<br />

• Beschreiben Sie anh<strong>and</strong> einer Skizze das ”<br />

Aussehen“ folgender Filtertypen im Frequenzbereich:<br />

1. Tiefpassfilter<br />

2. Hochpassfilter<br />

3. B<strong>and</strong>passfilter<br />

7.4 High Pass-Filter - Sharpening Filters<br />

We now are ready to visit the effect of a high-pass filter. In the spatial domain, the shape of the<br />

high pass-filter was presented in Slide 7.26. In actual numerical values such a filter is shown in<br />

Slide 7.28. The filter window is normalized such that the sum of all values equals zero. Note that<br />

we have a high positive value in the center <strong>and</strong> negative values at the edge of the window. The<br />

pixel at the center of the window in the image will be emphasized <strong>and</strong> the effect of neighbouring<br />

pixels reduced. There<strong>for</strong>e small details will be accentuated. Background will be suppressed. It is<br />

as if we only had the high-frequency detail left <strong>and</strong> the low frequency variations disappear. The<br />

reason is obvious. In areas were pixel values don’t change very much, the output gray values<br />

will become 0, bec<strong>aus</strong>e there are no differences among the gray values. The input pixels will be<br />

replaced by the value 0 bec<strong>aus</strong>e we are subtracting from the gray value the average value of the<br />

surrounding pixels.<br />

This high-pass filter can be used to emphasize (highlight) the geometric detail. But if we do<br />

not want to suppress the background, as we have seen in the pure high pass-filter, we need to<br />

re-introduce it. This leads to a particular type of filter that is popular in the graphic arts: the<br />

unsharp masking or USM. The high pass-filtered image really is the difference between an original<br />

image <strong>and</strong> a low-pass version of the image, so that only the high frequency content survives. In<br />

the USM we would like to have the high-pass-version of the image augmented with the original<br />

image. We obtain this by means of a high pass-filter version of the image <strong>and</strong> adding to it the<br />

original image, however multiplied by a factor A − 1, where A > 1. If A = 1 we have a st<strong>and</strong>ard<br />

high pass-filter. As we increase A we add more <strong>and</strong> more of the original image back. The effect<br />

is shown in Slide 7.30. In that slide we have a 3 by 3 filter window <strong>and</strong> the factor A is shown<br />

variably as being 1.1, 1.15, <strong>and</strong> 1.2. As A increases, the original image gets more <strong>and</strong> more added<br />

back in, to a point where we get overwhelmed by the amount of very noisy detail.<br />

Prüfungsfragen:


¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

¢<br />

¡<br />

¡<br />

¡<br />

¡<br />

¡<br />

138 CHAPTER 7. FILTERING<br />

• Gegeben sei eine Filtermaske entsprechend Abbildung ??. Um was für eine Art Filter h<strong>and</strong>elt<br />

es sich hier?<br />

• Gegeben sei ein Bild nach Abbildung ??. Was sind die Ergebnispixel im Ergebnisbild an<br />

den markierten drei Orten nach Anwendung der Filtermaske <strong>aus</strong> Abbildung ???<br />

• Eines der populärsten Filter heißt ”<br />

Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte<br />

um eine einfache <strong>for</strong>melmäßige Erläuterung.<br />

• In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in<br />

der Mitte heller ist als am R<strong>and</strong>. Geben Sie ein Verfahren an, das diese Störung entfernt!<br />

• Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas ”<br />

flau“.<br />

1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert.<br />

2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes<br />

zu verbessern?<br />

Wird durch diese Methoden auch der In<strong>for</strong>mationsgehalt des Bildes vergrößert? Begründen<br />

Sie Ihre Antwort.<br />

• Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass<br />

1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals<br />

unverändert lässt,<br />

2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals<br />

vollständig unterdrückt!<br />

Antwort: siehe Abbildung 7.3<br />

(a) Tiefpass<br />

(b) Hochpass<br />

Figure 7.3: Tief- und Hochpassfilter<br />

7.5 The Derivative Filter<br />

A very basic image processing function is the creation of a so called edge-image. Recall that we<br />

had one definition of “edge” that related to the binary image early on in this class (Chapter 1).<br />

That definition of an edge will now be revisited <strong>and</strong> we will learn about a second definition of an<br />

edge.<br />

Let us first define what a gradient image is. We apply a gradient operator to the image function<br />

f(x, y). The gradient of f(x, y) is shown in Slide 7.32, denoted as ∇ (Nabla). A gradient is thus a


7.5. THE DERIVATIVE FILTER 139<br />

multidimensional entity; in a two dimensional image we obtain a two dimensional gradient vector<br />

with a length <strong>and</strong> a direction. We now have to associate with each location x, y in the image these<br />

two entities. The length of the gradient vector is of course the Pythagorean sum of its elements,<br />

namely of the derivatives of the gray-value function with respect to x <strong>and</strong> y. We typically use<br />

the magnitude of the gradient vector <strong>and</strong> ignore it’s direction. However, this is not true in every<br />

instance.<br />

We are not dealing with continuous tone images but discrete renditions in the <strong>for</strong>m of pixels <strong>and</strong><br />

discrete matrices of numbers. We can approximate the computation of a gradient function by<br />

means of a three by three matrix as explains. The 3 × 3 matrix has nine values z 1 , z 2 , . . . , z 9 . We<br />

approximate the derivative my means of a first difference, namely z 5 − z 8 , z 5 − z 6 , <strong>and</strong> so <strong>for</strong>th.<br />

The magnitude of the gradient function is being approximated by the expression shown in Slide<br />

7.34. We define a way of computing the gradient in a discrete, sampled digital image avoiding<br />

squares <strong>and</strong> square roots. We can even further simplify the approximation as shown in Slide 7.34<br />

namely as the sum of the absolute values of the differences between pixel gray values. We can<br />

also use gradient approximations by means of cross-differences, thus not by horizontal <strong>and</strong> vertical<br />

differences along rows <strong>and</strong> columns of pixels in the window.<br />

Some of these approximations are associated with their inventors. The gradient operator<br />

∇f ≈ |z 5 − z 9 | + |z 6 − z 8 |<br />

is named after Roberts. Prewitt’s approximation is little more complicated:<br />

∇f ≈ |(z 7 + z 8 + z 9 ) − (z 1 + z 2 + z 3 )| + |(z 3 + z 6 + z 9 ) − (z 1 + z 4 + z 7 )|<br />

Slide 7.36 computation is being implemented. Two filter functions are sequentially being applied<br />

to the input image, <strong>and</strong> the two resulting output images are being added up. The Roberts<br />

operates with two windows of dimensions 2 × 2. The case of Prewitt uses two windows with<br />

dimensions 3 × 3, <strong>and</strong> a third gradient approximation by Sobel also uses two 3 × 3 windows.<br />

Lets take a look at an example: Slide 7.37 shows a military fighter plane <strong>and</strong> the gradient image<br />

derived from it using the Prewitt-operator. These gradient images can then be post-processed<br />

by e.g. removing the background details, simply by reassigning gray values above a certain level<br />

to zero or one, or assign gradients of a certain value to a particular colour such as white or black.<br />

This will produce from an original image the contours of its objects, as seen Slide 7.37.<br />

We call the resulting image, after a gradient operator has been applied, an edge-image. However,<br />

in reality we don’t have any edges yet. We still have a gray-tone image that visually appears like<br />

as image of edges <strong>and</strong> contours. To convert this truly to an edge image we need to treshold the<br />

gradient image so that only the highest valued pixels get a value one <strong>and</strong> all lower value pixels are<br />

set to 0 (black) <strong>and</strong> are called “background”.<br />

This means that we have produced a binary image where the contours <strong>and</strong> edgy objects are marked<br />

as binary elements. We now need to remove the noise, <strong>for</strong> example in the <strong>for</strong>m of single pixels<br />

using by a morphological filter. We also have to link up the individual edge pixels along the<br />

contours so that we obtain contour lines. Linking up these edges is an operation that has to do<br />

with “neighbourhoods” (Chapter 1) we also need to obtain skeletons <strong>and</strong> connected sequences of<br />

pixels as discussed previously (Chapter 3).<br />

Prüfungsfragen:<br />

• Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett<br />

umr<strong>and</strong>eten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das<br />

Ergebnis direkt in Abbildung B.13 eintragen.


140 CHAPTER 7. FILTERING<br />

9 9 8 8 6 7 6 6<br />

7 8 9 8 7 2 3 1<br />

6 8 7 8 3 2 0 1<br />

8 7 8 2 3 1 1 2<br />

7 6 7 1 0 2 3 1<br />

7 6 8 2 2 1 2 0<br />

0<br />

2<br />

3<br />

5<br />

12<br />

12<br />

6<br />

4<br />

3<br />

3<br />

2<br />

1<br />

2<br />

3<br />

2<br />

Figure 7.4: Roberts-Operator<br />

• Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden.<br />

Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des<br />

fett umr<strong>and</strong>eten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen.<br />

Führen Sie außerdem für eines der Pixel den Rechengang vor.<br />

• Wenden Sie auf den fett umr<strong>and</strong>eten Bereich in Abbildung B.34 den Roberts-Operator zur<br />

Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen.<br />

Antwort: Siehe Abbildung 7.4<br />

7.6 Filtering in the Spectral Domain / Frequency Domain<br />

We define a filter function H(u, v) in the spectral domain as a rectangular function, a so-called<br />

box function. Multiplying the Fourier trans<strong>for</strong>m of an image by H(u, v) produces the spectral<br />

representation of the final image G(u, v) as a product of H <strong>and</strong> F. We have a trans<strong>for</strong>m-function of<br />

the shape of filter-function H as shown in Slide 7.39, that has the value 1 at the origin, <strong>and</strong> from<br />

the origin to a value D 0 <strong>and</strong> we assume that H is rotationally symmetric. In the frequency domain<br />

the value D 0 is denoted as a cut-of-frequency. Any frequency beyond D 0 will not be permitted<br />

through the filter function.<br />

Let us take a look of how this works. In Slide 7.40 we have the image of the head of a bee course in<br />

the spectral domain we would not be able to judge what the image shows. We can create a spectral<br />

representation by applying a Fourier-trans<strong>for</strong>m to the image <strong>and</strong> we can now define circles in<br />

the spectral representation with the centre at the origin of the spectral domain <strong>and</strong> radius that<br />

contains 90%, 93% or more of the image frequencies, also denoted as the “energies”. Now if we<br />

apply a filter function H is shown be<strong>for</strong>e that will only let the frequencies pass through within<br />

90% of the energy <strong>and</strong> than we trans<strong>for</strong>m the resulting function G back from this spectral into<br />

the spatial domain to obtain an image g we obtain a blurred version of the original image. As we<br />

let more frequencies go throw the blur will be less <strong>and</strong> less. What we have obtained is a series of<br />

low-pass filtered images of the head of the bee <strong>and</strong> we also have indicated how much of the image<br />

content we have filtered out <strong>and</strong> how much we have let go through the low-pass filter.<br />

If we trans<strong>for</strong>m the function H from the spectral domain into the spatial domain, we obtain Slide<br />

7.41. If we apply this filter function to an image that contains nothing but 2 white points, we<br />

will obtain an image g that will appear corrupted, presenting us with a ghost image. We should<br />

there<strong>for</strong>e be careful with that type of box filter (in the spectral domain). The ghost images of<br />

high contrast content in our input image will be disturbing. It is advisable to not use such a box<br />

filter, which is sometimes also called ideal filter. We should use instead an approximation . We<br />

introduce the Butterworth filter as shown in . The Butterworth filter is represented in the<br />

spectral domain by a curve as shown in the slide:<br />

H(u, v) =<br />

1 +<br />

1<br />

( ) 2n<br />

D(u,v)<br />

D 0


7.7. IMPROVING NOISY IMAGES 141<br />

In two dimensions this is a volcano-like shape. Applying that type of filter now to the bee produces<br />

a series of low pass filtered images without ghost images . A straight example of the difference of<br />

a applying the box-filter as opposed to the Butterworth filter is shown in .<br />

Of course this entire discussion of spectral <strong>and</strong> spatial domains <strong>and</strong> of convolutions <strong>and</strong> filters<br />

requires space, time <strong>and</strong> ef<strong>for</strong>t <strong>and</strong> is related to a discussion of the Fourier trans<strong>for</strong>m, <strong>and</strong><br />

the effect of these trans<strong>for</strong>ms <strong>and</strong> of the suppression of certain frequencies on the appearance of<br />

functions. Typically throughout an engineering program the signals are mostly one-dimensional,<br />

whereas in image processing the typical signals are two-dimensional. A quick view of Fourier<br />

trans<strong>for</strong>ms of certain functions illustrates some of what one needs to be aware off. Slide 7.47<br />

presents one function F (u) in the spectral domain in the <strong>for</strong>m of a rectangular pulse. Its trans<strong>for</strong>m<br />

into the spatial domain gives us the sinc-function, as previously discussed as f(x) = sin(πx)/(πx).<br />

Now if we cut off the extremities of f(x) <strong>and</strong> then trans<strong>for</strong>m that function back into the spectral<br />

space we obtain a so-called ringing of the function. Giving up there<strong>for</strong>e certain frequencies in the<br />

spectral domain can lead to a certain noisiness of the signal in the other domain.<br />

Prüfungsfragen:<br />

• Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit<br />

der ”<br />

cutoff“-Frequenz D 0 an! Skizzieren Sie die Transferfunktion!<br />

7.7 Improving Noisy Images<br />

There are many uses of filters. We have already found the use of filters to enhance edges, <strong>and</strong><br />

pointed out that filters trans<strong>for</strong>m individual pixels. We may use filters also to remove problems<br />

in images. Let us assume that we have compressed an image from 8 bits to 4 bits <strong>and</strong> there<strong>for</strong>e<br />

have reduced a number of available grey values to 16. We have an example in Slide 7.49 where<br />

the low number of gray values creates artefacts in the image in the <strong>for</strong>m of gray value contours.<br />

By applying a low-pass filter we can suppress the unpleasant appearance of false density contours.<br />

Another example also in Slide 7.49 is an image with some corruption by noise. A low pass filter<br />

will produce a new image that is smoother <strong>and</strong> there<strong>for</strong>e more pleasant to look at. Finally, we<br />

want to revisit the relationship between “filter” <strong>and</strong> “sampling”.. Slide 7.51 illustrates again the<br />

monkey-face: smoothing an image by a low-pass filter maybe equivalent to sampling the image,<br />

then reconstructing it from the samples.<br />

Prüfungsfragen:<br />

• Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer<br />

diskretisierten Bildfunktion. Erklären Sie diese Behauptung!<br />

7.8 The Ideal <strong>and</strong> the Butterworth High-Pass Filter<br />

??? inspection we may want to use high-pass filters, bec<strong>aus</strong>e our eye likes the crisp, sharp edges<br />

<strong>and</strong> a high level of energy in an image. Slide 7.53 introduces such a high pass filter in the<br />

spectral domain. The ideal high-pass filter lets all high frequencies go through <strong>and</strong> supresses<br />

all low frequencies. This “ideal” filter has the same problems as we have seen in the low-pass<br />

case. There<strong>for</strong>e we may prefer the Butterworth high-pass filter is not a box, but a monotonaes<br />

function. Of course in the 2-dimensional domain the ideal <strong>and</strong> Butterworth high-pass filters<br />

appear like a brick with a hole in it. Application of the high-pass filter is to enhance the contrast<br />

<strong>and</strong> bringing out the fine detail of the object as shown in . The high-pass filter improves the


142 CHAPTER 7. FILTERING<br />

appearance of the image, suppresses the background. If we addin the original image in a highpass<br />

filtered version we come again to a type of ”emphasis filter” that we have seen earlier under<br />

the name “unsharp masking” (USM). The resulting image can be processed into an equalized<br />

histogram <strong>for</strong> optimum visual inspection.<br />

Again high-pass filters can be studied in booth the spatial <strong>and</strong> the spectral domains . We have<br />

the sinc function in the spectral domain which represents in the spatial domain a box-function, a<br />

pulse. The sinc 2 function is a triangular function in the spatial domain. And a G<strong>aus</strong>sian function<br />

will remain a G<strong>aus</strong>sian function both in the spectral <strong>and</strong> in the spatial domains.<br />

Prüfungsfragen:<br />

• Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters<br />

und vergleichen Sie die Vor- und Nachteile beider Filtertypen!<br />

7.9 Anti-Aliasing<br />

7.9.1 What is Aliasing ?<br />

Recall the rasterization or scan-conversion of straight lines <strong>and</strong> curves, <strong>and</strong> the resulting aliasing.<br />

Suppose we work with a trigonometric function of some sort. This function is being sampled<br />

at certain widely spaced intervals. Reconstruction of the function from samples will produce<br />

a particular function that’s not really there. What is shown in Slide 7.58 is a high frequency<br />

function, whereas the samples describe a low frequency sinus-curve. We denote the falsification of<br />

the original function into one of a different frequency with “aliasing”. This type of aliasing is a<br />

widely reviewed subject of sampling theory <strong>and</strong> signal processing <strong>and</strong> is not particular to image<br />

processing or graphics.<br />

Aliasing is a result of our need to sample continous functions, both in the creation of images <strong>and</strong><br />

in the creation of visualizations of objects in computer graphics.<br />

7.9.2 Aliasing by Cutting-off High Frequencies<br />

explains the issue further with an excursion into sampling theory. We have an input-image f(x) in<br />

the spatial domain that needs to be sampled. As we go into the spectral domain we cannot use all<br />

frequencies. We cut them off at w <strong>and</strong> we loose all frequencies outside the interval −w ≤ F (u) ≤ w.<br />

Let us now define a sampling function in the spatial domain as s(x), consisting of a series of Diracfunctions<br />

at an interval ∆x. The multiplication of f(x) with s (x) produces the sampled function in<br />

the spatial domain. As we go into the spectral domain we also obtain a set of discrete frequencies<br />

s(u) at 1/∆x, 2/∆x.<br />

If we now convolve (in the spectral domain) the sampling function S(u) with the original function<br />

F (u) we get the spectral view of the sampled function f(x) · s(x). We see the original function<br />

F (u) repeated at locations −1/∆x, +1/∆x, . . . Trans<strong>for</strong>ming this back into the spatial domain<br />

produces samples from which the original function f(x) can only be incompletely reconstructed.<br />

What is now the effect of changing ∆x? If we make it smaller we get a more accurate sampling<br />

of the input function in accurdance with . We see in the spectral domain that the repetitions<br />

of the original function F (u) in F (u) ∗ S(u) are spaced apart at wider intervals 1/∆x, as ∆x<br />

gets smaller. Slide 7.62 illustrates that we could isolate the spectrum of our function f(x) by<br />

multiplying F (x) ∗ S(u) by a box filter G(u), producing F (u) <strong>and</strong> we can fully reconstruct f(x)<br />

from the samples. If w was the smallest frequency in our function f(x) or F (u), then we have no


7.9. ANTI-ALIASING 143<br />

less if the sampling interval ∆x is smaller than 1/2w:<br />

∆x ≤ 1<br />

2w<br />

Whittaker-Shannon theorem<br />

In turn we define a cut-off frequency w <strong>and</strong> denote it the Nyquist-frequency that is fully represented<br />

by a sampling interval ∆x if w = 1/(2∆x) Nyquist frequency.<br />

7.9.3 Overcoming Aliasing with an Unweightable Area Approach<br />

Of course the implementation is again as smart as possible to avoid multiplications <strong>and</strong> divisions,<br />

<strong>and</strong> replaces them by simpler operations. The approach in Slide 7.63. Aliasing occurs if ∆x<br />

violates the Whittaker-Shannon theorem. Anti-Aliasing by means of a low-pass filter occurs<br />

in the rasterization or scan conversion of geometric elements in computer graphics. We have<br />

discussed this effect in the context of scan conversion by means of the Bresenham-approach.<br />

Slide 7.63 explains another view of the issue, using the scan-conversion of a straight line. We can<br />

assign grey values to those pixels that are being touched by the area representing the “thin-line”.<br />

This would produce a different approach from Bresenham bec<strong>aus</strong>e we are not starting out from<br />

a binary decision that certain pixels are in, all others are out: We instead select pixels that are<br />

“touched” by the straight line, <strong>and</strong> assign a brightness proportional to the area that the overlap<br />

takes up.<br />

7.9.4 Overcoming Aliasing with a Weighted Area Approach<br />

Algorithm 21 Weighted Antialiasing<br />

1: set currentX to x-value of start of line<br />

2: while currentX smaller than x-value of end of line do<br />

3: apply Bresenham’s Line Algorithm to get appropriate currentY -value<br />

4: consider three cones (each with diameter of 2 pixels <strong>and</strong> volume normalized to 1)<br />

erected over the grid positions (currentX, currentY + 1), (currentX, currentY ) <strong>and</strong><br />

(currentX, currentY - 1 )<br />

5: <strong>for</strong> all cones do<br />

6: determine the intersection of the cone’s base with the line<br />

7: calculate the volume above the intersection<br />

8: multiply the obtained volume with the desired gray value<br />

9: take the result <strong>and</strong> set it as the pixel’s gray value<br />

10: end <strong>for</strong><br />

11: increase currentX<br />

12: end while<br />

In weighted area sampling we also decrease a pixel´s brightness as it has less overlap with the area<br />

of the “thin line”. But not all overlap areas are treated equal! We introduce a “distance” from the<br />

center of a pixel <strong>for</strong> the overlap area. With this basic idea in mind we can revisit the unweighted<br />

area sampling <strong>and</strong> treat all overlap areas equally, implementing a “box-filter” as shown in . Each<br />

overlap area is multiplied with the same value represented us the height of the box, normalized to<br />

1.<br />

A weighted area sampling approach is shown in . The “base” of the filter (its support) is circular<br />

<strong>and</strong> larger than a pixel, typically with a diameter at 2x the pixel´s side length. The height of the<br />

filter come in such that its volume is 1.<br />

illustrates the effect that a maring small triangle would have on pixels as it moves across an image.<br />

The triangle is smaller than a pixel.


144 CHAPTER 7. FILTERING<br />

Getting Antialiased Lines by Means of the Gupta-Sproull approach.<br />

Algorithm 22 Gupta-Sproull-Antialiasing<br />

1: dx := x2 − x1;<br />

2: dy := y2 − y1;<br />

3: d := 2*dy − dx;<br />

4: incrE := 2∗ dy;<br />

5: incrNE := 2 ∗ ( dy − dx);<br />

6: two v dx := 0;<br />

7: invDenom := 1/(2∗ Sqrt(dx ∗ dx + dy ∗ dy));<br />

8: two dx invDenom := 2∗ dx ∗ invDenom;<br />

9: x := x1;<br />

10: y := y1;<br />

11: IntensifyPixel (x, y, 0);<br />

12: IntensifyPixel (x, y +1, two dx invDenom);<br />

13: IntensifyPixel (x, y −1, two dx invDenom);<br />

14: while x < x2 do<br />

15: if d < 0 then<br />

16: two v dx := d + dx;<br />

17: d := d + incrE;<br />

18: x := x + 1;<br />

19: else<br />

20: two v dx := d − dx;<br />

21: d := d + incrNE;<br />

22: x := x + 1;<br />

23: y := y + 1;<br />

24: end if<br />

25: IntensifyPixel (x, y, two v dx ∗ invDenom);<br />

26: IntensifyPixel (x, y +1, two dx invDenom-two v dx ∗ invDenom);<br />

27: IntensifyPixel (x, y −1, two dx invDenom-two v dx ∗ invDenom);<br />

28: end while<br />

29: intensity := Filter(Round(Abs(distance)));<br />

30: WritePixel (x, y, intensity);<br />

Using the weighted area method, we can pre-compute a table <strong>for</strong> lines at different distances from<br />

a pixel’s center. A line will typically intersect those cones centered on three pixels as shown in<br />

Slide 7.69, but it may intersect also only 2, maximally 5 such cones. The look-up table is filled<br />

with values, computed by using to the definitions of as a function F (D, t) of two variables: t as<br />

the line’s thickness <strong>and</strong> D as the distance from a pixel center. Gupta <strong>and</strong> Sproull, two early<br />

pioneers of computer graphics, introduced the table look up <strong>for</strong> a 4-bit display device. There are<br />

only 16 values of D needed since a 4-bit display only has 16 different gray values. The Bresenham<br />

method (the midpoint line algorithm) needs now to be modified to not only decide on the E or<br />

NE pixel, but we also need to assign a grey value. However, we not only set a grey value <strong>for</strong> the<br />

single pixels at E or NE, but also <strong>for</strong> its two neighbours above <strong>and</strong> below.<br />

Slide 7.70 illustrates how distance D is being computed using simple trigonometry:<br />

dx<br />

D = v √<br />

dx2 + dy 2<br />

And we need two additional distances D above <strong>and</strong> D below :<br />

dx<br />

D above = (1 − v) √ (7.1)<br />

dx2 + dy 2


7.9. ANTI-ALIASING 145<br />

dx<br />

D below = (1 + v) √ (7.2)<br />

dx2 + dy 2<br />

Prüfungsfragen:<br />

• Erklären Sie, unter welchen Umständen ”<br />

Aliasing“ auftritt und was man dagegen unternehmen<br />

kann!<br />

• In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären<br />

Sie, wie die Artefakte am oberen Bildr<strong>and</strong> zust<strong>and</strong>ekommen, und beschreiben Sie eine<br />

Möglichkeit, deren Auftreten zu verhindern!


146 CHAPTER 7. FILTERING


7.9. ANTI-ALIASING 147<br />

Slide 7.1 Slide 7.2 Slide 7.3 Slide 7.4<br />

Slide 7.5 Slide 7.6 Slide 7.7 Slide 7.8<br />

Slide 7.9 Slide 7.10 Slide 7.11 Slide 7.12<br />

Slide 7.13 Slide 7.14 Slide 7.15 Slide 7.16<br />

Slide 7.17 Slide 7.18 Slide 7.19 Slide 7.20<br />

Slide 7.21 Slide 7.22 Slide 7.23 Slide 7.24<br />

Slide 7.25 Slide 7.26 Slide 7.27 Slide 7.28


148 CHAPTER 7. FILTERING<br />

Slide 7.29 Slide 7.30 Slide 7.31 Slide 7.32<br />

Slide 7.33 Slide 7.34 Slide 7.35 Slide 7.36<br />

Slide 7.37 Slide 7.38 Slide 7.39 Slide 7.40<br />

Slide 7.41 Slide 7.42 Slide 7.43 Slide 7.44<br />

Slide 7.45 Slide 7.46 Slide 7.47 Slide 7.48<br />

Slide 7.49 Slide 7.50 Slide 7.51 Slide 7.52<br />

Slide 7.53 Slide 7.54 Slide 7.55 Slide 7.56


7.9. ANTI-ALIASING 149<br />

Slide 7.57 Slide 7.58 Slide 7.59 Slide 7.60<br />

Slide 7.61 Slide 7.62 Slide 7.63 Slide 7.64<br />

Slide 7.65 Slide 7.66 Slide 7.67 Slide 7.68<br />

Slide 7.69 Slide 7.70


150 CHAPTER 7. FILTERING


Chapter 8<br />

Texture<br />

8.1 Description<br />

Texture is an important subject in the analysis of natural images of our environment <strong>and</strong> in the<br />

computer generation of images if we want to achieve photo-realism. Slide 8.3 illustrates three<br />

different sets of textures. The first maybe of pebbles on the ground, the second of a quarry to<br />

mine stones <strong>and</strong> the third is a texture of fabric. We can describe texture (a) pictorially be means<br />

of the photograph of the surface (b) or by a set of mathematical methods: these may be statistical,<br />

structural or spectral. And finally we will present a procedural approach to modeling <strong>and</strong> using<br />

texture.<br />

Prüfungsfragen:<br />

• Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an.<br />

8.2 A Statistical Description of Texture<br />

Recall the image function as z = f(x, y) with the image gray values z. We can compute so-called<br />

moments of the image gray values as shown in Slide 8.7. The moments are denoted as µ n (z).<br />

The first moment, µ 1 (z) is the mean of the gray values. The second moment m 2 (z) represents<br />

the variance of the gray values with respect to the mean, see definition 8.2 . Moments include the<br />

probability of the gray value p(z). In a discrete context probability is represented by the histogram<br />

of the gray values. Obviously if a gray value is very unlikely to occur, its column in the histogram<br />

will be very low or empty.<br />

The measure of texture can be a function of these moments. A very simple one is the value R .<br />

If there is no variation in gray value then its variance σ 2 or its st<strong>and</strong>ard deviation σ or its second<br />

moment µ 2 (z) is 0 or close to 0. In that case the value of R is 0 as well. R there<strong>for</strong>e represents a<br />

measure of the smoothness of the image. We can associate a separate value R with each pixel i<br />

by computing it <strong>for</strong> a window around that pixel i.<br />

There are other statistical measures of texture, <strong>for</strong> example associated with the “edginess” of<br />

an area. In this case we would produce an edge value associated with each pixel, <strong>for</strong> example<br />

representing the number, direction <strong>and</strong> strength of the edges in small windows surrounding a<br />

pixel. Nominally we obtain a different texture parameter at each pixel. However, we are looking<br />

to describe an extended image by regions of similar texture. There<strong>for</strong>e we will classify the texture<br />

parameters into a few groups. We may create an equidensity image as discussed previously. If<br />

151


152 CHAPTER 8. TEXTURE<br />

µ n (z) = ∑ [(z i − m) n ∗ p(z i )]<br />

z ... the grayvalue Image, z i the gray value of the i-th pixel in the Image<br />

m ... mean value of z ( average intensity )<br />

µ n ... n-th moment of z about the mean<br />

R ... a measure of the relative smoothness<br />

σ 2 ... variance<br />

R = 1 − 1/(1 + σ 2 (z))<br />

we do that, we might be able to describe a quarry with a only two texture parameters as shown<br />

in Slide 8.8 <strong>and</strong> . While the quarry itself has been delineated manually by a human operator, a<br />

texture parameter is computed within this delineated polygon <strong>and</strong> the equidensity method applied<br />

to the texture parameter will define two different textures in this quarry.<br />

A very frequently used texture measure is the so called co-occurrence-matrix which is seeking to<br />

describe the occurrence of similar patterns in an image. We are not discussing this in this context<br />

other than to mention the name.<br />

Prüfungsfragen:<br />

• Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden?<br />

Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern!<br />

8.3 Structural Methods of Describing Texture<br />

In order to underst<strong>and</strong> the concept of a structural texture description we refer to Slide 8.11. We<br />

define a rule that replaces a small window in a image by a pattern, <strong>for</strong> example we replace a<br />

window by a pattern “aS” <strong>and</strong> “a” may represent a circle. If we now apply the same operation<br />

multiple times, we do get an arrangement of repetitive patterns located adjacent to one another<br />

in a row <strong>and</strong> column pattern “a a a S”. We might denote the neighbourhood relationship between<br />

adjacent areas by different symbols. Slide 8.12 below the current location, “c” to the left. We can<br />

now describe a certain pattern by a certain sequence of “a”, “b” <strong>and</strong> “c” operations. A texture<br />

primitive which is shown here as a circle, could be any kind of other pattern. We set up our<br />

texture by repeating the pattern. Note again that a this point we are concerned with describing<br />

texture as we find it in a natural image. We are not, at this time, generating a texture <strong>for</strong> an<br />

object that we want to visualise.<br />

Prüfungsfragen:<br />

• Erläutern Sie die strukturelle Methode der Texturbeschreibung!<br />

8.4 Spectral Representation of Texture<br />

We have previously discussed the technical viability to represent an image in a computer. In<br />

the spatial domain we are using the rows <strong>and</strong> colums of pixels, in the spectral domain we are


8.5. TEXTURE APPLIED TO VISUALISATION 153<br />

using the frequencies. The description of texture using the spectral representation of an image is<br />

there<strong>for</strong>e described next. Slide 8.14 illustrates a typical texture pattern. Its spectral representation<br />

illustrates that there are distinct patterns that are repeated in the image. These are illustrated<br />

by dominant frequencies in the image.<br />

We call the two dimensional function in the spatial domain the spectral function s(r, j), where r<br />

is the radius of a certain spectral location from the origin <strong>and</strong> j is the angle from the axis x in a<br />

counter-clock wise direction. Any location in the spectral representation of the image there<strong>for</strong>e has<br />

the coordinates r, j. Slide 8.15 explains this further. We can simplify the spectral representation<br />

of the image into two functions. One functions is a plot of the angle j as a function of r <strong>and</strong> the<br />

other is a plot of the function r <strong>for</strong> a given value of j. Slide 8.16 illustrates two different patterns<br />

of textures <strong>and</strong> the manifestation of those patterns in the j-curve. A texture parameter can now<br />

be extracted from the spectral representation, <strong>for</strong> example by counting the number of peaks or<br />

the average distance between the peaks in the spectral domain.<br />

We can also set up a texture vector with several values that consider the number of peaks as a<br />

function of the radius r. The aim is to associate with a pixel or a window in the image a simple<br />

number or vector that is indicative of the type of texture one finds there. There<strong>for</strong>e we have<br />

here a case of classification where we could take a set of know textures <strong>and</strong> create from those a<br />

feature space in two or more dimensions (see Chapter 14). If we now have an unknown texture<br />

we might try to describe this in terms of the known textures using the feature space <strong>and</strong> looking<br />

<strong>for</strong> the nearest texture that we can find given the texture numbers of the unknown texture. In<br />

this manner we can replace an input image by a texture image which indicates at each location<br />

the kind of texture which exists there. In classifying areas of similar texture as one area we will<br />

replace a large number of pixels by a small numbers of textures <strong>and</strong> a description of the contour<br />

of an area of uni<strong>for</strong>m texture.<br />

Prüfungsfragen:<br />

• Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum<br />

auf? Welche Aussagen können über eine Textur anh<strong>and</strong> ihres Spektrums gemacht werden?<br />

• Das digitalen Rasterbild <strong>aus</strong> Abbildung B.71 soll segmentiert werden, wobei die beiden<br />

Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme<br />

von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung<br />

hier nicht erfolgreich sein. Welche <strong>and</strong>eren Bildeigenschaften kann man verwenden,<br />

um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können?<br />

8.5 Texture Applied to Visualisation<br />

To achieve photorealism in the visualisation of two- or three-dimensional objects we employ descriptions<br />

of texture rather than texture itself. We may apply artificial texture, also denoted as<br />

synthetic texture <strong>and</strong> place this on the geometric polygons describing the surface shape of an<br />

object. The texture itself may consist of texture elements which are also denoted as texels. Slide<br />

8.19 is an example of some simple objects showing a wire-frame rendering of an indoor scene <strong>and</strong><br />

illustrates how unrealistic this type of representation appears. Slide 8.20 is a result of placing<br />

photographic texture on top of the objects. We obtain a photorealistic representation of those<br />

objects. Basic concepts in Slide 8.21 are illustrated by various examples of a two-dimensional flag,<br />

a three-dimensional indoor-scene, a two dimensional representation of symbols, phototexture of<br />

wood, texture of some h<strong>and</strong>-written material.<br />

How is this photographic texture applied to a geometrically complex object? This is illustrated<br />

in Slide 8.22, see also 23. We deal with three different coordinate systems. At first we have the<br />

representation on a monitor or display medium. And a window on this display contains a segment


154 CHAPTER 8. TEXTURE<br />

Algorithm 23 Texture mapping<br />

1: surround the object with a virtual cylinder<br />

2: <strong>for</strong> all pixels of the texture do<br />

3: make a coordinate trans<strong>for</strong>mation from carthesian to cylindric coordinates {to wrap the<br />

texture on the cylinders surface}<br />

4: end <strong>for</strong><br />

5: <strong>for</strong> all points of the object do<br />

6: project the point perpendicularly from the midpoint of the cylinder to the cylinders surface<br />

7: where the projection cuts the edge of the object, assign the object point the color of the<br />

corresponding cylinder point<br />

8: end <strong>for</strong><br />

of a three dimensional object which is represented in a world-coordinate system. The surface of<br />

this object needs to be photo-textured <strong>and</strong> it receives that photo-texture from a texture map with<br />

its third coordinate system. Essentially we a projecting the texture map onto the curved surface<br />

of an object <strong>and</strong> than render the curved surface on the display medium using a trans<strong>for</strong>mation<br />

that results from a synthetic camera <strong>and</strong> with a camera pose consisting of attitude <strong>and</strong> angle<br />

orientation.<br />

Prüfungsfragen:<br />

• Erklären Sie, wie in der Visualisierung die Qualität eines vom <strong>Computer</strong> erzeugten Bildes<br />

durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften<br />

(insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer<br />

Textur eignen.<br />

• In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht<br />

zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung<br />

solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche<br />

Fälle typisch?<br />

8.6 Bump Mapping<br />

In order to provide a realistic appearance to a surface which is not smooth but bumpy, there exists<br />

a concept called bump-mapping. This applies a two dimensional texture to a three-dimensional<br />

object <strong>and</strong> making the two dimensional texture appear as if it were three dimensional. Slide 8.24<br />

<strong>and</strong> Slide ?? explain the concept with a donut <strong>and</strong> a strawberry. Note that the texture really is<br />

two dimensional. The third dimension is introduced by some 2D-picture of a shadow <strong>and</strong> detail<br />

that is not available in the third dimension. This is visible in the contours of the object where<br />

the bumps on the texture are not reflected in the geometry of the object. In this case we do not<br />

apply the photographic texture we did use in the previous chapter, but we deal with a computed<br />

texture.<br />

Prüfungsfragen:<br />

• Was beschreibt der Begriff ”<br />

Bump-Mapping“?<br />

• In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle<br />

einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet.<br />

Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte<br />

dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt,<br />

und was sind die typischen Eigenschaften, anh<strong>and</strong> derer man das Verfahren hier erkennen<br />

kann?


8.7. 3D TEXTURE 155<br />

8.7 3D Texture<br />

Another concept of texture is three dimensional. In this case we do not texture a surface but an<br />

entire three dimensional body. An example is shown in Slide 8.27 where the surface results from<br />

the intersection of the three dimensional texture body with the surface geometry.<br />

Prüfungsfragen:<br />

• Was ist eine ”<br />

3D Textur“?<br />

8.8 A Review of Texture Concepts by Example<br />

Slide 8.29 illustrates from an animated movie an example of the complexities of applying photorealistic<br />

textures to three dimensional objects. We begin with basic shapes of the geometric entity<br />

<strong>and</strong> apply to it some basic colours. We superimpose on these colours an environment map. This<br />

is again modified by a bump map <strong>and</strong> the appropriate illumination effect. The intermediate result<br />

is shown in Slide 8.31 adding dirt specks <strong>for</strong> additional realism. Slide 8.32 adds further details:<br />

We want to add details by creating a near-photographic texture, by adding more colour, the effect<br />

of water troplets, mirror <strong>and</strong> spectral reflections.<br />

We should not be surprised that the creation of such animated scenes consumes growing computing<br />

power <strong>and</strong> there<strong>for</strong>e takes time the complete. The final result is in Slide 8.33.<br />

8.9 Modeling Texture: Procedural Approach<br />

As previously discussed we process natural images to find a model of texture <strong>and</strong> we use those<br />

models to create images. Slide 8.35 details the method of analysing existing texture. We have a<br />

real scene of an environment <strong>and</strong> we do underst<strong>and</strong> from the image model that the intensity in<br />

the image is a function of the material property f r <strong>and</strong> the illumination E i . Material property is<br />

unknown <strong>and</strong> needs to be determined from the raster image. Illumination is known. We estimate<br />

the model parameters <strong>for</strong> the material property <strong>and</strong> we use it to approximate objects. The photo<br />

texture use this <strong>and</strong> the virtual scene with the unknown material <strong>and</strong> illumination properties to<br />

compute the density per pixel <strong>and</strong> thereby obtain a synthetic image. An issue is now to find a<br />

method of model an unknown texture by simple curves.<br />

Slide 8.36 explains how a reference surface, a light source, a camera <strong>and</strong> a texture to be analysed<br />

can be set up into a sensor system. The resulting image is illustrated in Slide 8.37 with the known<br />

reference surface <strong>and</strong> the unknown texture.<br />

We need to have the reference texture so that we can calibrate the differences in illumination. As<br />

seen in the previous slide we have an image of texture <strong>and</strong> an effect of illumination, particularly<br />

we may have mirror or specular reflection. We do not discuss models <strong>for</strong> reflection at this time but<br />

just show a given model in Slide 8.38 <strong>for</strong> illustration purposes. We have <strong>for</strong> each pixel a known<br />

gray value f <strong>and</strong> we know the angle Q i under which a pixel is being illuminated <strong>and</strong> how the<br />

reflection occurs . We will discuss the parameters of the illumination model in Chapter ??. We<br />

need to compute the parameters of the reflections that are marked.<br />

In Slide 8.39 we study a particular column of pixels that represent a gray value curve of an<br />

unknown photo texture. The question is: what is “texture” here ? Slide 8.40 explains. We do<br />

have the actual brightness along the row of pixels plotted <strong>and</strong> we model the change of brightness<br />

as a function of the illumination with an average that we can calibrate with our reference pattern.<br />

The deviation from the average is then the actual texture in the <strong>for</strong>m of an irregular signal. We


156 CHAPTER 8. TEXTURE<br />

now need to describe that signal statistically by means of a few simple numbers. How to do this<br />

is a topic of “statistical signal analysis”, <strong>for</strong> example in a spectral representation of the signal as<br />

previously discussed in section 8.4.<br />

Let us review the basic idea in a different way. We have an image of a surface <strong>and</strong> we can<br />

take a little window <strong>for</strong> analysis. We can create a texture surface by projecting that window<br />

multiple times onto the surface <strong>and</strong> we may obtain in the process some type of “tiling effect”. The<br />

procedural texture discussed be<strong>for</strong>e will model the surface texture by mathematics <strong>and</strong> avoid the<br />

seaming effect of the individual tiles. We can create any kind of shapes in our synthetic surface as<br />

shown in Slide 8.43. We can illustrate in Slide 8.44 that those shapes can be fairly complex even<br />

in three dimensions.<br />

Prüfungsfragen:<br />

• Was versteht man unter ”<br />

prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile<br />

bringt ihr Einsatz?


8.9. MODELING TEXTURE: PROCEDURAL APPROACH 157


158 CHAPTER 8. TEXTURE<br />

Slide 8.1 Slide 8.2 Slide 8.3 Slide 8.4<br />

Slide 8.5 Slide 8.6 Slide 8.7 Slide 8.8<br />

Slide 8.9 Slide 8.10 Slide 8.11 Slide 8.12<br />

Slide 8.13 Slide 8.14 Slide 8.15 Slide 8.16<br />

Slide 8.17 Slide 8.18 Slide 8.19 Slide 8.20<br />

Slide 8.21 Slide 8.22 Slide 8.23 Slide 8.24<br />

Slide 8.25 Slide 8.26 Slide 8.27 Slide 8.28


8.9. MODELING TEXTURE: PROCEDURAL APPROACH 159<br />

Slide 8.29 Slide 8.30 Slide 8.31 Slide 8.32<br />

Slide 8.33 Slide 8.34 Slide 8.35 Slide 8.36<br />

Slide 8.37 Slide 8.38 Slide 8.39 Slide 8.40<br />

Slide 8.41 Slide 8.42 Slide 8.43 Slide 8.44


160 CHAPTER 8. TEXTURE


Chapter 9<br />

Trans<strong>for</strong>mations<br />

9.1 About Geometric Trans<strong>for</strong>mations<br />

We will discuss in this chapter the trans<strong>for</strong>mation of objects in a fixed coordinate system, the<br />

change of coordinate systems with a fixed object, the de<strong>for</strong>mation of objects, so that from an<br />

input object the geometrically changed output object results, we will discuss projections of the<br />

3D-world into a 2D-display plane <strong>and</strong> finally we will discuss under the heading of “trans<strong>for</strong>mations”<br />

the change in representation of an object if we approximate it by simple functions <strong>and</strong> we denote<br />

this as approximation <strong>and</strong> interpolation.<br />

Geometric trans<strong>for</strong>mations apply when objects move in a fixed world coordinate system, but they<br />

also apply when we need to look at objects <strong>and</strong> have to create images or use images of objects to<br />

reconstruct them. In that case we need to underst<strong>and</strong> the projection of the object into an image<br />

or display medium. A very important application of geometric trans<strong>for</strong>mation is in robotics. This<br />

is can be unrelated to the processing of digital visual in<strong>for</strong>mation, but employ the same sets of<br />

<strong>for</strong>mule <strong>and</strong> ideologies of “trans<strong>for</strong>mation”. A simple robot may have associated with it numerous<br />

coordinate systems which are attached to its rigid elements. Each coordinate system is related<br />

to each other coordinate system by a coordinate trans<strong>for</strong>mation. Slide 9.3 <strong>and</strong> Slide 9.4 explain<br />

how a world coordinate system is home to the robot’s body which in turn is the reference <strong>for</strong> the<br />

robot arm. The arm holds the h<strong>and</strong>, the h<strong>and</strong> holds the fingers <strong>and</strong> the fingers seek to relate to an<br />

object or box which itself is presented in the world coordinate system. Slide 9.4 illustrates these<br />

six coordinate systems in a simplified presentation in two dimensions.<br />

Our interest is in geometric trans<strong>for</strong>mations concerning the use of imagery. Slide 9.5 illustrates<br />

an early video image of the surface of planet Mercury, from NASA’s Mariner mission in the mid<br />

1960’s. We do need to relate each image to images taken from other orbits, <strong>and</strong> we need to place<br />

each image into a coordinate reference frame that is defined by meridians, the equator <strong>and</strong> poles of<br />

the planet. Slide 9.6 represents a geometric rectification of the previous image. The trans<strong>for</strong>mation<br />

is into a Mercator or Stereographic projection. We can see the geometric correction of the image<br />

if we note that the craters which were of elliptical shape in the original image, now approximate<br />

circles, as they would appear from an overhead-view straight down. Such a view is also denoted<br />

as an orthographic projection.<br />

9.2 Problem of a Geometric Trans<strong>for</strong>mation<br />

The geometric trans<strong>for</strong>mation applies typically to a 2-dimensional space in the plane, the 3-<br />

dimensional space as in the natural human environment <strong>and</strong> more generally to n-dimensional space.<br />

161


162 CHAPTER 9. TRANSFORMATIONS<br />

In the processing of digital visual in<strong>for</strong>mation, most of our geometric trans<strong>for</strong>mations address the<br />

3-dimensional space of our environment <strong>and</strong> the 2-dimensional space of a display medium. Slide<br />

9.8 illustrates the trans<strong>for</strong>mation of objects in a rigid coordinate system (x, y) in a 2-dimensional<br />

space. We have in this example 2 objects, 1 <strong>and</strong> 2, be<strong>for</strong>e the trans<strong>for</strong>mation, <strong>and</strong> 1’, 2’ after<br />

the trans<strong>for</strong>mation. The general model of a rigid body trans<strong>for</strong>mation in 2-D the space is shown<br />

inSlide 9.9: the equation takes the input (x, y) coordinates <strong>and</strong> produces from them the output<br />

x ′ , y ′ coordinates using trans<strong>for</strong>mation parameters a 0 , a 1 , a 2 <strong>and</strong> b 0 , b 1 , b 2 . Slide 9.10 illustrates<br />

the usefulness of this <strong>for</strong>mulation, if we have given objects be<strong>for</strong>e <strong>and</strong> after the trans<strong>for</strong>mation<br />

<strong>and</strong> we need to determine (“estimate”) the unknown parameters of the trans<strong>for</strong>mation. Given are<br />

there<strong>for</strong>e: x 1 , y 1 , x 2 , y 2 , x ′ 1 , y ′ 1, x ′ 2, <strong>and</strong> y ′ 2, <strong>and</strong> we seek to compute a 0 , a 1 , a 2 , b 0 , b 1 , b 2 .<br />

We may also know the trans<strong>for</strong>mation parameters <strong>and</strong> need to compute <strong>for</strong> each given input<br />

coordinate pair (x, y) its associated output coordinate pair x ′ , y ′ as illustrated in Slide 9.11.<br />

This concludes the introduction of the basic ideas of trans<strong>for</strong>mations using the example of 2<br />

dimensional space.<br />

9.3 Analysis of a Geometric Trans<strong>for</strong>mation<br />

We will use the example of a 2-dimensional object that is trans<strong>for</strong>med in 2D-space under a socalled<br />

con<strong>for</strong>mal trans<strong>for</strong>mation which does not change the angles of the object. The following<br />

illustrates in Slide 9.13, Slide 9.14, Slide 9.15 through Slide 9.16 the elements from which a<br />

geometric trans<strong>for</strong>mation in 2D-space is assembled. A very basic element of a trans<strong>for</strong>mation<br />

always is the translation. We add to each pair (x, y) an object’s translational component t x <strong>and</strong><br />

t y to produce the output coordinates x ′ , y ′ .<br />

Definition 18 Con<strong>for</strong>mal trans<strong>for</strong>mation<br />

x ′ = s · cos(α) · x − s · sin(α) · y + t x<br />

y ′ = s · sin(α) · x + s · cos(α) · y + t y<br />

A second important trans<strong>for</strong>mational element is scaling. An object gets reduced or enlarged by a<br />

scale factor s ( see definition 18 ), <strong>and</strong> more generally we might use 2 different scale factors in the<br />

x coordinate direction denoted s x <strong>and</strong> in the y coordinate direction denoted s y . As a result, we<br />

may obtain a squished thus de<strong>for</strong>med object. We call a de<strong>for</strong>mation by means of 2 different scale<br />

factors an affine de<strong>for</strong>mation <strong>and</strong> will discuss this later.<br />

Finally we have rotations <strong>and</strong> rotate an object by an angle α. The trans<strong>for</strong>mation equation<br />

representing the rotation is shown in Slide 9.15. For a rotation we need a point around which we<br />

rotate the object. Normally this is the origin of the coordinate system. The general expression <strong>for</strong><br />

a rotation using a rotation angle α produces the output x ′ , y ′ coordinates from the input (x, y)<br />

coordinates by multiplying those coordinates with cos α <strong>and</strong> sinα in accordance with Slide 9.16.<br />

This can also be presented in matrix notation, resulting in the expression p ′ = R · p, <strong>and</strong> we call<br />

R the rotation matrix.<br />

What makes now a trans<strong>for</strong>mation in 2D-space specifically a con<strong>for</strong>mal trans<strong>for</strong>mation? We<br />

already stated that this does not change any angles. Obviously this requires that our body not<br />

be changed in shape. Instead it may be enlarged or reduced, it may be translated <strong>and</strong> it may be<br />

rotated, but right angles be<strong>for</strong>e the trans<strong>for</strong>mation will be right angles after the trans<strong>for</strong>mation<br />

as well. Slide 9.17 explains that we combine the three elements of the 2D-trans<strong>for</strong>mation that we<br />

denoted as scaling by factor s, rotating by angle α <strong>and</strong> translating by the translation elements t x


9.3. ANALYSIS OF A GEOMETRIC TRANSFORMATION 163<br />

<strong>and</strong> t y . We call this a four parameter trans<strong>for</strong>mation since we have four independent elements of<br />

the trans<strong>for</strong>mation: s, α, t x , t y . In matrix notation this trans<strong>for</strong>mation is<br />

x ′ = s · Rx + t,<br />

<strong>and</strong> s · R can be replaced by the trans<strong>for</strong>mation matrix M.<br />

We have described a trans<strong>for</strong>mation by means of Cartesian coordinates (x, y). One could use<br />

polar coordinates (r, φ). A point with coordinates (x, y) receives the coordinates (r, φ). A rotation<br />

becomes a very simple operation, changing the angle φ by the rotation angle ω. The relationships<br />

between (x, y) <strong>and</strong> (r, φ) are fairly obvious:<br />

x = r cos φ, y = r sin φ.<br />

A rotated point p ′ will have the coordinates r cos(φ + ω) <strong>and</strong> r sin(φ + ω).<br />

When per<strong>for</strong>ming a trans<strong>for</strong>mation we may have a fixed coordinate system <strong>and</strong> rotate the object<br />

or we may have a fixed object <strong>and</strong> rotate the coordinate system. In Slide 9.20 we explain how a<br />

point p with coordinates (x, y) obtains coordinates (X, Y ) as a result of rotating the coordinate<br />

system by an angle α. Note that the angle α is the angle subtended between the input <strong>and</strong> output<br />

axes. We can there<strong>for</strong>e interpret that the rotation matrix is not only filled with the elements cos α,<br />

sin α, but we can interpret the rotation matrix to be filled with the angle subtended between the<br />

rotation axes be<strong>for</strong>e <strong>and</strong> after rotation <strong>and</strong> we have the angles xX, xY , yX, yY <strong>and</strong> have them<br />

all enter the rotation matrix with a cos(xX), cos(xY ) etc.<br />

We have thus found in Slide 9.20 a second definition <strong>for</strong> the contents of the rotation matrix: first<br />

was the interpretation of R with cos α <strong>and</strong> sin α of the rotation angle α. The second now is,<br />

that the elements of the rotation matrix are the cosinus of the angles subtended by the input <strong>and</strong><br />

output coordinates.<br />

Prüfungsfragen:<br />

• In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Trans<strong>for</strong>mation M in das<br />

Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an,<br />

die diese Trans<strong>for</strong>mation beschreibt (zwei verschiedene Lösungen)!<br />

Antwort: Zwei verschiedene Lösungen ergeben sich, weil das Objekt symmetrisch ist und<br />

um die y-Achse gespiegelt werden kann, ohne verändert zu werden.<br />

⎛<br />

2 0<br />

⎞<br />

4<br />

M 1 = ⎝ 0 0.5 3 ⎠<br />

0 0 1<br />

⎛<br />

⎞<br />

−2 0 12<br />

M 2 = ⎝ 0 0.5 3 ⎠<br />

0 0 1<br />

• Berechnen Sie jene Trans<strong>for</strong>mationsmatrix M, die eine Rotation um 45 ◦ im Gegenuhrzeigersinn<br />

um den Punkt R = (3, 2) T und zugleich eine Skalierung mit dem Faktor √ 2 bewirkt<br />

(wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei<br />

Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p ′ = Mp in den Punkt<br />

p ′ übergeführt wird.<br />

Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für<br />

die Matrixmultiplikation geeignet anwenden.<br />

Antwort:<br />

M = T(3, 2) · S( √ 2) · R(45 ◦ ) · T(−3, −2)


164 CHAPTER 9. TRANSFORMATIONS<br />

=<br />

=<br />

=<br />

⎛<br />

⎝ 1 0 3<br />

0 1 2<br />

0 0 1<br />

⎛<br />

⎝<br />

⎛<br />

⎝<br />

1 0 3<br />

0 1 2<br />

0 0 1<br />

1 0 3<br />

0 1 2<br />

0 0 1<br />

⎞ ⎛ √ ⎞ ⎛<br />

2<br />

√ 0 0<br />

⎠ · ⎝ 0 2 0 ⎠ ·<br />

⎞<br />

0 0 1<br />

⎛<br />

⎞<br />

1 −1 0<br />

⎛<br />

⎠ · ⎝ 1 1 0 ⎠ · ⎝<br />

0 0 1<br />

⎞<br />

⎛<br />

⎠ · ⎝<br />

1 −1 −1<br />

1 1 −5<br />

0 0 1<br />

⎞<br />

⎝ cos 45◦ − sin 45 ◦ 0<br />

sin 45 ◦ cos 45 ◦ 0<br />

0 0 1<br />

⎠ = ⎝<br />

1 0 −3<br />

0 1 −2<br />

0 0 1<br />

⎛<br />

⎞<br />

⎠<br />

1 −1 2<br />

1 1 −3<br />

0 0 1<br />

⎞<br />

⎠<br />

⎞<br />

⎠ ·<br />

⎛<br />

⎝ 1 0 −3<br />

0 1 −2<br />

0 0 1<br />

⎞<br />

⎠<br />

• Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Trans<strong>for</strong>mationsmatrix (in<br />

zwei Dimensionen) gefragt, die sich <strong>aus</strong> einer Skalierung und einer Rotation um ein beliebiges<br />

Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Trans<strong>for</strong>mation?<br />

Begründen Sie Ihre Antwort!<br />

Antwort: Rotationszentrum (r x , r y ), Rotationswinkel (ϕ) und Skalierungsfaktor (s) ergeben<br />

vier Freiheitsgrade.<br />

• Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung<br />

liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angew<strong>and</strong>t<br />

”<br />

werden, wobei<br />

⎛<br />

T = ⎝ 1 0 t ⎞ ⎛<br />

x<br />

0 1 t y<br />

⎠ , S = ⎝ s 0 0<br />

⎞<br />

0 s 0 ⎠ .<br />

0 0 1<br />

0 0 1<br />

Nach der Tran<strong>for</strong>mation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt<br />

soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p<br />

des Objekts gemäß obiger Vorschrift in einen Punkt p ′ = M · p des trans<strong>for</strong>mierten Objekts<br />

überführt. Welche ist die richtige Lösung:<br />

1. M = T · S<br />

2. M = S · T<br />

Begründen Sie Ihre Antwort und geben Sie M an!<br />

Antwort: Antwort 1 ist richtig, da durch die Skalierung der Schwerpunkt genau dann<br />

unverändert bleibt, wenn er im Koordinatenursprung liegt. Die anschließende Translation<br />

verschiebt das Objekt (und damit den Schwerpunkt) an die gewünschte Position. Es ist also<br />

⎛<br />

M = T · S = ⎝ 1 0 t ⎞ ⎛<br />

x<br />

0 1 t y<br />

⎠ · ⎝ s 0 0<br />

⎞ ⎛<br />

0 s 0 ⎠ = ⎝ s 0 t ⎞<br />

x<br />

0 s t y<br />

⎠<br />

0 0 1 0 0 1 0 0 1<br />

• Gegeben seien eine 3 × 3-Trans<strong>for</strong>mationsmatrix<br />

⎛<br />

⎞<br />

M = −4 3 1 ⎠<br />

⎝ 3 4 2<br />

0 0 1<br />

sowie drei Punkte<br />

a = (2, 0) T ,<br />

b = (0, 1) T ,<br />

c = (0, 0) T


9.4. DISCUSSING THE ROTATION MATRIX IN TWO DIMENSIONS 165<br />

im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine<br />

kon<strong>for</strong>me Trans<strong>for</strong>mation, wobei ein Punkt p gemäß p ′ = Mp in einen Punkt p ′ übergeführt<br />

wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und<br />

bc stehen normal aufein<strong>and</strong>er.<br />

1. Berechnen Sie a ′ , b ′ und c ′ durch Anwendung der durch M beschriebenen Trans<strong>for</strong>mation<br />

auf die Punkte a, b und c!<br />

2. Da M eine kon<strong>for</strong>me Trans<strong>for</strong>mation beschreibt, müssen auch die Punkte a ′ , b ′ und<br />

c ′ ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall<br />

ist! (Hinweis: es genügt zu zeigen, dass die Strecken a ′ c ′ und b ′ c ′ normal aufein<strong>and</strong>er<br />

stehen.)<br />

Antwort:<br />

1.<br />

2.<br />

a ′ = (8, −7) T<br />

b ′ = (6, 4) T<br />

c ′ = (2, 1) T<br />

a ′ − c ′ = (6, −8) T<br />

b ′ − c ′ = (4, 3) T<br />

(a ′ − c ′ ) · (b ′ − c ′ ) = 6 · 4 + (−8) · 3 = 0<br />

9.4 Discussing the Rotation Matrix in two Dimensions<br />

A rotation matrix R is filled with four elements it if concerns rotations in two dimensions.<br />

Definition 19 Rotation in 2D<br />

x ′ = x · cos θ − y · sin θ<br />

y ′ = x · sin θ + y · cos θ<br />

written in matrix-<strong>for</strong>m:<br />

( cos θ − sin θ<br />

R =<br />

sin θ cos θ<br />

( ) ( )<br />

x<br />

′<br />

x<br />

y ′ = R ·<br />

y<br />

)<br />

As shown in 9.1 two elements can be combined into a unit vector, namely unit vectors i <strong>and</strong> j.<br />

The rotation matrix R consists of i, j, which are the unit vectors in the direction of the rotated<br />

coordinate system. We can show that the rotation matrix has some interesting properties, namely<br />

that the multiplications of the unit vectors with themselves are 1, <strong>and</strong> that the cross-products of<br />

the unit vectors are zero. ( see also Slide 9.22 )


166 CHAPTER 9. TRANSFORMATIONS<br />

Definition 20 2D rotation matrix<br />

A point of an object is rotated about the origin by multiplying it with a so called rotation matrix.<br />

When dealing with rotations in two dimensions the rotation matrix R consists of four elements.<br />

These elements can be combined into two unit vectors i <strong>and</strong> j.<br />

( ) (<br />

cos α<br />

− sin α<br />

i =<br />

, j =<br />

sin α<br />

cos α<br />

( )<br />

cos θ − sin θ<br />

R =<br />

= (i, j)<br />

sin θ cos θ<br />

( ) ( )<br />

x<br />

′<br />

x<br />

y ′ = R ·<br />

y<br />

)<br />

Starting from a given coordinate system with axes X <strong>and</strong> Y the vectors i <strong>and</strong> j correspond to the<br />

unit vectors in the direction of the rotated coordinate system (see Figure 9.1).<br />

Figure 9.1: rotated coordinate system<br />

We have now found a third definition of the rotation matrix element, namely the unit vectors<br />

along the axes of the rotated coordinate system as expressed in the input coordinate system. Slide<br />

9.23 summarizes the 3 interpretations of the elements of a rotation matrix.<br />

Let’s take a look at the inverse of a rotation matrix. Note that if we premultiply a rotation<br />

matrix by its inverse we get the unit matrix (obviously). But we also learn very quickly, that<br />

premultiplying the rotation matrix with the transposed of the rotation matrix also produces the<br />

unit vector, which very quickly proves to us in accordance with Slide 9.24 that the inverse of a<br />

rotation matrix is nothing else but the transposed rotation matrix.<br />

We now take a look at the <strong>for</strong>ward <strong>and</strong> backward rotation. Suppose we have rotated a coordinate<br />

system denoted by x into a new coordinate system of X. If we now premultiply the new coordinate<br />

system with the transposed rotation matrix, we obtain the inverse relationship <strong>and</strong> see that we<br />

obtain, in accordance with Slide 9.25, the original input coordinates. There<strong>for</strong>e we know that the<br />

transposed of a rotation matrix serves to rotate back the rotated coordinate system into its input<br />

state.<br />

Let’s now take a look at multiple sequential rotations. We first rotate input coordinates x into<br />

output coordinates x 1 <strong>and</strong> then we rotate the output coordinates x 1 further into coordinates x 2 .


9.5. THE AFFINE TRANSFORMATION IN 2 DIMENSIONS 167<br />

We see very quickly that x 2 is obtained from the product of two rotation matrixes R 1 <strong>and</strong> R 2 .<br />

However, it is also very quickly evident that multiplying two rotation matrixes produces nothing<br />

else but a third rotation matrix.<br />

Definition 21 Sequenced rotations<br />

x 1 = R 1 x<br />

x 2 = R 2 x 1<br />

x 2 = R 2 R 1 x = Rx<br />

R = R 2 R 1<br />

It is important, however, to realize that matrix multiplications are not commutative: R 2 · R 1 is<br />

not necessarily identical to R 1 · R 2 !<br />

Prüfungsfragen:<br />

• In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen<br />

nicht kommutativ ist, d.h. für zwei Trans<strong>for</strong>mationsmatrizen M 1 und M 2 gilt M 1·M 2 ≠ M 2·<br />

M 1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R 1<br />

und R 2 , so gilt sehr wohl R 1·R 2 = R 2·R 1 . Geben Sie eine geometrische oder mathematische<br />

Begründung für diesen Sachverhalt an!<br />

Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt!<br />

Antwort: Bei der Drehung um eine fixe Rotationsachse addieren sich die Rotationswinkel,<br />

die Reihenfolge der Rotationen spielt daher keine Rolle.<br />

9.5 The Affine Trans<strong>for</strong>mation in 2 Dimensions<br />

Slide 9.28 is an example of an Affine Trans<strong>for</strong>mation created with the help of a letter “F”. We<br />

see a shearing effect as a characteristic feature of an Affine Trans<strong>for</strong>mation. Similarly, Slide 9.29<br />

illustrates how a unit square will be de<strong>for</strong>med <strong>for</strong> example by squishing it only along the axis x<br />

but not along the axis y, or by shearing the square in one direction or in the other direction. All<br />

these are effects of an Affine Trans<strong>for</strong>mation.<br />

Slide 9.30 provides us with the equation <strong>for</strong> a general Affine Trans<strong>for</strong>mation in 2 dimensions. We<br />

see that this is a six parameter trans<strong>for</strong>mation, defined by trans<strong>for</strong>mation parameters a, b, c, d,<br />

t x , t y . We may again ask the question of estimating the unknown trans<strong>for</strong>mation parameters<br />

if we have given a number of points both be<strong>for</strong>e <strong>and</strong> after the trans<strong>for</strong>mation. Question: How<br />

many points do we need at a minimum to be able to solve <strong>for</strong> the unknown six trans<strong>for</strong>mation<br />

parameters. Obviously we need three points, bec<strong>aus</strong>e each point provides us with two equations,<br />

so that three points provide us with six equations suitable of solving <strong>for</strong> the six unknown equation<br />

parameters. But be aware: those three points cannot be colinear!<br />

Let us now analyze the elements of an Affine Trans<strong>for</strong>mation <strong>and</strong> let us take a look at Slide 9.32,<br />

Slide 9.33 however recalling what we saw from Slide 9.13 to Slide 9.15. First, we see a scaling of<br />

the input coordinates, in this case denoted as p x <strong>and</strong> p y , independently by scaling factors s x <strong>and</strong><br />

s y to obtain output coordinates q x <strong>and</strong> q y . We can denote the scaling operations by means of a<br />

2 × 2 scaling matrix M sc as shown in Definition ??.<br />

Secondly, we have a shearing de<strong>for</strong>mation which adds to each coordinate x an increment that is<br />

proportional to y <strong>and</strong> we add in y an augmentation that is proportional to the x coordinate using a


168 CHAPTER 9. TRANSFORMATIONS<br />

proportionality factor g. That shearing trans<strong>for</strong>mation can be described by a matrix M sh shearing<br />

(see Definition ?? ). Thirdly, we can introduce a translation adding to each x <strong>and</strong> y coordinate<br />

the translational element t x <strong>and</strong> t y ( see Definition ?? ).<br />

Finally, we can rotate the entire object identical to the rotation that we saw earlier using a rotation<br />

angle α <strong>and</strong> producing a rotation matrix M R ( see Chapter 9.4 ). An Affine Trans<strong>for</strong>mation is now<br />

the sum total of the trans<strong>for</strong>mations, thus the product of three trans<strong>for</strong>mations: M sc <strong>for</strong> scale,<br />

M sh <strong>for</strong> shearing <strong>and</strong> M R <strong>for</strong> rotation <strong>and</strong> adding on the translation as discussed previously.<br />

Slide 9.34 further explains how the trans<strong>for</strong>mation of the input coordinate vector p into an output<br />

coordinate q is identical to the earlier two equations, converting the input coordinate pair (x, y)<br />

into an output coordinate pair (x ′ , y ′ ) via a six parameter affine trans<strong>for</strong>mation.<br />

Definition 22 Affine trans<strong>for</strong>mation with 2D homogeneous coordinates<br />

⎛<br />

⎝ x′<br />

y ′<br />

w ′<br />

⎞<br />

⎠ =<br />

⎛<br />

⎝ s x 0 0<br />

0 s y 0<br />

0 0 1<br />

⎞<br />

⎠ ·<br />

⎛<br />

⎝ x y<br />

w<br />

⎞ ⎛<br />

⎠ = M sc<br />

⎝ x y<br />

w<br />

⎞<br />

⎠<br />

⎛<br />

⎝<br />

⎛<br />

x ′<br />

y ′<br />

w ′<br />

⎝ x′<br />

y ′<br />

w ′<br />

⎞<br />

⎠ =<br />

⎞<br />

⎠ =<br />

⎛<br />

⎝<br />

⎛<br />

1 h x 0<br />

h y 1 0<br />

0 0 1<br />

⎝ 1 0 t x<br />

0 1 t y<br />

0 0 1<br />

⎞<br />

⎛<br />

⎠ · ⎝<br />

⎞<br />

⎠ ·<br />

⎛<br />

x<br />

y<br />

w<br />

⎝ x y<br />

w<br />

⎞ ⎛<br />

⎠ = M sh<br />

⎝<br />

x<br />

y<br />

w<br />

⎞ ⎛<br />

⎠ = M tr<br />

⎝ x y<br />

w<br />

⎞<br />

⎠<br />

⎞<br />

⎠<br />

⎛<br />

⎝ x′<br />

y ′<br />

w ′<br />

⎞<br />

⎠ =<br />

⎛<br />

⎝ r 11 r 12 t x<br />

r 21 r 22 t y<br />

0 0 s<br />

⎞<br />

⎠ ·<br />

⎛<br />

⎝ x y<br />

w<br />

⎞<br />

⎠<br />

Definition 22 shows an example of how to construct a Affine Trans<strong>for</strong>mation that rotates, translates<br />

<strong>and</strong> scales it in one step. The trans<strong>for</strong>mation is done in 2D using homogeneous coordinates ( see<br />

Chapter 9.9 ). The parameters r i specify the rotation, t i specify the translational element <strong>and</strong> s<br />

is a scaling factor ( which in this case scales equally in both directions x, <strong>and</strong> y ).<br />

Prüfungsfragen:<br />

• Es seien zwei Punktwolken“ entsprechend Abbildung ?? gegeben. Stelle zunächst die<br />

”<br />

geeignete Trans<strong>for</strong>mation der einen Punktgruppe auf die zweite Punktgruppe unter Verwendung<br />

des dazu einzusetzenden Formelapparates (ohne Verwendung der angebenen Koordinaten)<br />

dar, sodass die markierten drei Punkte im linken Bild (jene drei, welche als<br />

Kreisflächen markiert sind) nach der Trans<strong>for</strong>mation mit den drei Punkten im rechten Bild<br />

(die ebenfalls als Kreisflächen markiert sind) zur Deckung gebracht werden.<br />

• Stellen Sie bitte für die in der Frage ?? gesuchte Berechnung der unbekannten Trans<strong>for</strong>mationsparameter<br />

die Koeffizientenmatrix auf, wobei die Koordinaten <strong>aus</strong> Abbildung ?? nur<br />

ganzzahling verwendet werden.


9.6. A GENERAL 2-DIMENSIONAL TRANSFORMATION 169<br />

9.6 A General 2-Dimensional Trans<strong>for</strong>mation<br />

We begin the consideration of a more general 2-dimensional trans<strong>for</strong>mation by a look at the<br />

bilinear trans<strong>for</strong>mation ( see Definition 23 ), which takes the input coordinates (x, y) <strong>and</strong> converts<br />

them into an output coordinate pair (X, Y ) via a bilinear expression which has a term with a<br />

product (x, y) of the input x <strong>and</strong> input y coordinates. This trans<strong>for</strong>mation is called bilinear<br />

bec<strong>aus</strong>e if we freeze either the coordinate x or the coordinate y we obtain a linear expression <strong>for</strong><br />

the trans<strong>for</strong>mation. Such a trans<strong>for</strong>mation has 8 parameters as we can see from Slide 9.36. Each<br />

input point (x, y) produces 2 equations as shown in that slide. We need four points to compute<br />

the trans<strong>for</strong>mation parameters a, b, c, e, f, g, <strong>and</strong> the translational parameters d <strong>and</strong> h. By means<br />

of a bilinear trans<strong>for</strong>mation we can match any group of four input points into any group of four<br />

output points <strong>and</strong> thereby achieve a perfect fit by means of that trans<strong>for</strong>mation.<br />

Definition 23 Bliniear trans<strong>for</strong>mation<br />

x ′ = a ∗ x + b ∗ y + c ∗ xy + d<br />

y ′ = e ∗ x + f ∗ y + g ∗ xy + h<br />

A more general trans<strong>for</strong>mation would be capable of taking a group of input points as shown in<br />

Slide 9.37, in this example with an arrangement of 16 points, into a desired output geometry<br />

as shown in Slide 9.38. We suggest that the r<strong>and</strong>omly de<strong>for</strong>med arrangements of that slide be<br />

converted into a rigidly rectangular pattern: How can we achieve this?<br />

Obviously, we need to define a trans<strong>for</strong>mation with 16 × 2 = 32 parameters <strong>for</strong> all 32 coordinate<br />

values. Slide 9.39 illustrates the basic concept. We are setting up a polynomial trans<strong>for</strong>mation to<br />

take the input coordinate pair (x, y) <strong>and</strong> translate it into an coordinate pair (X, Y ) by means of<br />

two 16-parameter polynomials. These polynomial coefficients a 0 , a 1 , a 2 , . . . <strong>and</strong> b 0 , b 1 , b 2 , . . . may<br />

initially be unknown, but if we have 16 input points with their input locations (x, y) <strong>and</strong> we<br />

know their output locations (X, Y ), then we can set up an equation system to solve the unknown<br />

trans<strong>for</strong>mation parameters a 0 , a 1 , . . . , a 15 , <strong>and</strong> b 0 , b 1 , . . . , b 15 .<br />

Slide 9.40 illustrates the type of computation we have to per<strong>for</strong>m. Suppose we had given in the<br />

input coordinate system 1 the input coordinates (x i , y i ) <strong>and</strong> we have n such points. We also have<br />

given in the output coordinate system 2 the output coordinates (X j , Y j ) <strong>and</strong> we have the same<br />

number of output points n. We can now set up the equation system that translates the input<br />

coordinates (x i , y i ) into output coordinates (X j , Y j ). What we ultimately obtain is an equation<br />

system:<br />

x = K · u<br />

In this equation, x are the known output coordinates, u is the vector of unknown trans<strong>for</strong>mation<br />

parameters, <strong>and</strong> this may be 4 in the con<strong>for</strong>mal trans<strong>for</strong>mation, 6 in the affine, 8 in the bilinear or,<br />

as we discussed be<strong>for</strong>e, 32 <strong>for</strong> a polynomial trans<strong>for</strong>mation that must fit 16 points from an input<br />

to 16 output locations. What is in the matrix K? It is the coefficient matrix <strong>for</strong> the equations <strong>and</strong><br />

is filled with the input coordinates as shown in the polynomial or other trans<strong>for</strong>mation equations.<br />

How large is the coefficient matrix K? Obviously <strong>for</strong> an affine trans<strong>for</strong>mation, the coefficient<br />

matrix K is filled with 6 by 6 elements, <strong>and</strong> in the polynomial case discussed here the coefficient<br />

matrix K has 36 by 36 elements.<br />

What happens if we have more points given in system 1 with their trans<strong>for</strong>med coordinates in<br />

system 2 than we need to solve <strong>for</strong> the unknowns? Suppose we had ten input <strong>and</strong> ten output<br />

points to compute the unknown coefficients of a con<strong>for</strong>mal trans<strong>for</strong>mations where we would only<br />

need 2 points producing 4 equations to allow us to solve <strong>for</strong> the 4 unknowns? We have an overdetermined<br />

equation system <strong>and</strong> our matrix K is rectangular. We can not invert a rectangular<br />

matrix. So what do we do?


170 CHAPTER 9. TRANSFORMATIONS<br />

There is a theory in statistics <strong>and</strong> estimation theory which is called Least Squares Method. Slide<br />

9.41 explains: we can solve an over-determined equation system which has a rectangular <strong>and</strong> not<br />

a square coefficient matrix by premultiplying the left <strong>and</strong> the right side of the equation system by<br />

a transposed of the coefficient matrix, K T . We obtain in this manner a square matrix K T · K on<br />

the right h<strong>and</strong> side <strong>and</strong> we call this a normal equation matrix. It is square in the shorter of the<br />

two dimensions of the rectangular matrix K <strong>and</strong> it can be inverted. So the unknown coefficient u<br />

results of an inverse of the product K T · K as shown in Slide 9.41.<br />

This is but a very simple glimpse at the matters of “Least Squares”. In reality, this is a concept<br />

that can fill many hundreds of pages of textbooks, but the basic idea is that we estimate the<br />

unknown parameters u using observations that are often erroneous, <strong>and</strong> to be robust against such<br />

errors, we provide more points (x i , y i ) in the input system <strong>and</strong> (X i , Y i ) in the output system than<br />

needed as a minimum. Bec<strong>aus</strong>e of these errors the equations will not be entirely consistent <strong>and</strong><br />

we will have to compute trans<strong>for</strong>mation parameters that will provide a best approximation of the<br />

trans<strong>for</strong>mation.<br />

“Least squares” solutions have optimality properties if the errors in the coordinates are statistically<br />

normally distributed.<br />

Prüfungsfragen:<br />

• Im 2D Raum sei ein bilineare Trans<strong>for</strong>mation gesucht, und die unbekannten Trans<strong>for</strong>mationsparameter<br />

seien zu berechnen. Es seien dafür N Punkte mit ihren Koordinaten vor und<br />

nach der Trans<strong>for</strong>mation bekannt, wobei N > 4. Welcher Lösungsansatz kommt hier zur<br />

Anwendung?<br />

Antwort: Methode der kleinsten Quadrate:<br />

X = K · u<br />

K T · X = K T ·K · u<br />

u = ( K T · K ) −1<br />

· KT · X<br />

• In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen<br />

Trans<strong>for</strong>mation in zwei Dimensionen erläutert:<br />

1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare<br />

gegeben sind<br />

2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/Output-<br />

Punktpaare gegeben sind ( ”<br />

Least squares method“)<br />

Die Methode der kleinsten Quadrate kann jedoch auch dann angew<strong>and</strong>t werden, wenn genau<br />

vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche<br />

Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese<br />

Feststellung?<br />

Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat.<br />

Antwort:<br />

u = ( K T K ) −1<br />

K T X = K −1 ( (<br />

K<br />

T ) −1<br />

K<br />

T ) X = K −1 X<br />

Diese Um<strong>for</strong>mungen sind möglich, da K hier eine quadratische Matrix ist. Da das Gleichungssystem<br />

nicht überbestimmt ist, existiert eine exakte Lösung (Fehler ε = 0). Diese<br />

Lösung wird auch von der Methode der kleinsten Quadrate gefunden, indem der Fehler<br />

(ε ≥ 0) minimiert wird.<br />

• Beschreiben Sie eine bilineare Trans<strong>for</strong>mation anh<strong>and</strong> ihrer Definitionsgleichung!


9.7. IMAGE RECTIFICATION AND RESAMPLING 171<br />

9.7 Image Rectification <strong>and</strong> Resampling<br />

We change the geometry of an input image as illustrated in Slide 9.43, Slide 9.44, Slide 9.45<br />

showing a mesh or a grid superimposed over the input image. We similarly show a different shape<br />

mesh in the output image. The task is to match the input image onto the output geometry so<br />

that the meshes fit one another. We have to establish a geometric relationship between the image<br />

in the input <strong>and</strong> the output using a trans<strong>for</strong>mation equation from the input to the output.<br />

If we now do a geometric trans<strong>for</strong>mation of an image we have essentially two tasks to per<strong>for</strong>m.<br />

First we need to describe the geometric trans<strong>for</strong>mation between the input image <strong>and</strong> the output<br />

image by assigning to every input image location the corresponding location in the output image.<br />

This is a geometric operation with coordinates. Second, we need to produce an output gray level<br />

<strong>for</strong> the resulting image based on the input gray levels. We call this second process a process of<br />

resampling as shown in Slide 9.46, <strong>and</strong> use operations on gray values.<br />

Again, what we do conceptually is to take an input image pixel at location (x, y) <strong>and</strong> to compute<br />

by a spatial trans<strong>for</strong>m the location in the output image at which this input pixel would fall <strong>and</strong><br />

this location has the coordinates (x ′ , y ′ ) in accordance with Slide 9.46. However, that location<br />

may not perfectly coincide with the center of a pixel in the output image. Now we have a second<br />

problem <strong>and</strong> that is to compute the gray value at the center of the output pixel by looking at the<br />

area in which the input image corresponds to that output location. One method is to assign the<br />

gray value we find in the input image to the specific location in the output image. If we use this<br />

method, we have used a so-called nearest neighbor-method.<br />

An application of this matter of resampling <strong>and</strong> rectification of images is illustrated in Slide 9.47.<br />

We have a distorted input image which would show an otherwise perfectly regular grid with some<br />

distortions. In the output image that same grid is reconstructed with reasonably perfect vertical<br />

<strong>and</strong> horizontal grid lines. The transition is obtained by means of a geometric rectification <strong>and</strong><br />

this rectification includes as an important element the function of resampling. Slide 9.113 is<br />

again the image of planet Mercury be<strong>for</strong>e the rectification <strong>and</strong> Slide 9.49 after the rectification<br />

per<strong>for</strong>ming a process as illustrated earlier. Let us hold right at this point <strong>and</strong> delay a further<br />

discussion of resampling to a separate later (Chapter 15). Resampling <strong>and</strong> image rectification was<br />

only mentioned at this point to establish the relationship of this task to the idea of 2-dimensional<br />

trans<strong>for</strong>mations from an input image to an output image.<br />

Prüfungsfragen:<br />

• Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein<br />

verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die er<strong>for</strong>derlich sind, um ein<br />

solches verzerrtes Bild zu rektifizieren!<br />

Antwort:<br />

1. geometrisches Resampling: Auffinden von korrespondierenden Positionen in beiden<br />

Bildern<br />

2. radiometrisches Resampling: Auffinden eines geeigneten Grauwertes im Ausgabebild<br />

9.8 Clipping<br />

As part of the process of trans<strong>for</strong>ming an object from a world coordinate system into a display<br />

coordinate system on a monitor or on a hardcopy output device we are faced with an interesting<br />

problem: We need to take objects represented by vectors <strong>and</strong> figure out which element of each<br />

vector is visible on the display device. This task is called clipping. An algorithm to achieve


172 CHAPTER 9. TRANSFORMATIONS<br />

clipping very efficiently is named after Cohen-Sutherl<strong>and</strong>. Slide 9.51 illustrates the problem<br />

a number of objects is in world coordinates <strong>and</strong> a display window will only show part of those<br />

objects. On the monitor the objects will be clipped.<br />

Slide 9.52 algorithm. The task is to receive on the input side a vector defined by the end points<br />

p 1 <strong>and</strong> p 2 <strong>and</strong> computing auxiliary points C, D where this vector intersects the display window<br />

which is defined by a rectangle.<br />

9.8.1 Half Space Codes<br />

In order to solve the clipping problem Cohen <strong>and</strong> Sutherl<strong>and</strong> have defined so-called half-space<br />

codes in Slide 9.54 <strong>and</strong> relate to the half spaces defined by the straight lines delineating the display<br />

window. These half-space codes designate spaces to the right, to the left, to the top <strong>and</strong> to the<br />

bottom of the boundaries of the display window, say with subscripts c r , c l , c t , <strong>and</strong> c b . For example<br />

if a point is to the right of the vertical boundary, the point’s half-space code is set to “true”, but if<br />

it is to the left the code per is set to “false”. Similar a location above the window gets a half-space<br />

code c t “true” <strong>and</strong> below gets “false”.<br />

We now need to define a procedure called “Encode” in Slide 9.55 which takes an input point p <strong>and</strong><br />

produces <strong>for</strong> it the associated four half-space codes assigning the half-space codes to variable c, a<br />

Boolean variable. We obtain 2 values <strong>for</strong> each of the 2 coordinates p x <strong>and</strong> p y of point p, obtaining<br />

a value of true or false depending on where p x falls with respect to the vertical boundaries of the<br />

display window, <strong>and</strong> on where p y falls with respect to the horizontal boundaries.<br />

9.8.2 Trivial acceptance <strong>and</strong> rejection<br />

Slide 9.56 is a picture of the first part of the procedure clip, as it is presented in [FvDFH90,<br />

Section 3.12.3]. Procedure “Encode” is called up <strong>for</strong> the beginning <strong>and</strong> end points of a straight<br />

line, denoted as P 1 <strong>and</strong> P 2 <strong>and</strong> the resulting half-space codes are denoted as C 1 <strong>and</strong> C 2 . We<br />

now have to take a few decisions about the straight line depending on where P 1 <strong>and</strong> P 2 fall. We<br />

compute 2 auxiliary Boolean variables, |In| 1 <strong>and</strong> |In| 2 . We can easily show that the straight line<br />

is entirely within the display window if |In| 1 <strong>and</strong> |In| 2 are “true”. This is called trivial acceptance,<br />

shown in Slide 9.57 <strong>for</strong> points A, B. Trivial rejection is also shown in Slide 9.57 <strong>for</strong> a straight line<br />

connecting points C <strong>and</strong> D.<br />

9.8.3 Is the Line Vertical?<br />

We need to proceed in the “Clipping Algorithm” in Slide 9.58, if we do not have a trivial acceptance<br />

nor a trivial rejection. We differentiate among cases where at least one point is outside the display<br />

window. The first possibility is that the line is vertical. That is considered first.<br />

9.8.4 Computing the slope<br />

If the line is not vertical we compute its slope. This is illustrated in Slide 9.59.<br />

9.8.5 Computing the Intersection A in the Window Boundary<br />

With this slope we compute the intersection of the straight line with the relevant boundary lines of<br />

the display window at w l , w r , w t <strong>and</strong> w b . We work our way through a few decisions to make sure<br />

that we do find the intersections of our straight line with the boundaries of the display window.


9.9. HOMOGENEOUS COORDINATES 173<br />

9.8.6 The Result of the Cohen-Sutherl<strong>and</strong> Algorithm<br />

The algorithm will produce a value starting that either the straight line is entirely outside of the<br />

window or it returns with the end points of the straight line. These are the end points from the<br />

input if the entire line segment is within the window, they are the intersection points of the input<br />

line with the window bountaries if the line intersects them.<br />

Prüfungsfragen:<br />

• Welche ”<br />

Halbraumcodes“ werden im Clipping verwendet, und welche Rolle spielen sie?<br />

• Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherl<strong>and</strong><br />

anh<strong>and</strong> des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes<br />

sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks<br />

R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt<br />

<strong>aus</strong> Abbildung B.18 ablesen.<br />

• Wenden Sie den Clipping-Algorithmus von Cohen-Sutherl<strong>and</strong> (in zwei Dimensionen)<br />

auf die in Beispiel B.2 gefundenen Punkte p ′ 1 und p ′ 2 an, um den innerhalb des Quadrats<br />

Q = {(0, 0) T , (0, 1) T , (1, 1) T , (1, 0) T } liegenden Teil der Verbindungsstrecke zwischen p ′ 1 und<br />

p ′ 2 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen<br />

grafisch lösen.<br />

Antwort:<br />

c l c r c t c b<br />

p ′ 1 true false false false<br />

p ′ 2 false true false true<br />

9.9 Homogeneous Coordinates<br />

A lot of use of homogenous coordinates is made in the world of computer graphics. The attraction<br />

of homogenous coordinates is that in a 2- or 3- dimensional trans<strong>for</strong>mation of an input x coordinate<br />

system or object described by x into an output coordinate system x ′ or changed object we do not<br />

have to split our operation into a part with a multiplication <strong>for</strong> the rotation matrix <strong>and</strong> scale<br />

factor, <strong>and</strong> separately have an addition <strong>for</strong> the translation vector t. Instead we simply employ<br />

only a matrix multiplication having a simple homogeneous coordinate X <strong>for</strong> a point <strong>and</strong> output<br />

coordinates X ′ <strong>for</strong> the same point after the trans<strong>for</strong>mation.<br />

Slide 9.62 explains the basic idea of homogenous coordinates. Instead of working in 2 dimensions<br />

in a 2-dimensional Cartesian coordinate system (x, y) we augment the coordinate system by a<br />

third coordinate w, <strong>and</strong> any point in 2D-space with locations (x, y) receives a third coordinate<br />

<strong>and</strong> there<strong>for</strong>e is at location (x, y, w). If we define w 1 = 1 we have defined a horizontal plane<br />

<strong>for</strong> the location of a point. Again Slide 9.63 states that Cartesian coordinates in 2 dimensions<br />

represent a point p as (x, y) <strong>and</strong> homogeneous coordinates in 2 dimensions have that same point<br />

represented by the three element vector (x, y, 1). Let us try to explain how we use homogeneous<br />

coordinates staying with 2 dimensions only. In Slide 9.64 we have another view of a translation<br />

in Cartesian coordinates. Slide 9.65 describes scaling, in this particular case an affine scaling<br />

occurs with separate scale factors in the two different coordinate directions (Slide 9.66 illustrates<br />

a rotation). Slide 9.67 illustrates the translation by means of a translation vector <strong>and</strong> scaling<br />

by means of a scaling matrix. Slide 9.68 introduces the relationship between a Cartesian <strong>and</strong><br />

a homogenous coordinate system. Slide 9.69 uses homogeneous coordinates <strong>for</strong> a translation by<br />

means of a multiplication of the input coordinate into an output coordinate system. The same<br />

operation is used <strong>for</strong> scaling in Slide 9.70 <strong>and</strong> <strong>for</strong> rotation in Slide ??, Slide ?? summarizes.<br />

As Slide 9.73 reiterates that translation <strong>and</strong> scaling are described by matrix multiplication <strong>and</strong><br />

of course rotation <strong>and</strong> scaling have previously also been matrix multiplications in the Cartesian


174 CHAPTER 9. TRANSFORMATIONS<br />

coordinate system. If we now combine these three trans<strong>for</strong>mations of translation, scaling <strong>and</strong><br />

rotation we obtain a single trans<strong>for</strong>mation matrix M which describes all three trans<strong>for</strong>mations<br />

without separation into multiplication <strong>and</strong> additions as is necessary in the Cartesian case.<br />

The simplicity of doing everything in matrix <strong>for</strong>m is the appeal that leads computer graphics<br />

software to heavily rely on homogeneous coordinates. In image analysis homogeneous coordinates<br />

are not as prevalent. One may assume that bec<strong>aus</strong>e we often times have in image processing<br />

to estimate trans<strong>for</strong>mation parameters using <strong>for</strong> this over-determined equation systems <strong>and</strong> the<br />

method of least squares. That approach typically is better applicable with Cartesian geometry<br />

than with the homogeneous system.<br />

Prüfungsfragen:<br />

• Erklären Sie die Bedeutung von homogenen Koordinaten für die <strong>Computer</strong>grafik! Welche<br />

Eigenschaften weisen homogene Koordinaten auf?<br />

• Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden<br />

an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß<br />

q = Mp zu trans<strong>for</strong>mieren (sog. rigid body trans<strong>for</strong>mation“)!<br />

”<br />

Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“<br />

”<br />

enthalten. Wären sie hingegen explizit <strong>for</strong>muliert, wäre die Antwort eigentlich Material der<br />

Gruppe I“.<br />

”<br />

• Gegeben seien die Trans<strong>for</strong>mationsmatrix<br />

und zwei Punkte<br />

M =<br />

⎛<br />

p 1 = ⎝<br />

⎛<br />

⎜<br />

⎝<br />

3<br />

−1<br />

1<br />

0 2 0 0<br />

0 0 2 0<br />

1 0 0 −5<br />

−2 0 0 8<br />

⎞<br />

⎛<br />

⎠ , p 2 = ⎝<br />

in Objektkoordinaten. Führen Sie die beiden Punkte p 1 und p 2 mit Hilfe der Matrix M in<br />

die Punkte p ′ 1 bzw. p ′ 2 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei<br />

die Umw<strong>and</strong>lungen zwischen dreidimensionalen und homogenen Koordinaten)!<br />

Antwort:<br />

⎛<br />

⎜<br />

⎝<br />

⎛<br />

⎜<br />

⎝<br />

0 2 0 0<br />

0 0 2 0<br />

1 0 0 −5<br />

−2 0 0 8<br />

0 2 0 0<br />

0 0 2 0<br />

1 0 0 −5<br />

−2 0 0 8<br />

⎞<br />

⎟<br />

⎠ ·<br />

⎞<br />

⎟<br />

⎠ ·<br />

⎛<br />

⎜<br />

⎝<br />

⎛<br />

⎜<br />

⎝<br />

3<br />

−1<br />

1<br />

1<br />

2<br />

4<br />

−1<br />

1<br />

⎞ ⎛<br />

⎟<br />

⎠ = ⎜<br />

⎝<br />

⎞ ⎛<br />

⎟<br />

⎠ = ⎜<br />

⎝<br />

−2<br />

2<br />

−2<br />

2<br />

8<br />

−2<br />

−3<br />

4<br />

⎞<br />

⎟<br />

⎠<br />

2<br />

4<br />

−1<br />

⎞<br />

⎞<br />

⎞<br />

⎠<br />

⎟<br />

⎠ ⇒ p′ 1 =<br />

⎟<br />

⎠ ⇒ p′ 2 =<br />

⎛<br />

⎛<br />

⎝<br />

−1<br />

1<br />

−1<br />

⎝ 2<br />

−0.5<br />

−0.75<br />

⎞<br />

⎠<br />

⎞<br />

⎠<br />

9.10 A Three-Dimensional Con<strong>for</strong>mal Trans<strong>for</strong>mation<br />

In three dimensions things become considerably more complex <strong>and</strong> more difficult to describe. Slide<br />

9.75 shows that a 3-dimensional con<strong>for</strong>mal trans<strong>for</strong>mation rotates objects or coordinate axes, scales


9.10. A THREE-DIMENSIONAL CONFORMAL TRANSFORMATION 175<br />

Definition 24 Rotation in 3D<br />

The three-dimensional rotation trans<strong>for</strong>ms an input point P with coordinates (x,y,z) into an output<br />

coordinate system (X,Y,Z) by means of a rotation matrix R.<br />

The elements of this rotation matrix can be interpreted following:<br />

- as the cosines of the angles subtended by the coordinate axes xX,yX,zX,...zZ<br />

- as the assembly of the three unit vectors directed along the axes of the rotated coordinate systems<br />

but described in terms of the input.<br />

R =<br />

or<br />

R =<br />

⎛<br />

⎝<br />

⎛<br />

cos(xX) cos(yX) cos(zX)<br />

cos(xY ) cos(yY ) cos(zY )<br />

cos(xZ) cos(yZ) cos(zZ)<br />

⎝ r ⎞<br />

11 r 12 r 13<br />

r 21 r 22 r 23<br />

⎠<br />

r 31 r 32 r 33<br />

⎞<br />

⎠<br />

P ′ = R · P<br />

A 3D rotation can be considered as a composition of three individual 2-D rotations around the<br />

coordinate axes x,y,z. It is easy to see that rotating around one axis will also affect the two other<br />

axis. There<strong>for</strong>e the sequence of the rotations is very important. Changing the sequence of the<br />

rotations may result in a different output image.<br />

them <strong>and</strong> translates, just as we had in 2 dimensions. However, the rotation matrix now needs to<br />

cope with three coordinate axes.<br />

In analogy to the 2-dimensional case we now know that the rotation matrix takes an input point<br />

P with coordinates (x, y, z) into an output coordinate system (X, Y, Z) by means of a rotation<br />

matrix R. The elements of this rotation matrix are again first: the cosines of the angles subtended<br />

by the coordinate axes xX, yX, zX, . . . , zZ; second is the assembly of three unit vectors directed<br />

along the axes of the rotated coordinate system but described in terms of the input coordinate<br />

systems (Slide 9.76 the multiplication of three 2-D rotation a matrices as shown in Slide 9.77. The<br />

composition of the rotation matrix by three individual 2-D rotations around the three coordinate<br />

axes x, y <strong>and</strong> z is the most commonly used approach. Each rotation around an axis needs to<br />

consider that that particular axis may already have been rotated by a previous rotation. Note as<br />

we rotate around a particular axis first, that will move the other two coordinate axes. We then<br />

rotate around the rotated second axis, affecting the third one again <strong>and</strong> then we rotate around the<br />

third axis. The sequence of rotations is of importance <strong>and</strong> will change the ultimate outcome if we<br />

change the sequence. Slide 9.79 illustrates how we might define a three-dimensional rotation <strong>and</strong><br />

translation by means of three points P 1 , P 2 , P 3 which represent two straight line sequency P 1 P 2<br />

<strong>and</strong> P 1 P 3 . We begin by translating P 1 into the origin of the coordinate system. We proceed by<br />

rotating P 2 into the z axis <strong>and</strong> complete the rotation by rotating P 3 into the yz plane. We thereby<br />

obtain the final position. If we track this operation we see that we have applied several rotations.<br />

We have first rotated P 1 P 2 into the xz plane. Then we have rotated the result around the y-axis<br />

into the z-axis. Finally we have rotated P 1 P 3 around the z-axis into the yz plane. Slide 9.80<br />

<strong>and</strong> Slide 9.81 explain in detail the sequence of three rotations of three angles which are denoted<br />

in this case first as angle Θ, second angle φ, <strong>and</strong> third angle α. Generally, a three dimensional<br />

con<strong>for</strong>mal trans<strong>for</strong>mation will be described by a scaling l, a rotation matrix R <strong>and</strong> a translation<br />

vector t. Note that l is a scalar value, the rotation matrix is a 3 by 3 matrix containing three<br />

angles <strong>and</strong> translation vector t has three elements with translations along the directions x, y, <strong>and</strong><br />

z. This type of trans<strong>for</strong>mation contains seven parameters <strong>for</strong> the three dimensions as opposed to<br />

four parameters in the 2D case. Note that the rotation matrix has 3 angles, the scale factor is a<br />

fourth value <strong>and</strong> the translation vector has three values, resulting in a total of seven parameters


176 CHAPTER 9. TRANSFORMATIONS<br />

to define this trans<strong>for</strong>mation.<br />

Prüfungsfragen:<br />

• Was versteht man unter einer ”<br />

kon<strong>for</strong>men Trans<strong>for</strong>mation“?<br />

9.11 Three-Dimensional Affine Trans<strong>for</strong>mations<br />

Definition 25 Affine trans<strong>for</strong>mation with 3D homogeneous coordinates<br />

case ’translation’:<br />

case ’rotation x’:<br />

case ’rotation y’:<br />

case ’rotation z’:<br />

case ’scale’:<br />

tr matrix =<br />

rotation x =<br />

rotation y =<br />

rotation z =<br />

⎛<br />

⎜<br />

⎝<br />

⎛<br />

⎜<br />

⎝<br />

⎛<br />

⎜<br />

⎝<br />

scale matrix =<br />

⎛<br />

⎜<br />

⎝<br />

1 0 0 t x<br />

0 1 0 t y<br />

0 0 1 t z<br />

0 0 0 1<br />

⎞<br />

⎟<br />

⎠<br />

1 0 0 0<br />

0 cos φ − sin φ 0<br />

0 sin φ cos φ 0<br />

0 0 0 1<br />

cos φ 0 sin φ 0<br />

0 1 0 0<br />

− sin φ 0 cos φ 0<br />

0 0 0 1<br />

cos φ − sin φ 0 0<br />

sin φ cos φ 1 0<br />

0 0 0 0<br />

0 0 0 1<br />

⎛<br />

⎜<br />

⎝<br />

s x 0 0 0<br />

0 s y 1 0<br />

0 0 s z 0<br />

0 0 0 1<br />

By using homogeneous coordinates, all trans<strong>for</strong>mations are 4x4 matrices. So the trans<strong>for</strong>mations<br />

can be easily combined by multiplying the matrices. This results in a speedup bec<strong>aus</strong>e every point<br />

is only multiplied with one matrix <strong>and</strong> not with all trans<strong>for</strong>mation-matrices.<br />

The three-dimensional trans<strong>for</strong>mation may change the object shape. A simple change results from<br />

shearing or squishing, <strong>and</strong> produces an affine trans<strong>for</strong>mation. Generally, the affine trans<strong>for</strong>mation<br />

⎞<br />

⎟<br />

⎠<br />

⎞<br />

⎟<br />

⎠<br />

⎞<br />

⎟<br />

⎠<br />

⎞<br />

⎟<br />


9.12. PROJECTIONS 177<br />

does not have a single scale factor, but we may have up to three different scale factors along the<br />

x, y, <strong>and</strong> z axes as illustrated in Slide 9.83. An other interpretation of this effect is to state<br />

that a coordinate X is obtained from the input coordinates (x, y, z) by means of these shearing<br />

elements h yx <strong>and</strong> h zy which are really part of the scaling matrix M sc . Ultimately, a cube will be<br />

de<strong>for</strong>med into a fairly irregular shape as shown in Slide 9.84 with the example of a building shape.<br />

A three-dimensional affine trans<strong>for</strong>mations now has 12 parameters, so that trans<strong>for</strong>mations of the<br />

x, y, <strong>and</strong> z coordinates are independent of one another. Yet, however, the trans<strong>for</strong>mation will<br />

maintain straight lines as straight lines. However, right angles will not remain right angles.<br />

9.12 Projections<br />

From a higher dimensional space, projections produce images in a lower dimensional space. We<br />

have projection lines in projectors that connect input to output points, we have projection centers<br />

<strong>and</strong> we have a projection surface onto which the high-dimensional space is projected. In the real<br />

world we basically project 3-dimensional spaces onto 2 dimensional projection planes. The most<br />

common projections are the perspective projections as used by the human eye <strong>and</strong> by optical<br />

cameras.<br />

We differentiate among a multitude of projections. The perspective projections model what happens<br />

in a camera or the human eye. However, engineers have long used parallel projections. These<br />

are historically used also in the arts <strong>and</strong> in cartography <strong>and</strong> have projection rays (also called<br />

projectors or projection lines) that are parallel. If they are perpendicular onto the projection<br />

plane we talk about an orthographic projection. If they are not perpendicular but oblique to the<br />

projection plane, we talk about an oblique projection (see Slide 9.86).<br />

A special case of the orthographic projection results in the commonly used presentations of three<br />

dimension space in a top view, front view <strong>and</strong> side view (Slide 9.87) case. Heavy use in architecture<br />

<strong>and</strong> civil engineering of top views, front views <strong>and</strong> side views of a 3-D space is easy to justify:<br />

from these 3 views we can reconstruct the 3 dimensions of that space. Another special is the<br />

axonometric projection where the projection plane is not in one of the three coordinate planes of<br />

a three-dimensional space.. Yet another special case in the isometric projection which occurs if<br />

the projection plane is chosen such that all three coordinate axes are changed equally much in the<br />

projection (the projection rays are directed along the vector with elements (1,1,1). We highlight<br />

particular oblique projections which are the cavalier <strong>and</strong> the cabinet projection. The cavalier<br />

projection produces no scale reduction along the coordinate axes bec<strong>aus</strong>e it projects perfectly<br />

under 45 ◦ . In the cabinet projection we project under an angle α = 63.4 ◦ since from tan α = 2,<br />

this projection shrinks an object in one direction by factor of 1/2.<br />

9.13 Vanishing Points in Perspective Projections<br />

In order to construct a perspective projection we can take advantage of parallel lines. In the<br />

natural world they meet of course at infinity. In the projection they meet at a so-called vanishing<br />

point 1 . This is a concept of descriptive geometry, a branch of mathematics. Slide 9.91 is the<br />

example of a perspective projection as produced by a synthetic camera. Note how parallel lines<br />

converge at a point which typically is outside the display area. The vanishing point is the image<br />

of the object point at infinity.<br />

Bec<strong>aus</strong>e there exists an infinity of directions <strong>for</strong> bundles of parallel lines in 3D space, there exists<br />

an infinity of vanishing points. However, special vanishing points are associated with bundles<br />

of lines that are parallel with the coordinate axes. Such vanishing points are called principal.<br />

1 in German: Fluchtpunkt


178 CHAPTER 9. TRANSFORMATIONS<br />

If we may have only one axis producing a finite vanishing point since the other two axes are<br />

themselves parallel to the projection plane <strong>and</strong> their vanishing points are at infinity. There<strong>for</strong>e<br />

such a perspective projection is called a one-point perspective in which a cube aliguid with the<br />

coordinate axes will only have one vanishing point. Analogously, Slide 9.93 <strong>and</strong> Slide 9.94 present<br />

a 2-point <strong>and</strong> a general 3-point perspective.<br />

9.14 A Classification of Projections<br />

Slide 9.96 presents the customary hierarchy of projections as they are commonly presented in<br />

books about architecture, art <strong>and</strong> engineering. In all cases, these projections are onto a plane<br />

<strong>and</strong> are thus planar projections. The differenciation between perspective <strong>and</strong> parallel projections<br />

is somewhat artificial if one considers that with a perspective center at infinity, one obtains the<br />

parallel projection. However, the projections are grouped into parallel <strong>and</strong> perspective projections,<br />

the perspective axes are then subdivided into single point, two point <strong>and</strong> three point perspective<br />

projections <strong>and</strong> the parallel projections are classified into orthographic <strong>and</strong> oblique ones, the<br />

oblique have the cavalier <strong>and</strong> cabinet projection as special cases. The orthographic projections<br />

have the axonometry on one h<strong>and</strong> <strong>and</strong> the multi-view orthographic on the other h<strong>and</strong> <strong>and</strong> within<br />

the axonometric projection we have one special case we discussed, the isometric projection.<br />

We do not discuss the world of more complex projections, <strong>for</strong> example to convert the surface of a<br />

sphere into a plane: this is the classical problem of cartography with its need to present a picture<br />

of the Earth on a flat sheet of paper.<br />

Prüfungsfragen:<br />

• In der Vorlesung wurde ein ”<br />

Baum“ für die Hierarchie diverser Projektionen in die Ebene<br />

dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden<br />

Projektionen.<br />

9.15 The Central Projection<br />

This is the most important projection of all the ones we discuss in this class. The simple reason<br />

<strong>for</strong> this is that it is the geometric model of a classical camera. Slide 9.98 explains the geometry of<br />

a camera <strong>and</strong> defines three coordinate systems. The first is the world coordinate system with X,<br />

Y <strong>and</strong> Z. In this world coordinate system we have a projection center O at location (X 0 , Y 0 , Z 0 ).<br />

The projection center is the geometric model of a lens. All projection lines are straight lines<br />

going from the object space, where there is an object point P at the location (x, y, z) through the<br />

projection center O <strong>and</strong> intersecting the image plane.<br />

We know at this point that the central projection is similar to the perspective projection. There<br />

is a small difference, though. We define the projection center with respect to an image plane <strong>and</strong><br />

insist on some additional parameters that describe the central projection that we do not typically<br />

use in the perspective projection.<br />

Note that we have a second coordinate system that is in the image plane that is denoted in Slide<br />

9.98 by ξ <strong>and</strong> η. This is a rectangular 2-dimensional Cartesian coordinate system with an origin<br />

at point M. The point P in object space is projected onto the image location P ′ = (x, y). Third,<br />

we have the location of point O of the perspective center defined in a sensor coordinate system.<br />

The sensor coordinate system has its origin at the perspective center O, is a three-dimensional<br />

coordinate system, its x <strong>and</strong> y axes are nominally parallel to the image coordinate system (ξ, η)<strong>and</strong><br />

the z-axis is perpendicular to the image plane.


9.15. THE CENTRAL PROJECTION 179<br />

We do have an additional point H defined in the central projection which is the intersection of<br />

the line perpendicular to the image plane <strong>and</strong> passing through the projection center. Note that<br />

this does not necessarily have to be identical to point M. M simply is the origin of the image<br />

coordinate system <strong>and</strong> is typically the point of symmetry with respect to some fiducial marks as<br />

shown in Slide 9.98.<br />

In order to describe a central projection we need to know the image coordinate system with<br />

its origin M, we need to know the sensor coordinate system with its origin O <strong>and</strong> we need to<br />

underst<strong>and</strong> the relationship between the sensor coordinate system <strong>and</strong> the world coordinate system<br />

(X, Y, Z).<br />

Let us take another look at the same situation in Slide 9.99 where the coordinate systems are again<br />

illustrated. We have the projection center O as in the previous slide <strong>and</strong> we have two projection<br />

rays going from object point P 1 to image point P 1 ′ or object point P 2 to image point P 2. ′ We also<br />

do have an optical axis or the direction of the camera axis which passes through point O <strong>and</strong> is<br />

perpendicular to the image plane. In Slide 9.99 there are two image planes suggested. One is<br />

between the perspective center <strong>and</strong> the object area <strong>and</strong> that suggests the creation of a positive<br />

image. In a camera, however, the projection center is typically between the object <strong>and</strong> the image<br />

plane <strong>and</strong> that leads geometrically to a negative image. Slide 9.99 also defines again the idea of<br />

an image coordinate system. In this case it is suggested that the image is rectangular <strong>and</strong> the<br />

definition of the image coordinates is by some artificial marks that are placed in the image plane.<br />

The marks are connected <strong>and</strong> define the origin M. We have also again the point H which is<br />

the intersection of the line perpendicular to the image plane but passing through the projection<br />

center. He also have some arbitrary location <strong>for</strong> a point P ′ that is projected into the image. We<br />

will from here on out ignore that M <strong>and</strong> H may be 2 locations. Typically the distance between<br />

M <strong>and</strong> H is small, <strong>and</strong> it is considered an error of a camera if M <strong>and</strong> H don’t coincide.<br />

Normal cameras that we use as amateurs do not have those fiducial marks <strong>and</strong> there<strong>for</strong>e they<br />

are called non-metric cameras bec<strong>aus</strong>e they do not define an image coordinate system. Users of<br />

non-metric cameras who want to measure need to help themselves by some auxiliary definition of<br />

an image coordinate system <strong>and</strong> they must make sure that the image coordinate system is the<br />

same from picture to picture if multiple pictures show the same object. Professional cameras that<br />

are used <strong>for</strong> making measurements <strong>and</strong> reconstructing 3D objects typically will have those fiducial<br />

marks as fixed features of a camera. In digital cameras the rows <strong>and</strong> columns of a CCD array will<br />

provide an inherent coordinate system bec<strong>aus</strong>e of the numbering of the pixels.<br />

Slide 9.100 is revisiting the issue of a 3-dimensional rotation. We have mentioned be<strong>for</strong>e that there<br />

are three coordinate systems in the camera: Two of those are 3-dimensional <strong>and</strong> the third one is<br />

2-dimensional. The sensor coordinate system with its origin at projection center O <strong>and</strong> the world<br />

coordinate system (X, Y, Z) need to be related via a 3-dimensional trans<strong>for</strong>mation. Slide 9.100<br />

suggests we have 3 angles that define the relationship between the 2 coordinate systems. Each of<br />

those angles represents a 2-dimensional rotation around 1 of the 3 world coordinate axes. Those<br />

are angles in 3D space.<br />

Recall that we have several definitons of rotations matrixes <strong>and</strong> that we can define a rotation<br />

matrix by various geometric entities. These can be rotations around axes that rotate themselves,<br />

or they can be angles in 3-D space subtended by the original axes <strong>and</strong> the rotated axis. Slide<br />

9.100 describes the first case. Θ is the angle of rotation around the axis Z, but in the process we<br />

will rotate axes X <strong>and</strong> Y . φ is rotating around the axis X <strong>and</strong> will take with it obviously the<br />

axes Z <strong>and</strong> Y . And A then is a rotation around the rotated axis Z. Conceptually, everything<br />

we said earlier about 3-dimensional trans<strong>for</strong>mations, rotations <strong>and</strong> so <strong>for</strong>th applies here as well.<br />

Our earlier discussions of a 3D con<strong>for</strong>mal trans<strong>for</strong>mation applies to the central projection <strong>and</strong> the<br />

central projection really is mathematically modeled by the 3-dimensional con<strong>for</strong>mal which elevates<br />

that particular projection to a particularly important role.


180 CHAPTER 9. TRANSFORMATIONS<br />

9.16 The Synthetic Camera<br />

We have various places in our class in which we suggest the use of a synthetic camera. We<br />

have applications in computer graphics in order to create a picture on a display medium, on a<br />

monitor, <strong>for</strong> augmented or virtual reality. We have it in image processing <strong>and</strong> photogrammetry<br />

to reconstruct the world from images <strong>and</strong> we have terminology that has developed separately as<br />

follows. What is called a projection plane or image plane is in computer graphics called a View<br />

Plane. What in image processing is a projection center is in computer graphics a View Reference<br />

Point VRP. And what in image processing is the optical axis or the camera axis is in computer<br />

graphics the View Plane Normal VPN. Slide 9.102 <strong>and</strong> Slide 9.103 explain this further. We do<br />

have again a lens center <strong>and</strong> an image plane <strong>and</strong> an optical axes Z that is perpendicular to the<br />

image plane which itself is defined by coordinate axis X <strong>and</strong> Y . An arbitrary point in object<br />

space (X, Y, Z) is projected through the lens center on to an image plane. Note that in a synthetic<br />

camera we do not worry much about fine points such as an image coordinate system defined by<br />

fiducial marks or the difference between the points M <strong>and</strong> H (M being the origin of the image<br />

coordinate system <strong>and</strong> H being the intersection point of the line normal to the image plane <strong>and</strong><br />

passing through the lens center).<br />

In robotics we typically use cameras that might use rotations around very particular axes. Slide<br />

9.103 defines the world coordinate system with (X, Y, Z) <strong>and</strong> defines a point or axis of rotation<br />

in the world coordinate system at the end of vector w 0 at location (X 0 , Y 0 , Z 0 ). That location of<br />

an axis of rotation then defines the angle under which the camera itself is looking at the world.<br />

The camera has coordinate axes (x, y, z) <strong>and</strong> an optical axis in the direction of coordinate axis z.<br />

The image coordinates are 2-dimensional with an origin at the center of the image <strong>and</strong> that point<br />

itself is defined by an auxiliary vector r with respect to the point of rotation. So we see that we<br />

have various definitions of angles <strong>and</strong> coordinate systems <strong>and</strong> we always need to underst<strong>and</strong> these<br />

coordinate systems <strong>and</strong> convert them into one another.<br />

Slide 9.104 explains this further: We do have a camera looking at the world, again we have an image<br />

coordinate system (x, y), <strong>and</strong> a sensor system (x, y, z) that are defined in the world coordinate<br />

system (X, Y, Z). As we want to define where a camera is in the world coordinate system <strong>and</strong> in<br />

which direction its optical axis is pointing we have to build up a trans<strong>for</strong>mation just as we did<br />

previously with the 3-dimensional con<strong>for</strong>mal trans<strong>for</strong>mation.<br />

Let us assume that we start out with a perfect alignment of our camera in the world coordinate<br />

system so that the sensor coordinate axes x, y, z <strong>and</strong> the world coordinate axis X, Y , Z are<br />

coinciding. We now move the camera into an arbitrary position which represents the translation<br />

in 3-D space defined, if you recall, by the translational vector t. Then we orient the camera by<br />

rotating it essentially around 3 axes into an arbitrary position. First rotation may be as suggested<br />

in Slide 9.105 around the z axis which represents the angle A in Slide ??. In this slide it is<br />

suggested that the angle is 135 o . Next we roll the camera around the x axis, also again by an<br />

angle of 135 o <strong>and</strong> instead of having the camera looking up into the sky we now have it look down at<br />

the object. Obviously we can apply a third rotation around the rotated axis y to give our camera<br />

attitude complete freedom. We now have a rotation matrix that will be defined by those angles of<br />

rotation that we just described, we have a translation vector as described earlier. Implied in all of<br />

this is also a scale factor. We have not discussed yet the perspective center <strong>and</strong> the image plane.<br />

Obviously, as the distance grows, we go from a wide-angle through a normal-angle to a tele-lens<br />

<strong>and</strong> that will affect the scale. So the scale of the image is affected by the distance of the camera<br />

from the object <strong>and</strong> also by the distance of the projection center from the image plane.<br />

Note that we need 7 elements to describe the trans<strong>for</strong>mation that we have seen in Slide 9.105. We<br />

need 3 elements of translation, we have 3 angles of rotation <strong>and</strong> we have one scale factor that is<br />

defined by the distance of the projection center from the image plane. That are the exact same 7<br />

trans<strong>for</strong>mation parameters that we had earlier in the 3-dimensional con<strong>for</strong>mal trans<strong>for</strong>mation.<br />

Prüfungsfragen:


9.17. STEREOPSIS 181<br />

• Gegeben seien eine 4 × 4-Matrix<br />

sowie vier Punkte<br />

M =<br />

⎛<br />

⎜<br />

⎝<br />

8 0 8 −24<br />

0 8 8 8<br />

0 0 0 24<br />

0 0 1 1<br />

p 1 = (3, 0, 1) T<br />

p 2 = (2, 0, 7) T<br />

p 3 = (4, 0, 5) T<br />

p 4 = (1, 0, 3) T<br />

im dreidimensionalen Raum. Die Matrix M fasst alle Trans<strong>for</strong>mationen zusammen, die zur<br />

Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p ′ = M · p<br />

in Gerätekoordinaten er<strong>for</strong>derlich sind (siehe auch Abbildung B.36, die Bildschirmebene und<br />

daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Trans<strong>for</strong>mationsmatrix<br />

M werden die Punkte p 1 und p 2 auf die Punkte<br />

p ′ 1 = (4, 8, 12) T<br />

p ′ 2 = (6, 8, 3) T<br />

in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p ′ 3 und p ′ 4!<br />

⎞<br />

⎟<br />

⎠<br />

Antwort:<br />

es gilt<br />

˜p ′ 1 = (8, 16, 24, 2) T ⇒ p ′ 1 = (4, 8, 12) T<br />

˜p ′ 2 = (48, 64, 24, 8) T ⇒ p ′ 2 = (6, 8, 3) T<br />

˜p ′ 3 = (48, 48, 24, 6) T ⇒ p ′ 3 = (8, 8, 4) T<br />

˜p ′ 4 = (8, 32, 24, 4) T ⇒ p ′ 4 = (2, 8, 6) T<br />

9.17 Stereopsis<br />

This is a good time to introduce the idea of stereopsis although we will have a separate chapter<br />

later in this class. The synthetic camera produces an image that we can look at with one eye<br />

<strong>and</strong> if we produce a second image <strong>and</strong> show it to the other eye we will be able to “trick” the eye<br />

into a 3-dimensional perception of the object that was imaged. Slide 9.107. We model binocular<br />

vision by two images: we compute or present to the eyes two existing natural images of the object,<br />

separately one image to one eye <strong>and</strong> the other image to the other eye. Those images can be taken<br />

by one camera placed in two locations or there can be synthetic images computed with a synthetic<br />

camera. Slide 9.108 explains further that our left eye is seeing point P left , the right eye is seeing<br />

point P right , <strong>and</strong> in the brain those 2 observations are merged in a 3-dimensional location P .<br />

Slide 9.109 illustrates that a few rules need to be considered when creating images <strong>for</strong> stereoscopic<br />

viewing. Image planes <strong>and</strong> the optical axis <strong>for</strong> the 2 images should be parallel. There<strong>for</strong>e, one<br />

should not create two images with converging optical axes. This would be inconsistent with natural<br />

human viewing. Only people who squint 2 will have converging optical axes. Normal stereoscopic<br />

viewing would create a headache if the images were taken with converging optical axes.<br />

We call the distance between the two lens centers <strong>for</strong> the two stereoscopic images the stereobase<br />

B. Slide 9.110 shows the same situation in a top view. We have the distance from the lens center<br />

2 in German: schielen


182 CHAPTER 9. TRANSFORMATIONS<br />

to the image plane which is typically noted as the camera constant or focal length <strong>and</strong> an object<br />

point W which is projected into image locations (X 1 , Y 1 ) <strong>and</strong> (X 2 , Y 2 ), <strong>and</strong> we have the two optical<br />

axes Z parallel to one another <strong>and</strong> perpendicular to XY .<br />

Note that we call the ratio of B/Distance-to-W also the Base/Heigth ratio, this being a measure<br />

of quality <strong>for</strong> the stereo-view. If we compute a synthetic image from a 3-dimensional object <strong>for</strong><br />

the left <strong>and</strong> the right eye we might get a result as shown in Slide 9.111 which indeed can be viewed<br />

stereoscopically under a stereoscope.<br />

To make matters a little more complicated yet, it turns out that a human can view stereoscopically<br />

two images that do not necessarily have to be made by a camera under a central perspective<br />

projection. As long as the two images are similar enough in radiometry <strong>and</strong> if the geometric<br />

differences are not excessive, the human will be able to merge the two images into a 3-dimensional<br />

impression. This factor has been used in the past to represent measurements in 3 dimensions,<br />

<strong>for</strong> example, temperature. We could encode temperature as a geometric difference in 2 otherwise<br />

identical images <strong>and</strong> we would see a 2-dimensional scene <strong>and</strong> temperature would be shown as<br />

height. This <strong>and</strong> similar applications have in the past been implemented by various researchers.<br />

9.18 Interpolation versus Trans<strong>for</strong>mation<br />

One may want to transfer an object such as a distorted photo in Slide 9.113 into an output<br />

geometry. This can be accomplished by a simplified trans<strong>for</strong>mation based <strong>for</strong> example on 4 points.<br />

This will reveal errors (distortions) in other known points (see Slide 9.114 <strong>and</strong> Slide 9.115). These<br />

errors can be used to interpolate a continuous error function d x (x, y), d y (x, y) which must be<br />

applied to each (x, y) location:<br />

x ′ = x + d x (x, y)<br />

y ′ = y + d y (x, y)<br />

We have replaced a complicated trans<strong>for</strong>mation by a much simpler trans<strong>for</strong>mation plus an interpolation.<br />

Question: What is the definition of interpolation?<br />

9.19 Trans<strong>for</strong>ming a Representation<br />

9.19.1 Presenting a Curve by Samples <strong>and</strong> an Interpolation Scheme<br />

We may want to represent an object in various ways. We may have a continuous representation<br />

of an object or we might sample that object <strong>and</strong> represent the intervals between samples by some<br />

kind of interpolation <strong>and</strong> approximation technique. So we have conceptually something similar<br />

to a trans<strong>for</strong>mation bec<strong>aus</strong>e we have two different ways of representing an object. Slide 9.117<br />

introduces the basic idea that is described by a set of points p 1 , p 2 , . . . , p n . If we are in a 2-<br />

dimensional space we may want to represent that object not by n points but by a mathematical<br />

curve. In 3-dimensional space it may be a surface to represent a set of points: We trans<strong>for</strong>m from<br />

one representation into another.<br />

A second item is that an object may not be given by points, but by a set of curves x = f x (t),<br />

y = f y (t), <strong>and</strong> z = f z (t). We would like to replace this representation by another mathematical<br />

representation which may be more useful <strong>for</strong> certain tasks.<br />

Again while we are going to look at this basically in 2 dimensions or <strong>for</strong> curves, a generalization<br />

into 3 dimensions <strong>and</strong> to surfaces always applies.


9.19. TRANSFORMING A REPRESENTATION 183<br />

9.19.2 Parametric Representations of Curves<br />

We introduce the parametric representation of a curve. We suggest in Slide 9.120 that the 2-<br />

dimensional curve Q in an (x, y) Cartesian coordinate system can be represented by two curves<br />

Q = x(t), y(t). We note this as a parametric representation. The parameter t typically can be<br />

the length of the curve <strong>and</strong> as we proceed along a curve, the coordinate x <strong>and</strong> the coordinate y<br />

will change as the function of the curve length t. More typically, t may be “time” <strong>for</strong> a point to<br />

move along the curve. The advantage of a parametric representation is described in Slide 9.120.<br />

The tangent is replaced by a tangent vector Q ′ (t) = ( dx(t)<br />

dt<br />

length.<br />

, dy(t)<br />

dt<br />

). That vector has a direction <strong>and</strong><br />

9.19.3 Introducing Piecewise Curves<br />

We may also not use a representation of the function x(t) or y(t) with a high order polynomial<br />

but instead we might break up the curve into individual parts, each part being a polynomial of<br />

third order (a cubic polynomial). We connect those polynomials at joints by <strong>for</strong>cing continuity at<br />

the joints.<br />

If a curve is represented in 3D-space by the equations x(t), y(t), <strong>and</strong> z(t) as shown in Slide 9.121,<br />

we can request that at the joints those polynomial pieces be continuous in the function but are<br />

also continuous in the first derivative or tangent. We may even want to make it continuous in<br />

the curvature or second derivative (the length of the tangent). However, this type of geometric<br />

continuity is narrower than the continity in “speed” as acceleration, if t is interpreted as time.<br />

One represents such a curve by a function Q(t) which is really a vector function (x(t), y(t), z(t)).<br />

9.19.4 Rearranging Entities of the Vector Function Q<br />

In accordance with the equation of Slide 9.121, Q(t) can be represented as a multiplication of a<br />

(row) vector T <strong>and</strong> a coefficient matrix C where T contains the independent parameter t as the<br />

coefficient of the unknowns a x , b x , <strong>and</strong> c x . Matrix C can now be decomposed into M · G. As a<br />

result we can write that Q(t) = T · M · G <strong>and</strong> we call now G a geometry vector <strong>and</strong> M is called<br />

a basis matrix. We can introduce a new entity, a function B = T · M <strong>and</strong> those are the cubic<br />

polynomials or the so-called blending functions.<br />

Prüfungsfragen:<br />

• Was sind der ”<br />

Geometrievektor“, die ”<br />

Basisfunktion“ und die ”<br />

Blending Funktionen“ einer<br />

parametrischen Kurvendarstellung?<br />

Antwort:<br />

Man zerlegt C in C = M · G, sodass<br />

x(t) = a x t 3 + b x t 2 + c x t + d x<br />

y(t) = a y t 3 + b y t 2 + c y t + d y<br />

z(t) = a z t 3 + b z t 2 + c z t + d z<br />

Q(t) = (x(t), y(y), z(t)) T = T · C<br />

T = (t 3 , t 2 , t, 1)<br />

Q(t) = T · C = T · M · G<br />

mit G als Geometrievektor und M als Basismatrix. Weiters sind<br />

B = T · M


184 CHAPTER 9. TRANSFORMATIONS<br />

kubische Polynome, die Blending Functions.<br />

9.19.5 Showing Examples: Three methods of Defining Curves<br />

Slide 9.122 introduces three definitions of curves that are frequently used in engineering. Let’s<br />

take a look at an example in Slide 9.123. In that slide we have a continuous curve represented<br />

by 2 segments S <strong>and</strong> C. They are connected at a joint. Depending on the tangent vector at the<br />

joint we may have different curves. Illustrated in Slide 9.123 are 3 examples C 0 , C 1 , <strong>and</strong> C 2 . C 0<br />

is obtained if we simply en<strong>for</strong>ce at the joint that the function be continuous, but we don’t worry<br />

about the tangent vectors to be have the same direction. C 1 results if we say that the function<br />

has to have the same derivative. C 2 further defines that also the length must be identical at the<br />

joint. So we have 3 different types of continuity at the joint: function, velocity, acceleration. This<br />

type of continuity is narrower than mere geometric continuity with function, slope <strong>and</strong> curneture.<br />

In computer graphics one describes the type of continuity by the direction <strong>and</strong> the length of the<br />

tangent vector. Slide 9.124 again illustrates how a point P 2 is the joint between curve segments<br />

Q 1 <strong>and</strong> Q 2 , two curves passing through P 1 , P 2 , P i <strong>and</strong> P 3 . Defining two different lengths <strong>for</strong> the<br />

tangent (representing velocity) leads to two different curve segments Q 2 , Q 3 .<br />

Slide 9.125 describes a curve with two segments joining at point P . We indicate equal time<br />

intervals, showing a “velocity” that reduces as we approach point P . At point P we change<br />

direction <strong>and</strong> accelerate. In this case of course, the function is continuous but as shown in that<br />

example, the tangent is not continuous. We have a discontinuity in the first derivative at point P .<br />

9.19.6 Hermite’s Approach<br />

There is a concept in the representation of curves by means of cubic parametric equations called the<br />

Hermite’s Curves. We start out with the beginning <strong>and</strong> end point of a curve <strong>and</strong> the beginning<br />

<strong>and</strong> end tangent vector of that curve, <strong>and</strong> with those elements we can define a geometry vector G<br />

as discussed earlier in Slide 9.121. Slide 9.127 explains several cases where we have a beginning<br />

<strong>and</strong> end point of a curve defined, associated with a tangent vector <strong>and</strong> as a result we can now<br />

describe a curve. Two points <strong>and</strong> two tangent vectors define four elements of a curve. In 2D space<br />

this is a third order or cubic curve with coefficients a, b, c <strong>and</strong> d. Slide 9.128’s curves are basically<br />

defined by 2 points <strong>and</strong> 2 tangent vectors. Since the end point of one curve is identical to the<br />

beginning point of the next, we obtain a continuous curve. The tangent vectors are parallel but<br />

point into opposite directions. Geometrically we are continuous in the shape, but the vertices are<br />

opposing one another. This lends itself to describing curves by placing points interactively on a<br />

monitor with a tangent vector. This is being done in constructing complex shapes, say in the car<br />

industry where car bodies need to be designed. A particular approach to accomplishing this has<br />

been proposed by Bezier.<br />

9.20 Bezier’s Approach<br />

Pierre Bezier worked <strong>for</strong> a French car manufacturer <strong>and</strong> invented an approach of designing 3-<br />

dimensional shapes, but we will discuss this in 2 dimensions only. He wanted to represent a smooth<br />

curve by means of 2 auxiliary points which are not on the curve. Note that so far we have had<br />

curves go through our points, <strong>and</strong> Bezier wanted a different approach. So he defined 2 auxiliary<br />

points <strong>for</strong> a curve <strong>and</strong> the directions of the tangent vectors. Slide 9.130 defines the beginning <strong>and</strong><br />

end points, P 1 <strong>and</strong> P 4 <strong>and</strong> the tangent at P 1 using an auxiliary point P 2 <strong>and</strong> the tangent at P 4 by<br />

using an auxiliary point P 3 . By moving P 2 <strong>and</strong> P 3 one can obtain various shapes as one pleases,<br />

passing through P 1 <strong>and</strong> P 4 .


9.21. SUBDIVIDING CURVES AND USING SPLINE FUNCTIONS 185<br />

Definition 26 Bezier-curves in 2D<br />

Sind definierte Punkte P 0 bis P n gegeben, die durch eine Kurve angenähert werden sollen, dann<br />

ist die dazugehörige Bézierkurve:<br />

P (t) =<br />

n∑<br />

Bi n (t)P i 0 ≤ t ≤ 1 1.0<br />

i=0<br />

Die Basisfunktionen, Bernsteinpolynaome genannt, ergeben sich <strong>aus</strong>:<br />

B n i (t) =<br />

( n<br />

i<br />

)<br />

t i (1 − t) n−i<br />

( n<br />

)<br />

n!<br />

mit =<br />

i i!(n − i)!<br />

2.0<br />

Sie können auch rekursiv berechnet werden. Bézierkurven haben die Eigenschaften, dass sie:<br />

• Polynome (in t) vom Grad n sind, wenn n+1 Punkte gegeben sind,<br />

• innerhalb der konvexen Hülle der definierenden Punkte liegen,<br />

• im ersten Punkt P 0 beginnen und im letzten Punkt P n enden und<br />

• alle Punkte P 0 bis P n Einfluss auf den Verlauf der Kurve haben.<br />

Slide 9.131 illustrates the mathematics behind it. Obviously, we have a tangent at P 1 denoted as<br />

R 1 , which is according to Bezier 3 · (P 2 − P 1 ). The analogous applies to tangent R 4 . If we define<br />

tangents in that way, we then obtain a third order parametric curve Q(t) as shown in Slide 9.131.<br />

Slide 9.132 recalls what we have discussed be<strong>for</strong>e, how these cubic polynomials <strong>for</strong> a parametric<br />

representation of a curve or surface can be decomposed into a geometric vector <strong>and</strong> a basis matrix<br />

<strong>and</strong> how we define a blending function. Slide 9.133 illustrates geometrically some of those blending<br />

functions <strong>for</strong> Bezier. Those particular ones are called Bernstein-curves.<br />

Now let’s proceed in Slide 9.134 to the construction of a complicated curve that consists of 2<br />

polynomial parts. We there<strong>for</strong>e need the beginning <strong>and</strong> end point <strong>for</strong> the first part, P 1 <strong>and</strong> P 4 ,<br />

<strong>and</strong> the beginning <strong>and</strong> end point <strong>for</strong> the second part which is P 4 <strong>and</strong> P 7 . We then need to have<br />

auxiliary points P 2 , P 3 , P 5 <strong>and</strong> P 6 to define the tangent vectors at P 1 , P 4 , P 7 . P 3 defines the<br />

tangent at P 4 <strong>for</strong> the first curve segment <strong>and</strong> P 5 defines the tangent at point P 4 <strong>for</strong> the second<br />

segment. We are operating here with piece-wise functions. If P 3 , P 4 , <strong>and</strong> P 5 are colinear, then<br />

the curve is geometrically continuous. Study Slide 9.134 <strong>for</strong> details.<br />

Prüfungsfragen:<br />

• Was ist die Grundidee bei der Konstruktion von 2-dimensionalen ”<br />

Bezier-Kurven“?<br />

• Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von<br />

Kurven, und erläutern Sie anh<strong>and</strong> einer Skizze ein Approximationsverfahren Ihrer Wahl!<br />

9.21 Subdividing Curves <strong>and</strong> Using Spline Functions<br />

We can generalize the ideas of Bezier <strong>and</strong> other people <strong>and</strong> basically define spline functions 3<br />

as functions that are defined by a set of data points P 1 , P 2 , . . . , P n to describe an object <strong>and</strong><br />

we approximate the object by piecewise polynomial functions that are valid on certain intervals.<br />

In the general case of splines the curve does not necessarily have to go through P 1 , P 2 , . . . , P n .<br />

3 in German: Biegefunktionen


186 CHAPTER 9. TRANSFORMATIONS<br />

Algorithm 24 Casteljau<br />

1: {Input: array p[0:n] of n+1 points <strong>and</strong> real number u}<br />

2: {Output: point on curve, p(u)}<br />

3: {Working: point array q[0:n]}<br />

4: <strong>for</strong> i := 0 to n do<br />

5: q[i] := p[i] {save input}<br />

6: end <strong>for</strong><br />

7: <strong>for</strong> k := 1 to n do<br />

8: <strong>for</strong> i := 0 to n - k do<br />

9: q[i] := (1 - u)q[i] + uq[i + 1]<br />

10: end <strong>for</strong><br />

11: end <strong>for</strong><br />

12: return q[0]<br />

We need to define the locations of the joints, <strong>and</strong> the type of continuity we want. Note that we<br />

abondon here the used <strong>for</strong> a parametric representation.<br />

Let us examine the idea that our points describing an object may be in error, <strong>for</strong> example those<br />

points may be reconstructions from photographs taken of an object using a stereo reconstruction<br />

process. Bec<strong>aus</strong>e the points may be in error <strong>and</strong> there<strong>for</strong>e be noisy, we do not want the curve<br />

or surface to go through the points. We want an approximation of the shape. In that case we<br />

need to have more points than we have unknown parameters of our function. In the Least Squares<br />

approach discussed earlier, we would get a smooth spline going nearly through the points. Slide<br />

9.136 illustrates the idea of a broken-up curve <strong>and</strong> defines a definition area <strong>for</strong> each curve between<br />

joints P 2 , P 3 <strong>and</strong> P 3 , P 4 . We en<strong>for</strong>ce continuity of the curves at the joints, <strong>for</strong> example by saying<br />

that the tangent has to be identical. A spline that goes exactly through the data points is different<br />

from the spline that approximates the data points only. Note that the data points are called control<br />

points 4 .<br />

Of course the general idea of a spline function can be combined with Bezier as suggested in<br />

Slide 9.137 curve. For added flexibility we want to replace a single Bezier curve by two Bezier<br />

curves which are defined on a first <strong>and</strong> second part of the original Bezier curve. We solve this<br />

problem by finding auxiliary points <strong>and</strong> tangents such that the conditions apply, by propertionally<br />

segmenting distance as shown.<br />

Slide 9.139 illustrates the process. The technique is named after a French engineer Casteljeau.<br />

The single curve defined by P 1 , P 4 (<strong>and</strong> auxiliary points P 2 , P 3 ) is broken into two smaller curves<br />

defined by L 1 , . . . , L 4 <strong>and</strong> another curve defined by R 1 , . . . , R 4 .<br />

Spline functions of a special kind exist if we en<strong>for</strong>ce that the tangents at the joint are parallel to<br />

the line going through adjacent neighboring joints. Slide 9.140 explains. The technique is named<br />

after Catmull-Rom.<br />

Prüfungsfragen:<br />

• In Abbildung B.26 sehen Sie vier Punkte P 1 , P 2 , P 3 und P 4 , die als Kontrollpunkte für<br />

eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des<br />

Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 1 3 , also x( 1 3<br />

), und<br />

erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26<br />

eintragen, eine skizzenhafte Darstellung ist <strong>aus</strong>reichend.<br />

Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der<br />

Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird.<br />

Antwort: Die Strecken sind rekursiv im Verhältnis 1 3 : 2 3<br />

zu teilen (siehe Abbildung 9.2).<br />

4 in German: Pass-Punkte


¡<br />

¥<br />

£<br />

¥<br />

¡<br />

¥<br />

9.22. GENERALIZATION TO 3 DIMENSIONS 187<br />

¦¥<br />

¤£<br />

©<br />

¢¡<br />

¨§<br />

Figure 9.2: Konstruktion einer Bezier-Kurve nach Casteljau<br />

9.22 Generalization to 3 Dimensions<br />

Slide 9.142 suggests a general idea of taking the 2-dimensional discussions we just had <strong>and</strong> transporting<br />

them into 3 dimensions. Bezier, splines <strong>and</strong> so <strong>for</strong>th, all exist in 3-D as well. That in<br />

effect is where the applications are. Instead of having coordinates (x, y) or parameters t we now<br />

have coordinates (x, y, z) or parameters t 1 , t 2 . Instead of having points define a curve we now have<br />

a 3-dimensional arrangement of auxiliary points that serve to approximate a smooth 3D-surface.<br />

9.23 Graz <strong>and</strong> Geometric Algorithms<br />

On a passing note, a disproportional number of people who have been educated at the TU Graz<br />

have become well-known <strong>and</strong> respected scientists in the discussion of geometric algorithms. Obviously,<br />

Graz has been a hot bed of geometric algorithms. Look out <strong>for</strong> classes on “Geometric<br />

Algorithms”. Note that these geometric algorithms we have discussed are very closely related to<br />

mathematics <strong>and</strong> really are associated with theoretical computer science <strong>and</strong> less so with computer<br />

graphics <strong>and</strong> image processing. The discussion of curves <strong>and</strong> surfaces also is a topic of descriptive<br />

geometry. In that context one speaks of “free-<strong>for</strong>m curves <strong>and</strong> surfaces”. Look out <strong>for</strong> classes <strong>and</strong><br />

that subject as well!


188 CHAPTER 9. TRANSFORMATIONS


9.23. GRAZ AND GEOMETRIC ALGORITHMS 189<br />

Slide 9.1 Slide 9.2 Slide 9.3 Slide 9.4<br />

Slide 9.5 Slide 9.6 Slide 9.7 Slide 9.8<br />

Slide 9.9 Slide 9.10 Slide 9.11 Slide 9.12<br />

Slide 9.13 Slide 9.14 Slide 9.15 Slide 9.16<br />

Slide 9.17 Slide 9.18 Slide 9.19 Slide 9.20<br />

Slide 9.21 Slide 9.22 Slide 9.23 Slide 9.24<br />

Slide 9.25 Slide 9.26 Slide 9.27 Slide 9.28


190 CHAPTER 9. TRANSFORMATIONS<br />

Slide 9.29 Slide 9.30 Slide 9.31 Slide 9.32<br />

Slide 9.33 Slide 9.34 Slide 9.35 Slide 9.36<br />

Slide 9.37 Slide 9.38 Slide 9.39 Slide 9.40<br />

Slide 9.41 Slide 9.42 Slide 9.43 Slide 9.44<br />

Slide 9.45 Slide 9.46 Slide 9.47 Slide 9.48<br />

Slide 9.49 Slide 9.50 Slide 9.51 Slide 9.52<br />

Slide 9.53 Slide 9.54 Slide 9.55 Slide 9.56


9.23. GRAZ AND GEOMETRIC ALGORITHMS 191<br />

Slide 9.57 Slide 9.58 Slide 9.59 Slide 9.60<br />

Slide 9.61 Slide 9.62 Slide 9.63 Slide 9.64<br />

Slide 9.65 Slide 9.66 Slide 9.67 Slide 9.68<br />

Slide 9.69 Slide 9.70 Slide 9.71 Slide 9.72<br />

Slide 9.73 Slide 9.74 Slide 9.75 Slide 9.76<br />

Slide 9.77 Slide 9.78 Slide 9.79 Slide 9.80<br />

Slide 9.81 Slide 9.82 Slide 9.83 Slide 9.84


192 CHAPTER 9. TRANSFORMATIONS<br />

Slide 9.85 Slide 9.86 Slide 9.87 Slide 9.88<br />

Slide 9.89 Slide 9.90 Slide 9.91 Slide 9.92<br />

Slide 9.93 Slide 9.94 Slide 9.95 Slide 9.96<br />

Slide 9.97 Slide 9.98 Slide 9.99 Slide 9.100<br />

Slide 9.101 Slide 9.102 Slide 9.103 Slide 9.104<br />

Slide 9.105 Slide 9.106 Slide 9.107 Slide 9.108<br />

Slide 9.109 Slide 9.110 Slide 9.111 Slide 9.112


9.23. GRAZ AND GEOMETRIC ALGORITHMS 193<br />

Slide 9.113 Slide 9.114 Slide 9.115 Slide 9.116<br />

Slide 9.117 Slide 9.118 Slide 9.119 Slide 9.120<br />

Slide 9.121 Slide 9.122 Slide 9.123 Slide 9.124<br />

Slide 9.125 Slide 9.126 Slide 9.127 Slide 9.128<br />

Slide 9.129 Slide 9.130 Slide 9.131 Slide 9.132<br />

Slide 9.133 Slide 9.134 Slide 9.135 Slide 9.136<br />

Slide 9.137 Slide 9.138 Slide 9.139 Slide 9.140


194 CHAPTER 9. TRANSFORMATIONS<br />

Slide 9.141 Slide 9.142 Slide 9.143


Chapter 10<br />

Data Structures<br />

10.1 Two-Dimensional Chain-Coding<br />

Algorithm 25 Chain coding<br />

1: resample boundary by selecting larger grid spacing<br />

2: starting from top left search the image rightwards until a pixel P[0] belonging to the region is<br />

found<br />

3: initialize orientation d with 1 to select northeast as the direction of the previous move<br />

4: initialize isLooping with true<br />

5: initialize i with 1<br />

6: while isLooping do<br />

7: search the neighbourhood of the current pixel <strong>for</strong> another unvisited pixel P[i] in a clockwise<br />

direction beginning from (d + 7) mod 8, increasing d at every search step<br />

8: if no unvisited pixel found then<br />

9: set isLooping false<br />

10: else<br />

11: print d<br />

12: end if<br />

13: increase i<br />

14: end while<br />

We start from a raster image of a linear object. We are looking <strong>for</strong> a compact <strong>and</strong> economical<br />

representation by means of vectors. Slide 10.3 illustrates the 2-dimensional raster of a contour<br />

image, which is to be encoded by means of a chain-code. We have to make a decision about the<br />

level of generalization or elimination of detail. Slide 10.4 describes the 4 <strong>and</strong> 8 neighborhood <strong>for</strong><br />

each pixel <strong>and</strong> indicates by a sequence of numbers how each neighbor is labeled as 1, 2, 3, 4, . . . , 8.<br />

Using this approach, we can replace the actual object by a series of pixels <strong>and</strong> in the process<br />

obtain a different resolution. We have resampled the contour of the object. Slide 10.6 shows how<br />

a 4-neighborhood <strong>and</strong> an 8-neighborhood will serve to describe the object by a series of vectors,<br />

beginning at an initial point. The encoding itself is represented by a string of integer numbers.<br />

Obviously we obtain a very compact representation of that contour.<br />

Next we can think of a number of normalizations of that coding scheme. We may dem<strong>and</strong> that<br />

the sum of all codes be minimized. Instead of recording the codes themselves to indicate in which<br />

direction each vector points, we can look at code differences only, which would have the advantage<br />

that they are invariant under rotations.<br />

Obviously the object will look different if we change the direction of the grid at which we resample<br />

195


196 CHAPTER 10. DATA STRUCTURES<br />

the contour. An extensive theory of chain codes has been introduced by H. Freeman <strong>and</strong> one of<br />

the best-known coding schemes is there<strong>for</strong>e also called the Freeman-Chain-Code.<br />

Prüfungsfragen:<br />

• Gegeben sei eine Punktfolge entsprechend Abbildung ?? und ein Pixelraster, wie dies in<br />

Abbildung ?? dargestellt ist. Geben Sie bitte sowohl grafisch als auch numerisch die kompakte<br />

Kettenkodierung dieser Punktfolge im Pixelraster an, welche mit Hilfe eines 8-Codes<br />

erhalten wird.<br />

10.2 Two-Dimensional Polygonal Representations<br />

Algorithm 26 Splitting<br />

1: Splitting methods work by first drawing a line from one point on the boundary to another.<br />

2: Then, we compute the perpendicular distance from each point along the segment to the line.<br />

3: If this exceeds some threshold, we break the line at the point of greatest error.<br />

4: We then repeat the process recursively <strong>for</strong> each of the two new lines until we don’t need to<br />

break any more.<br />

5:<br />

6: For a closed contour, we can find the two points that lie farthest apart <strong>and</strong> fit two lines<br />

between them, one <strong>for</strong> one side <strong>and</strong> one <strong>for</strong> the other. Then, we can apply the recursive<br />

splitting procedure to each side.<br />

Let us assume that we do have an object with an irregular contour as shown in Slide 10.9 on the<br />

left side. We describe that object by a series of pixels <strong>and</strong> the transition from the actual detailed<br />

contour to the simplification of a representation by pixels must follow some rules. One of those<br />

is a minimum parameter rule which takes the idea of a rubber b<strong>and</strong> that is fit along the contour<br />

pixels as shown on the right-h<strong>and</strong> side of Slide 10.9.<br />

At issue is many times the simplification of a shape in order to save space, while maintaining the<br />

essence of the object. Slide 10.10 explains how one may replace a polygonal representation of an<br />

object by a simplified minimum quadrangle. One will look <strong>for</strong> the longest distance that can be<br />

defined from points along the contour of the object. This produces a line segment ab. We then<br />

further subdivide that shape by looking <strong>for</strong> the longest line that is perpendicular to the axis that<br />

we just found. This produces a quadrangle. We can now continue on <strong>and</strong> further refine this shape<br />

by a simplifying polygon defining a maximum deviation between the actual object contour <strong>and</strong> its<br />

simplification. If the threshold value is set at 0.25 then we obtain the result shown in Slide 10.10.<br />

The process is also denoted as splitting (algorithm 26).<br />

Prüfungsfragen:<br />

• Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale<br />

Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie<br />

einen Schritt des Algorithmus im Detail anh<strong>and</strong> Ihrer Zeichnung! Wählen Sie den Schwellwert<br />

so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann<br />

vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung<br />

B.35 einzeichnen.


10.3. A SPECIAL DATA STRUCTURE FOR 2-D MORPHING 197<br />

Definition 27 2D morphing <strong>for</strong> lines<br />

Problems with other kinds of representation can be taken care of by the parametric representation.<br />

In Parametric representation a single parameter t can represent the complete straight line once<br />

the starting <strong>and</strong> ending points are given. In parametric representation<br />

x = X(t), y = Y (t)<br />

For starting point (x1, y1) <strong>and</strong> ending point (x2, y2)<br />

(x, y) = (x1, y1) if t = 0<br />

(x, y) = (x2, y2) if t = 1<br />

Thus any point (x, y) on the straight line joining two points (x1, y1) <strong>and</strong> (x2, y2) is given by<br />

x = x1 + t(x2 − x1)<br />

y = y1 + t(y2 − y1)<br />

10.3 A Special Data Structure <strong>for</strong> 2-D Morphing<br />

Suppose the task is defined as in Slide 10.13 where an input figure, in this particular case a cartoon<br />

of President Bush, needs to be trans<strong>for</strong>med into an output figure, namely the cartoon of President<br />

Clinton. The approach establishes a relationship between the object contour points of the input<br />

<strong>and</strong> output cartoons. Each point on the input cartoon will correspond to one or no point on the<br />

output cartoon. In order to morph the input into the output one needs now to take these vectors<br />

which link these points. We introduce a parametric representation x = f x (t), y = f y (t). We<br />

gradually increase the value of the parameter t from 0 to 1. At a value of the parameter t = 0 one<br />

has the Bush cartoon, at the parameter t = 1, one has the Clinton cartoon. The transition can be<br />

illustrated in as many steps as one likes. The basic concept is shown in Slide ?? <strong>and</strong> Slide 10.14<br />

<strong>and</strong> the result is shown in Slide 10.15.<br />

Prüfungsfragen:<br />

• In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in<br />

eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder<br />

als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen<br />

werden benötigt? Erläutern Sie Ihre Antwort anh<strong>and</strong> einer beliebigen Strecke<br />

<strong>aus</strong> Abbildung B.3!<br />

10.4 Basic Concepts of Data Structures<br />

For a successful data structure we would like to have a direct access to data independent of how big<br />

a data base is. We would like to have simple arrays, our data should be stored sequentially <strong>and</strong> we<br />

might use pointer lists, thus pointers, chains, trees, <strong>and</strong> rings. This all is applicable in geometric<br />

data represented by coordinates. Slide 10.17 illustrates how we can build a directed graph of some<br />

geometric entities that are built from points in 3-dimensional space with coordinates x, y, z at the<br />

base. From those points, we produce lists of edges which combine two points into an edge. From<br />

the edges, one builds regions or areas which combine edges into contours of areas.<br />

Slide 10.18 shows that we request an ease of dynamic changes in the data, so we can insert or delete<br />

points <strong>and</strong> objects or areas. We will also like to be able to change dynamically a visualization:<br />

if we delete an object we should not be required to completely recompute everything. We would


198 CHAPTER 10. DATA STRUCTURES<br />

like to have support <strong>for</strong> a hierarchical approach so that we can look at an overview as well as at<br />

detail. And we would like to be able to group objects into hyper-objects <strong>and</strong> we need to have a<br />

r<strong>and</strong>om access to arbitrary objects independent of the number of objects in the data base. Let us<br />

now examine a few data structures.<br />

Prüfungsfragen:<br />

• Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch<br />

seine (polygonale) Oberfläche genutzt werden kann!<br />

10.5 Quadtree<br />

Algorithm 27 Quadtree<br />

1: {define datastructure quadtree}<br />

2: quadtree=(SW,SE,NW,NE:Pointer of quadtree,value)<br />

{SW south-western son, SE south-eastern son}<br />

{NW north-western son, NE north-eastern son}<br />

{value holds e.g. brightness}<br />

3: init quadtree = (NULL,NULL,NULL,NULL,0)<br />

4: while the entire image has not been segmented do<br />

5: segment actually processed area into 4 squares<br />

6: if there is no element of the object left in a subdivided square then<br />

7: link a leaf to the quadtree according to the actually processed square {leaf =<br />

quadtree(NULL,NULL,NULL,NULL,value)}<br />

8: else<br />

9: link new node to (SW or SE or NW or NE) of <strong>for</strong>mer quadtree according to the actually<br />

processed square<br />

{node = quadtree (SW,SE,NW,NE,0)}<br />

10: if node holds four leafs containing the same value then<br />

11: replace node with leaf containig value<br />

12: end if<br />

13: end if<br />

14: end while<br />

A quadtree is a tree data structure <strong>for</strong> 2-dimensional graphical data, where we subdivide the root,<br />

the 2-dimensional space, into squares of equal size, so we subdivide an entire area into 4 squares,<br />

we subdivide those 4 squares further into 4 squares <strong>and</strong> so <strong>for</strong>th. We number each quadrant as<br />

shown in Slide 10.20. Now if we have an object in an image or in a plane we describe the object<br />

by a quadtree by breaking up the area sequentially, until such time that there is no element of the<br />

object left in a subdivided square. In this case we call this a leaf of the tree structure, an empty<br />

leaf. So we have as a node a quadrant, <strong>and</strong> each quadrant has four pointers to its sons. The sons<br />

will be further subdivided until such time that there is either the entire quadrant filled with the<br />

object or it is entirely empty.<br />

A slight difference to the quadtree is the Bin-tree. In it, each node has only two sons <strong>and</strong> not four<br />

like in the quadtree. Slide 10.21 explains.<br />

If there is a mechanical part available as shown in Slide 10.22 then a pixel representation may be<br />

shown on the left <strong>and</strong> the quadtree representation at right. The quadtree is more efficient. There<br />

is an entire literature on geometric operations in quadtrees such as geometric trans<strong>for</strong>mations,<br />

scale changes, editing, visualization, Boolean operations <strong>and</strong> so <strong>for</strong>th. Slide 10.23 represents the<br />

mechanical part of Slide 10.24 in a quadtree representation.


10.6. DATA STRUCTURES FOR IMAGES 199<br />

A quadtree has “levels of subdivisions”, obviously, <strong>and</strong> its root is at the highest level, with a single<br />

node. The next level up is shown in Slide 10.24 <strong>and</strong> has one empty <strong>and</strong> three full nodes which are<br />

further subdivided into a third level with some empty <strong>and</strong> some full leaves <strong>and</strong> some nodes that<br />

are further subdivided into a fourth level. The leafs are numbered sequentially from north-west to<br />

south-east. Slide 10.25 again illustrates how a raster image with pixels of equal area is converted<br />

into a quadtree representation. It is more efficient since there are fewer leafs in a quadtree than<br />

there are pixels in an image, except when the image is totally chaotic.<br />

One may want to store all leaves whether they are empty or full or one stores only the full leafs,<br />

thereby saving storage space. Typically this may save 60 percent as in the example of Slide 10.26.<br />

Prüfungsfragen:<br />

• Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung<br />

dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung<br />

”<br />

B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes<br />

grafisch mitzuteilen.<br />

• Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen<br />

Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält<br />

sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen?<br />

10.6 Data Structures <strong>for</strong> Images<br />

So far we have looked at data structures <strong>for</strong> binary data, showing objects by means of their<br />

contours, or as binary objects in a raster image. In this chapter, we are looking at data structures<br />

<strong>for</strong> color <strong>and</strong> black <strong>and</strong> white gray value images. A fairly complete list of such data structures can<br />

be seen in PhotoShop (Slide 10.28 <strong>and</strong> Slide 10.29). Let us review a few structures as shown in<br />

Slide 10.30.<br />

We can store an image by storing it pixel by pixel, <strong>and</strong> all in<strong>for</strong>mation that belongs to a pixel<br />

is stored sequentially, or we store row by row <strong>and</strong> we repeat say red, green, blue <strong>for</strong> each row of<br />

images or we can go b<strong>and</strong> sequential which means we store a complete image, one <strong>for</strong> the red, one<br />

<strong>for</strong> the green, one <strong>for</strong> the blue channel. Those <strong>for</strong>ms are called BSSF or BIFF (B<strong>and</strong> Sequential<br />

File Format or similar). The next category is the TIFF-<strong>for</strong>mat, a tagged image file <strong>for</strong>mat, another<br />

one is to store images in tiles, in little 32 by 32 or 128 by 128 windows.<br />

The idea of hexagonal pixels has been proposed. An important idea is that of pyramids, where a<br />

single image is reproduced at different resolutions, <strong>and</strong> finally representations of images by fractals<br />

or wavelets <strong>and</strong> so <strong>for</strong>th exist. Slide 10.31 illustrates the idea of an image pyramid. The purpose<br />

of pyramids is to start an image analysis process on a much reduced version of an image, e.g. to<br />

segment it into its major parts <strong>and</strong> then guide a process which refines the preliminary segmentation<br />

from resolution level to resolution level. This increases the robustness of an approach <strong>and</strong> also<br />

reduces computing times. At issue is how one takes a full resolution image <strong>and</strong> creates from it<br />

reduced versions. This may be by simple averaging or by some higher level processes <strong>and</strong> filters<br />

that create low resolutions from neighborhoods of higher resolution pixels.<br />

Slide 10.32 suggests that data structures <strong>for</strong> images are important in the context of image compression<br />

<strong>and</strong> we will address that subject under the title “Compression” towards the end of this<br />

class.<br />

Prüfungsfragen:<br />

• In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das<br />

erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht


200 CHAPTER 10. DATA STRUCTURES<br />

nur mehr <strong>aus</strong> einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo<br />

wird sie eingesetzt (nennen Sie mindestens ein Beispiel)?<br />

• In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt<br />

gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge<br />

des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten<br />

Speicherbedarf einer solchen Repräsentation her, wobei<br />

– das erste (größte) Bild <strong>aus</strong> N × N Pixeln besteht,<br />

– alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden,<br />

– eine mögliche Komprimierung nicht berücksichtigt werden soll!<br />

Hinweis: Benutzen Sie die Gleichung ∑ ∞<br />

i=0 qi = 1<br />

1−q<br />

für q ∈ R, 0 < q < 1.<br />

Antwort:<br />

∞∑<br />

( i 1<br />

S(N) < N 2 ·<br />

4)<br />

10.7 Three-Dimensional Data<br />

i=0<br />

= N 2 1 ·<br />

1 − 1 4<br />

= N 2 · 1<br />

3<br />

= 4 3 N 2<br />

4<br />

The requirements <strong>for</strong> a successful data structure are listed in Slide 10.34. Little needs to be added<br />

to the contents of that slide.<br />

Prüfungsfragen:<br />

• Nennen Sie allgemeine An<strong>for</strong>derungen an eine Datenstruktur zur Repräsentation dreidimensionaler<br />

Objekte!<br />

10.8 The Wire-Frame Structure<br />

Definition 28 Wireframe structure<br />

The simplest three-dimensional data structure is the wire-frame. A wireframe model captures the<br />

shape of a 3D object in two lists, a vertex list <strong>and</strong> an edge list. The vertex list specifies geometric<br />

in<strong>for</strong>mation: where each corner is located. The edge list provides conectivity in<strong>for</strong>mation,<br />

specifying (in arbitrary order) the two vertices that <strong>for</strong>m the endpoints of each edge.<br />

The vertex-lists are used to build edges, the edges build edge-lists which then build faces<br />

or facets <strong>and</strong> facets may build objects. In a wire-frame, there are no real facets, we simply go<br />

from edges to objects directly.<br />

The simplest three-dimensional data structure is the wire-frame. At the lowest level we have a list<br />

of three-dimensional coordinates. The point-lists are used to build edges, the edges build edge-lists<br />

which then build faces or facets <strong>and</strong> facets may build objects. In a wire-frame, there are no real<br />

facets, we simply go from edges to objects directly. Slide 10.36 shows the example of a cube with


10.9. OPERATIONS ON 3-D BODIES 201<br />

the object, the edge-lists <strong>and</strong> the point-lists. The edges or lines <strong>and</strong> the points or vertices are<br />

again listed in Slide 10.37 <strong>for</strong> a cube. In Slide 10.38 the cube is augmented by an extra-plane <strong>and</strong><br />

represented by two extra vertices <strong>and</strong> three extra lines.<br />

Prüfungsfragen:<br />

• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken<br />

gezeigt. Benennen Sie die vier Darstellungstechniken!<br />

10.9 Operations on 3-D Bodies<br />

Assume that we have 2 cubes, A <strong>and</strong> B, <strong>and</strong> we need to intersect them. A number of Boolean<br />

operations can be defined as an intersection or a union of 2 bodies, subtracting B from A or A<br />

from B leading to different results.<br />

10.10 Sweep-Representations<br />

A sweep-representation creates a 3-D object by means of a 2-D shape. An object will be created<br />

by moving the 2-D representation through 3-D space denoting the movement as sweep. We may<br />

have a translatory or a rotational sweep as shown in Slide ?? <strong>and</strong> Slide 10.43. A translatory sweep<br />

can be obtained by a cutting tool. A rotational sweep obviously will be obtained by a rotational<br />

tool. We have in Slide 10.43 the cutting tool, the model of a part <strong>and</strong> the image of an actual part<br />

as produced in a machine.<br />

Prüfungsfragen:<br />

• Was versteht man unter einer ”<br />

Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese<br />

Art der Objektrepräsentation?<br />

• In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene<br />

Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben!<br />

10.11 Boundary-Representations<br />

A very popular representation of objects is by means of their boundaries. Generally, these representations<br />

are denoted as B-reps. They are built from faces with vertices <strong>and</strong> edges. Slide 10.45<br />

illustrates an object <strong>and</strong> asks the question of how many objects are we facing here, how many<br />

faces, how many edges <strong>and</strong> so <strong>for</strong>th? A B-rep system makes certain assumptions about the topology<br />

of an object. In Slide 10.46 we show a prism that is <strong>for</strong>med from 5 faces, 6 vertices, 9 edges.<br />

A basic assumption is that differential small pieces on the surface of the object can be represented<br />

by a plane as shown in the left <strong>and</strong> central elements of Slide 10.46. On the right-h<strong>and</strong> side of Slide<br />

10.46 is a body that does not satisfy the dem<strong>and</strong>s on a 2-manifold topology <strong>and</strong> that is the type<br />

of body we may have difficulties with in a B-rep system.<br />

A boundary representation takes advantage of Euler’s Formula. It relates the number of vertices,<br />

faces <strong>and</strong> edges to one another as shown in Slide 10.47. A simple polyhedron is a body that can be<br />

de<strong>for</strong>med into a sphere <strong>and</strong> there<strong>for</strong>e has no holes. In this case, Euler’s Formula applies. Slide<br />

10.48 shows three examples that confirm the validity of Euler’s Formula. Slide 10.49 illustrates<br />

a body with holes. In that case, Euler’s <strong>for</strong>mula needs to be modified.


202 CHAPTER 10. DATA STRUCTURES<br />

Prüfungsfragen:<br />

• Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die<br />

Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die<br />

Reihenfolge, damit beide Flächen ”<br />

in die gleiche Richtung weisen“!<br />

• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken<br />

gezeigt. Benennen Sie die vier Darstellungstechniken!<br />

10.12 A B-Rep Data Structure<br />

Definition 29 Boundary representation<br />

A B-Rep structure describes the boundary of an object with the help of 3-dimensional Polygon<br />

Surfaces. The B-Rep model consists of three different object types: vertices, edges <strong>and</strong> surfaces.<br />

The B-Rep strucure is often organized in:<br />

• V: A set of vertices (Points in 3D-Space)<br />

• E: A set of edges. The edges are defined by 2 points referenced from E<br />

• S: A set of surfaces. Each surface is defined by a sequence of edges from V (at least 3 edges<br />

define a surface)<br />

The direction of the normal vector of the surfaces is usually given by the order of its edges<br />

(clockwise or counterclockwise). Due to the referencing the B-Rep permits a redundancy-free<br />

managment of the geometric in<strong>for</strong>mation.<br />

A B-rep structure is not unlike a wire-frame representation, but it does represent an object with<br />

pointers to polygons <strong>and</strong> lists of polygons with pointers to edges <strong>and</strong> one differentiates between<br />

spaces that are outside <strong>and</strong> inside the object taking advantage of the sequence of edges. Slide<br />

10.52 illustrates a body that is represented by 2 faces in 3-D. We show the point-list, list of edges<br />

<strong>and</strong> list of phases.<br />

Slide 10.53 illustrates a B-rep representation of a cube with the list of faces, the list of edges,<br />

the point-lists, <strong>and</strong> the respective pointers. Slide 10.54 explains the idea of inside <strong>and</strong> outside<br />

directions <strong>for</strong> each face. The direction of the edges defines the direction of the normal vector onto<br />

a face. As shown in Slide 10.54, A would be inside of B in one case, <strong>and</strong> outside of B in the other<br />

depending on the direction of the normal onto face B.<br />

10.13 Spatial Partitioning<br />

An entirely different approach to 3-dimensional data structures is the idea of a spatial partitioning<br />

approach.<br />

In Slide 10.56 we choose the primitives to be prisms <strong>and</strong> cubes. They build the basic cells <strong>for</strong> a<br />

decomposition. From those basic elements we can now build up various shapes as shown in that<br />

slide. A special case occurs if the primitive is a cube of given size as shown in Slide 10.57. Slide<br />

10.58 introduces the idea of the oct-tree which is the 3-dimensional analogon to the quadtree. Slide<br />

10.59 explains how the 3 dimensional space as a root is decomposed into 8 sons, which then are<br />

further decomposed until there is no further decomposition necessary bec<strong>aus</strong>e each son is either<br />

empty or full. The example of Slide 10.59 has 2 levels <strong>and</strong> there<strong>for</strong>e the object can be created<br />

from 2 types of cubes. Slide 10.60 illustrates the resulting representation in a computer that takes


10.14. BINARY SPACE PARTITIONING BSP 203<br />

the root, subdivides it into 8 sons, calls them either white or black <strong>and</strong> if it needs to be further<br />

subdivided then substitutes <strong>for</strong> the element another expression with 8 sons <strong>and</strong> so <strong>for</strong>th.<br />

Slide 10.61 illustrates an oct-tree representation of a coffee cup. We can see how the surface,<br />

bec<strong>aus</strong>e of its curvature, requires many small cubes to be represented whereas on the inside of<br />

the cup the size of the elements increases. The data structure is very popular in medical imaging<br />

bec<strong>aus</strong>e there exist various sensor systems that produce voxels, <strong>and</strong> those voxels can be generalized<br />

into oct-trees, similar to pixels that can be generalized into quadtrees in 2 dimensions.<br />

Prüfungsfragen:<br />

• Erklären Sie den Begriff ”<br />

spatial partitioning“ und nennen Sie drei räumliche Datenstrukturen<br />

<strong>aus</strong> dieser Gruppe!<br />

10.14 Binary Space Partitioning BSP<br />

Definition 30 Cell-structure<br />

An example <strong>for</strong> a 3-dimensional data structure is the idea of spatial partitioning. There<strong>for</strong><br />

some primitives like prisms or cubes are choosen. These primitives build the ”CELLS” <strong>for</strong> a<br />

decomposition of an object. Every geometrical object can be build with these cells.<br />

A special case occurs, if the primitive is an object of a given size.<br />

A very common datastructure to find the decomposition is the oct-tree. The root (3-dimensional<br />

space) of the oct-tree is subdivided into 8 cubes of equal size <strong>and</strong> these resulting cubes are<br />

subdivided themselve again until there is no further decomposition necessary. A son in the tree is<br />

marked as black or white (represented or not) or is marked as gray. Then a further decomposition<br />

is needed.<br />

This type of datastructure is very popular in medical imaging. The different sensor systems,<br />

like ”<strong>Computer</strong> Aided Tomography”, are producing voxels. These voxels can be generalized<br />

into oct-trees.<br />

A more specific space partitioning approach is the Binary Space Partitioning or BSP. We subdivide<br />

space by means of planes that can be arbitrarily arranged. The Binary Space Partition is a tree<br />

in which the nodes are represented by the planes. Each node has two sons which are the spaces<br />

which result on the two sides of a plane, we have the inner <strong>and</strong> the outer half space.<br />

Slide 10.63 illustrates the basic idea in 2 dimensions where the plane degenerates into straight<br />

lines. The figure on the left side of Slide 10.63 needs to be represented by a BSP structure. The<br />

root is the straight line a, subdividing a 2-D space into half spaces, defining an outside <strong>and</strong> an<br />

inside by means of a vector shown on the left side of the slide. There are two sons <strong>and</strong> we take<br />

the line b <strong>and</strong> the line j as the two sons. We further subdivide the half-spaces. We go on until<br />

the entire figure is represented in this manner.<br />

A similar illustration representing the same idea is shown in Slide 10.64. At the root is line 1,<br />

the outside half-space is empty, the inside half-space contains line 2, again with an outside space<br />

empty <strong>and</strong> the inside space containing line 3 <strong>and</strong> we repeat the structure.<br />

If we start out with line 3 at the root, we obtain a different description of the same object. We<br />

have in the outside half-space line 4 <strong>and</strong> on the inside half-space line 2, <strong>and</strong> now the half-space<br />

defined by line 4 within the half-space defined by line 3 contains only the line segment 1b <strong>and</strong> the<br />

other half-space as seen from line 3 which is then further subdivided into a half-space by line 2<br />

contains line-segment 1a. The straight line 1 in this case is appearing twice, once in the <strong>for</strong>m of<br />

1a <strong>and</strong> another time in the <strong>for</strong>m of 1b.


204 CHAPTER 10. DATA STRUCTURES<br />

Algorithm 28 Creation of a BSP tree<br />

1: polygon root; {current Root-Polygon}<br />

2: polygon *backList, *frontList; {polygons in current Halfspaces}<br />

3: polygon p, backPart, frontPart; {temporary variables}<br />

4: if (polyList == NULL) then<br />

5: return NULL; {no more polygons in this halfspace}<br />

6: else<br />

7: root = selectAndRemovePolygon(&polyList); {prefer polygons defining planes that don’t<br />

intersect with other polygons}<br />

8: backList = NULL;<br />

9: frontList = NULL;<br />

10: <strong>for</strong> (each remaining polygon in polyList) do<br />

11: if (polygon p in front of root) then<br />

12: addToList(p, &frontList);<br />

13: else<br />

14: if (polygon p in back of root) then<br />

15: addToList(p, &backList);<br />

16: else {polygon p must be split}<br />

17: splitPoly(p, root, &frontPart, &backPart);<br />

18: addToList(frontPart, &frontList);<br />

19: addToList(backPart, &backList);<br />

20: end if<br />

21: end if<br />

22: end <strong>for</strong><br />

23: return new BSPTREE(root, makeTree(frontList), makeTree(backList));<br />

24: end if<br />

Prüfungsfragen:<br />

• Geben Sie einen ”<br />

Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten<br />

für das Polygon <strong>aus</strong> Abbildung B.17 an und zeichnen Sie die von Ihnen verwendeten Trennebenen<br />

ein!<br />

10.15 Constructive Solid Geometry, CSG<br />

This data structure takes 3D-primitives as input <strong>and</strong> produces Boolean operations, translations,<br />

scaling <strong>and</strong> rotational operators to construct 3-dimensional objects from the primitives. Slide<br />

10.66 <strong>and</strong> Slide 10.67 explain. A complex object as shown in Slide 10.67 may be composed of a<br />

cylinder with an indentation <strong>and</strong> a rectangular body of which a corner is cut off. The cylinder itself<br />

is obtained by subtracting a smaller cylinder from a larger cylinder. The cut-off is obtained by<br />

subtracting from a fully rectangular shape another rectangular shape. So we have 2 subtractions<br />

<strong>and</strong> one union to produce our object. In Slide 10.67 we have again 2 primitives, a block <strong>and</strong> a<br />

cylinder, we can scale them, so we start out with two types of blocks <strong>and</strong> two types of cylinders.<br />

By an operation of intersection union <strong>and</strong> difference we obtain a complicated object from those<br />

primitives.<br />

Slide 10.68 explains how Constructive Solid Geometry can produce a result in two different ways.<br />

We can take two blocks <strong>and</strong> subtract them from one another or we can take two blocks <strong>and</strong> <strong>for</strong>m<br />

the union of them to obtain a particular shape. We cannot say generally that those two operations<br />

are equivalent, bec<strong>aus</strong>e if we change the shapes of the two blocks, the same two operations may<br />

not result in the same object shown in Slide 10.68.


10.16. MIXING VECTORS AND RASTER DATA 205<br />

Prüfungsfragen:<br />

• Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva<br />

bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der<br />

Konstruktion des Objektes (ohne Lampe).<br />

10.16 Mixing Vectors <strong>and</strong> Raster Data<br />

When we have photo-realistic representations of 3-D objects, we may need to mix data structures,<br />

e.g. vector data or three-dimensional data structures representing 3-D objects <strong>and</strong> raster data<br />

coming from images. The example of city models has illustrated this issue. introduces a particular<br />

hierarchical structure <strong>for</strong> the geometric data. It is called the LoD/R-Tree-data structure <strong>for</strong> Level<br />

of Detail <strong>and</strong> Rectangular Tree-structure. The idea is that objects are approximated by boxes<br />

in 3D generalized from rectangles in 2 dimensions. These blocks can overlap <strong>and</strong> so we have the<br />

entire city being at the root of a tree, represented by one block. Each district now is a son of that<br />

root <strong>and</strong> is represented by blocks. Within each district we may have city blocks, within the city<br />

blocks we may have buildings, <strong>and</strong> one particular building may there<strong>for</strong>e be the leaf of this data<br />

structure.<br />

We also have the problem of a level of detail <strong>for</strong> the photographic texture. We create an image<br />

pyramid by image processing <strong>and</strong> then store the pyramids <strong>and</strong> create links to the geometric<br />

elements in terms of level of detail, so that if we have wanted an overview of an object we get very<br />

few pixels to process.<br />

If we take a vantage point to look at the city, we have in the <strong>for</strong>eground a high resolution <strong>for</strong> the<br />

texture <strong>and</strong> in the background low resolution. So we precompute per vantage point a hierarchy<br />

of resolutions that may fall within the so-called View-Frustum. As we change our vantage point<br />

by rotating our eyes, we have to call up from a data base a related element. If we move, thus<br />

change our position, we have to call up from the data base different elements at high resolution<br />

<strong>and</strong> elements at low resolution.<br />

Slide 10.72 illustrates how the vector data structure describes nothing but the geometry whereas<br />

the raster data describes the character of the object in Slide 10.73.<br />

We may also use a raster data structure <strong>for</strong> geometric detail as shown in Slide 10.74. In that<br />

case we have an (x, y) pattern of pixels <strong>and</strong> we associate with each pixel not the gray value but<br />

an elevation representing there<strong>for</strong>e a geometry in the <strong>for</strong>m of a raster which we otherwise have<br />

typically used <strong>for</strong> images only.<br />

10.17 Summary<br />

We summarize the various ideas <strong>for</strong> data structures of spatial objects, be they in 2D or in 3D.<br />

Slide 10.76 addresses 3D.<br />

Prüfungsfragen:<br />

• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken<br />

gezeigt. Benennen Sie die vier Darstellungstechniken!


206 CHAPTER 10. DATA STRUCTURES


10.17. SUMMARY 207<br />

Slide 10.1 Slide 10.2 Slide 10.3 Slide 10.4<br />

Slide 10.5 Slide 10.6 Slide 10.7 Slide 10.8<br />

Slide 10.9 Slide 10.10 Slide 10.11 Slide 10.12<br />

Slide 10.13 Slide 10.14 Slide 10.15 Slide 10.16<br />

Slide 10.17 Slide 10.18 Slide 10.19 Slide 10.20<br />

Slide 10.21 Slide 10.22 Slide 10.23 Slide 10.24<br />

Slide 10.25 Slide 10.26 Slide 10.27 Slide 10.28


208 CHAPTER 10. DATA STRUCTURES<br />

Slide 10.29 Slide 10.30 Slide 10.31 Slide 10.32<br />

Slide 10.33 Slide 10.34 Slide 10.35 Slide 10.36<br />

Slide 10.37 Slide 10.38 Slide 10.39 Slide 10.40<br />

Slide 10.41 Slide 10.42 Slide 10.43 Slide 10.44<br />

Slide 10.45 Slide 10.46 Slide 10.47 Slide 10.48<br />

Slide 10.49 Slide 10.50 Slide 10.51 Slide 10.52<br />

Slide 10.53 Slide 10.54 Slide 10.55 Slide 10.56


10.17. SUMMARY 209<br />

Slide 10.57 Slide 10.58 Slide 10.59 Slide 10.60<br />

Slide 10.61 Slide 10.62 Slide 10.63 Slide 10.64<br />

Slide 10.65 Slide 10.66 Slide 10.67 Slide 10.68<br />

Slide 10.69 Slide 10.70 Slide 10.71 Slide 10.72<br />

Slide 10.73 Slide 10.74 Slide 10.75 Slide 10.76


210 CHAPTER 10. DATA STRUCTURES


Chapter 11<br />

3-D Objects <strong>and</strong> Surfaces<br />

11.1 Geometric <strong>and</strong> Radiometric 3-D Effects<br />

We are reviewing various effects we can use to model <strong>and</strong> perceive the 3-dimensional properties of<br />

objects. This could be radiometric or geometric effects of reconstructing <strong>and</strong> representing objects.<br />

When we look at a photograph of a l<strong>and</strong>scape as in Slide 11.3, we notice various depth cues. Slide<br />

11.4 summarizes these <strong>and</strong> other depth cues. Total of eight different cues are being described. For<br />

example, colors tend to become bluer as the objects are farther away. Obviously, objects that are<br />

nearby would cover <strong>and</strong> hide objects that are farther away. Familiar objects, such as buildings<br />

will appear smaller as the distance grows. Our own motion will make nearby things move faster.<br />

We have spatial viewing by stereoscopy. We have brightness that reduces as the distance grows.<br />

Focus <strong>for</strong> one distance will have to change at others distance. Texture of a nearby object will<br />

become simple shading of a far-away object.<br />

Slide 11.5 shows that one often times differentiates between so-called two-dimensional, two-<strong>and</strong>a-half<br />

<strong>and</strong> three-dimensional objects. When we deal with two-<strong>and</strong>-half objects, we deal with one<br />

surface of that object, essentially a function z(x, y) that is single-valued. In contrast a threedimensional<br />

object may have multiple values of z <strong>for</strong> a given x <strong>and</strong> y. Slide 11.5 is a typical<br />

example of a two-<strong>and</strong>-a-half-dimensional object, Slide 11.7 of a three-D object.<br />

Prüfungsfragen:<br />

• Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 1 2D- oder 3D-<br />

Modellen. Definieren Sie die Objektbeschreibung durch 2 1 2D- bzw. 3D-Modelle mittels Gleichungen<br />

und erläutern Sie in Worten den wesentlichen Unterschied!<br />

11.2 Measuring the Surface of An Object (Shape from X)<br />

”<strong>Computer</strong> Vision“ is an expression that is particularly used when dealing with 3-D objects.<br />

Methods that determine the surface of an object are numerous. One generally denotes methods<br />

that will create a model of one side of a object (a two-<strong>and</strong>-a-half-dimensional model), as shapefrom-X.<br />

One typically will include the techniques which use images as the source of in<strong>for</strong>mation.<br />

In Slide 11.9 we may have sources of shape in<strong>for</strong>mation that are not images. Slide 11.10 highlights<br />

the one technique that is mostly used <strong>for</strong> small objects that can be placed inside a measuring<br />

device. This may or may not use images to support the shape reconstruction. A laser may scan a<br />

profile across the object, measuring the echo-time, <strong>and</strong> creating the profile sequentially across the<br />

211


212 CHAPTER 11. 3-D OBJECTS AND SURFACES<br />

object thereby building up the shape of the object. The object may rotate under a laser scanner,<br />

or the laser scanner may rotate around the object. In that case we obtain a complete three-D<br />

model of the object. Such devices are commercially available. For larger objects airborne laser<br />

scanners exist such as shown in Slide 11.11 <strong>and</strong> previously discussed in the Chapter 2. A typical<br />

product of an airborne laser scanner is shown in Slide 11.12.<br />

The next technique is so-called Shape-from-Shading. In this technique, an illuminated object’s<br />

gray tones are used to estimate a slope of the surface at each pixel. Integration of the slopes<br />

to a continuous surface will lead to a model of the surface’s shape. This technique is inherently<br />

unstable <strong>and</strong> under-constrained. There is not a unique slope associated with a pixel’s brightness.<br />

The same gray value may be obtained from various illumination directions <strong>and</strong> there<strong>for</strong>e slopes.<br />

In addition, the complication with this technique is that we must that the reflectance properties<br />

of the surface. We have knowledge in an industrial environment where parts of known surface<br />

properties are kept in a box <strong>and</strong> a robot needs to recognize the shape. In natural terrain, shading<br />

alone is an insufficient source of in<strong>for</strong>mation to model the surface shape. Slide 11.14 suggests an<br />

example where a picture of a sculpture of Mozart is used to recreate the surface shape. With<br />

perfectly known surface properties <strong>and</strong> with a known light source, we can cope with the variables<br />

<strong>and</strong> constrain the problem sufficiently to find a solution.<br />

An analogy of Shape-from-Shading is Photometric Stereo, where multiple images are taken of<br />

a single surface from multiple vantage points that are known, but where the geometry of the<br />

individual images is identical, only the illumination is not. This can be used in microscopy as<br />

shown in the example of Slide 11.16.<br />

Shape-from-Focus is also usable in microscopes, but also in a natural environment with small<br />

objects. A Shape-from-Focus imaging system finds the portion of an object that is in focus, thereby<br />

producing a contour of the object. By changing the focal distance we obtain a moving contour<br />

<strong>and</strong> can reconstruct the object. Slide 11.18 illustrates a system that can do a shape reconstruction<br />

in real time using the changing focus. Slide 11.19 illustrates two real-time reconstructions by<br />

Shape-from-Focus. Slide 11.20 has additional examples.<br />

The method of Structured Light projects a pattern onto an object <strong>and</strong> makes one or more images<br />

of the surface with the pattern. Depending on the type of patterns we can from a single image<br />

reconstruct the shape, or we can use the pattern as a surface texture to make it easy <strong>for</strong> an<br />

algorithm to find overlapping image points in the stereo-method we will discuss in a moment.<br />

Slide 11.22 through Slide 11.25 illustrate the use of structured light. In case of Slide 11.22 <strong>and</strong><br />

Slide 11.23 a stereo-pair is created <strong>and</strong> matching is made very simple. Slide 11.24 illustrates the<br />

shape that is being reconstructed. Slide 11.25 suggests that by using a smart pattern, we can<br />

reconstruct the shape from the gray-code that is being projected.<br />

Slide 11.27 illustrates a fairly new technique <strong>for</strong> mapping terrain using interferometric radar. A<br />

single radar pulse is being transmitted from an antenna in an aircraft or satellite <strong>and</strong> this is<br />

reflected off the surface of the Earth <strong>and</strong> is being received by the transmitting antenna <strong>and</strong> an<br />

auxiliary second antenna that is placed in the vicinity of the first one, say at the two wings of an<br />

airplane. The difference in arrival time of the echoes at the two antennas is indicative of the angle<br />

under which the pulse has traveled to the terrain <strong>and</strong> back. The method is inherently accurate to<br />

within the wavelength of the used radiation. This technique is available even <strong>for</strong> satellites, with<br />

two antennas on the space shuttle (NASA mission SRTM <strong>for</strong> Shuttle Radar Topography Mission,<br />

1999), or is applicable to systems with a single antenna on a satellite, where the satellite repeats an<br />

orbit very close, to within a few hundred meters of the original orbit, <strong>and</strong> in the process produces<br />

a signal as if the two antennas had been carried along simultaneously.<br />

The most popular <strong>and</strong> most widely used technique of Shape-from-X is the stereo-method. Slide<br />

11.29 suggests a non-traditional arrangement, where two cameras take one image each of a scene<br />

where the camera’s stereo-base b is the distance from one another. Two objects P k <strong>and</strong> P t are<br />

at different depths as seen from the stereo-base <strong>and</strong> we can from the two images determine a<br />

parallactic angle γ which allows us to determine the depth difference between the two points.


11.3. SURFACE MODELING 213<br />

Obviously, a scene as shown in Slide 11.29 will produce a 2-D representation on a single image in<br />

which the depth between P t <strong>and</strong> P k is lost. However, given two images, we can determine the angle<br />

(<strong>and</strong> the distance to point P k <strong>and</strong> we can also determine the angle dγ (<strong>and</strong> obtain the position<br />

of point P t at a depth different from P k ’s. Slide 11.30 illustrates two images of a building. The<br />

two images are illuminated in the same manner by the sunlight. The difference between the two<br />

images is strictly geometrical. We have lines in the left image <strong>and</strong> corresponding lines in the right<br />

images that are called “epi-polar lines”. Those are intersections of a special plane in 3-d space<br />

with each of the two images. These planes are <strong>for</strong>med by the two projection centers <strong>and</strong> a point<br />

on the object. If we have a point on the line of the left image, we know that it’s corresponding<br />

matching point must be on the corresponding epi-polar line in the right image. Epi-polar lines<br />

help in reducing the searching <strong>for</strong> match points <strong>for</strong> automated stereo. Slide 11.31 is a stereo<br />

representation from an electron microscope. Structures are very small, pixels may have the size<br />

of a few nanometers in object-space. We do not have a center-perspective camera model as the<br />

basis <strong>for</strong> this type of stereo. However, the electron microscopic mode of imaging can be modeled<br />

<strong>and</strong> we can reconstruct the surface by a method similar to classical camera stereo.<br />

Slide 11.32 addresses a last technique of Shape-from-X, tomography. Slide ?? <strong>and</strong> Slide ?? illustrate<br />

from medical imaging, a so-called computer-aided tomographic CAT scan of a human scull.<br />

Individual images represent a slice through the object. By stacking up a number of those images<br />

we obtain a replica of the entire original space. Automated methods exist that collect all the<br />

voxels that belong to a particular object <strong>and</strong> in the process determine the surface of that object.<br />

The result is shown in Slide 11.34.<br />

Prüfungsfragen:<br />

• Erstellen Sie bitte eine Liste aller Ihnen bekannten Verfahren, welche man als ”<br />

Shape-from-<br />

X“ bezeichnet.<br />

• Wozu dient das sogenannte ”<br />

photometrische Stereo“? Und was ist die Grundidee, die diesem<br />

Verfahren dient?<br />

• In der Vorlesung wurden Tiefenwahrnehmungshilfen ( ”<br />

depth cues“) besprochen, die es dem<br />

menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene<br />

dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in<br />

der digitalen Bildverarbeitung von verschiedenen ”<br />

shape from X“-Verfahren gelöst. Welche<br />

” depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden ” shape<br />

from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung<br />

kann kein solcher Zusammenhang hergestellt werden?<br />

11.3 Surface Modeling<br />

There is an entire field of study to optimally model a surface from the data primitives one may<br />

have obtained from stereo or other Shape-from-X techniques. We are dealing with point clouds,<br />

connecting the point clouds to triangles, building from triangles polygonal faces, then take the<br />

faces <strong>and</strong> replace them by continuos functions such as bi-cubic or quadric functions. Slide 11.36<br />

illustrates a successfully constructed network of triangles, using as input a set of points created<br />

from stereo. Slide 11.37 illustrates the triangles <strong>for</strong>med from all the photogrammetrically obtained<br />

points of Emperor Charles in the National Library in Vienna. Also shown is a rendering of that<br />

surface using photographic texture. calls to mind that these problems of creating a surface from<br />

measured points. triangulating points etc have been previously discussed in the Chapters 9 <strong>and</strong><br />

10.


214 CHAPTER 11. 3-D OBJECTS AND SURFACES<br />

11.4 Representing 3-D Objects<br />

In representing 3-D objects we have to cope with 2 important subjects:<br />

• hidden edges <strong>and</strong> hidden surfaces<br />

• the interaction of light <strong>and</strong> material<br />

In dealing with hidden edges <strong>and</strong> surfaces, we essentially differentiate among two classes of procedures.<br />

The first is an image space method where we go through all the pixels of an image <strong>and</strong><br />

find the associated object point that is closest to the image. This method is very susceptible to<br />

aliasing effects. The object space method searches through the object among all object elements<br />

<strong>and</strong> is checking what can be seen from the vantage point of the user. These techniques are less<br />

prone to suffer from aliasing.<br />

The issue of hidden lines or surfaces is illustrated in Slide 11.40 with a single-valued function<br />

y = f(x, z). We might represent this surface by drawing profiles from the left edge to the right<br />

edge of the 2-D surface. The resulting image in Slide 11.40 is not very easily interpreted. Slide<br />

11.42 illustrates the effect of removing hidden lines. Hidden lines are being removed by going from<br />

profile to profile through the data set <strong>and</strong> plotting them into a 2-D <strong>for</strong>m as shown in Slide 11.43.<br />

Each profile is compared with the background <strong>and</strong> we can find by a method of clipping which<br />

surface elements are hidden by previous profiles. This can be done in one dimension as shown in<br />

Slide 11.43 <strong>and</strong> then in a second dimension (Slide 11.44). When we look at Slide 11.44 we might<br />

see slight differences between two methods of hidden line removal in case (c) <strong>and</strong> case (d).<br />

Many tricks are being applied to speed up the computation of hidden lines <strong>and</strong> surfaces. One<br />

employs the use of neighborhoods or some geometric auxiliary trans<strong>for</strong>mations, some accelerations<br />

using bounding boxes around objects or finding surfaces that are facing away from the view position<br />

(back-face culling), a subdivision of the view frustum <strong>and</strong> the use of hierarchies. Slide 11.46<br />

illustrates the usefulness of enclosing rectangles or bounding boxes. Four objects exist in a 3-D<br />

space <strong>and</strong> it is necessary to decide which ones cover the other ones up. Slide 11.47 illustrates<br />

that the bounding box approach, while helping many times, may also mislead one to suspecting<br />

overlaps when there are none.<br />

Prüfungsfragen:<br />

• Bei der Erstellung eines Bildes mittels ”<br />

recursive raytracing“ trifft der Primärstrahl für ein<br />

bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen<br />

aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die<br />

Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten<br />

jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen<br />

geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel<br />

zugeordnete Intensität, wenn<br />

1. die Rekursionstiefe nicht beschränkt ist,<br />

2. der Strahl nur genau einmal aufgeteilt wird,<br />

3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt!<br />

Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des<br />

Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden!<br />

Antwort:


11.5. THE Z-BUFFER 215<br />

1. ohne Beschränkung:<br />

I = 2.7 + 0.1 · 2 + 0.5 · (3 + 0.4 · 2 + 0.1 · 4)<br />

= 2.7 + 0.2 + 0.5 · (3 + 0.8 + 0.4)<br />

= 2.9 + 0.5 · 4.2<br />

= 2.9 + 2.1<br />

= 5<br />

2. Rekursionstiefe beschränkt:<br />

I = 2.7 + 0.1 · 2 + 0.5 · 3<br />

= 2.7 + 0.2 + 1.5<br />

= 4.4<br />

3. Abbruch nach Gewichtung:<br />

I = 2.7 + 0.5 · (3 + 0.4 · 2)<br />

= 2.7 + 0.5 · 3.8<br />

= 2.7 + 1.9<br />

= 4.6<br />

11.5 The z-Buffer<br />

Algorithm 29 z-buffer<br />

1: Set zBuffer to infinite<br />

2: <strong>for</strong> all possible polygons plg that have to be drawn do<br />

3: <strong>for</strong> all possible scanlines scl of that polygon plg do<br />

4: <strong>for</strong> all possible pixels pxl of that scanline scl do<br />

5: if z-Value of pixel pz is nearer than zBuffer then<br />

6: set zBuffer to z-Value of pixel pz<br />

7: draw pixel pxl<br />

8: end if<br />

9: end <strong>for</strong><br />

10: end <strong>for</strong><br />

11: end <strong>for</strong><br />

The most popular approach to hidden line <strong>and</strong> surface removal is the well-known z-Buffer method<br />

(algorithm 29). It has been introduced in 1974 <strong>and</strong> uses a trans<strong>for</strong>mation of an object’s surface<br />

facets into the image plane <strong>and</strong> keeping track at each pixel of the distance between the camera<br />

<strong>and</strong> the corresponding element on an object facet. One is keeping that gray value in each pixel<br />

which comes from an object point that is closest to the image plane.<br />

Another procedure is illustrated in Slide 11.50 with an oct-tree. The view reference point V as<br />

shown in that slide leads to labeling of the octtree space <strong>and</strong> shows that the element 7 will be seen<br />

most.<br />

Prüfungsfragen:<br />

• Die vier Punkte <strong>aus</strong> Aufgabe B.2 bilden zwei Strecken<br />

A = p 1 p 2 , B = p 3 p 4 ,


216 CHAPTER 11. 3-D OBJECTS AND SURFACES<br />

12<br />

11<br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

z<br />

§ ¢¡<br />

B<br />

¢¡£<br />

A<br />

0<br />

0 1 2 3 4 5 6 7 8 9 10<br />

x<br />

− − B B A A B B B − −<br />

Figure 11.1: grafische Auswertung des z-Buffer-Algorithmus<br />

¤¡¥<br />

¤¡¦<br />

deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline<br />

fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt<br />

(A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist!<br />

Hinweis: Zeichnen Sie p 1 p 2 und p 3 p 4 in die xz-Ebene des Gerätekoordinatensystems ein!<br />

Antwort: siehe Abbildung 11.1<br />

11.6 Ray-tracing<br />

The most popular method to find hidden surfaces (but also used in other contexts) is the so-called<br />

ray-tracing method. Slide 11.52 illustrates the basic idea that we have a projection center, an<br />

image window <strong>and</strong> the object space. We cast a ray from the projection center through a pixel<br />

into the object space <strong>and</strong> check to see where it hits the objects. To accelerate the ray-tracing we<br />

subdivide the space <strong>and</strong> instead of intersecting the ray with each actual object we do a search<br />

through the bounding boxes surrounding the objects. In this case we can dismiss many objects<br />

bec<strong>aus</strong>e they are not along the path of the ray that is cast <strong>for</strong> a particular pixel. Pseudocode can<br />

be seen in algorithm 30<br />

Prüfungsfragen:<br />

• Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen!<br />

”<br />

Optimierungen können helfen, den Rechenaufw<strong>and</strong> zu verringern?<br />

Welche<br />

Antwort: Vom Projektionszentrum <strong>aus</strong> wird durch jedes Pixel der Bildebene ein Strahl<br />

in die Szene geschickt und mit allen Objekten geschnitten. Von allen getroffenen Objekten<br />

bestimmt jenes, dessen Schnittpunkt mit dem Strahl dem Projektionszentrum am nächsten<br />

liegt, den Farbwert des Pixels.<br />

– Die Zahl der benötigten Schnittberechnungen kann durch Verwendung von hierarchischen<br />

bounding-Volumina stark reduziert werden.<br />

– Das getroffene Objekt (bei recursive ray-tracing nur im ersten Schnitt) kann auch mit<br />

Hilfe des z-buffer Algorithmus ermittelt werden.


11.6. RAY-TRACING 217<br />

Algorithm 30 Raytracing <strong>for</strong> Octrees<br />

Raytracing - Algorithmus<br />

Für jede Zeile des Bildes<br />

Für jedes Pixel der Zeile<br />

Bestimme Strahl vom Auge zum Pixel;<br />

Pixelfarbe = Raytrace(Strahl);<br />

Raytrace(Strahl)<br />

Für alle Objekte der Szene<br />

Wenn Strahl Objekt schneidet und Schnittpunkt ist bisher<br />

am nächsten<br />

notiere Schnitt;<br />

Wenn kein Schnitt dann Ergebnis:=Hintergrundfarbe sonst<br />

Ergebnis:=<br />

Raytrace(reflektierter Strahl) + Raytrace(gebrochener Strahl);<br />

Für alle Lichtquellen<br />

Für alle Objekte der Szene<br />

Wenn Strahl zur Lichtquelle Objekt schneidet<br />

Schleifenabbruch, nächste Lichtquelle<br />

Wenn kein Schnitt gefunden<br />

Ergebnis += lokale Beleuchtung<br />

Octree - Implementierung<br />

Aufbau<br />

Lege Quader q um Szene<br />

Für alle Objekte o<br />

Einfügen(o, q)<br />

Einfügen (Objekt o, Quader q)<br />

Für alle acht Teilquader t von q<br />

Wenn o ganz in t passt<br />

Ggf. t erstellen<br />

Einfügen(o,t )<br />

return<br />

Ordne Objekt o Quader q zu<br />

Schnitt<br />

Schnitt (Quader q, Strahl s)<br />

Wenn q leer return NULL<br />

Wenn Schnitttest(q, s)<br />

Für alle acht Teilquader t von q<br />

res += Schnitt(t, s)<br />

Für alle zugeordneten Objekte o<br />

res += Schnitttest(o, s)<br />

return nächsten Schnitt(res)


218 CHAPTER 11. 3-D OBJECTS AND SURFACES<br />

11.7 Other Methods of Providing Depth Perception<br />

Numerous methods exist to help us create the impression of depth in the rendering of a 3-D model.<br />

These may include coding by brightness or coding in color. Slide 11.55 illustrates depth encoding<br />

by means of the brightness of lines. The closer an object is to the viewer, the brighter it is. In Slide<br />

11.56 we even add color to help obtaining a depth perception. Of course the depth perception<br />

improves dramatically if we use the removal of edges as shown in Slide 11.57. We now can take<br />

advantage of our knowledge that nearby objects cover up objects that are farther away. Slide<br />

11.60 indicates that the transition to illumination methods <strong>for</strong> rendering 3-D objects is relevant<br />

<strong>for</strong> depth perception.<br />

Slide 11.58 introduces the idea of halos to represent 3-D objects, <strong>and</strong> Slide 11.59 is an example.<br />

At first we see a wire-frame model of a human head <strong>and</strong> we see the same model after removing<br />

the hidden lines but also interrupting some of the lines when they intersect with other lines. The<br />

little interruption is denoted as a halo.


11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION 219


220 CHAPTER 11. 3-D OBJECTS AND SURFACES<br />

Slide 11.1 Slide 11.2 Slide 11.3 Slide 11.4<br />

Slide 11.5 Slide 11.6 Slide 11.7 Slide 11.8<br />

Slide 11.9 Slide 11.10 Slide 11.11 Slide 11.12<br />

Slide 11.13 Slide 11.14 Slide 11.15 Slide 11.16<br />

Slide 11.17 Slide 11.18 Slide 11.19 Slide 11.20<br />

Slide 11.21 Slide 11.22 Slide 11.23 Slide 11.24<br />

Slide 11.25 Slide 11.26 Slide 11.27 Slide 11.28


11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION 221<br />

Slide 11.29 Slide 11.30 Slide 11.31 Slide 11.32<br />

Slide 11.33 Slide 11.34 Slide 11.35 Slide 11.36<br />

Slide 11.37 Slide 11.38 Slide 11.39 Slide 11.40<br />

Slide 11.41 Slide 11.42 Slide 11.43 Slide 11.44<br />

Slide 11.45 Slide 11.46 Slide 11.47 Slide 11.48<br />

Slide 11.49 Slide 11.50 Slide 11.51 Slide 11.52<br />

Slide 11.53 Slide 11.54 Slide 11.55 Slide 11.56


222 CHAPTER 11. 3-D OBJECTS AND SURFACES<br />

Slide 11.57 Slide 11.58 Slide 11.59 Slide 11.60


Chapter 12<br />

Interaction of Light <strong>and</strong> Objects<br />

Radiation <strong>and</strong> the natural environment have a complex interaction. If we assume as in Slide 12.2<br />

that the sun illuminates the Earth, we have atmospheric scattering as the radiation approaches<br />

the surface. We have atmospheric absorption that reduces to power in the light coming from the<br />

sun. Then we have reflection of the top surface, which can be picked up by a sensor <strong>and</strong> used<br />

in image <strong>for</strong>mation. The light will go through an object <strong>and</strong> will be absorbed, but at the same<br />

time an object might emit radiation, such as <strong>for</strong> example in the infrared wave length. Finally<br />

the radiation will hit the ground <strong>and</strong> might again be absorbed, reflected or emitted. As the light<br />

returns from the Earth’s surface to the sensor we again have atmospheric absorption <strong>and</strong> emission.<br />

In remote sensing many of those factors will be used to describe <strong>and</strong> analyze objects based on<br />

sensed images. In computer graphics we use a much simplified approach.<br />

12.1 Illumination Models<br />

Definition 31 Ambient light<br />

In the Ambient Illumination Model the light intensity I after reflection from an object´s surface<br />

is given by the equation<br />

I = I a k a<br />

I a is the intensity of ambient light, assumed to be constant <strong>for</strong> all objects. k a is a constant<br />

between 0 <strong>and</strong> 1, called ambient-reflection coefficient. k a is a material property, <strong>and</strong> must be<br />

defined <strong>for</strong> every object.<br />

Ambient light alone creates unnatural images, bec<strong>aus</strong>e every point on an object´s surface is assigned<br />

the same intensity. Shading is not possible with this kind of light. Ambient light is used<br />

mainly as an additional term in more complex illumination models, to illuminate parts of an object<br />

that are visible to the viewer, but invisible to the light source. The resulting image then becomes<br />

more realistic.<br />

The simplest case is illumination by ambient light (definition 31). The existing light will be<br />

multiplied with the properties of an object to produce the intensity of an object point in an<br />

image. Slide 12.4 illustrates this with the previously used indoor scene.<br />

Slide 12.5 goes one step further <strong>and</strong> introduces the diffuse Lambert reflection. There is a light<br />

source which illuminates the surface under an angle Θ from the surface normal. The illumination<br />

intensity I is the amount of incident light × the surface property k × the angle under which the<br />

light is falling onto the surface is then being reflected. Slide 12.6 illustrates the effect of various<br />

223


224 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS<br />

Definition 32 Lambert model<br />

The Lambert Model describes the reflection of the light of a point light source on a matte surface<br />

like chalk or fabrics.<br />

Light emitted on a matte surface is diffuse reflected. This means that the light is reflected<br />

uni<strong>for</strong>mely in any direction. Bec<strong>aus</strong>e of the uni<strong>for</strong>m reflection in any direction the amount of light<br />

seen from any angle in front of the surface is the same. As the point of view does not influence<br />

the amount of reflected light seen, the position of the light source has to. This relationship is<br />

described in the Lambertian law:<br />

Lambertian law:<br />

Assume that a surface facet is directly illuminated by light so that the normal vector of the<br />

surface is parallel to the vector from the light source to the surface facet. Now if you tilt the<br />

surface facet by an angle θ the amount of light falling on the surface facet reduces by cos θ.<br />

A tilted surface is illuminated by less light than a surface normal to the light direction.<br />

So it reflects less light. This is called the diffuse Lambertian reflection:<br />

I = I p · k d · cos θ<br />

Where I is the amount of reflected light; I p is the intensity of the point light source; k d is the<br />

materials diffuse reflection coefficient <strong>and</strong> cos θ is the angle between the surface normal <strong>and</strong> the<br />

light vector.<br />

values of the parameter k. The angle cos Θ in reality can also be expressed as an in-product of<br />

two vectors, namely the vector from the light <strong>and</strong> the surface normal. Considering this diffuse<br />

Lambert reflection our original image becomes Slide 12.7.<br />

The next level of complexity is to add the two brightnesses together, the ambient <strong>and</strong> the Lambert<br />

illumination. A next sophistication gets introduced if we add an atmospheric attenuation of the<br />

light as a function of distance to the object shown in Slide 12.9. So far we have not talked about<br />

mirror reflection. For this we need to introduce a new vector. We have the light source L, the<br />

surface normal N, the mirror reflection vector R <strong>and</strong> the direction to a camera or viewer V . We<br />

have a mirror reflection component in the system that is illustrated in Slide 12.10 with a term<br />

W cos n α. α is the angle between the viewing direction <strong>and</strong> the direction of mirror reflection. W is<br />

a value that the user can choose to indicate how mirror-like the surface is. Phong introduced the<br />

model of this mirror reflection in 1975 <strong>and</strong> explained the effect of the power of n of cos n α. The<br />

larger the power is, the more focussed <strong>and</strong> smaller will the area of mirror reflection be. But not<br />

only does the power n define the type of mirror reflection, but also the parameter W as shown in<br />

Slide 12.12 where the same amount of mirror reflection produces different appearances by varying<br />

the value of the parameter W . W is describing the blending of the mirror reflection into the<br />

background whereas the value n is indicating how small or large the area is that is affected by<br />

the mirror reflection. Slide 12.13 introduces the idea of a light source that is not a point. In that<br />

case we introduce a point light source <strong>and</strong> a reflector, which will reflect light onto the scene. The<br />

reflector represendts the extended light source.<br />

Prüfungsfragen:<br />

• Was ist eine einfache Realisierung der ”<br />

Spiegelreflektion“ (engl.: specular reflection) bei<br />

der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den<br />

Namen eines Verfahrens nach seinem Erfinder.<br />

• In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell<br />

von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parame-


12.2. REFLECTIONS FROM POLYGON FACETS 225<br />

ter der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter<br />

wahrgenommene Intensität I dieses Punktes!<br />

Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge<br />

gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die<br />

Näherung (1 − x) k ≈ 1 − kx für kleine x verwendbar ist.<br />

12.2 Reflections from Polygon Facets<br />

Gouraud introduced the idea of interpolated shading. Each pixel on a surface will have a brightness<br />

in an image that is interpolated using the three surrounding corners of the triangular facet.<br />

The computation is made along a scan line as shown in Slide 12.15 with auxiliary brightness values<br />

I a <strong>and</strong> I b . Note that the brightnesses are computed with a sophisticated illumination model at<br />

positions I 1 , I 2 <strong>and</strong> I 3 of the triangle <strong>and</strong> then a simple interpolation scheme is used to obtain<br />

the brightness in I p . Gouraud does not consider a specular reflection while Phong does.<br />

Gouraud just interpolated brightnesses (algorithm 31), Phong interpolates surface normals from<br />

the corners of a triangle (algorithm 32). Slide 12.16 explains. Slide 12.17 illustrates the appearance<br />

of a Gouraud illumination model. Note how smooth the illumination changes along the<br />

surface whereas the geometry of the object is not smoothly interpolated. Slide 12.18 adds specular<br />

reflection to Gouraud. Phong, as shown in Slide 12.19, is creating a smoother appearance of the<br />

surface bec<strong>aus</strong>e of its interpolation of the surface normal. Of course it includes specular reflection.<br />

In order to not only have smoothness in the surface illumination but also in the surface geometry,<br />

facets of the object must be replaced by curved surfaces. Slide 12.20 illustrates the idea: the<br />

model’s appearance is improved, also due to the specular reflection of the Phong model.<br />

Slide 12.21 finally is introducing additional light sources. Slide 12.22 summarizes the various types<br />

of reflection. We have the law of Snell, indicating that the angle of incidence equals the angle of<br />

reflection <strong>and</strong> these angles are measured with respect to the surface normal. A mirror or specular<br />

reflection is very directed <strong>and</strong> the incoming ray is reflected in the opposite output direction. The<br />

opposite of specular reflection is the ”diffuse“ reflection. If it is near perfect, it will radiate into<br />

all directions almost equally. The Lambert reflection is a perfect diffuse reflector as shown in on<br />

the right-h<strong>and</strong> side.<br />

Prüfungsfragen:<br />

• Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt<br />

nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte<br />

ist I A = 100, I B = 50 und I C = 0. Berechne die Beleuchtungswerte nach dem Gouraud-<br />

Verfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden<br />

Pixeln.<br />

• Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks,<br />

das zu einer beleuchteten polygonalen Szene gehört.<br />

12.3 Shadows<br />

Typically, shadows are computed in two steps or phases. The computations <strong>for</strong> shadows are related<br />

to the computation of hidden surfaces, bec<strong>aus</strong>e areas in shadows are areas that are not seen from<br />

the illuminating sun or light source. Slide 12.24 explains the two types of trans<strong>for</strong>mation. We first<br />

have to trans<strong>for</strong>m a 3-D object into a fictitious viewing situation with the view point at the light<br />

source. That produces the visible surfaces in that view. A trans<strong>for</strong>m into the model coordinates<br />

produces shadow edges. We now have to merge the 3-D viewing <strong>and</strong> the auxiliary lines from


226 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS<br />

Algorithm 31 Gouraud shading<br />

Prozedur ScanLine (x a , I a , x b , I b , y)<br />

1: grad = (I b − I a )/(x b − x a ) {Schrittweite berechnen}<br />

2: if x b > x a then<br />

3: x c = (int)x a + 1 {x c und x d auf Mittelpunkt von Pixel setzen}<br />

4: x d = (int)x b<br />

5: else<br />

6: x c = (int)x b + 1<br />

7: x d = (int)x a<br />

8: end if<br />

9: I = I a + (x c − x a ) ∗ grad {Startwert für erstes Pixel berechnen}<br />

10: while x c ≤ x d do<br />

11: I auf Pixel (x c ,y) anwenden<br />

12: x c = x c + 1 {einen Schritt weiter gehen}<br />

13: I = I + grad<br />

14: end while<br />

Function Triangle(x 1 , y 1 , I 1 , x 2 , y 2 , I 2 , x 3 , y 3 , I 3 )<br />

1: Punkte aufsteigend nach der y-Koordinate sortieren<br />

2: ∆x a = (x 2 − x 1 )/(y 2 − y 1 ) {Schrittweiten für linke Kante berechnen}<br />

3: ∆I a = (I 2 − I 1 )/(y 2 − y 1 )<br />

4: ∆x b = (x 3 − x 1 )/(y 3 − y 1 ) {Schrittweiten für rechte Kante berechnen}<br />

5: ∆I b = (I 3 − I 1 )/(y 3 − y 1 )<br />

6: y = (int)y 1 + 1 {Startzeile berechnen}<br />

7: y end = (int)(y 2 + 0.5) {Endzeile für oberes Teildreieck berechnen}<br />

8: x a = x 1 + (y − y 1 ) ∗ ∆x a {Startwerte berechnen}<br />

9: x b = x 1 + (y − y 1 ) ∗ ∆x b<br />

10: I a = I 1 + (y − y 1 ) ∗ ∆I a<br />

11: I b = I 1 + (y − y 1 ) ∗ ∆I b<br />

12: while y < y end do<br />

13: eine Zeile mit ScanLine(x a , I a , x b , I b , y) berechnen<br />

14: x a = x a + ∆x a {einen Schritt weiter gehen}<br />

15: x b = x b + ∆x b<br />

16: I a = I a + ∆I a<br />

17: I b = I b + ∆I b<br />

18: y = y + 1<br />

19: end while {oberes Teildreieck fertig}<br />

20: ∆x a = (x 3 − x 2 )/(y 3 − y 2 ) {Schrittweiten für Kante berechnen}<br />

21: ∆I a = (I 3 − I 2 )/(y 3 − y 2 )<br />

22: y end = (int)(y 3 + 0.5) {Endzeile für unteres Teildreieck berechnen}<br />

23: x a = x 2 + (y − y 2 ) ∗ ∆x a {Startwert berechnen}<br />

24: while y < y end do<br />

25: eine Zeile mit ScanLine(x a , I a , x b , I b , y) berechnen<br />

26: x a = x a + ∆x a {einen Schritt weiter gehen}<br />

27: x b = x b + ∆x b<br />

28: I a = I a + ∆I a<br />

29: I b = I b + ∆I b<br />

30: y = y + 1<br />

31: end while {unteres Teildreieck fertig}


12.3. SHADOWS 227<br />

Algorithm 32 Phong - shading<br />

1: <strong>for</strong> all polygons do<br />

2: compute the surface normal in the corners of the polygon.<br />

3: project the corners of the polygon into the plane<br />

4: <strong>for</strong> all scanlines, which are overlaped by the polygon do<br />

5: compute the linear interpolated surface normals on the left <strong>and</strong> right edge of the polygon<br />

6: <strong>for</strong> all pixels of the polygon on the scanline do<br />

7: compute the linear interpolated surface normals<br />

8: normalize the surface normals<br />

9: compute the illuminating modell <strong>and</strong> set the color of the pixel to the computed value<br />

10: end <strong>for</strong><br />

11: end <strong>for</strong><br />

12: end <strong>for</strong><br />

Algorithm 33 Shadow map<br />

1: make lightsource coordinate be center of projection<br />

2: render object using zbuffer<br />

3: assign zbuffer to shadowzbuffer<br />

4: make camera coordinate be center of projection<br />

5: render object using zbuffer<br />

6: <strong>for</strong> all pixels visible do<br />

7: Map coordinate from ’camera space’ into ’light space’<br />

8: Project trans<strong>for</strong>med coordinate to 2D (x’,y’)<br />

9: if trans<strong>for</strong>med Z-coordinate > shadowzbuffer[x’, y’] then<br />

10: shadow pixel {A Surface is nearer to the point than the lightsource}<br />

11: end if<br />

12: end <strong>for</strong><br />

Algorithm 34 Implementation of Atheron-Weiler-Greeberg Algorithm<br />

1: make lightpoint be center of projection<br />

2: determine visible parts of polygones<br />

3: split visible <strong>and</strong> invisible parts of partial lightened polygones<br />

4: trans<strong>for</strong>m to modelling database<br />

5: merge original database with lightened polygones {results a object splitted in lightened an<br />

unlightened polygones}<br />

6: make (any) eye point be center of projection<br />

7: <strong>for</strong> all polygons do {reder scene}<br />

8: if polygone is in shadow then<br />

9: set shading model to ambient model<br />

10: else<br />

11: set shading model to default model<br />

12: end if<br />

13: draw polygones<br />

14: end <strong>for</strong>


228 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS<br />

the shadow boundaries into a combined polygon data base. The method of computing hidden<br />

surfaces from the viewer’s perspective is repeated in Slide 12.25. Slide 12.26 illustrates the use of<br />

the z-buffer method <strong>for</strong> the computation of shadow boundaries (algorithm 33). L is the direction<br />

of the light, V is the position of the viewer. We first have to do a z-buffer from the light source,<br />

<strong>and</strong> then we do a z-buffer from the viewer’s perspective. The view without shadows <strong>and</strong> the view<br />

with them give a dramatically different impression of realism of the scene with two objects.<br />

Prüfungsfragen:<br />

• Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels<br />

z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile.<br />

12.4 Physically Inspired Illumination Models<br />

There is a complex world of illumination computations that are concerned with the bi-directional<br />

reflectivity function BRDF. In addition we can use ray-tracing <strong>for</strong> illumination <strong>and</strong> a very particular<br />

method called radiosity. We will spend a few thoughts on each of those three subjects.<br />

A BRDF in Slide 12.28 describes the properties of a surface as a function of illumination. A 3-D<br />

shape indicates how the incoming light from a light source is being reflected from a particular<br />

surface. Many of the mathematical models used to describe those complex shapes bear their<br />

inventors’ names.<br />

12.5 Regressive Ray-Tracing<br />

As discussed be<strong>for</strong>e, we have to cast a ray from the light source onto the object <strong>and</strong> find points<br />

in shadow or illuminated. Similarly, rays cast from the observer’s position will give us the hidden<br />

lines from the viewer’s reference point. Slide 12.30 illustrates again the geometry of ray-tracing to<br />

obtain complex patterns in an image from an object <strong>and</strong> from light cast from other objects onto<br />

that surface. Transparent object reflections may be obtained from the interface of the object with<br />

the air at the back, away from the viewer.<br />

12.6 Radiosity<br />

A very interesting illumination concept that has been studied extensively during the last ten years<br />

is called radiosity. It is a method that derives from modeling the distribution of temperature in<br />

bodies in mechanical engineering (see Algorithm 35).<br />

We subdivide the surface of our 3-D space into small facets. We have a light source, illuminating all<br />

the facets, but the facets illuminate one another, <strong>and</strong> they become a <strong>for</strong>m of secondary illumination<br />

source. Each surface facet has associated with it the differential surface area dA. We can set up<br />

an equation that relates the incoming light of the facets to all other facets. Very large systems<br />

of equations comes about. They can, however, be efficiently reduced in the number of unknowns,<br />

<strong>and</strong> there<strong>for</strong>e efficiently be solved.<br />

Let’s have a look at the few of the examples of these technique. In Slide 12.38, we see a radiosity<br />

used in the representation of a classroom, Slide 12.39 is an artificial set of cubes, Slide 12.39<br />

illustrates one table at two levels of resolution. In the first case, the facets used <strong>for</strong> radiosity<br />

are fairly large, in the second the facets are made much smaller. We see how the realism in this<br />

illumination model increases.


12.6. RADIOSITY 229<br />

Algorithm 35 Radiosity<br />

1: load scene<br />

2: divide surfaces into patches<br />

3: <strong>for</strong> all patches do {initialize patches}<br />

4: if patch is a light then<br />

5: patch.emmision := amount of light<br />

6: patch.available emmision := amount of light<br />

7: else<br />

8: patch.emmision := 0<br />

9: patch.available emmision := 0<br />

10: end if<br />

11: end <strong>for</strong><br />

12:<br />

13: repeat {render scene}<br />

14: <strong>for</strong> all patches i, starting at the patch with the highest emmision available do<br />

15: place hemicube on top of patch i {needed to calculate <strong>for</strong>m factors}<br />

16: <strong>for</strong> all patches j do<br />

17: calculate <strong>for</strong>m factor between patch i <strong>and</strong> patch j {needed to calculate amount of<br />

light}<br />

18: end <strong>for</strong><br />

19: <strong>for</strong> all patches j do<br />

20: ∆R := amount of light from patch i to patch j {using the <strong>for</strong>m factor <strong>and</strong> properties<br />

of the patches}<br />

21: j.emmision available := j.emmision available +∆R<br />

22: j.emmision := j.emmision +∆R<br />

23: end <strong>for</strong><br />

24: i.emmision available := 0 {all aviailable light has been distributed to the other patches}<br />

25: end <strong>for</strong><br />

26: until good enough


230 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS<br />

Similarly, we have radiosity in modeling a computer room in Slide 12.39. We have internal illumination,<br />

<strong>and</strong> in one case on the lower right of Slide 12.40 we have illumination from the outside of<br />

the room. In slide we see a radiosity-based computation of an indoor scene again in two levels of<br />

detail in the mesh sizes <strong>for</strong> the radiosity computation.


12.6. RADIOSITY 231


232 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS<br />

Slide 12.1 Slide 12.2 Slide 12.3 Slide 12.4<br />

Slide 12.5 Slide 12.6 Slide 12.7 Slide 12.8<br />

Slide 12.9 Slide 12.10 Slide 12.11 Slide 12.12<br />

Slide 12.13 Slide 12.14 Slide 12.15 Slide 12.16<br />

Slide 12.17 Slide 12.18 Slide 12.19 Slide 12.20<br />

Slide 12.21 Slide 12.22 Slide 12.23 Slide 12.24<br />

Slide 12.25 Slide 12.26 Slide 12.27 Slide 12.28


12.6. RADIOSITY 233<br />

Slide 12.29 Slide 12.30 Slide 12.31 Slide 12.32<br />

Slide 12.33 Slide 12.34 Slide 12.35 Slide 12.36<br />

Slide 12.37 Slide 12.38 Slide 12.39 Slide 12.40<br />

Slide 12.41


234 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS


Chapter 13<br />

Stereopsis<br />

13.1 Binokulares Sehen<br />

The 3-dimensional impressions of our environment as received by our two eyes is called binocular<br />

vision. Slide 13.10 explains that the human perceives two separate images via two eyes, merges<br />

the two images in the brain <strong>and</strong> reconstructs a depth-model of the perceived scene in the brain.<br />

The two images obtained by the two eyes differ slightly, bec<strong>aus</strong>e of the two different vantage<br />

points. The stereo-base <strong>for</strong> natural binocular vision is typically six-<strong>and</strong>-a-half centimeters, thus<br />

the distance between the eyes.<br />

Recall that natural depth perception is defined by many depth queues other that binocular vision.<br />

We talked about depth queues by color, by size, by motion, by objects covering up one another<br />

etc. (see Chapter 11)<br />

Slide 13.4 explains geometrically the binocular stereo-effect. On the retina, two points, P <strong>and</strong> Q,<br />

will be imaged on top of one another in one eye, but will be imaged side by side subtending an small<br />

angle dγ in the other eye. We call γ the parallactic angle or parallax an dγ a parallel difference.<br />

It is the measure of disparity which is sensed <strong>and</strong> used in the brain <strong>for</strong> shape reconstruction.<br />

The angle γ itself gives us the absolute distance to a point P <strong>and</strong> is usually computed from the<br />

stereobase b a . Note that our eyes are sensitive to within a parallactic angle of 15 seconds of arc<br />

(15”), <strong>and</strong> may be limited to perceive a parallactic angle no larger than 7 minutes of arc (7’).<br />

Slide 13.5, Slide 13.6, <strong>and</strong> Slide 13.7 illustrate two cases of stereo-images taken from space <strong>and</strong> one<br />

from microscopy. Note the difference between binocular viewing <strong>and</strong> stereo-viewing as discussed<br />

in a moment.<br />

What is of interest in a natural binocular viewing environment is the sensitivity of the eyes to<br />

depth. Slide 13.8 explains that the difference in depth between 2 points, d, can be obtained by<br />

our sensitivity to the parallactic angle, dγ. Since this is typically no smaller than 17 seconds of<br />

arc, we have a depth differentiation ability dγ as shown in Slide 13.8. At a distance of 25 cm we<br />

may be able to perceive depth differences as small a few p micrometers. At a meter it may be a<br />

tenth of a millimeter, but at ten meters distance, it may already be about a meter. At a distance<br />

of about 900 meters, we may not see any depth at all from our binocular vision.<br />

Prüfungsfragen:<br />

• Gegeben sei eine Distanz y A = 3 Meter vom Auge eines scharfäugigen Betrachters mit<br />

typischem Augenabst<strong>and</strong> zu einem Objektpunkt A. Wie viel weiter darf sich nun ein zweiter<br />

Objektpunkt B vom Auge befinden, sodass der Betrachter den Tiefenunterschied zwischen<br />

den beiden Objektpunkten A und B gerade nicht mehr wahrnehmen kann? Es wird um die<br />

235


236 CHAPTER 13. STEREOPSIS<br />

entsprechende Formel, das Einsetzen von Zahlenwerten und auch um die Auswertung der<br />

Formel gebeten.<br />

• Auf der derzeit laufenden steirischen L<strong>and</strong>es<strong>aus</strong>stellung ”<br />

comm.gr2000az“ im Schloss Eggenberg<br />

in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen<br />

soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu<br />

können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden.<br />

Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte<br />

Skizze der Anordnung ist in Abbildung B.63 dargestellt.<br />

Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63<br />

markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit<br />

halber folgende Kameraparameter an:<br />

– Brennweite: 10 Millimeter<br />

– geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter<br />

Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition<br />

verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie<br />

eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen<br />

zur Unschärfe der binokularen Tiefenwahrnehmung verwenden.<br />

13.2 Stereoskopisches Sehen<br />

We can now trick our two eyes to think they would see the natural environment, when in fact<br />

they look at two images presented separately to the left <strong>and</strong> right eye. Since those images will<br />

not be at an infinite distance, but will be perhaps at 25 cm, we will be <strong>for</strong>ced with our eyes to<br />

focus at 25 cm, yet use in our brain an attitude as if one were to look at a much larger distance<br />

where the eye’s optical axes are parallel. Many people have difficulties focussing at 25 cm, <strong>and</strong><br />

simultaneously obtaining a stereoscopic impression.<br />

To help, one has auxiliary tools called a mirror stereoscope. Two images are placed on a table, an<br />

assembly of two mirrors <strong>and</strong> a lens present each image separately to each eye, whereby the eye is<br />

permitted to focus at infinity <strong>and</strong> not at 25 cm.<br />

Slide 13.12 lists alternative modes of stereo-viewing. We mentioned the mirror stereoscope with<br />

separate optical axes. A second approach is by anaglyphs, implemented in the <strong>for</strong>m of glasses,<br />

where one eye is only receiving the red, the other one only the green component in an image. A<br />

third approach is polarization, where the images presented to the eyes are polarized differently<br />

<strong>for</strong> the left <strong>and</strong> right eye. And a further approach is the ultimate presentation of images by<br />

shutters <strong>and</strong> glasses <strong>and</strong> presentation by projection or on a monitor. All four approaches have<br />

been implemented on computer monitors.<br />

You can think of a mirror stereoscope looking at two images by putting two optical systems on<br />

a monitor <strong>and</strong> have the left half present one image, the right half the other. Anaglyphs are a<br />

classical case of looking at stereo on a monitor by presenting in the green <strong>and</strong> red channel two<br />

images, <strong>and</strong> wearing glasses to perceive a stereo impression. The most popular way of presenting<br />

soft copy images on a monitor is by polarization, wearing simple glasses that look at two polarized<br />

images on a monitor or active glasses, that are being controlled from the monitor <strong>and</strong> presenting<br />

120 images per second, 60 to one eye <strong>and</strong> 60 to the other eye, <strong>and</strong> by polarization ensuring that<br />

the proper image hits the proper eye. This will be called image flickering using polarization.<br />

Slide 13.13 explains how stereoscopic viewing by means of two images increases the ability of the<br />

human to perceive depth way beyond the ability available from binocular vision. The reason is very<br />

simple. Binocular vision is limited by the six-<strong>and</strong>-half cm distance between the two eyes, whereas


13.3. STEREO-BILDGEBUNG 237<br />

Definition 33 total plastic<br />

Let<br />

be the total plastic, whereby<br />

p = n · v<br />

n ... image magnification<br />

v ... eye base magnification<br />

The synthetic eye base dA, typically 6.5 cm, can be magnified by the stereo base dK from<br />

which the images are taken. That implies<br />

v = dK/dA<br />

stereoscopic vision can employ images taken from a much larger stereobase. Take the example of<br />

aerial photography, where two images may be taken from an airplane with the perspective centers<br />

600 meters apart. We obtain an increase in our stereo-perception that is called total plastic (see<br />

Definition 13.2). We look at a ratio between the eye base <strong>and</strong> the stereo base from which the images<br />

are taken which gives us a factor v. In addition, we look at the images under a magnification n,<br />

<strong>and</strong> a total plastic increases our stereo-ability, by n · v, thus by a factor of tens of thous<strong>and</strong>. As<br />

a result, even though the object may be, say, a thous<strong>and</strong> meters away, we still may have a depth<br />

acuity of three cm.<br />

Prüfungsfragen:<br />

• Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das ”<br />

Geheimnis“, welches<br />

es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine<br />

wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem<br />

Sehen möglich ist.<br />

• Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines ”<br />

echten“<br />

(dreidimensionalen) Raumeindrucks einer vom <strong>Computer</strong> dargestellten Szene!<br />

13.3 Stereo-Bildgebung<br />

We need to create two natural images, with one camera at two positions, taking images sequentially,<br />

or with a camera pair, taking images simultaneously. The simultaneous imaging is preferred when<br />

the object moves. Slide 13.15 illustrates the two camera positions looking at a two-dimensional<br />

scene <strong>and</strong> explaining again the concept of a stereobase b, of the angle γ of convergence giving<br />

us the distance to the object P K <strong>and</strong> the parallactic difference angle dγ which is a measure of<br />

depth between two points P K <strong>and</strong> P T . Slide 13.16 repeats the same idea <strong>for</strong> the case of aerial<br />

photography where an airplane takes one image at position O 1 <strong>and</strong> a second image at position O 2 .<br />

The distance between 0 1 <strong>and</strong> 0 2 is this aerial stereobase b, the distance to the ground is the flying<br />

height H, the ratio b/H is called base-to-height-ratio <strong>and</strong> is a measure <strong>for</strong> the stereo acuity of an<br />

image pair. Slide 13.17 repeats again the case of two images taken from an overhead position. Note<br />

that the two images look identical to the casual observer. What makes the stereo-process work<br />

are the minute small geometric differences between the two images which occur in the direction<br />

of flight. There are no geometric differences in the direction perpendicular to the flight direction.<br />

Going back to Slide 13.16, we may appreciate the necessity of recreating in the computer the<br />

relative position <strong>and</strong> orientation of the two images in space. An airplane or satellite may make


238 CHAPTER 13. STEREOPSIS<br />

unintended motions that will lead the user to not get an accurate measure of the positions O 1 <strong>and</strong><br />

O 2 <strong>and</strong> of the direction of imaging <strong>for</strong> the two camera positions. A stereo-process will there<strong>for</strong>e<br />

typically require that sets of points are extracted from overlapping images representing the same<br />

object on the ground. These are called homologue points. In Slide 13.16 is suggested that a<br />

rectangular pattern of six points has been observed in image 1 <strong>and</strong> the same six points have been<br />

observed in image 2. What now needs to happen mathematically is that two bundles of rays<br />

are created from the image coordinates <strong>and</strong> the knowledge of the perspective centers O 1 <strong>and</strong> O 2<br />

in the camera system. And then the two bundles of rays need to be arranged such, that the<br />

corresponding rays (homologue rays) intersect in the three-dimensional space of the object world.<br />

We call the reconstruction of a bundle of rays from image coordinates the inner orientation. We<br />

call the process by which we arrange the two images such that all corresponding rays intersect in<br />

object space, the relative orientation. And we call the process by which we take the final geometric<br />

arrangement <strong>and</strong> we make it fit into the world coordinate system by a three-dimensional con<strong>for</strong>mal<br />

trans<strong>for</strong>mation the absolute trans<strong>for</strong>mation.<br />

Prüfungsfragen:<br />

• Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben<br />

Sie typische Anwendungsfälle beider Methoden!<br />

13.4 Stereo-Visualization<br />

images to stereoscopically view the natural environment is stereo-visualization by creating artificial<br />

images presented to the eyes <strong>and</strong> obtaining a three-dimensional impression of an artificial world.<br />

We visit Slide 13.19 to explain that we need to create two images <strong>for</strong> the left <strong>and</strong> the right eye<br />

of a geometric scene, represented in the slide by a cube <strong>and</strong> its point P . Slide 13.20 shows that<br />

we compute two images of each world point W , assuming that we have two cameras, side by side,<br />

at a stereobase b <strong>and</strong> with their optical axes being parallel. Recall that in computer graphics the<br />

optical axes are called view point normals, the lens center is the view point V P . Slide 13.21 is<br />

the ground view of the geometric arrangement. We have used previously Slide 13.22 to illustrate<br />

the result obtained by creation of two images of a three-dimensional scene. In this particular case<br />

it is a wire-frame representation <strong>for</strong> the left <strong>and</strong> right eye. If we present those two images at a<br />

distance of about six-<strong>and</strong>-a-half cm on a piece on a flat table, <strong>and</strong> we look vertically down <strong>and</strong><br />

think we are looking at infinity (so that your eye-axes are parallel) we will be able to merge the<br />

two images into a three-dimensional model of that object. However, we will notice that we will<br />

not have a focused image, bec<strong>aus</strong>e our eyes will tend to focus at infinity, when we <strong>for</strong>ce our eye<br />

axes to be parallel.<br />

<strong>Computer</strong> generated stereo-images are the essence of virtual environments <strong>and</strong> augmented environments.<br />

Slide 13.23 illustrates how a person does look at artificial images <strong>and</strong> receives a<br />

three-dimensional impression, using motion detectors, that will feed the head’s position <strong>and</strong> orientation<br />

into the computer, so that as the head gets moved, a new image will be projected to<br />

the eyes, <strong>and</strong> the motion of the head will be consistent with the experience of the natural environment.<br />

In contrast, Slide 13.24 illustrates again augmented reality, where the monitors are<br />

semi-transparent <strong>and</strong> there<strong>for</strong>e the human observer does not only see the artificial virtual impression<br />

of computed images, but has superimposed on them the natural environment which is visible<br />

binocularly: augmented reality uses both, the binocular <strong>and</strong> stereo-vision.<br />

13.5 Non-Optical Stereo<br />

Eyes are very <strong>for</strong>giving, <strong>and</strong> the images we observe stereoscopically need not necessarily be taken<br />

by a camera <strong>and</strong> there<strong>for</strong>e need not be centrally perspective. Slide 13.26 explains how NASA Space


13.6. INTERACTIVE STEREO-MEASUREMENTS 239<br />

Shuttle has created radar images in sequential orbits. Those images overlap with one another <strong>and</strong><br />

show the same terrain. Slide 13.27 illustrates a mountain range in Arizona imaged by radar. Note<br />

that the two images look more different than our previous optical images did. Shadows are longer<br />

in one image than the other. Yet a stereo-impression can be obtained in the same way as we have<br />

obtaining it with optical imagery. The quality of the stereo-measurement will be lower, bec<strong>aus</strong>e<br />

of the added complexity that the two images are less similar in gray tones.<br />

The basic idea of this type of stereo is repeated in Slide 13.28. We have two antennas illuminating<br />

the ground <strong>and</strong> receiving echoes as in a traditional radar image, <strong>and</strong> the overlap area can be<br />

presented to the eyes as if they were two optical images. The basic idea is also explained in Slide<br />

13.29. Note that in each radar image, point P is projected into position P ′ or P ′′ <strong>and</strong> we get a<br />

parallactic distance d p . The corresponding camera position that will produce from a point P the<br />

same positions P ′ <strong>and</strong> P ′′ <strong>and</strong> this parallax distance d p would be camera positions 1 <strong>and</strong> 2 shown<br />

in Slide 13.29.<br />

Prüfungsfragen:<br />

• Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der<br />

Stereo-Bildgebung!<br />

13.6 Interactive Stereo-Measurements<br />

If we want to make measurements using the stereo-impression from two images, we need to add<br />

something to our visual impression: a measuring mark. Slide 13.31 explains the two stereo-images,<br />

<strong>and</strong> our eyes viewing the same point M in the two images, where they are presented as M 1 <strong>and</strong><br />

M 2 . If we add a measuring mark as shown in e will perceive the measuring mark (M) to float<br />

above or below the ground. If we now move the measuring mark in the two images, such that<br />

they superimpose the points M 1 <strong>and</strong> M 2 , the measure mark will coincide with the object point<br />

M. We can now measure the elevation differences between two points by tracking the motion that<br />

we have to apply to the measuring mark in image space. Slide 13.32 explains the object point M,<br />

the measuring mark (M) <strong>and</strong> their positions in image space at M 1 , M 2 .<br />

Slide 13.33 is an attempt at illustrating the position of the measuring mark above the ground, on<br />

the ground <strong>and</strong> below the ground. In this particular case, the stereo-perception is anaglyphic.<br />

13.7 Automated Stereo-Measurements<br />

See Algorithm ??. The measuring mark <strong>for</strong> stereo-measurements needs to be placed on a pair<br />

of homologue points. Knowing the location of the stereo-measuring mark permits us to measure<br />

the coordinates of the 3D point in the world coordinate system. A systematic description of the<br />

terrain shape, or more generally, the shape of 3D objects, requires many surface measurements<br />

to be made by h<strong>and</strong>. This can be automated if the location of homologue points can be found<br />

without manual interference. Slide 13.35 <strong>and</strong> Slide 13.36 explain.<br />

Two images exist, building a stereo-pair, <strong>and</strong> a window is taken out of each image to indicate a<br />

homologue area. The task exists, as shown in Slide 13.37 to automatically find the corresponding<br />

locations in such windows. For such purpose, we define a master-<strong>and</strong>-slave image. We take a<br />

window of the master image <strong>and</strong> move it over the slave image <strong>and</strong> at each location we compute a<br />

value, describing the similarity between the two image windows. At a maximum value of similarity,<br />

we have found a point of correspondence. We have as a result a point 1’ in image (’) <strong>and</strong> a point<br />

1” in image (”). These two points define two perspective rays from a perspective center through<br />

the image plane into the world coordinate system, <strong>and</strong> intersect at a surface point 1. We need to


240 CHAPTER 13. STEREOPSIS<br />

verify that the surface point 1 makes sense, we will not accept that point if it is totally inconsistent<br />

with its neighborhood, we will call this a gross error. We will accept the point if it is consistent<br />

with its neighborhood.<br />

Slide 13.38 explains the process of matching with the master-<strong>and</strong>-slave image windows. Note that<br />

the window may be of size K × J <strong>and</strong> we are looking in the master window of size N × M,<br />

obtaining many measures of similarity. Slide 13.39 defines individual pixels within the sub-image<br />

<strong>and</strong> is the basis <strong>for</strong> one particular measure of similarity shown in Slide 13.40. In it, a measure of<br />

similarity, called normalized correlation as defined by a value RN 2 (m, n) at location (m, n). The<br />

values in this <strong>for</strong>mula are the gray values W in the master <strong>and</strong> S in the slave image. A double<br />

summation occurs, bec<strong>aus</strong>e of the two-dimensional nature of the windows of size M × N. Slide<br />

13.41 illustrates two additional image correlation measures. The normalized correlation produces<br />

a value R, which typically assumes numbers between 0 <strong>and</strong> 1. Full similarity is expressed with<br />

a value 1, total dissimilarity results in a value 0. A non-normalized correlation will not have a<br />

range between 0 <strong>and</strong> 1, but will assume much larger ranges. However, whether the correlation is<br />

normalized or not, one will likely find the same extremas <strong>and</strong> there<strong>for</strong>e the same matchpoints.<br />

A much different measure of similarity is the sum of absolute differences in gray values. We<br />

essentially sum up the absolute differences in gray between the master-<strong>and</strong>-slave images at a<br />

particular location (m, n) of the window. The computation is much faster than the computation<br />

of a correlation since we avoid the squaring of values, also if a measure of similarity becomes larger<br />

than a previous value, we can stop the double summation, since we have already found a lower<br />

value of absolute differences, <strong>and</strong> there<strong>for</strong>e a more likely place at which maximum similarities are<br />

achieved. Slide 13.42 explains how the many computations of correlation values result in a window<br />

of such correlation values <strong>and</strong> we need to find the extremum, the highest correlation within the<br />

window, as marked by a star in Slide 13.42.<br />

Problems occur if we have multiple extremas <strong>and</strong> we don’t know which one to choose.<br />

Slide 13.43 suggests that various techniques exist that accelerate the matching process. Slide 13.44<br />

indicates how the existence of a pyramid will allow us to do a preliminary match with reduced<br />

versions of the two images <strong>and</strong> then limit the size of the search windows dramatically <strong>and</strong> thereby<br />

increase the speed of finding successful matches. We call this a hierarchical matching approach.<br />

Another trick is shown in Slide 13.45 where an input image is converted into a gradient image or<br />

an image of interesting features. Instead of matching two gray value images, we match two edge<br />

images. A whole theory exists on how to optimize the search <strong>for</strong> edges in images in preparation<br />

<strong>for</strong> a stereo-matching approach. Slide 13.46 explains that a high-pass filter that suppresses noise<br />

<strong>and</strong> computes edges is preferable. Such a filter is the so-called LoG-filter or Laplacian-of-G<strong>aus</strong>sian<br />

trans<strong>for</strong>mation of an image. Where we get two lines <strong>for</strong> each edge since we are looking <strong>for</strong> zerotransitions<br />

1 . That subject is an extension of the topic of filtering.<br />

Prüfungsfragen:<br />

• Bestimmen Sie mit Hilfe der normalisierten Korrelation RN 2 (m, n) jenen Bild<strong>aus</strong>schnitt innerhalb<br />

des fett umr<strong>and</strong>eten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen<br />

Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie<br />

den gefundenen Bereich in Abbildung B.25!<br />

Antwort:<br />

[ ∑M ∑ 2<br />

N<br />

RN(m, 2 j=1 k=1 W (j, k)S m,n(j, k)]<br />

n) = ∑ M ∑ N<br />

j=1 k=1 [W (j, k)]2 · ∑M ∑ N<br />

k=1 [S m,n(j, k)] 2<br />

j=1<br />

1 in German: Nulldurchgänge


13.7. AUTOMATED STEREO-MEASUREMENTS 241<br />

c WS :=<br />

c WW :=<br />

c SS :=<br />

⎡<br />

⎤<br />

M∑ N∑<br />

⎣ W (j, k)S m,n (j, k) ⎦<br />

j=1 k=1<br />

M∑ N∑<br />

[W (j, k)] 2<br />

j=1 k=1<br />

M∑ N∑<br />

[S m,n (j, k)] 2<br />

j=1 k=1<br />

2<br />

Position c WS c WW c SS RN 2 (m, n)<br />

links oben 25 6 5 0.833<br />

rechts oben 25 6 6 0.694<br />

links unten 16 6 6 0.444<br />

rechts unten 4 6 6 0.111<br />

Die beste Übereinstimmung besteht links oben.<br />

• Nach welchem Grundprinzip arbeiten Verfahren, die <strong>aus</strong> einem Stereobildpaar die Oberfläche<br />

eines in beiden Bildern sichtbaren Körpers rekonstruieren können?


242 CHAPTER 13. STEREOPSIS


13.7. AUTOMATED STEREO-MEASUREMENTS 243<br />

Slide 13.1 Slide 13.2 Slide 13.3 Slide 13.4<br />

Slide 13.5 Slide 13.6 Slide 13.7 Slide 13.8<br />

Slide 13.9 Slide 13.10 Slide 13.11 Slide 13.12<br />

Slide 13.13 Slide 13.14 Slide 13.15 Slide 13.16<br />

Slide 13.17 Slide 13.18 Slide 13.19 Slide 13.20<br />

Slide 13.21 Slide 13.22 Slide 13.23 Slide 13.24<br />

Slide 13.25 Slide 13.26 Slide 13.27 Slide 13.28


244 CHAPTER 13. STEREOPSIS<br />

Slide 13.29 Slide 13.30 Slide 13.31 Slide 13.32<br />

Slide 13.33 Slide 13.34 Slide 13.35 Slide 13.36<br />

Slide 13.37 Slide 13.38 Slide 13.39 Slide 13.40<br />

Slide 13.41 Slide 13.42 Slide 13.43 Slide 13.44<br />

Slide 13.45 Slide 13.46 Slide 13.47


Chapter 14<br />

Classification<br />

14.1 Introduction<br />

Concepts of classification cannot just be used in image analysis <strong>and</strong> computer vision but also in<br />

many other fields where one has to make decisions.<br />

First, we want to define the problem, then see some examples. We then review an heuristic approach<br />

called minimum distance classifier. We finally go through the Bayes Theorem as the basis<br />

of statistical classification. We round out this chapter with a <strong>and</strong> sketch of a simple implementation<br />

based on the Bayes Theorem.<br />

Classification is a topic based to a considerable extent on the field of statistics, dealing with<br />

probabilities, errors, estimations. We will stay away from statistics here, but only take a short<br />

look.<br />

What is the definition of classification?<br />

We have object classes C i , i = 1, . . . , n, <strong>and</strong> we search a certain class C i which belongs with a set of<br />

observations. The question is first which observations to make <strong>and</strong> then second is the classification<br />

itself, namely the decisions to which class the observations belong.<br />

14.2 Object Properties<br />

Let us review object features. Objects have colors, texture, height, whatever one can imagine. If<br />

we classify the types of l<strong>and</strong> use in Austria, as suggested in Slide 14.5, a set of terrain surface<br />

properties will be needed perhaps from satellite images <strong>and</strong> public records. Slide 14.6 enumerates<br />

the 7 properties of electromagnetic radiation one can sense remotely, say by camera, thermal<br />

images, radiometry, radar <strong>and</strong> interferometric sensors. As a sensor collects image data about a<br />

scene from a distance, up to 7 characteristics are accessible.<br />

However, the properties of the sensed signal may be used to “invert” it into a physical parameter<br />

of the object. Examples may be the object point’s moisture or roughness, possibly its geometric<br />

shape. Slide ?? illustrates a camera image of a small segment of skin with a growth called lesion<br />

that could be cancer. One can extract from the physically observed color image some geometric<br />

properties of the lesion such as length, width, roughness of the edge etc.<br />

Slide ?? is a fingerprint, Slide 14.9 a set of derived numbers describing the finger print. Each<br />

number is associated with a pixel <strong>for</strong> a feature vector per pixel, or with a larger object such as<br />

the lesion or finger print. The feature vector x is the input to a classification.<br />

245


246 CHAPTER 14. CLASSIFICATION<br />

Prüfungsfragen:<br />

• Welche physikalischen Merkmale der von einem Körper <strong>aus</strong>ges<strong>and</strong>ten oder reflektierten<br />

Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)?<br />

14.3 Features, Patterns, <strong>and</strong> a Feature Space<br />

Algorithm 36 Feature space<br />

1: FeatureSpace = CreateHyperCube(n-Dimensional); {Create an n-Dimensional Hypercube}<br />

2: <strong>for</strong> all Pixels in Image do<br />

3: FeatureSpace[Pixel[Plane-1], Pixel[Plane-2], .. Pixel[Plane-n]] +=1; {Increment the<br />

corresponding Point in the FeatureSpace by 1}<br />

4: end <strong>for</strong> {This algorithm creates a Feature-Space represented by a n-Dimensional Hypercube.}<br />

If we have to a color classification, then our features will be “color”. In a color image we represent<br />

color via the red-green-blue (RGB) planes. Recall the eight bit gray value image representing the<br />

R channel, next the G channel <strong>and</strong> last the B channel, representing red, green, blue.<br />

SlideFigure x suggests color classifications, but has 4 images or channels, <strong>for</strong> instance, infrared<br />

(IR) in addition to RGB, or temperature or whatever we can find as an object feature.<br />

We now build up a feature space. In the case of RGB we would have three dimensions. Slide 14.12<br />

presents just tow dimensions <strong>for</strong> simplicity, <strong>for</strong> example R <strong>and</strong> G. If we add more features (B, IR,<br />

temperature...) we end up with hyperspaces which are hard to visualize.<br />

14.4 Principle of Decisions<br />

Slide 14.14 illustrates what we would like to get from the classifier’s decisions: each object, in this<br />

case pixel, is to be assigned to a class, here denoted by O 1 , O 2 , O 3 . . .<br />

The simplest method of classification is a so-called minimum-distance classifier. Slide 14.19<br />

presents a 2-dimensional feature space. Each entry into this 2D space is a vector x = (x 1 , x 2 ) T or<br />

(g 1 , g 2 ) T , with the observations x 1 , x 2 or g 1 , g 2 , <strong>for</strong> example representing the amount of red (R)<br />

or green (G) as an 8-bit digital number DN from an image.<br />

These observations describe in this case one pixel each <strong>and</strong> we find that the value <strong>for</strong> R may be<br />

50 <strong>and</strong> <strong>for</strong> G 90. This determines a unique entry in the feature space. As we make observations<br />

of known objects we may define a so-called learning phase, in which we find feature pairs defining<br />

a specific class. R = 50, G = 90 might be a type of object.<br />

We now calculate the mean value of a distribution which is nothing else than the expected value of<br />

a set of observations. The arithmetic mean in this case is obtained by summing up all the values<br />

<strong>and</strong> calculating the mean. We connect those means via a straight line <strong>and</strong> define a line halfway<br />

between the means perpendicular to the connection line. This is the boundary between the two<br />

classes is called the discriminating function.<br />

If we now make an observation of a new unknown object (pixel), we simply determine the distances<br />

to the various means. In Slide 14.16 the new object belongs class O 3 . This is the minimum distance<br />

classifier.<br />

What could be a problem with the minimum distance classifier? Suppose that in the learning<br />

phase one makes an error <strong>and</strong> <strong>for</strong> the class O 3 we make an “odd” observation. This will affect


14.4. PRINCIPLE OF DECISIONS 247<br />

Algorithm 37 Classification without rejection<br />

TYPE pattern =<br />

feature: ARRAY [1 .. NbOfFeatures] of Integer;<br />

classIdentifier: Integer;<br />

Classify-by-MinimumDistance (input: pattern)<br />

this Method sets the ”classIdentifier” of ”input”<br />

to the class represented by the nearest sample-Pattern<br />

<strong>for</strong> i:=1 to NbOfSamples do<br />

Distance := 0<br />

...initial value<br />

Summarizing all differences between ”input” <strong>and</strong> ”SamplePattern[i]”:<br />

<strong>for</strong> j:=1 to NbOfFeatures do<br />

Difference := input.feature[j] - SamplePattern[i].feature[j]<br />

Distance := Distance + |Difference|<br />

end <strong>for</strong><br />

if i=1 then<br />

minDistance := Distance<br />

end if<br />

...initial value<br />

Setting the Class:<br />

if Distance ≤ minDistance then<br />

minDistance := Distance<br />

input.classIdentifier := SamplePattern[i].classIdentifier<br />

end if<br />

end <strong>for</strong><br />

Classify-by-DiscriminationFunction (input: pattern)<br />

this Method sets the ”classIdentifier” of ”input”<br />

to the class with maximum function result<br />

<strong>for</strong> i:=1 to NbOfClasses do<br />

Sum := 0<br />

...initial value<br />

Summarizing all function results of the input-features:<br />

<strong>for</strong> j:=1 to NbOfFeatures do<br />

functionResult := DiscriminationFunction[i] (input.feature[j])<br />

Sum := Sum + functionResult<br />

end <strong>for</strong><br />

if i=1 then<br />

maxSum := Sum<br />

end if<br />

Setting the Class:<br />

if Sum ≥ maxSum then<br />

maxSum := Sum<br />

input.classIdentifier := i<br />

end if<br />

end <strong>for</strong><br />

...initial value<br />

...representing the actual function set


248 CHAPTER 14. CLASSIFICATION<br />

the expected value <strong>for</strong> the entire data set. One problem is then that we have not considered<br />

the “uncertainty” of the observation in defining the various classes. This “uncertainty” would<br />

be represented by the “variance” of our observations. If the observations are clustered together<br />

closely, then their variance is small. If they are spread out widely, then their variance is larger.<br />

Variance is not considered in a minimum classifier. Figure x illustrates that each pixel gets<br />

classified <strong>and</strong> assigned to a class. There are no rejections where the classifier is unable to make a<br />

decision <strong>and</strong> rejects a pixel/object/feature vector as belonging to none of the classes<br />

Prüfungsfragen:<br />

• Gegeben seien Trainingspixel mit den in der beiliegenden Tabelle ?? angegebenen Grauwerten.<br />

Gegeben sei auch ein neues Pixel x neu = (13, 7).<br />

1. Spannen Sie bitte nun grafisch einen zwei-dimensionalen Merkmalsraum auf und tragen<br />

Sie die Lage der Trainingspixel ein.<br />

2. Beschreiben Sie bitte einen einfachen Rechenvorgang (Algorithmus) zur Entscheidung,<br />

welcher ”<br />

Objektklasse“ dieses neue Pixel mit hoher Wahrscheinlichkeit angehören wird.<br />

3. Führen Sie die numerische Berechnung dieser Entscheidung durch und begründen Sie<br />

daher numerisch die Zurdnung des neuen Pixels zu einer der in den Trainingspixeln<br />

dargestellten Objektklassen.<br />

14.5 Bayes Theorem<br />

Algorithm 38 Classification with rejection<br />

1: P max := −1 {initial value}<br />

2: P min := 0.6 {choosen border <strong>for</strong> what to classify}<br />

3: while there is a pixel to classify do<br />

4: pick <strong>and</strong> remove pixel from the to-do-list<br />

5: x := f(pixel) {n-dim feature vector, represents in<strong>for</strong>mation about pixel}<br />

6: <strong>for</strong> all existing classes C i do<br />

7: with x calculate a posteriori probability P (C i |x) <strong>for</strong> pixel<br />

8: if P (C i |x) > P max then<br />

9: P max := P (C i |x)<br />

10: k := i {store the actual most probable class k <strong>for</strong> pixel}<br />

11: end if<br />

12: end <strong>for</strong><br />

13: if P max > P min then<br />

14: add pixel to corresponding class k {classification}<br />

15: else<br />

16: leave pixel unclassified {rejection}<br />

17: end if<br />

18: end while<br />

Bayes Theorem looks complicated, but is not. We define a probability that an observation x<br />

belongs to a class C i . We call it an a-posteriori probability bec<strong>aus</strong>e it is a probability of a result,<br />

after the classification. This resulting probability is computed from 3 other probabilities. The<br />

first is the result of the learning phase which is the probability that given a class C i , we make the<br />

observation x. Second, we have the a-priori knowledge of the expert providing a probability that<br />

a class C i may occur. The third probability is the so-called joint probability of observation x <strong>and</strong><br />

class C i .


14.5. BAYES THEOREM 249<br />

This <strong>for</strong>mula will not help us with the implementation of software codes. But the expression in<br />

Slide 14.18 serves to explain relationships. A sketch of a possible implementation follows. First<br />

we make a very common assumption in Slide 14.18. This assumption is called the closed world<br />

assumption over all the classes stating that there is no unknown class <strong>and</strong> that an observation will<br />

belong to one of the n classes. This expresses itself in statistics by means of a sum of all posteriori<br />

probabilities being 1. For example colors: there is no pixel in the image where we do not know<br />

a color. Bayes Theorem simplifies under this assumption since the joint probability is a constant<br />

factor 1/a.<br />

The problem with all classifiers is the need to model expert knowledge, then to learn one’s system.<br />

The hard thing is to find a correct computation model. One simple implementation would thus<br />

be that we just calculate the variances of our observations in the learning phase. We compute<br />

not only the means, as we did be<strong>for</strong>e, but also the variance or st<strong>and</strong>ard deviation. We need to<br />

learn our pixels, our colors, our triplets in color, we need to assign certain triplets to certain colors<br />

<strong>and</strong> this will give us our means <strong>and</strong> our variances as in Slide 14.23. Note that the slide shows 2,<br />

not 3 dimensions. The mean value <strong>and</strong> the variance define a G<strong>aus</strong>s an function representing the<br />

so-called distribution of the observations.<br />

In Slide 14.23 the ellipse may <strong>for</strong> instance define the 1-sigma border: “Sigma” or σ is the st<strong>and</strong>ard<br />

deviation, σ 2 is the variance. The probability is represented by a curve or surface in 2D that is<br />

called a “G<strong>aus</strong>sian curve” or surface.<br />

This means that the probability that an observation within the ellipse of O 3 is 66%. If the ellipse<br />

is drawn at 3σ (3 times the st<strong>and</strong>ard deviation), then the probability goes to 99%.<br />

By calculating the variance <strong>and</strong> the sigma border <strong>for</strong> each class C i or O i we produce n G<strong>aus</strong>sian<br />

functions. In Slide ?? we have two dimensions, red <strong>and</strong> green. We make an observation which we<br />

want to classify. We do not calculate the minimum distance, but we check in which ellipse the<br />

vector of a new observation will come to lie.<br />

To summarize, we have per<strong>for</strong>med two steps: we calculate the mean <strong>and</strong> variance of each class in<br />

the learning phase <strong>and</strong> then “intersect” the unknown observation with the result of the learning<br />

phase. A simple Bayes classifier requires no more than to determine the G<strong>aus</strong>sian function<br />

discussed above.<br />

The G<strong>aus</strong>sian function in a single dimension <strong>for</strong> classing is<br />

d j (x) = √ 1 exp<br />

[− (x − m j) 2 ]<br />

,<br />

2σj 2σ j<br />

with x being the feature vector, σ j the st<strong>and</strong>ard deviation, m j the mean <strong>and</strong> j is the index<br />

associated with a specific class. In more than one dimension, m <strong>and</strong> x get replaced by vectors<br />

<strong>and</strong> σ becomes a matrix.<br />

This algorithm is summarized in Slide 14.23: m is the mean of each class, C is the variance. In<br />

a multi-dimensional context, C is a matrix of numbers, the so-called co-variance matrix. It is<br />

computed using the coordinates of the mean m. The expression E{·} in Slide 14.23 is denoted as<br />

expected value <strong>and</strong> can be estimated by<br />

or equivalently<br />

c ij = 1 N<br />

C = 1 N<br />

N∑<br />

xx T − mm T<br />

k=1<br />

N∑<br />

(x k,i − m i )(x k,j − m j ), i, j = 1 . . . M,<br />

k=1<br />

where M is the dimension of feature space <strong>and</strong> N is the number of feature vectors or pixels per<br />

class <strong>for</strong> the learning phase.


250 CHAPTER 14. CLASSIFICATION<br />

As shown in Slide 14.23, each class of objects gets defined by an ellipse.<br />

Prüfungsfragen:<br />

• In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion<br />

der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine G<strong>aus</strong>ssche Normalverteilung<br />

zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren<br />

benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p 1 , p 2 und p 3 in zwei Dimensionen<br />

(also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C!<br />

Antwort:<br />

Zuerst den Mittelwert m berechnen:<br />

m = 1 [( 1<br />

3 −1<br />

) ( 3<br />

+<br />

3<br />

) ( 2<br />

+<br />

4<br />

)] ( ) 2<br />

=<br />

2<br />

dann die (p i − m) · (p i − m) T bestimmen:<br />

( )<br />

(p 1 − m) · (p 1 − m) T −1<br />

= · ( −1 −3 ) ( 1 3<br />

=<br />

−3<br />

3 9<br />

( )<br />

(p 2 − m) · (p 2 − m) T 1<br />

= · ( 1 1 ) ( ) 1 1<br />

=<br />

1<br />

1 1<br />

( )<br />

(p 3 − m) · (p 3 − m) T 0<br />

= · ( 0 2 ) ( ) 0 0<br />

=<br />

2<br />

0 4<br />

)<br />

Die Kovarianzmatrix ist<br />

C = 1 3<br />

3∑<br />

(p i − m) · (p i − m) T = 1 3<br />

i=1<br />

( 2 4<br />

4 14<br />

• Es sei p(x), x ∈ R 2 die Wahrscheinlichkeitsdichtefunktion gemäß G<strong>aus</strong>sscher Normalverteilung,<br />

deren Parameter aufgrund der drei Merkmalsvektoren p 1 , p 2 und p 3 <strong>aus</strong> Aufgabe<br />

B.2 geschätzt wurden. Weiters seien zwei Punkte x 1 = (0, 3) T und x 2 = (3, 6) T im Merkmalsraum<br />

gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre<br />

Antwort):<br />

1. p(x 1 ) < p(x 2 )<br />

2. p(x 1 ) > p(x 2 )<br />

Hinweis: Zeichnen Sie die beiden Punkte x 1 und x 2 in Abbildung B.28 ein und überlegen Sie<br />

sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C <strong>aus</strong> Aufgabe B.2 weisen.<br />

Antwort: Es ist p(x 1 ) < p(x 2 ), da x 2 in Richtung“ des größten Eigenvektors von C liegt<br />

”<br />

(gemessen vom Klassenzentrum m) und daher die Wahrscheinlichkeit von x 2 größer ist als<br />

die von x 1 .<br />

)<br />

14.6 Supervised Classification<br />

The approach where training/learning data exist is called supervised classification. Unsupervised<br />

is a method where pixels (or objects) get entered into the feature space not knowing what they<br />

are. In that case a search gets started to detect clusters in the data. The search comes up with<br />

aggregations of pixels/objects <strong>and</strong> simply defines that each aggregate is a class.


14.7. REAL LIFE EXAMPLE 251<br />

In contrast to this approach common classification starts out from known training pixels or objects.<br />

A real life case is shown in Slide 14.22. A clustering algorithm may find here 3 clusters. In fact,<br />

Slide ?? is the actual segmentation of these training pixels into 6 object classes (compare with<br />

Slide 14.23).<br />

The computation in the learning or training phase which leads to Slide ??, is the basis to receive<br />

new pixels. If they fall within the agreed-upon range of a class, the pixel is assigned to that class..<br />

Otherwise it is not assigned to any class: it gets rejected.<br />

14.7 Real Life Example<br />

Slide 14.26 to Slide 14.31 illustrate a classification of the territory of Austria on behalf of a cellphone<br />

project where surface cover was needed <strong>for</strong> wave propagation <strong>and</strong> signal strength assessment.<br />

It is suggested that the classification was unsupervised, thus without training pixels <strong>and</strong> simply<br />

looking <strong>for</strong> groups of similar pixels (clusters). A rather “noisy” result is obtained in Slide 14.28,<br />

Slide 14.29 presents the <strong>for</strong>est pixels where many pixels get assigned to different classes, although<br />

they are adjacent to one another. This is the result of not considering “neighborhoods”. One can<br />

fix this by means of a filter that will aggregate adjacent pixels into one class if this does not totally<br />

contradict the feature space. The city of Vienna’s surface cover <strong>and</strong> l<strong>and</strong>use result is shown in<br />

Slide 14.31.<br />

14.8 Outlook<br />

In the specialization class on “Image Processing <strong>and</strong> Pattern Recognition” we will discuss more<br />

details of this important <strong>and</strong> central topic of:<br />

• Multi-variable probabilities<br />

• Neural network classification<br />

• Dependencies between features<br />

• Non statistical classification (shape, chain codes)<br />

• Transition to Artificial Intelligence AI


252 CHAPTER 14. CLASSIFICATION


14.8. OUTLOOK 253<br />

Slide 14.1 Slide 14.2 Slide 14.3 Slide 14.4<br />

Slide 14.5 Slide 14.6 Slide 14.7 Slide 14.8<br />

Slide 14.9 Slide 14.10 Slide 14.11 Slide 14.12<br />

Slide 14.13 Slide 14.14 Slide 14.15 Slide 14.16<br />

Slide 14.17 Slide 14.18 Slide 14.19 Slide 14.20<br />

Slide 14.21 Slide 14.22 Slide 14.23 Slide 14.24<br />

Slide 14.25 Slide 14.26 Slide 14.27 Slide 14.28


254 CHAPTER 14. CLASSIFICATION<br />

Slide 14.29 Slide 14.30 Slide 14.31 Slide 14.32


Chapter 15<br />

Resampling<br />

We have previously discussed the idea of resampling under the heading of Trans<strong>for</strong>mation (Chapter<br />

9). It was a side-topic in that chapter, essentially an application. We will focus on the topic here,<br />

using many of the illustrations from previous chapters.<br />

Prüfungsfragen:<br />

• Was versteht man unter (geometrischem) ”<br />

Resampling“, und welche Möglichkeiten gibt es,<br />

die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren<br />

anh<strong>and</strong> einer Skizze und ggf. eines Formel<strong>aus</strong>drucks!<br />

15.1 The Problem in Examples of Resampling<br />

Slide 15.3 recalls an input image that is distorted <strong>and</strong> illustrates in connection with Slide 15.4<br />

the rectification of the image, a geometric trans<strong>for</strong>mation from the input geometry to an output<br />

geometry. The basic idea is illustrated in Slide 15.5. On the left, we have an input image geometry,<br />

representing an distorted image. On the right, we have the output geometry, representing a<br />

corrected or rectified image. The suggestion is here that we take a grid mesh of lines to cut up the<br />

input image <strong>and</strong> we stretch each quadrilateral on the input image to fit into a perfect square on the<br />

output image. This casual illustration of geometric trans<strong>for</strong>mation actually presents reasonably<br />

fairly what happens in geometric trans<strong>for</strong>mation <strong>and</strong> resampling in digital image processing.<br />

Resampling is also applicable in a context where we have individual images taken at different times<br />

from different vantage points <strong>and</strong> we need to merge them into a continuous large image. We call<br />

this process mosaicing. The images might overlap, <strong>and</strong> the overlap is used to achieve a match<br />

between the images, finding homologue points. Those are the basis <strong>for</strong> a geometric trans<strong>for</strong>mation<br />

<strong>and</strong> resampling process, to achieve the mosaic.<br />

Finally, resampling is also an issue in computer graphics when dealing with texture. We may<br />

have an input image, showing a particular pattern, <strong>and</strong> as we geometrically trans<strong>for</strong>m or change<br />

the scale of that pattern, we will have to resample the texture. The illustration shows so-called<br />

MIP-maps which are small image segments which are rich in detail.<br />

15.2 A Two-Step Process<br />

Geometric trans<strong>for</strong>mation <strong>and</strong> resampling really are typically per<strong>for</strong>med in a two-step process.<br />

The first step is the establishment of a geometric relationship between the input <strong>and</strong> the output<br />

255


256 CHAPTER 15. RESAMPLING<br />

images, essentially a coordinate processing issue. We typically have a regular pattern of pixels<br />

in the input image, <strong>and</strong> conceptually we need to find a geometric location in the output image,<br />

representing the center of each pixel from the input image. Vice-versa, we may have a regular<br />

image matrix on the output side (the ground), <strong>and</strong> <strong>for</strong> each center of an output pixel, we need to<br />

find the location in the input image, from where to pick a gray value. Slide 15.9 explains. Slide<br />

15.10 <strong>and</strong> augment that explanation. We do have an input image that is geometrically distorted.<br />

The object might be a stick figure as suggested in Slide 15.10. The output or target image is a<br />

trans<strong>for</strong>med stick figure. We have regular pixels in the target or output image, that need to be<br />

assigned gray values as a function of the input image. Slide 15.12 explains the idea of the two-step<br />

process: We have on the one h<strong>and</strong> a step 1 with a manipulation of coordinates, mapping the input<br />

(x, y) into output (ˆx, ŷ) coordinates. We have on the other h<strong>and</strong> a step 2 with a search <strong>for</strong> a gray<br />

value <strong>for</strong> each output pixel, starting <strong>for</strong>m the output location of a pixel <strong>and</strong> looking in the input<br />

image <strong>for</strong> that gray value.<br />

15.2.1 Manipulation of Coordinates<br />

We have correspondence points between image-space <strong>and</strong> target or output space. These correspondence<br />

points serve to establish a geometric trans<strong>for</strong>mation that converts the input (x, y)<br />

coordinates of an arbitrary image location into an output as (i, j) coordinate in the target space.<br />

This particular trans<strong>for</strong>mation has its unknown trans<strong>for</strong>mation parameters which have to be computed<br />

in a separate process called spatial trans<strong>for</strong>mation. We will discuss in a moment how this<br />

is done efficiently.<br />

15.2.2 Gray Value Processing<br />

Once this spatial trans<strong>for</strong>mation is known, we need to go through the output image <strong>and</strong> <strong>for</strong> each<br />

pixel center (i, j) we need to find an input coordinate location (x, y) <strong>and</strong> we need to grab that<br />

gray value <strong>and</strong> place that value at the pixel location of the output or target image.<br />

Algorithm 39 Calculation with a node file<br />

1: while there is another quadrangle quadin in the input node file do<br />

2: if there is a corresponding quadrangle quadout in the output node file then<br />

3: read the four mesh points of the quadrangle quadin<br />

4: read the four mesh points of the quadrangle quadout<br />

5: calculate the (eight) parameters params of the (bilinear) trans<strong>for</strong>mation<br />

6: save the parameters params<br />

7: else<br />

8: error {no corresponding quadrangle quadout <strong>for</strong> quadrangle quadin}<br />

9: end if<br />

10: end while<br />

11: <strong>for</strong> all pixels pout of the output image do<br />

12: get the quadrangle quadout in which pixel pout lies<br />

13: get the parameters params corresponding to the quadrangle quadout<br />

14: calculate the input image position pin of pout with the parameters params<br />

15: calculate the grey value grey of pixel pout according to the position of pin<br />

16: assign the grey value grey to pout<br />

17: end <strong>for</strong>


15.3. GEOMETRIC PROCESSING STEP 257<br />

15.3 Geometric Processing Step<br />

See Algorithm 39. We go back to the idea that we cut up the input image into irregular meshes, <strong>and</strong><br />

each corner of the mesh pattern represents a corner of a regular mesh pattern in the output image.<br />

We call these mesh points also nodes, we obtain a node file in the input image that corresponds<br />

to the node file in the output image. Slide 15.15 suggests that the geometric trans<strong>for</strong>mation that<br />

will relate the irregular meshes of the input image to the rectangular meshes of the output image<br />

could be a polynomial trans<strong>for</strong>mation as previously discussed. More generally, we use a simple<br />

trans<strong>for</strong>mation that takes four input points into the four output points as suggested in Slide 15.16.<br />

That is a bi-linear trans<strong>for</strong>mation with 8 coefficients. The relationships between the mesh points<br />

of the input <strong>and</strong> output image are obtained as a function of control points 1 .<br />

Suggested in Slide 15.16 <strong>and</strong> Slide 15.17 are control points at the locations marked by little stars. It<br />

is those stars that define the parameters of a complex trans<strong>for</strong>mation function. The trans<strong>for</strong>mation<br />

function is applied to the individual mesh points in the input <strong>and</strong> output images. For each location<br />

in the output image, we compute the corresponding input mesh point. Slide 15.18 summarizes the<br />

result of these transactions. Recall that we had given control points, which we use to compute the<br />

trans<strong>for</strong>mation function. With the trans<strong>for</strong>mation function we establish image coordinates that<br />

belong to mesh points in the image representing regularly spaced mesh points in the output image.<br />

With this process, we have established the geometric relationship between input <strong>and</strong> output image<br />

using the ideas of trans<strong>for</strong>mations <strong>and</strong> resulting with a node file in the input- <strong>and</strong> output images.<br />

Algorithm 40 Nearest neighbor<br />

1: read float-coordinates x 0 <strong>and</strong> y 0 of the input-point<br />

2: x 1 := round(x 0 ) {result is an integer}<br />

3: y 1 := round(y 0 ) {result is an integer}<br />

4: return grayvalue of the new point (x 1 , y 1 )<br />

15.4 Radiometric Computation Step<br />

After the geometric relationships have been resolved, we now go to an arbitrary output pixel <strong>and</strong><br />

using its position within a square mesh, we compute the location in the input image, using the<br />

bi-linear relationship within the mesh to find the location in the input image as suggested in Slide<br />

15.20. That location will be an arbitrary point (x, y) that is not at the center of any pixel.<br />

We now can select among various techniques to find a gray value <strong>for</strong> that location to be put into<br />

the output pixel. Suggested in Slide 15.20 are 3 different techniques. If we take the gray value of<br />

the pixel onto which location (x, y) falls, we call this the nearest neighbor(see Algorithm 40). If we<br />

take four pixels that are nearest to the location (x, y), we can compute a bi-linear interpolation (see<br />

Algorithm ??). If we use the 9 closest pixels, we can use a bi-cubic interpolation. We differentiate<br />

between nearest neighbor, bi-linear <strong>and</strong> bi-cubic resampling in accordance with the technique<br />

<strong>for</strong> gray value assignment. Slide 15.21 specifically illustrates the bi-linear interpolation: which<br />

gray value do we assign to the output pixel as shown in Slide 15.21? We take the 4 gray values<br />

nearest the location (x, y), those gray values are g 1 , g 2 , g 3 , g 4 , <strong>and</strong> by a simple interpolation, using<br />

auxiliary values a <strong>and</strong> b, we obtain a gray value bi-linearly interpolated from the four gray values<br />

g 1 , g 2 , g 3 , g 4 .<br />

Prüfungsfragen:<br />

1 in German: Pass-Punkte


258 CHAPTER 15. RESAMPLING<br />

• Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das<br />

Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Trans<strong>for</strong>mation des<br />

Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Trans<strong>for</strong>mation<br />

einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1<br />

angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit<br />

grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung<br />

erfolgt.<br />

15.5 Special Case: Rotating an Image by Pixel Shifts<br />

We show in Slide 15.23 an aerial oblique image of an urban scene. We want to rotate that image<br />

by 45 o . We achieve this by simply shifting rows <strong>and</strong> columns of pixels (see Algorithm ??).. In<br />

a first step, we shift each column of the image, going from right to left <strong>and</strong> increasingly shifting<br />

the rows down. In a second step, we now take the rows of the resulting image <strong>and</strong> shift them<br />

horizontally. As a result, we obtain a rotated version of the original image.


15.5. SPECIAL CASE: ROTATING AN IMAGE BY PIXEL SHIFTS 259


260 CHAPTER 15. RESAMPLING<br />

Slide 15.1 Slide 15.2 Slide 15.3 Slide 15.4<br />

Slide 15.5 Slide 15.6 Slide 15.7 Slide 15.8<br />

Slide 15.9 Slide 15.10 Slide 15.11 Slide 15.12<br />

Slide 15.13 Slide 15.14 Slide 15.15 Slide 15.16<br />

Slide 15.17 Slide 15.18 Slide 15.19 Slide 15.20<br />

Slide 15.21 Slide 15.22 Slide 15.23 Slide 15.24<br />

Slide 15.25 Slide 15.26 Slide 15.27


Chapter 16<br />

About Simulation in Virtual <strong>and</strong><br />

Augmented Reality<br />

16.1 Various Realisms<br />

Recall that we have earlier defined various types of reality. We talked about virtual reality, that<br />

presents objects to the viewer that are modeled in a computer. Different from that is photographic<br />

reality that we experience by an actual photograph of the natural environment. It differs from the<br />

experience we have when we go in the real world <strong>and</strong> experience physical reality. You may recall<br />

that we also talked about emotions <strong>and</strong> there<strong>for</strong>e talked about psychological reality, different from<br />

the physical one. Simulation is now an attempt at creating a virtual environment that provides<br />

essential aspects of the physical or psychological reality in a human being without the presence of<br />

the full physical reality.<br />

16.2 Why simulation?<br />

To save money when training pilots, bus drivers, ship captains, soldiers, etc.<br />

Simulation servers may be used <strong>for</strong> disaster preparedness training. Simulation is big business.<br />

How realistic does a simulation have to be? Sufficiently realistic to serve the training purpose.<br />

There<strong>for</strong>e not under all circumstances do we need photorealism in simulation. We just need to<br />

have enough visual support to challenge the human in a training situation.<br />

16.3 Geometry, Texture, Illumination<br />

Simulation needs in<strong>for</strong>mation about the geometry of a situation, the illumination <strong>and</strong> the surface<br />

properties. These are three factors, illustrated in Slide 16.8, Slide 16.9, Slide 16.10. The geometry<br />

will not suffice if we need to recognize a particular scene. We will have difficulties with depth<br />

queues as a function of size. We have a much reduced quality of data if we ignore texture. Texture<br />

provides a greatly enhanced sense of realism <strong>and</strong> helps us better to estimate depth. In a disasterpreparedness<br />

scenario, the knowledge of windows <strong>and</strong> doors may be crucial <strong>and</strong> it may only be<br />

available through texture <strong>and</strong> not through geometry.<br />

Illumination is a third factor that creates shadows <strong>and</strong> light, again to help better underst<strong>and</strong> the<br />

context of a scene, estimate distances <strong>and</strong> intervisibility.<br />

261


262 CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY<br />

16.4 Augmented Reality<br />

We combine the real world <strong>and</strong> the computer generated representation of a modeled world that<br />

does not need to be in existence in reality. A challenge is the calibration of a system. We need<br />

to see the real world <strong>and</strong> what is superimposed on it is shown on the two monitors. This needs<br />

to match geometrically <strong>and</strong> in scale with the real environment that we see. There<strong>for</strong>e we need to<br />

define a world coordinate system <strong>and</strong> communicate that to the computer.<br />

We also need sufficient speed, so if we turn our head, the two stereo-images computed <strong>for</strong> visual<br />

consumption are recomputed instantly as a function of the changed angle. We need also to be<br />

accurate to assess any rotations or change of position.<br />

Magnetic positioning often is too slow <strong>and</strong> too inaccurate to serve the purpose well. For that<br />

reason, an optical auxiliary system may be included in an augmented reality environment, so that<br />

the world is observed through the camera <strong>and</strong> any change in attitude or position of the viewer<br />

is more accurately tracked than the magnetic position could achieve. However, a camera-based<br />

optical tracking system may be slow, too slow to act in real time at a rate of about thirty positioning<br />

computations per second. There<strong>for</strong>e the magnetic positioning may provide an approximate<br />

solution that is only refined by the optical tracking.<br />

Slide 16.13 illustrates an application with a game played by two people seeing the same chess<br />

board. An outside observer seeing the two players will see nothing. It is the two chess players who<br />

will see one another <strong>and</strong> the game board.<br />

Prüfungsfragen:<br />

• Beschreiben Sie den Unterschied zwischen ”<br />

Virtual Reality“ und ”<br />

Augmented Reality“.<br />

Welche Hardware wird in beiden Fällen benötigt?<br />

16.5 Virtual Environments<br />

If we exclude the real world from being experienced, then we talk about the virtual environment<br />

or, more customarily, virtual reality. We immerse ourselves in the world of data. However, we<br />

still have our own position <strong>and</strong> direction of viewing. As we move or turn our head we would like<br />

to have in a virtual environment a resulting effect of looking at a new situation. There<strong>for</strong>e, much<br />

as in augmented reality, do we have a need to recompute very rapidly the stereo-impression of the<br />

data world. However, virtual reality is simpler than augmented reality, bec<strong>aus</strong>e we don’t have the<br />

accuracy requirement to superimpose the virtual over the real, as we have in augmented reality.<br />

In a virtual reality environment, we would like to interact with the computer using our h<strong>and</strong>s<br />

<strong>and</strong> as a result we need some data garments that allow us to provide inputs to the compute, <strong>for</strong><br />

example by motions of our h<strong>and</strong>s <strong>and</strong> fingers.<br />

Prüfungsfragen:<br />

• Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter<br />

Trackingverfahren und erläutern Sie deren Vor- und Nachteile!<br />

Antwort:<br />

Tracking Vorteile Nachteile<br />

magnetisch robust kurze Reichweite<br />

schnell<br />

ungenau<br />

optisch genau An<strong>for</strong>derung an Umgebung<br />

aufwändig


16.5. VIRTUAL ENVIRONMENTS 263<br />

Slide 16.1 Slide 16.2 Slide 16.3 Slide 16.4<br />

Slide 16.5 Slide 16.6 Slide 16.7 Slide 16.8<br />

Slide 16.9 Slide 16.10 Slide 16.11 Slide 16.12<br />

Slide 16.13 Slide 16.14 Slide 16.15 Slide 16.16


264 CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY


Chapter 17<br />

Motion<br />

17.1 Image Sequence Analysis<br />

A fixed sensor may observe a moving object, as suggested in Slide 17.3, where a series of images<br />

is taken of moving ice in the arctic ocean. There is not only a motion of the ice, there is also<br />

a change of the ice over time. Slide 17.4 presents a product obtained from an image sequence<br />

analysis, representing a vector diagram of ice flows in the arctic ocean. The source of the results<br />

was a satellite radar system of NASA, called Seasat that flew in 1978. This is now available also<br />

from recent systems such as Canada’s Radarsat, currently orbiting the globe.<br />

17.2 Motion Blur<br />

Slide 17.6 illustrates a blurred image that is a result of an exposure taken while an object moved.<br />

If the motion is known, then its effect can be removed <strong>and</strong> we can restore an image as if no motion<br />

had happened.<br />

The inverse occurs in Slide 17.7, where the object was stable but the camera moved during the<br />

exposure. The same applies: if we can model the motion of the camera we will obtain a successful<br />

reconstruction of the object by removal of the motion blur of the camera Slide 17.7 suggests that<br />

simple filtering will not remove that blur. We need to model the effect of the motion. Yet the<br />

process itself is called an Anti-Blur filter.<br />

Prüfungsfragen:<br />

• Was versteht man unter ”<br />

motion blur“, und unter welcher Vor<strong>aus</strong>setzung kann dieser Effekt<br />

<strong>aus</strong> einem Bild wieder entfernt werden?<br />

Antwort: Durch Bewegung des aufgenommenen Objekts relativ zur Kamera während der<br />

endlichen Öffnungszeit der Blende wird das Bild verwischt“. Eine Entfernung dieses Effekts<br />

”<br />

setzt vor<strong>aus</strong>, dass diese Bewegung genau bekannt ist.<br />

17.3 Detecting Change<br />

Change may occur bec<strong>aus</strong>e of motion. Slide 17.9 explains the situation in which a group of people<br />

is imaged while a person is moving out of the field-of-view of the camera. An algorithm can be<br />

constructed that will detect the change between each image <strong>and</strong> its predecessors <strong>and</strong> in the process<br />

265


266 CHAPTER 17. MOTION<br />

allows one to map just changes. The inverse idea is to find what is constant <strong>and</strong> eliminate changes<br />

<strong>for</strong>m a sequence of images. An example is to compute texture of a building’s facade covered by<br />

trees.<br />

17.4 Optical Flow<br />

A rapid sequence of images may be obtained of a changing situation. An example is the observation<br />

of traffic. Optical flow is the analysis of the sequence of images, <strong>and</strong> the assessment of the motion<br />

that is evident from the image stream. A typical representation of optical flow is by vectors<br />

representing moving objects. Slide 17.12 explains.


17.4. OPTICAL FLOW 267<br />

Slide 17.1 Slide 17.2 Slide 17.3 Slide 17.4<br />

Slide 17.5 Slide 17.6 Slide 17.7 Slide 17.8<br />

Slide 17.9 Slide 17.10 Slide 17.11 Slide 17.12<br />

Slide 17.13 Slide 17.14 Slide 17.15 Slide 17.16<br />

Slide 17.17


268 CHAPTER 17. MOTION


Chapter 18<br />

Man-Machine-Interfacing<br />

Our University offers a separate class on Man-Machine Interaction or Human-<strong>Computer</strong>-Interfaces<br />

(HCI) as part of the multi-media program <strong>and</strong> as part of the computer graphics program. This<br />

topic relates to elements of <strong>Computer</strong>-<strong>Graphics</strong> <strong>and</strong> Image Analysis since visual in<strong>for</strong>mation in<br />

created <strong>and</strong> manipulated.<br />

18.1 Visualization of Abstract In<strong>for</strong>mation<br />

Use of color <strong>and</strong> shape are a widely applicable tool in converging in<strong>for</strong>mation. We have seen<br />

examples in the Chapter on Color, encoding terrain elevation or temperature in color, or marking<br />

contours of objects in color.<br />

A very central element in the man-machine interaction is the use of the human visual sense to<br />

present non-visual in<strong>for</strong>mation <strong>for</strong> communication <strong>and</strong> interaction. An example is shown in Slide<br />

18.3 where a diagram is presented that has on one axis the calendar time <strong>and</strong> on the other axis<br />

a measure of popularity of movies. The interface serves to find movies on a computer monitor<br />

by popularity <strong>and</strong> by age. Simultaneously, we can switch between various types of movies, like<br />

drama, mystery, comedy <strong>and</strong> so <strong>for</strong>th.<br />

Slide 18.4 is a so-called table-lens. This is a particular type of Excel sheet which shows the entire<br />

complexity of the sheet in the background <strong>and</strong> provides a magnifying class that can be moved over<br />

the spread sheet.<br />

Another idea is shown in Slide 18.5 with the so-called cone-tree, representing a file structure. It<br />

is a tree, which at its root has an entire directory, this is broken up into folders or subdirectories<br />

which are then further broken up until each leaf is reached representing an individual file. A<br />

similar idea is shown in Slide 18.6 called in<strong>for</strong>mation slices. We have a very large inventory of<br />

files, organized in subdirectories <strong>and</strong> directories. We can take subgroups of these subdirectories<br />

<strong>and</strong> magnify them, until we can recognize each individual file.<br />

18.2 Immersive Man-Machine Interactions<br />

The subject of man-machine interaction also is involved in an immersion of the human in the world<br />

of data, as we previously discussed in virtual reality which is sometimes denoted as immersive<br />

visualization. Of particular interest is the input to the computer by means other than a keyboard<br />

<strong>and</strong> mouse. This of course is increasingly by speech, but also by motions of the h<strong>and</strong>s <strong>and</strong><br />

fingers, or by the recognition of facial expressions. This represents a hot subject in man-machine<br />

interaction <strong>and</strong> ties in with computer graphics <strong>and</strong> image analysis.<br />

269


270 CHAPTER 18. MAN-MACHINE-INTERFACING<br />

Slide 18.1 Slide 18.2 Slide 18.3 Slide 18.4<br />

Slide 18.5 Slide 18.6 Slide 18.7 Slide 18.8<br />

Slide 18.9 Slide 18.10


Chapter 19<br />

Pipelines<br />

19.1 The Concept of an Image Analysis System<br />

Various ideas exist in the literature about a system <strong>for</strong> image analysis. The idea of a pipeline<br />

comes about if we consider that we have many components <strong>and</strong> algorithms in a repository of an<br />

image possessing library. In order to set up an entire image analysis process, we plug the individual<br />

processing steps together, much like a plumber will put a plumbing system in a building together<br />

from st<strong>and</strong>ard components. In computer graphics <strong>and</strong> image processing we call this plumbing also<br />

creation of a pipeline.<br />

As shown in Slide 19.3 an image analysis system always begins with image acquision <strong>and</strong> sensing.<br />

We build up a system by going through preprocessing <strong>and</strong> segmentation to representation,<br />

recognition <strong>and</strong> final use of the results of the image analysis system. All of this is built around<br />

knowledge.<br />

A somewhat different view combines the role of image analysis with the role of computer graphics<br />

<strong>and</strong> separates the role into half worlds, one of reality <strong>and</strong> one of computer models. In the simplest<br />

case, we have the world, within it a scene from which we obtain an image which goes into the<br />

computer. The image will be replaced by an image description which then leads to a scene<br />

description, which ultimately ends up with a description of the world.<br />

We can close the loop from the description of the world, go back to the world, make the transition<br />

from computer to reality by computer graphics.<br />

The idea of active vision is going from the world to a description of the world, closing the loop<br />

from an incomplete description of the world to a new second loop through the selection of a scene,<br />

selection of images <strong>and</strong> so <strong>for</strong>th as shown in Slide 19.12. If as in analogy to the previous model, we<br />

assign a central control element with expert knowledge, we have a similar idea as shown be<strong>for</strong>e.<br />

Prüfungsfragen:<br />

• Skizzieren Sie die ”<br />

Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen<br />

Szene mittels z-buffering und Gouraud-shading!<br />

19.2 Systems of Image Generation<br />

Prüfungsfragen:<br />

• Was wird in der Bildanalyse mit dem Begriff ”<br />

Active Vision“ bezeichnet?<br />

271


272 CHAPTER 19. PIPELINES<br />

19.3 Revisiting Image Analysis versus <strong>Computer</strong> <strong>Graphics</strong><br />

Slide 19.18 suggests that the transition from an image to a model of a scene is the subject of<br />

image underst<strong>and</strong>ing or image processing. The inverse, the transition from a scene model to an<br />

image is the subject of computer graphics. We do have a great overlap between image analysis<br />

<strong>and</strong> computer graphics when it concerns the real world. Image analysis will always address the<br />

real world, whereas computer graphics may deal with a virtual world that does not exist in reality.<br />

In cases where one goes from a model of a non-existing world to an image, we are not dealing with<br />

the inverse of image analysis.<br />

Prüfungsfragen:<br />

• Welche ist die wesentliche Abgrenzung zwischen <strong>Computer</strong>grafik und Bildanalyse, welches<br />

ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung<br />

erwünscht.<br />

Algorithm 41 z-buffer pipeline<br />

1: <strong>for</strong> y = 0 to YMAX do<br />

2: <strong>for</strong> x = 0 to XMAX do<br />

3: WritePixel(x, y, backgroundcolor)<br />

4: Z[x, y] := 0<br />

5: end <strong>for</strong><br />

6: end <strong>for</strong><br />

7: <strong>for</strong> all Polygons polygon do<br />

8: <strong>for</strong> all pixel in the projection of the polygon do<br />

9: pz :=GetZValue(polygon,x,y)<br />

10: if pz ≥ Z[x, y] then<br />

11: Z[x, y] := pz {new point is in front}<br />

12: WritePixel(x, y, Color of polygon at (x,y) )<br />

13: end if<br />

14: end <strong>for</strong><br />

15: end <strong>for</strong><br />

Algorithm 42 Phong pipeline<br />

1: set value ai {ai is the ambient intensity.}<br />

2: set value il {il is the intensity of the light source.}<br />

3: diff:=diffuse() {calculates the amount of light which directly fall in.}<br />

4: reflect:=reflection() {calculates the amount of light which reflect.}<br />

5: result:= ai + il * (diff + reflect) {<strong>for</strong>mula developed by Phong.}


19.3. REVISITING IMAGE ANALYSIS VERSUS COMPUTER GRAPHICS 273<br />

Slide 19.1 Slide 19.2 Slide 19.3 Slide 19.4<br />

Slide 19.5 Slide 19.6 Slide 19.7 Slide 19.8<br />

Slide 19.9 Slide 19.10 Slide 19.11 Slide 19.12<br />

Slide 19.13 Slide 19.14 Slide 19.15 Slide 19.16<br />

Slide 19.17 Slide 19.18 Slide 19.19 Slide 19.20


274 CHAPTER 19. PIPELINES


Chapter 20<br />

Image Representation<br />

The main goal of this chapter is to briefly describe some of the most common graphic file <strong>for</strong>mats<br />

<strong>for</strong> image files, as well as how to determine which file <strong>for</strong>mat to use <strong>for</strong> certain applications.<br />

When an image is saved to a specific file <strong>for</strong>mat, one tells the application how to write the image’s<br />

in<strong>for</strong>mation to disk. The specific file <strong>for</strong>mat which is chosen depends on the graphics software<br />

application one is using (e.g., Illustrator, Freeh<strong>and</strong>, Photoshop) <strong>and</strong> how <strong>and</strong> where the image will<br />

be used (e.g., the Web or a print publication).<br />

There are three different categories of file <strong>for</strong>mats: bitmap, vector <strong>and</strong> metafiles. When an image<br />

is stored as a bitmap file, its in<strong>for</strong>mation is stored as a pattern of pixels, or tiny, colored or black<br />

<strong>and</strong> white dots. When an image is stored as a vector file, its in<strong>for</strong>mation is stored as mathematical<br />

data. The metafile <strong>for</strong>mat can store an image’s in<strong>for</strong>mation as pixels (i.e. bitmap), mathematical<br />

data (i.e., vector), or both.<br />

20.1 Definition of Terms<br />

20.1.1 Transparency<br />

Transparency is the degree of visibility of a pixel against a fixed background. A totally transparent<br />

pixel is invisible. Normal images are opaque, in the sense that no provision is made to allow the<br />

manipulation <strong>and</strong> display of multiple overlaid images. To allow image overlay, some mechanism<br />

must exist <strong>for</strong> the specification of transparency on a per-image, per-strip, per-tile, or per-pixel<br />

bases. In practice, transparency is usually controlled through the addition of in<strong>for</strong>mation to each<br />

element of the pixel data.<br />

The simplest way to allow image overlay is the addition of an overlay bit to each pixel value.<br />

Setting the overlay bit in an area of an image allows the rendering application or output device<br />

to selectively ignore those pixel values with the bit sample.<br />

Another simple way is to reserve one unique color as transparency color, e.g. the background color<br />

of a homogenous background. As all images are usually rectangular - regardless of the contours of<br />

whatever have been drawn within the image - this property of background transparency is useful<br />

<strong>for</strong> concealing image-backgrounds <strong>and</strong> making it appear that they are non rectangular. This<br />

feature is widely used e.g., <strong>for</strong> logos on Web pages.<br />

A more elaborate mechanism <strong>for</strong> specifying image overlays allows variations in transparency between<br />

bottom <strong>and</strong> overlaid images. Instead of having a single bit of overlay in<strong>for</strong>mation, each pixel<br />

value has more (usually eight bits). The eight transparency bits are sometimes called the alpha<br />

channel. The degree of pixel transparency <strong>for</strong> an 8-bit alpha channel ranges from 0 (the pixel is<br />

completely invisible or transparent) to 255 (the pixel is completely visible or opaque).<br />

275


276 CHAPTER 20. IMAGE REPRESENTATION<br />

20.1.2 Compression<br />

This is a new concept not previously discussed in this class, except in the context of encoding<br />

contours of objects. The amount of image data produced from all kinds of sensor, like digital<br />

cameras, remote sensing satellites medical imaging devices, video cameras, increases steadily with<br />

increasing number of sensors, resolution <strong>and</strong> color capabilities. Especially <strong>for</strong> transmission <strong>and</strong><br />

storage of this large amount of image data compression is a big issue.<br />

We separate data compression into two classes, lossless <strong>and</strong> lossy compression. Lossless compression<br />

preserves all in<strong>for</strong>mation present in the original data, the in<strong>for</strong>mation is only stored in an<br />

optimized way. Examples <strong>for</strong> lossless compression are run-length-encoding, where subsequent pixels<br />

of the same color are replaced by one color in<strong>for</strong>mation <strong>and</strong> the number of following identical<br />

pixels, Huffman coding uses codewords of different size instead of the usual strictly 8 or 24 bits,<br />

shorter codewords are assignd to symbols which occur more often, this usually reduces the total<br />

number of bits used to code an image. Compression rates between 2:1 <strong>and</strong> maximum 5:1 can be<br />

achieved using lossless compression.<br />

Lossy compression on the other h<strong>and</strong> removes invisible or only slightly visible in<strong>for</strong>mation from<br />

the image, e.g. only a reduced set of colors is used or high spatial frequencies in the image are<br />

removed. The amount of compression which can be achieved by lossy compression is superior to<br />

lossless compression schemes, at compression rates of 10:1 with no visible difference is feasible, the<br />

quality <strong>for</strong> photographs is usually sufficient after a 20:1 compression. However, the in<strong>for</strong>mation<br />

content is changed by such an operation, there<strong>for</strong>e lossy compressed images are not suitable <strong>for</strong><br />

further image processing stages. We will see exampels of JPEG compressed images further on in<br />

this lecture.<br />

Algorithms 43 <strong>and</strong> ?? illustrate the principles.<br />

Algorithm 43 Pipeline <strong>for</strong> lossless compression<br />

load image;<br />

// find redundancy <strong>and</strong> eliminate redundancy<br />

<strong>for</strong> i = 0 to number of image columns do<br />

<strong>for</strong> j = 0 to number of image rows do<br />

// find out how often each pixel value appears<br />

// (needed <strong>for</strong> the variable-length coding)<br />

<strong>for</strong> pixel value = 0 to 2 b do<br />

histogram[pixel value]++;<br />

end <strong>for</strong><br />

huffman (histogram, image);<br />

// instead of Huffman other procedures can be used that<br />

// produce variable-length code but Huffman leads to<br />

// best compression results<br />

end <strong>for</strong><br />

end <strong>for</strong><br />

save image;<br />

20.1.3 Progressive Coding<br />

Progressive image transmision is based on the fact that transmitting all image data may not be<br />

necessary under some circumstances. Imagine a situation in which an operator is searching an<br />

image database looking <strong>for</strong> a particular image. If the transmission is based on a raster scanning<br />

order, all the data must be transmitted to view the whole image, but often it is not necessary to<br />

have the highest possible image quality to find the image <strong>for</strong> which the operator is looking. Images


20.1. DEFINITION OF TERMS 277<br />

Algorithm 44 Pipeline <strong>for</strong> lossy compression<br />

load image;<br />

// find irrelevancy like high frequencies <strong>and</strong><br />

// eliminate them<br />

split image in nxn subimages;<br />

// a common value <strong>for</strong> n is 8 or 16<br />

trans<strong>for</strong>m in frequency domain;<br />

cut off high frequencies;<br />

// find redundancy <strong>and</strong> eliminate redundancy<br />

<strong>for</strong> i = 0 to number of image columns do<br />

<strong>for</strong> j = 0 to number of image rows do<br />

// find out how often each pixel value appears<br />

// (needed <strong>for</strong> the variable-length coding)<br />

<strong>for</strong> pixel value = 0 to 2 b do<br />

histogram[pixel value]++;<br />

end <strong>for</strong><br />

huffman (histogram, image);<br />

// instead of Huffman other procedures can be used that<br />

// produce variable-length code but Huffman leads to<br />

// best compression results<br />

end <strong>for</strong><br />

end <strong>for</strong><br />

save image;<br />

do not have to be displayed with the hightest available resolution, <strong>and</strong> lower resolution may be<br />

sufficient to reject an image <strong>and</strong> to begin displaying another one. This approach is also commonly<br />

used to decrease the waiting time needed <strong>for</strong> the image to start appearing after transmission <strong>and</strong><br />

is used by WWW image transmission.<br />

In progressive transmissions, the images are represented in a pyramid structure, the higher pyramid<br />

levels (lower resolution) being transmitted first. The number of pixels representing a lowerresolution<br />

image is substantially smaller <strong>and</strong> thus the user can decide from lower resolution images<br />

whether further image refinement is needed.<br />

20.1.4 Animation<br />

A sequence of two or more images displayed in a rapid sequence so as to provide the illusion of<br />

continuous motion. Animations are typically played back at a rate of 12 to 15 frames per second.<br />

20.1.5 Digital Watermarking<br />

A digital watermark is a digital signal or pattern inserted into a digital image. Since this signal<br />

or pattern is present in each unaltered copy of the original image, the digital watermark may<br />

also serve as a digital signature <strong>for</strong> the copies. A given watermark may be unique to each copy<br />

(e.g., to identify the intended recipient), or be common to multiple copies (e.g., to identify the<br />

document source). In either case, the watermarking of the document involves the trans<strong>for</strong>mation<br />

of the original into another <strong>for</strong>m.<br />

Unlike encryption, digital watermarking leaves the original image or (or file) basically intact <strong>and</strong><br />

recognizable. In addition, digital watermarks, as signatures, may not be validated without special<br />

software. Further, decrypted documents are free of any residual effects of encryption, whereas


278 CHAPTER 20. IMAGE REPRESENTATION<br />

digital watermarks are designed to be persistent in viewing, printing, or subsequent re-transmission<br />

or dissemination.<br />

Two types of digital watermarks may be distinguished, depending upon whether the watermark<br />

appears visible or invisible to the casual viewer. Visible watermarks Slide ?? are used in much<br />

the same way as their bond paper ancestors. One might view digitally watermarked documents<br />

<strong>and</strong> images as digitally ”stamped”.<br />

Invisible watermarks Slide ??, on the other h<strong>and</strong>, are potentially useful as a means of identifying<br />

the source, author, creator, owner, distributor or authorized consumer of a document or image.<br />

For this purpose, the objective is to permanently <strong>and</strong> unalterably mark the image so that the credit<br />

or assignment is beyond dispute. In the event of illicit usage, the watermark would facilitate the<br />

claim of ownership, or the receipt of copyright revenues.<br />

20.2 Common Image File Formats<br />

Following are descriptions of some commonly used file <strong>for</strong>mats:<br />

20.2.1 BMP: Microsoft Windows Bitmap<br />

The bitmap file <strong>for</strong>mat is used <strong>for</strong> bitmap graphics on the Windows plat<strong>for</strong>m only. Unlike other<br />

file <strong>for</strong>mats, which store image data from top to bottom <strong>and</strong> pixels in red/green/blue order, the<br />

BMP <strong>for</strong>mat stores image data from bottom to top <strong>and</strong> pixels in blue/green/red order. This<br />

means that if memory is tight, BMP graphics will sometimes appear drawn from bottom to top.<br />

Compression of BMP files is not supported, so they are usually very large.<br />

20.2.2 GIF: <strong>Graphics</strong> Interchange Format<br />

The <strong>Graphics</strong> Interchange Format was originally developed by CompuServe in 1987. It is one of<br />

the most popular file <strong>for</strong>mats <strong>for</strong> Web graphics <strong>for</strong> exchanging graphics files between computers.<br />

It is most commonly used <strong>for</strong> bitmap images composed of line drawings or blocks of a few distinct<br />

colors. The GIF <strong>for</strong>mat supports 8 bits of color in<strong>for</strong>mation or less. There<strong>for</strong>e it is not suitiable<br />

<strong>for</strong> photographs. In addition, the GIF89a file <strong>for</strong>mat supports transparency, allowing you to<br />

make a color in your image transparent. (Please note: CompuServe Gif(87) does not support<br />

transparency). This feature makes GIF a particularly popular <strong>for</strong>mat <strong>for</strong> Web images.<br />

When to use GIF Use the GIF file <strong>for</strong>mat <strong>for</strong> images with only a few distinct colors, such<br />

as illustrations, cartoons, <strong>and</strong> images with blocks of color, such as icons, buttons, <strong>and</strong> horizontal<br />

rules.<br />

GIF, like JPEG, is a “lossy” file <strong>for</strong>mat! It reduces an image’s file size by removing bits of<br />

color in<strong>for</strong>mation during the conversion process. The GIF <strong>for</strong>mat supports 256 colors or less.<br />

When creating images <strong>for</strong> the Web, be aware that only 216 colors are shared between Macintosh<br />

<strong>and</strong> Windows monitors. These colors, called the “Web palette,” should be used when creating<br />

GIFs <strong>for</strong> the Web bec<strong>aus</strong>e colors that are not in this palette display differently on Macintosh <strong>and</strong><br />

Windows monitors. The restriction to only 256 colors is the reason why GIF is not siutable <strong>for</strong><br />

color photographs.


20.2. COMMON IMAGE FILE FORMATS 279<br />

20.2.3 PICT: Picture File Format<br />

The Picture file <strong>for</strong>mat is <strong>for</strong> use primarily on the Macintosh plat<strong>for</strong>m; it is the default <strong>for</strong>mat<br />

<strong>for</strong> Macintosh image files. The PICT <strong>for</strong>mat is most commonly used <strong>for</strong> bitmap images, but can<br />

be used <strong>for</strong> vector images was well. Avoid using PICT images <strong>for</strong> print publishing. The PICT<br />

<strong>for</strong>mat is “lossless,” meaning it does not remove in<strong>for</strong>mation from the original image during the<br />

file <strong>for</strong>mat conversion process. Bec<strong>aus</strong>e the PICT <strong>for</strong>mat supports only limited compression on<br />

Macintoshes with QuickTime installed, PICT files are usually large. When saving an image as a<br />

PICT, add the extension “.pct” to the end of its file name. Use the PICT <strong>for</strong>mat <strong>for</strong> images used<br />

in video editing, animations, desktop computer presentations, <strong>and</strong> multimedia authoring.<br />

20.2.4 PNG: Portable Network <strong>Graphics</strong><br />

The Portable Network <strong>Graphics</strong> <strong>for</strong>mat was developed to be the successor to the GIF file <strong>for</strong>mat.<br />

PNG is not yet widely supported by most Web browsers; Netscape versions 4.04 <strong>and</strong> later <strong>and</strong><br />

Internet Explorer version 4.0b1 <strong>and</strong> later currently support this file <strong>for</strong>mat. However, PNG is<br />

expected to become a mainstream <strong>for</strong>mat <strong>for</strong> Web images <strong>and</strong> could replace GIF entirely. It is<br />

plat<strong>for</strong>m independent <strong>and</strong> should be used <strong>for</strong> single images only (not animations). Compared<br />

with GIF, PNG offers greater color support, better compression, gamma correction <strong>for</strong> brightness<br />

control across plat<strong>for</strong>ms, better support <strong>for</strong> transparency (alpha channel), <strong>and</strong> a better method<br />

<strong>for</strong> displaying progressive images.<br />

20.2.5 RAS: Sun Raster File<br />

The Sun Raster image file <strong>for</strong>mat is the native bitmap <strong>for</strong>mat of the SUN Microsystems UNIX<br />

plat<strong>for</strong>ms using the SunOS operating system. This <strong>for</strong>mat is capable of storing black-<strong>and</strong>-white,<br />

gray-scale, <strong>and</strong> color bitmapped data of any pixel depth. The use of color maps <strong>and</strong> a simple<br />

Run-Length data compression are supported. Typically, most images found on a SunOS system<br />

are Sun Raster images, <strong>and</strong> this <strong>for</strong>mat is supported by most UNIX imaging applications.<br />

20.2.6 EPS: Encapsulated PostScript<br />

The Encapsulated PostScript file <strong>for</strong>mat is a metafile <strong>for</strong>mat; it can be used <strong>for</strong> vector images or<br />

bitmap images. The EPS file <strong>for</strong>mat can be used on a variety of plat<strong>for</strong>ms, including Macintosh<br />

<strong>and</strong> Windows. When you place an EPS image into a document, you can scale it up or down<br />

without in<strong>for</strong>mation loss. This <strong>for</strong>mat contains PostScript in<strong>for</strong>mation <strong>and</strong> should be used when<br />

printing to a PostScript output device. The PostScript language , which was developed by Adobe,<br />

is the industry st<strong>and</strong>ard <strong>for</strong> desktop publishing software <strong>and</strong> hardware. EPS files can be graphics<br />

or images of whole pages that include text, font, graphic, <strong>and</strong> page layout in<strong>for</strong>mation.<br />

20.2.7 TIFF: Tag Interchange File Format<br />

The Tag Interchange File Format is a tag-based <strong>for</strong>mat that was developed <strong>and</strong> maintained by<br />

Aldus (now Adobe). TIFF, which is used <strong>for</strong> bitmap images, is compatible with a wide range of<br />

software applications <strong>and</strong> can be used across plat<strong>for</strong>ms such as Macintosh, Windows, <strong>and</strong> UNIX.<br />

The TIFF <strong>for</strong>mat is complex, so TIFF files are generally larger than GIF or JPEG files. TIFF<br />

supports lossless LZW (Lempel-Ziv-Welch) compression ; however, compressed TIFFs take longer<br />

to open. When saving a file to the TIFF <strong>for</strong>mat, add the file extension “.tif” to the end of its file<br />

name.


280 CHAPTER 20. IMAGE REPRESENTATION<br />

20.2.8 JPEG: Joint Photographic Expert Group<br />

Like GIF, the Joint Photographic Experts Group <strong>for</strong>mat is one of the most popular <strong>for</strong>mats <strong>for</strong><br />

Web graphcis. It supports 24 bits of color in<strong>for</strong>mation, <strong>and</strong> is most commonly used <strong>for</strong> photographs<br />

<strong>and</strong> similar continous-tone bitmap images. The JPEG file <strong>for</strong>mat stores all of the color in<strong>for</strong>mation<br />

in an RGB image, then reduces the file size by compressing it, or saving only the color in<strong>for</strong>mation<br />

that is essential to the image. Most imaging applications <strong>and</strong> plug-ins let you determine the<br />

amount of compression used when saving a graphic in JPEG <strong>for</strong>mat. Unlike GIF, JPEG does not<br />

support transparency.<br />

When to use JPEG? JPEG uses a “lossy” compression technique, which changes the original<br />

image by removing in<strong>for</strong>mation during the conversion process. In theory, JPEG was designed<br />

especially <strong>for</strong> photographs so that changes made to the orginal image during conversion to JPEG<br />

would not be visible to the human eye. Most imaging applications let you control the amount of<br />

lossy compression per<strong>for</strong>med on an image, so you can tade off image quality <strong>for</strong> smaller file size<br />

<strong>and</strong> vice versa. Be aware that the chances of degrading our image when converting it to JPEG<br />

increase proportionally with the amount of compression you use.<br />

JPEG is superior to GIF <strong>for</strong> storing full-color or grayscale images of “realistic” scenes, or images<br />

with continouos variation in color. For example, use JPEG <strong>for</strong> scanned photographs <strong>and</strong> naturalistic<br />

artwork with hightlights, shaded areas, <strong>and</strong> shadows. The more complex <strong>and</strong> subtly rendered<br />

the image is, the more likeley it is that the image should be converted to JPEG.<br />

Do not use JPEG <strong>for</strong> illustrations, cartoons, lettering, or any images that have very sharp edges<br />

(e.g., a row of black pixels adjacent to a row of white pixels). Sharp edges in images tend to<br />

blur in JPEG unless you use only a small amount of compression when converting the image.<br />

The JPEG data compression is being illustrated with an original image shown in Slide ??. We<br />

have an input parameter into a JPEG compression scheme that indicates how many coefficients<br />

one is carrying along. This is expressed by a percentage. Slide ?? shows 75% of the coefficients,<br />

leading to a 15:1 compression of that particular image. We go on to 50% of the coefficients in<br />

Slide ?? <strong>and</strong> 20% in Slide ??. We can appreciate the effect of the compression on the image<br />

by comparing a enlarged segment of the original image with a similarly enlarged segment of the<br />

de-compressed JPEG-image. Note how the decompression reveals that we have contaminated the<br />

image, bec<strong>aus</strong>e objects radiate out under the effect of the <strong>for</strong>ward trans<strong>for</strong>m that cannot fully be<br />

undone by an inverse trans<strong>for</strong>m using a reduced set of coefficients. The effect of the compression<br />

<strong>and</strong> the resulting contamination of the image is larger as we use fewer <strong>and</strong> fewer coefficients of<br />

the trans<strong>for</strong>m as shown in Slide ?? <strong>and</strong> Slide ??. The effect of the compression can be shown<br />

by computing a difference image of just the intensity component (black <strong>and</strong> white component) as<br />

shown in Slide ??, Slide ??, <strong>and</strong> Slide ??.<br />

The basic principle of JPEG compression is illustrated in Algorithm 45.<br />

Prüfungsfragen:<br />

• Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern?<br />

20.3 Video File Formats: MPEG<br />

Slide ?? illustrates the basic idea of the MPEG-1 st<strong>and</strong>ard <strong>for</strong> the compression of movies. MPEG<br />

st<strong>and</strong>s <strong>for</strong> Motion Picture Expert Group. Note that the MPEG approach takes key frames <strong>and</strong><br />

compresses them individually as shown as image frames I in Slide ??. Slides P get interpolated between<br />

frames I. Frames are then further interpolated using the frames P . Fairly large compression<br />

rates can be achieved of 200:1. This leads to the ability of showing movies on laptop computers at


20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG 281<br />

Algorithm 45 JPEG image compression<br />

1: divide the picture into blocks of 8x8 pixels<br />

2: <strong>for</strong> all blocks do<br />

3: trans<strong>for</strong>m the block by DCT-II methode<br />

4: <strong>for</strong> all values in the block do<br />

5: quantize the value dependent from the position in the block {high frequencies are less<br />

important}<br />

6: end <strong>for</strong><br />

7: reorder the values in a zic-zac way {DC value of block is replaced by difference to DC value<br />

of previous block}<br />

8: per<strong>for</strong>m a run-length encoding of the quantized values<br />

9: compress the resulting bytes with Huffmann coding<br />

10: end <strong>for</strong><br />

this time. Slide ?? explains that the requirements <strong>for</strong> the st<strong>and</strong>ard, as they are defined, includes<br />

the need to have the ability to play backwards <strong>and</strong> <strong>for</strong>wards, to compress time, to support fast<br />

motions <strong>and</strong> rapid changes of scenes, <strong>and</strong> to r<strong>and</strong>omly access any part of the movie.<br />

The basic principle of MPEG compression is illustrated in Algorithm 46.<br />

Prüfungsfragen:<br />

• Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche<br />

Kompressionsraten können erzielt werden?<br />

20.4 New Image File Formats: Scalable Vector Graphic -<br />

SVG<br />

A Vector graphic differs from a raster graphic in that its content is described by mathematical<br />

statements. The statements instruct a computer’s drawing engine what to display on screen i.e.<br />

pixel in<strong>for</strong>mation <strong>for</strong> a bitmap is not stored in the file <strong>and</strong> loaded into the display device as it is<br />

in the case of JPEG <strong>and</strong> GIF. Instead shapes <strong>and</strong> lines, their position <strong>and</strong> direction, colours <strong>and</strong><br />

gradients are drawn. Vector graphics files contain instructions <strong>for</strong> the rasterisation of graphics<br />

as the statements arrive at the viewer’s browser - ’on the fly’. Vector graphics are resolution<br />

independent. That is, they can be enlarged as much as required with no loss of quality as there<br />

is no raster type image to enlarge <strong>and</strong> pixelate. A vector graphic will always display at the best<br />

quality that the output device is set to. When printing out a vector graphic from a Web page it<br />

will print at the printer’s optimum resolution i.e. without ’jaggies’.<br />

Until recently only proprietary <strong>for</strong>mats such as Macromedia Flash or Apple’s QuickTime have<br />

allowed Web designers to create <strong>and</strong> animate vector graphics <strong>for</strong> the Web. That is going to<br />

change with the implementation of SVG (Scalable Vector <strong>Graphics</strong>).<br />

SVG is the st<strong>and</strong>ard, based on XML (Extensible Mark-up Language), which is currently undergoing<br />

development by the W3C consortium.<br />

An SVG file is itself comprised of text, that is the drawing engine instructions within it are<br />

written in ordinary text <strong>and</strong> not the binary symbols 1 <strong>and</strong> 0. The file can there<strong>for</strong>e be edited in an<br />

application no more complicated than a plain text editor, unlike raster graphics which have to be<br />

opened in image editing applications where pixel values are changed with the use of the program’s<br />

tools. If the appearance of a vector graphic is required to change in the Web browser, then the<br />

text file is edited via:


282 CHAPTER 20. IMAGE REPRESENTATION<br />

Algorithm 46 MPEG compression pipeline<br />

1: Open MPEG stream {Encoder, not specified as part of MPEG st<strong>and</strong>ard. Subject to various<br />

implementation dependant enhancements.}<br />

2: Close MPEG stream<br />

3: Open MPEG stream {Decoder}<br />

4: <strong>for</strong> all PictureGroups in MPEG stream do<br />

5: <strong>for</strong> all Pictures in PictureGroup do<br />

6: <strong>for</strong> all Slices in Picture do<br />

7: <strong>for</strong> all MacroBlock in Slice do<br />

8: <strong>for</strong> all Blocks in MacroBlock do {all I,P,B pictures}<br />

9: Variable Length Decoder {Huffman with fixed DC Tables}<br />

10: Inverse Quantizer<br />

11: Inverse ZigZag<br />

12: Inverse Diskrete Cosine Trans<strong>for</strong>mation {IDCT}<br />

13: end <strong>for</strong><br />

14: end <strong>for</strong><br />

15: end <strong>for</strong><br />

16: if Picture != I then {interpolated pictures P <strong>and</strong> B}<br />

17: average +1/2 interpolation<br />

18: new-Picture = IDCT-Picture + interpolated-Picture<br />

19: else<br />

20: new-Picture is ready<br />

21: end if<br />

22: Dither new-Picture <strong>for</strong> display<br />

23: display new-Picture<br />

24: end <strong>for</strong><br />

25: end <strong>for</strong><br />

26: Close MPEG stream


20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG 283<br />

• Editing the graphic in an SVG compliant drawing application (e.g. Adobe Illustrator 9)<br />

• Editing the text of which the file is comprised in a text editor<br />

• The actions of the viewer in the Web browser - clicking the mouse which triggers a script<br />

which changes the text in the vector file<br />

As the files are comprised of text the images themselves can be dynamic. For instance CGI <strong>and</strong><br />

PERL can generate images <strong>and</strong> animation based on user choices made in the browser. SVG<br />

graphics can be used to dynamically (in real time) render database in<strong>for</strong>mation, change their<br />

appearance, <strong>and</strong> respond to user input <strong>and</strong> subsequent database queries.<br />

As the SVG st<strong>and</strong>ard is based on XML it is fully compatible with existing Web st<strong>and</strong>ards such as<br />

HTML (HyperText Mark Up Language), CSS (Cascading Style Sheets), DOM (Document Object<br />

Model), JavaScript <strong>and</strong> CGI (Common Gateway Interface) etc.<br />

The SVG <strong>for</strong>mat supports 24-bit colour, ICC color profiles <strong>for</strong> colour management, pan, zoom,<br />

gradients <strong>and</strong> masking <strong>and</strong> other features. Type rendered as SVG will look smoother <strong>and</strong> attributes<br />

such as kerning (spacing between characters), paths (paths along which type is run) <strong>and</strong> ligatures<br />

(where characters are joined together) are as controllable as in DTP <strong>and</strong> drawing applications.<br />

Positioning of SVG graphics in the Web browser window will be achieved with the use of CCS<br />

(Cascading Style Sheets) which are part of the HTML 4 st<strong>and</strong>ard.


284 CHAPTER 20. IMAGE REPRESENTATION


Appendix A<br />

Algorithmen und Definitionen<br />

Algorithmus 1: Affines Matching (siehe Abschnitt 0.6)<br />

Definition 2: Modellieren einer Panoramakamera (siehe Abschnitt 0.15)<br />

Definition 3: Berechnung der Datenmenge eines Bildes (siehe Abschnitt 1.2)<br />

Algorithmus 4: Bildvergrößerung (Raster vs. Vektor) (siehe Abschnitt 1.5)<br />

Definition 5: Berechnung der Nachbarschaftspixel (siehe Abschnitt 1.6)<br />

Definition 6: Berechnung des Zusammenhanges (siehe Abschnitt 1.6)<br />

Definition 7: Berechnung der Distanz zwischen zwei Pixeln (siehe Abschnitt 1.6)<br />

Algorithmus 8: Berechnung logischer Maskenoperationen (siehe Abschnitt 1.7)<br />

Algorithmus 9: Berechnung schneller Maskenoperationen (siehe Abschnitt 1.7)<br />

Definition 10: Modellierung einer perspektiven Kamera (siehe Abschnitt 2.2)<br />

Algorithmus 11: DDA einer Geraden (siehe Abschnitt 3.1)<br />

Algorithmus 12: Bresenham einer Geraden (siehe Abschnitt 3.1)<br />

Algorithmus 13: Füllen eines Polygons (siehe Abschnitt 3.2)<br />

Algorithmus 14: Zeichnen dicker Linien (siehe Abschnitt 3.3)<br />

Definition 15: Skelettberechnung via MAT (siehe Abschnitt 3.4)<br />

Definition 16: Translation (siehe Abschnitt 4.1)<br />

Definition 17: Reflektion (siehe Abschnitt 4.1)<br />

Definition 18: Komplement (siehe Abschnitt 4.1)<br />

Definition 19: Differenz (siehe Abschnitt 4.1)<br />

Algorithmus 20: Dilation (siehe Abschnitt 4.2)<br />

Definition 21: Erosion (siehe Abschnitt 4.2)<br />

Definition 22: Öffnen (siehe Abschnitt 4.3)<br />

Definition 23: Schließen (siehe Abschnitt 4.3)<br />

Definition 24: Filtern (siehe Abschnitt 4.4)<br />

Definition 25: Hit oder Miss (siehe Abschnitt 4.5)<br />

285


286 APPENDIX A. ALGORITHMEN UND DEFINITIONEN<br />

Definition 26: Umriss (siehe Abschnitt 4.6)<br />

Definition 27: Regionenfüllung (siehe Abschnitt 4.6)<br />

Algorithmus 28: Herstellung von Halbtonbildern (siehe Abschnitt 5.1)<br />

Definition 29: Farbtrans<strong>for</strong>mation in CIE (siehe Abschnitt 5.3)<br />

Definition 30: Farbtrans<strong>for</strong>mation in CMY (siehe Abschnitt 5.6)<br />

Definition 31: Farbtrans<strong>for</strong>mation in CMYK (siehe Abschnitt 5.7)<br />

Algorithmus 32: HSV-HSI-HLS-RGB (siehe Abschnitt 5.8)<br />

Definition 33: YIK-RGB (siehe Abschnitt 5.9)<br />

Algorithmus 34: Umw<strong>and</strong>lung von Negativ- in Positivbild (siehe Abschnitt 5.14)<br />

Algorithmus 35: Bearbeitung eines Masked Negative (siehe Abschnitt 5.14)<br />

Algorithmus 36: Berechnung eines Ratiobildes (siehe Abschnitt 5.16)<br />

Definition 37: Umrechnung lp/mm in Pixelgröße (siehe Abschnitt 6.4)<br />

Algorithmus 38: Berechnung eines Histogrammes (siehe Abschnitt 6.6)<br />

Algorithmus 39: Äquidistanzberechnung (siehe Abschnitt 6.6)<br />

Definition 40: Spreizen des Histogrammes (siehe Abschnitt 6.6)<br />

Algorithmus 41: Örtliche Histogrammäqualisierung (siehe Abschnitt 6.6)<br />

Algorithmus 42: Differenzbild (siehe Abschnitt 6.6)<br />

Algorithmus 43: Schwellwertbildung (siehe Abschnitt 7)<br />

Definition 44: Kontrastspreitzung (siehe Abschnitt 7)<br />

Definition 45: Tiefpassfilter mit 3 × 3 Fenster (siehe Abschnitt 7.2)<br />

Algorithmus 46: Medianfilter (siehe Abschnitt 7.2)<br />

Algorithmus 47: Faltungsberechnung (siehe Abschnitt 7.3)<br />

Definition 48: USM Filter (siehe Abschnitt 7.4)<br />

Definition 49: Allgemeines 3 × 3 Gradientenfilter (siehe Abschnitt 7.5)<br />

Definition 50: Roberts-Filter (siehe Abschnitt 7.5)<br />

Definition 51: Prewitt-Filter (siehe Abschnitt 7.5)<br />

Definition 52: Sobel-Filter (siehe Abschnitt 7.5)<br />

Algorithmus 53: Berechnung eines gefilterten Bildes im Spektralbereich (siehe Abschnitt 7.6)<br />

Algorithmus 54: Ungewichtetes Antialiasing (siehe Abschnitt 7.9)<br />

Algorithmus 55: Gewichtetes Antialiasing (siehe Abschnitt 7.9)<br />

Algorithmus 56: Gupte-Sproull-Antialiasing (siehe Abschnitt 7.9)<br />

Definition 57: Statistische Texturberechnung (siehe Abschnitt 8.2)<br />

Definition 58: Berechnung eines spektralen Texturmasses (siehe Abschnitt 8.4)<br />

Algorithmus 59: Aufbringen einer Textur (siehe Abschnitt 8.5)<br />

Definition 60: Berechnung einer linearen Trans<strong>for</strong>mation in 2D (siehe Abschnitt 9.2)<br />

Definition 61: Kon<strong>for</strong>me Trans<strong>for</strong>mation (siehe Abschnitt 9.3)<br />

Definition 62: Modellierung einer Drehung in 2D (siehe Abschnitt 9.4)


287<br />

Definition 63: Aufbau einer 2D Drehmatrix bei gegebenen Koordinatenachsen (siehe Abschnitt<br />

9.4)<br />

Definition 64: Rückdrehung in 2D (siehe Abschnitt 9.4)<br />

Definition 65: Aufein<strong>and</strong>erfolgende Drehungen (siehe Abschnitt 9.4)<br />

Definition 66: Affine Trans<strong>for</strong>mation in 2D in homogenen Koordinaten (siehe Abschnitt 9.5)<br />

Definition 67: Affine Trans<strong>for</strong>mation in 2D in kartesischen Koordinaten (siehe Abschnitt 9.5)<br />

Definition 68: Allgemeine Trans<strong>for</strong>mation in 2D (siehe Abschnitt 9.6)<br />

Algorithmus 69: Berechnung unbekannter Trans<strong>for</strong>mationsparameter (siehe Abschnitt 9.6)<br />

Algorithmus 70: Cohen Sutherl<strong>and</strong> (siehe Abschnitt 9.8)<br />

Definition 71: Aufbau einer homogenen Trans<strong>for</strong>mationsmatrix in 2D (siehe Abschnitt 9.9)<br />

Definition 72: 3D Drehung (siehe Abschnitt 9.10)<br />

Definition 73: 3D affine Trans<strong>for</strong>mation in homogenen Koordinaten (siehe Abschnitt 9.11)<br />

Definition 74: Bezier-Kurven in 2D (siehe Abschnitt 9.20)<br />

Algorithmus 75: Casteljau (siehe Abschnitt 9.21)<br />

Algorithmus 76: Berechnung einer Kettenkodierung (siehe Abschnitt 10.1)<br />

Algorithmus 77: Splitting (siehe Abschnitt 10.2)<br />

Definition 78: Parameterdarstellung einer Geraden für 2D Morphing (siehe Abschnitt 10.3)<br />

Algorithmus 79: Aufbau eines Quadtrees (siehe Abschnitt 10.5)<br />

Definition 80: Aufbau einer Wireframestruktur (siehe Abschnitt 10.8)<br />

Definition 81: Aufbau einer B-Rep-Struktur (siehe Abschnitt 10.12)<br />

Definition 82: Aufbau einer ”<br />

Cell“-Struktur (siehe Abschnitt 10.14)<br />

Algorithmus 83: Aufbau einer BSP-Struktur (siehe Abschnitt 10.14)<br />

Algorithmus 84: z-Buffering für eine Octree-Struktur (siehe Abschnitt 11.5)<br />

Algorithmus 85: Raytracing für eine Octree-Struktur (siehe Abschnitt 11.6)<br />

Definition 86: Ambient Beleuchtung (siehe Abschnitt 12.1)<br />

Definition 87: Lambert Modell (siehe Abschnitt 12.1)<br />

Algorithmus 88: Gouraud (siehe Abschnitt 12.2)<br />

Algorithmus 89: Phong (siehe Abschnitt 12.2)<br />

Algorithmus 90: Objektgenaue Schattenberechnung (siehe Abschnitt 12.3)<br />

Algorithmus 91: Bildgenaue Schattenberechnung (siehe Abschnitt 12.3)<br />

Algorithmus 92: Radiosity (siehe Abschnitt 12.6)<br />

Definition 93: Berechnung der Binokularen Tiefenschärfe (siehe Abschnitt 13.1)<br />

Definition 94: Berechnung der totalen Plastik (siehe Abschnitt 13.2)<br />

Algorithmus 95: Berechnung eines Stereomatches (siehe Abschnitt 13.7)<br />

Definition 96: LoG Filter als Vorbereitung auf Stereomatches (siehe Abschnitt 13.7)<br />

Algorithmus 97: Aufbau eines Merkmalsraums (siehe Abschnitt 14.3)<br />

Algorithmus 98: Pixelzuteilung zu einer Klasse ohne Rückweisung (siehe Abschnitt 14.4)


288 APPENDIX A. ALGORITHMEN UND DEFINITIONEN<br />

Algorithmus 99: Pixelzuteilung zu einer Klasse mit Rückweisung (siehe Abschnitt 14.4)<br />

Algorithmus 100: Zuteilung eines Merkmalsraumes mittels Trainingspixeln (siehe Abschnitt 14.6)<br />

Algorithmus 101: Berechnung einer Knotendatei (siehe Abschnitt 15.3)<br />

Algorithmus 102: Berechnung eines nächsten Nachbars (siehe Abschnitt 15.4)<br />

Algorithmus 103: Berechnung eines bilinear interpolierten Grauwerts (siehe Abschnitt 15.4)<br />

Algorithmus 104: Bilddrehung (siehe Abschnitt 15.5)<br />

Algorithmus 105: z-Buffer Pipeline (siehe Abschnitt 19.2)<br />

Algorithmus 106: Phong-Pipeline (siehe Abschnitt 19.2)<br />

Algorithmus 107: Kompressionspipeline (siehe Abschnitt 20.1.2)<br />

Algorithmus 108: JPEG Pipeline (siehe Abschnitt 20.2.8)<br />

Algorithmus 109: MPEG Pipeline (siehe Abschnitt 20.3)


Appendix B<br />

Fragenübersicht<br />

B.1 Gruppe 1<br />

• Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“. Was ist<br />

”<br />

darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells? [#0001]<br />

(Frage I/8 14. April 2000)<br />

• Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle<br />

”<br />

Arten von Nachbarschaften auf, die in der Vorlesung beh<strong>and</strong>elt wurden, und beschreiben Sie<br />

diese Nachbarschaften mittels je einer Skizze. [#0003]<br />

(Frage I/9 14. April 2000, Frage I/1 9. November 2001)<br />

• Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus<br />

gegenüber dem DDA-Algorithmus. [#0006]<br />

(Frage I/5 11. Mai 2001, Frage 7 20. November 2001)<br />

• Erläutern Sie die morphologische ”<br />

Erosion“ unter Verwendung einer Skizze und eines Formel<strong>aus</strong>druckes.<br />

[#0007]<br />

(Frage I/2 14. April 2000)<br />

• Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer<br />

Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften<br />

sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen,<br />

und welche den Schnittpunkten der Geraden durch A, B mit dem R<strong>and</strong> des CIE-Farbraumes?<br />

[#0012]<br />

(Frage I/3 14. April 2000)<br />

• Zu welchem Zweck würde man als Anwender ein sogenanntes ”<br />

Ratio-Bild“ herstellen? Verwenden<br />

Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes.<br />

[#0015]<br />

(Frage I/4 14. April 2000)<br />

• Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit<br />

welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine<br />

Skizze. [#0017]<br />

(Frage I/10 14. April 2000)<br />

289


290 APPENDIX B. FRAGENÜBERSICHT<br />

• Eines der populärsten Filter heißt ”<br />

Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte<br />

um eine einfache <strong>for</strong>melmäßige Erläuterung. [#0021]<br />

(Frage I/11 14. April 2000)<br />

• In der Vorlesung wurde ein ”<br />

Baum“ für die Hierarchie diverser Projektionen in die Ebene<br />

dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden<br />

Projektionen. [#0026]<br />

(Frage I/12 14. April 2000)<br />

• Wozu dient das sogenannte ”<br />

photometrische Stereo“? Und was ist die Grundidee, die diesem<br />

Verfahren dient? [#0033]<br />

(Frage I/5 14. April 2000, Frage I/1 28. September 2001)<br />

• Was ist eine einfache Realisierung der ”<br />

Spiegelreflektion“ (engl.: specular reflection) bei<br />

der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den<br />

Namen eines Verfahrens nach seinem Erfinder. [#0034]<br />

(Frage I/6 14. April 2000, Frage I/6 28. September 2001, Frage I/6 1. Februar 2002)<br />

• Welche ist die wesentliche Abgrenzung zwischen <strong>Computer</strong>grafik und Bildanalyse, welches<br />

ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung<br />

erwünscht. [#0041]<br />

(Frage I/1 14. April 2000)<br />

• Was bedeuten die Begriffe ”<br />

geometrische“ bzw. ”<br />

radiometrische“ Auflösung eines Bildes?<br />

Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen. [#0047]<br />

(Frage I/1 14. Dezember 2001)<br />

• Was versteht man unter ”<br />

Rasterkonversion“, und welche Probleme können dabei auftreten?<br />

[#0058]<br />

(Frage I/1 26. Mai 2000, Frage I/8 15. März 2002)<br />

• Erläutern Sie das morphologische ” Öffnen“ unter Verwendung einer Skizze und eines Formel<strong>aus</strong>druckes.<br />

[#0059]<br />

(Frage I/2 26. Mai 2000, Frage I/4 10. November 2000)<br />

• Erklären Sie das Problem, das bei der Verwendung von ”<br />

einem Pixel breiten“ Linien auftritt,<br />

wenn eine korrekte Intensitätswiedergabe ge<strong>for</strong>dert ist. Welche Lösungsmöglichkeiten gibt<br />

es für dieses Problem? Bitte verdeutlichen Sie Ihre Antwort anh<strong>and</strong> einer Skizze! (Hinweis:<br />

betrachten Sie Linien unterschiedlicher Orientierung!) [#0060]<br />

(Frage I/3 26. Mai 2000)<br />

• Was versteht man unter dem ”<br />

dynamischen Bereich“ eines Mediums zur Wiedergabe bildhafter<br />

In<strong>for</strong>mationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung?<br />

Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen<br />

Bereiches! [#0061]<br />

(Frage I/5 30. Juni 2000, Frage 1 20. November 2001, Frage I/5 15. März 2002)<br />

• Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt<br />

werden? Begründen Sie Ihre Antwort anh<strong>and</strong> einer Skizze! [#0062]<br />

(Frage I/4 26. Mai 2000, Frage I/5 10. November 2000, Frage I/2 9. November 2001, Frage<br />

4 20. November 2001)


B.1. GRUPPE 1 291<br />

• Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er<br />

eingesetzt? [#0063]<br />

(Frage I/5 26. Mai 2000, Frage I/7 10. November 2000, Frage I/11 30. März 2001, Frage I/5<br />

28. September 2001, Frage 3 20. November 2001)<br />

• Erklären Sie die Bedeutung von homogenen Koordinaten für die <strong>Computer</strong>grafik! Welche<br />

Eigenschaften weisen homogene Koordinaten auf? [#0066]<br />

(Frage I/6 26. Mai 2000, Frage 1 15. Jänner 2002)<br />

• Was versteht man unter (geometrischem) ”<br />

Resampling“, und welche Möglichkeiten gibt es,<br />

die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren<br />

anh<strong>and</strong> einer Skizze und ggf. eines Formel<strong>aus</strong>drucks! [#0067]<br />

(Frage I/7 26. Mai 2000, Frage I/6 10. November 2000, Frage I/3 28. September 2001, Frage<br />

I/9 9. November 2001, Frage 6 20. November 2001, Frage 6 15. Jänner 2002)<br />

• Beschreiben Sie mindestens zwei Verfahren, bei denen allein durch Modulation der Oberflächenparameter<br />

(ohne Definition zusätzlicher geometrischer Details) eine realistischere Darstellung<br />

eines vom <strong>Computer</strong> gezeichneten Objekts möglich ist! [#0068]<br />

(Frage I/8 26. Mai 2000)<br />

• Ein dreidimensionaler Körper kann mit Hilfe von Zellen einheitlicher Größe (Würfeln), die in<br />

einem gleichmäßigen Gitter angeordnet sind, dargestellt werden. Beschreiben Sie Vor- und<br />

Nachteile dieser Repräsentations<strong>for</strong>m! Begründen Sie Ihre Antwort ggf. mit einer Skizze!<br />

[#0070]<br />

(Frage I/9 26. Mai 2000)<br />

• Erklären Sie (ohne Verwendung von Formeln) das Prinzip des ”<br />

Radiosity“-Verfahrens zur<br />

Herstellung realistischer Bilder mit dem <strong>Computer</strong>. Welche Art der Lichtinteraktion kann<br />

mit diesem Modell beschrieben werden, und welche kann nicht beschrieben werden? [#0073]<br />

(Frage I/10 26. Mai 2000)<br />

• In der Einführungsvorlesung wurde der Begriff ”<br />

Affine Matching“ verwendet. Wozu dient<br />

das Verfahren, welches dieser Begriff bezeichnet? [#0079]<br />

(Frage I/7 14. April 2000)<br />

• Skizzieren Sie die ”<br />

Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen<br />

Szene mittels z-buffering und Gouraud-shading! [#0082]<br />

(Frage I/10 30. Juni 2000, Frage I/9 10. November 2000)<br />

• Beschreiben Sie den Unterschied zwischen ”<br />

Virtual Reality“ und ”<br />

Augmented Reality“.<br />

Welche Hardware wird in beiden Fällen benötigt? [#0083]<br />

(Frage I/9 30. Juni 2000, Frage I/8 28. September 2001, Frage I/8 14. Dezember 2001, Frage<br />

I/2 15. März 2002)<br />

• Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben<br />

Sie typische Anwendungsfälle beider Methoden! [#0084]<br />

(Frage I/8 30. Juni 2000, Frage I/8 10. November 2000)<br />

• Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels<br />

z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile. [#0086]<br />

(Frage I/7 30. Juni 2000)


292 APPENDIX B. FRAGENÜBERSICHT<br />

• Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 1 2D- oder 3D-<br />

Modellen. Definieren Sie die Objektbeschreibung durch 2 1 2D- bzw. 3D-Modelle mittels Gleichungen<br />

und erläutern Sie in Worten den wesentlichen Unterschied! [#0087]<br />

(Frage I/6 30. Juni 2000, Frage I/6 9. November 2001, Frage I/6 14. Dezember 2001, Frage<br />

5 15. Jänner 2002)<br />

• Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum<br />

auf? Welche Aussagen können über eine Textur anh<strong>and</strong> ihres Spektrums gemacht werden?<br />

[#0093]<br />

(Frage I/4 30. Juni 2000)<br />

• Erklären Sie, unter welchen Umständen ”<br />

Aliasing“ auftritt und was man dagegen unternehmen<br />

kann! [#0094]<br />

(Frage I/3 30. Juni 2000)<br />

• Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in<br />

das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird<br />

das CMYK-Modell verwendet? [#0095]<br />

(Frage I/2 30. Juni 2000, Frage 2 20. November 2001)<br />

• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder<br />

Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? [#0097]<br />

(Frage I/1 30. Juni 2000)<br />

• Definieren Sie den Begriff ”<br />

Kante“. [#0105]<br />

(Frage I/1 13. Oktober 2000)<br />

• Erklären Sie anh<strong>and</strong> einer Skizze den zeitlichen Ablauf des Bildaufb<strong>aus</strong> auf einem Elektronenstrahlschirm!<br />

[#0109]<br />

(Frage I/2 13. Oktober 2000, Frage I/2 1. Februar 2002, Frage I/10 15. März 2002)<br />

• Erklären Sie, wie man mit Hilfe der <strong>Computer</strong>tomografie ein dreidimensionales Volumenmodell<br />

vom Inneren des menschlichen Körpers gewinnt. [#0110]<br />

(Frage I/3 13. Oktober 2000)<br />

• Nennen Sie verschiedene Techniken, um ”<br />

dicke“ Linien (z.B. Geradenstücke oder Kreisbögen)<br />

zu zeichnen. [#0111]<br />

(Frage I/4 13. Oktober 2000, Frage I/1 10. November 2000, Frage I/10 9. November 2001)<br />

• Zum YIQ-Farbmodell:<br />

1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell?<br />

2. Wo wird das YIQ-Farbmodell eingesetzt?<br />

(Frage I/5 13. Oktober 2000)<br />

[#0112]<br />

• Skizzieren Sie die Form des Filterkerns eines G<strong>aus</strong>sschen Tiefpassfilters. Worauf muss man<br />

bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten? [#0115]<br />

(Frage I/6 13. Oktober 2000, Frage I/3 10. November 2000)<br />

• Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an.<br />

[#0116]<br />

(Frage I/7 13. Oktober 2000, Frage I/10 10. November 2000)


B.1. GRUPPE 1 293<br />

• Was versteht man unter einer ”<br />

Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese<br />

Art der Objektrepräsentation? [#0117]<br />

(Frage I/8 13. Oktober 2000, Frage I/2 10. November 2000, Frage 4 15. Jänner 2002)<br />

• Welche physikalischen Merkmale der von einem Körper <strong>aus</strong>ges<strong>and</strong>ten oder reflektierten<br />

Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)?<br />

[#0118]<br />

(Frage I/9 13. Oktober 2000, Frage I/5 14. Dezember 2001)<br />

• Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks,<br />

das zu einer beleuchteten polygonalen Szene gehört. [#0119]<br />

(Frage I/10 13. Oktober 2000)<br />

• Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige<br />

Beispiele für Mehrfachbilder! [#0121]<br />

(Frage I/1 15. Dezember 2000, Frage I/5 9. November 2001, Frage I/3 14. Dezember 2001)<br />

• Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen<br />

(z.B. Fotografien)! [#0122]<br />

(Frage I/2 15. Dezember 2000)<br />

• Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile<br />

bietet dieses Verfahren? [#0123]<br />

(Frage I/3 15. Dezember 2000)<br />

• Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter<br />

Trackingverfahren und erläutern Sie deren Vor- und Nachteile! [#0124]<br />

(Frage I/4 15. Dezember 2000, Frage I/4 1. Februar 2002)<br />

• Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von<br />

Kurven, und erläutern Sie anh<strong>and</strong> einer Skizze ein Approximationsverfahren Ihrer Wahl!<br />

[#0125]<br />

(Frage I/5 15. Dezember 2000, Frage 2 15. Jänner 2002)<br />

• Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit<br />

der ”<br />

cutoff“-Frequenz D 0 an! Skizzieren Sie die Transferfunktion! [#0127]<br />

(Frage I/6 15. Dezember 2000, Frage I/7 14. Dezember 2001)<br />

• Erklären Sie, wie in der Visualisierung die Qualität eines vom <strong>Computer</strong> erzeugten Bildes<br />

durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften<br />

(insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer<br />

Textur eignen. [#0128]<br />

(Frage I/7 15. Dezember 2000)<br />

• Erklären Sie, warum bei der Entzerrung von digitalen Rasterbildern meist Resampling“<br />

”<br />

er<strong>for</strong>derlich ist. Nennen Sie zwei Verfahren zur Grauwertzuweisung für das Ausgabebild!<br />

[#0130]<br />

(Frage I/8 15. Dezember 2000)<br />

• Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch<br />

seine (polygonale) Oberfläche genutzt werden kann! [#0131]<br />

(Frage I/9 15. Dezember 2000, Frage I/2 28. September 2001)


294 APPENDIX B. FRAGENÜBERSICHT<br />

128x128<br />

256x256<br />

512x512<br />

Figure B.1: wiederholte Speicherung eines Bildes in verschieden Größen<br />

• Erklären Sie den Begriff ” Überwachen beim Klassifizieren“. Wann kann man dieses Verfahren<br />

einsetzen? [#0133]<br />

(Frage I/10 15. Dezember 2000)<br />

• Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Trans<strong>for</strong>mationsmatrix (in<br />

zwei Dimensionen) gefragt, die sich <strong>aus</strong> einer Skalierung und einer Rotation um ein beliebiges<br />

Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Trans<strong>for</strong>mation?<br />

Begründen Sie Ihre Antwort! [#0167]<br />

(Frage I/1 2. Februar 2001)<br />

• Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten <strong>aus</strong> digitale Bilder<br />

erzeugen, <strong>aus</strong> welchen ein topografisches Modell des Geländes (ein Höhenmodell) <strong>aus</strong> einer<br />

einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der<br />

elektromagnetischen Strahlung, die für diese Zwecke genutzt werden! [#0169]<br />

(Frage I/2 2. Februar 2001)<br />

• In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das<br />

erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht<br />

nur mehr <strong>aus</strong> einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo<br />

wird sie eingesetzt (nennen Sie mindestens ein Beispiel)? [#0170]<br />

(Frage I/6 2. Februar 2001, Frage I/1 1. Februar 2002)<br />

• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken<br />

gezeigt. Benennen Sie die vier Darstellungstechniken! [#0175]<br />

(Frage I/3 2. Februar 2001)<br />

• In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in<br />

eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder<br />

als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen<br />

werden benötigt? Erläutern Sie Ihre Antwort anh<strong>and</strong> einer beliebigen Strecke<br />

<strong>aus</strong> Abbildung B.3! [#0177]<br />

(Frage I/5 2. Februar 2001)<br />

• Was ist eine ”<br />

3D Textur“? [#0178]<br />

(Frage I/9 2. Februar 2001, Frage I/4 28. September 2001)


B.1. GRUPPE 1 295<br />

• Welche Rolle spielen die sogenannten ”<br />

Passpunkte“ (engl. Control Points) bei der Interpolation<br />

und bei der Approximation von Kurven? Erläutern Sie Ihre Antwort anh<strong>and</strong> einer<br />

Skizze! [#0179]<br />

(Frage I/7 2. Februar 2001)<br />

• Beschreiben Sie eine bilineare Trans<strong>for</strong>mation anh<strong>and</strong> ihrer Definitionsgleichung! [#0180]<br />

(Frage I/11 2. Februar 2001)<br />

• Zählen Sie Fälle auf, wo in der Bildanalyse die Fourier-Trans<strong>for</strong>mation verwendet wird!<br />

[#0184]<br />

(Frage I/8 2. Februar 2001)<br />

• Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern? [#0185]<br />

(Frage I/10 2. Februar 2001, Frage I/9 19. Oktober 2001)<br />

• Geben Sie zu jedem der Darstellungsverfahren <strong>aus</strong> Abbildung B.2 an, welche In<strong>for</strong>mationen<br />

über das Objekt gespeichert werden müssen! [#0187]<br />

(Frage I/4 2. Februar 2001)<br />

• Erläutern Sie den Begriff ”<br />

Sensor-Modell“! [#0193]<br />

(Frage I/1 30. März 2001, Frage I/7 19. Oktober 2001)<br />

• Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren<br />

kann man sie ermitteln? [#0194]<br />

(Frage I/2 30. März 2001)<br />

• Was versteht man unter ”<br />

passiver Radiometrie“? [#0195]<br />

(Frage I/3 30. März 2001, Frage I/9 1. Februar 2002)<br />

• Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon <strong>aus</strong>gefüllt<br />

(also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme<br />

treten auf, wenn das Polygon sehr ”<br />

spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)?<br />

[#0196]<br />

(Frage I/4 30. März 2001, Frage I/2 14. Dezember 2001)<br />

• Wie ist der ”<br />

Hit-or-Miss“-Operator A ⊛ B definiert? Erläutern Sie seine Funktionsweise zur<br />

Erkennung von Strukturen in Binärbildern! [#0199]<br />

(Frage I/5 30. März 2001)<br />

• Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild<br />

(pseudo color image)? Nennen Sie je einen typischen Anwendungsfall! [#0200]<br />

(Frage I/6 30. März 2001)<br />

• Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit<br />

der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz? [#0202]<br />

(Frage I/7 30. März 2001, Frage I/10 19. Oktober 2001, Frage I/4 14. Dezember 2001)<br />

• Was versteht man unter ”<br />

prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile<br />

bringt ihr Einsatz? [#0206]<br />

(Frage I/8 30. März 2001)<br />

• Erklären Sie den Begriff ”<br />

spatial partitioning“ und nennen Sie drei räumliche Datenstrukturen<br />

<strong>aus</strong> dieser Gruppe! [#0208]<br />

(Frage I/9 30. März 2001)


296 APPENDIX B. FRAGENÜBERSICHT<br />

• Erklären Sie die Begriffe ”<br />

feature“ (Merkmal), ”<br />

feature space“ (Merkmalsraum) und ”<br />

cluster“<br />

im Zusammenhang mit Klassifikationsproblemen und verdeutlichen Sie Ihre Antwort<br />

anh<strong>and</strong> einer Skizze! [#0209]<br />

(Frage I/10 30. März 2001, Frage I/9 28. September 2001, Frage I/7 1. Februar 2002)<br />

• Im Folgenden sehen Sie drei 3 × 3 Trans<strong>for</strong>mationmatrizen, wobei jede der Matrizen einen<br />

bestimmten Trans<strong>for</strong>mationstyp für homogene Koordinaten in 2D beschreibt:<br />

⎛<br />

A = ⎝ a ⎞<br />

11 0 0<br />

0 a 22 0 ⎠ , a 11 , a 22 beliebig<br />

⎛<br />

0 0 1<br />

B = ⎝ b ⎞<br />

11 b 12 0<br />

−b 12 b 11 0 ⎠ , b 2 11 + b 2 12 = 1<br />

⎛ 0 0 1<br />

C = ⎝ 1 0 c ⎞<br />

13<br />

0 1 c 23<br />

⎠ , c 13 , c 23 beliebig<br />

0 0 1<br />

Um welche Trans<strong>for</strong>mationen h<strong>and</strong>elt es sich bei A, B und C? [#0213]<br />

(Frage I/1 11. Mai 2001)<br />

• In der <strong>Computer</strong>grafik ist die Abbildung eines dreidimensionalen Objekts auf die zweidimensionale<br />

Bildfläche ein mehrstufiger Prozess (Abbildung B.4), an dem verschiedene Trans<strong>for</strong>mationen<br />

und Koordinatensysteme beteiligt sind. Benennen Sie die Koordinatensysteme A,<br />

B und C in Abbildung B.4! [#0215]<br />

(Frage I/1 26. Juni 2001)<br />

• Gegeben sei ein verr<strong>aus</strong>chtes monochromes digitales Rasterbild. Gesucht sei ein Filter, das<br />

zur Bereinigung eines solchen Bildes geeignet ist, wobei folgende An<strong>for</strong>derungen gestellt<br />

werden:<br />

– Kanten müssen erhalten bleiben und dürfen nicht ”<br />

verwischt“ werden.<br />

– Im Ausgabebild dürfen nur solche Grauwerte enthalten sein, die auch im Eingabebild<br />

vorkommen.<br />

Schlagen Sie einen Filtertyp vor, der dafür geeignet ist, und begründen Sie Ihre Antwort!<br />

[#0216]<br />

(Frage I/2 11. Mai 2001, Frage I/5 19. Oktober 2001)<br />

• In der <strong>Computer</strong>grafik kennt man die Begriffe ”<br />

Phong-shading“ und ”<br />

Phong-illumination“.<br />

Erklären Sie diese beiden Begriffe! [#0219]<br />

(Frage I/3 11. Mai 2001)<br />

• Bei der Erstellung realistischer Szenen werden in der <strong>Computer</strong>grafik u.a. die zwei Konzepte<br />

” shading“ und shadow“ verwendet, um die Helligkeit der darzustellenden Bildpunkte zu<br />

”<br />

ermitteln. Was ist der Unterschied zwischen diesen beiden Begriffen? [#0220]<br />

(Frage I/3 26. Juni 2001, Frage I/2 19. Oktober 2001, Frage I/4 15. März 2002)<br />

• Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung! [#0225]<br />

(Frage I/4 11. Mai 2001)<br />

• Nennen Sie allgemeine An<strong>for</strong>derungen an eine Datenstruktur zur Repräsentation dreidimensionaler<br />

Objekte! [#0230]<br />

(Frage I/7 11. Mai 2001)


B.1. GRUPPE 1 297<br />

• Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen! Welche<br />

”<br />

Optimierungen können helfen, den Rechenaufw<strong>and</strong> zu verringern? [#0231]<br />

(Frage I/9 11. Mai 2001, Frage I/8 19. Oktober 2001, Frage 8 15. Jänner 2002)<br />

• Beschreiben Sie Anwendungen von ”<br />

Resampling“ und erläutern Sie den Prozess, seine Varianten<br />

und mögliche Fehlerquellen! [#0232]<br />

(Frage I/10 11. Mai 2001)<br />

• Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines ”<br />

echten“<br />

(dreidimensionalen) Raumeindrucks einer vom <strong>Computer</strong> dargestellten Szene! [#0233]<br />

(Frage I/11 11. Mai 2001)<br />

• Erklären Sie den Unterschied zwischen ”<br />

supervised classification“ und ”<br />

unsupervised classification“!<br />

Welche Rollen spielen diese Verfahren bei der automatischen Klassifikation der<br />

Bodennutzung anh<strong>and</strong> von Luftbildern? [#0234]<br />

(Frage I/8 11. Mai 2001)<br />

• Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche<br />

Kompressionsraten können erzielt werden? [#0235]<br />

(Frage I/6 11. Mai 2001, Frage I/9 14. Dezember 2001, Frage I/1 15. März 2002)<br />

• Was versteht man unter ”<br />

motion blur“, und unter welcher Vor<strong>aus</strong>setzung kann dieser Effekt<br />

<strong>aus</strong> einem Bild wieder entfernt werden? [#0238]<br />

(Frage I/2 26. Juni 2001, Frage I/10 14. Dezember 2001)<br />

• Welchem Zweck dient ein sogenannter ”<br />

Objektscanner“? Nennen Sie drei verschiedene Verfahren,<br />

nach denen ein Objektscanner berührungslos arbeiten kann! [#0239]<br />

(Frage I/4 26. Juni 2001)<br />

• Erklären Sie anh<strong>and</strong> eines Beispiels den Vorgang des morphologischen Filterns! [#0240]<br />

(Frage I/6 26. Juni 2001)<br />

• Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen<br />

Rasterbildes? [#0243]<br />

(Frage I/5 1. Februar 2002)<br />

• Beschreiben Sie anh<strong>and</strong> einer Skizze das ”<br />

Aussehen“ folgender Filtertypen im Frequenzbereich:<br />

1. Tiefpassfilter<br />

2. Hochpassfilter<br />

3. B<strong>and</strong>passfilter<br />

(Frage I/8 26. Juni 2001)<br />

[#0245]<br />

• Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden?<br />

Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern!<br />

[#0246]<br />

(Frage I/5 26. Juni 2001)


298 APPENDIX B. FRAGENÜBERSICHT<br />

• Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein<br />

verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die er<strong>for</strong>derlich sind, um ein<br />

solches verzerrtes Bild zu rektifizieren! [#0249]<br />

(Frage I/10 26. Juni 2001)<br />

• In der <strong>Computer</strong>grafik gibt es zwei grundlegend verschiedene Verfahren, um ein möglichst<br />

(photo-)realistisches Bild einer dreidimensionalen Szene zu erstellen. Verfahren A kommt<br />

zum Einsatz, wenn Spiegelreflexion, Lichtbrechung und Punktlichtquellen simuliert werden<br />

sollen. Verfahren B ist besser geeignet, um diffuse Reflexion, gegenseitige Lichtabstrahlung<br />

und Flächenlichtquellen darzustellen und die Szene interaktiv zu durchw<strong>and</strong>ern. Benennen<br />

Sie diese beiden Verfahren und erläutern Sie kurz deren jeweilige Grundidee! [#0253]<br />

(Frage I/7 26. Juni 2001)<br />

• Was versteht man unter einem ”<br />

LoD/R-Tree“? [#0254]<br />

(Frage I/9 26. Juni 2001)<br />

• Was versteht man unter ”<br />

immersiver Visualisierung“? [#0256]<br />

(Frage I/11 26. Juni 2001)<br />

• Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird<br />

verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert? [#0265]<br />

(Frage I/10 28. September 2001)<br />

• Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der<br />

Stereo-Bildgebung! [#0266]<br />

(Frage I/7 28. September 2001)<br />

• Was versteht man unter data garmets“ (Datenkleidung)? Nennen Sie mindestens zwei<br />

”<br />

Geräte dieser Kategorie! [#0273]<br />

(Frage I/4 19. Oktober 2001)<br />

• Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters<br />

und vergleichen Sie die Vor- und Nachteile beider Filtertypen! [#0274]<br />

(Frage I/1 19. Oktober 2001)<br />

• Was versteht man unter einer ”<br />

kon<strong>for</strong>men Trans<strong>for</strong>mation“? [#0275]<br />

(Frage I/6 19. Oktober 2001)<br />

• Nach welchem Grundprinzip arbeiten Verfahren, die <strong>aus</strong> einem Stereobildpaar die Oberfläche<br />

eines in beiden Bildern sichtbaren Körpers rekonstruieren können? [#0276]<br />

(Frage I/3 19. Oktober 2001)<br />

• Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung<br />

digitaler Rasterbilder verwendet werden! [#0278]<br />

(Frage I/3 9. November 2001, Frage I/7 15. März 2002)<br />

• Was ist ”<br />

Morphologie“? [#0279]<br />

(Frage I/7 9. November 2001)<br />

• Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie<br />

mindestens drei Beispiele davon! [#0280]<br />

(Frage I/4 9. November 2001)


B.1. GRUPPE 1 299<br />

• Erläutern Sie die strukturelle Methode der Texturbeschreibung! [#0281]<br />

(Frage I/8 9. November 2001)<br />

• Nennen Sie ein Verfahren zur Verbesserung verr<strong>aus</strong>chter Bilder, und erläutern sie deren<br />

Auswirkungen auf die Qualität des Bildes! Bei welcher Art von R<strong>aus</strong>chen kann das von<br />

Ihnen genannte Verfahren eingesetzt werden? [#0296]<br />

(Frage I/3 1. Februar 2002)<br />

• Erläutern Sie die Octree-Datenstruktur und nennen Sie mindestens zwei verschiedene Anwendungen<br />

davon! [#0298]<br />

(Frage I/10 1. Februar 2002)<br />

• Erklären Sie den z-buffer-Algorithmus zur Ermittlung sichtbarer Flächen! [#0299]<br />

(Frage I/8 1. Februar 2002)<br />

• Beschreiben Sie die Arbeitsweise des Marr-Hildreth-Operators 1 ! [#0311]<br />

(Frage I/9 15. März 2002)<br />

• Nennen Sie vier dreidimensionale Farbmodelle, benennen Sie die einzelnen Komponenten<br />

und skizzieren Sie die Geometrie des Farbmodells! [#0313]<br />

(Frage I/6 15. März 2002)<br />

• Versuchen Sie eine Definition des Histogramms eines digitalen Grauwertbildes! [#0314]<br />

(Frage I/3 15. März 2002)<br />

1 Dieser Operator wurde in der Vorlesung zur Vorbearbeitung von Stereobilder besprochen und erstmals im<br />

Wintersemester 2001/02 namentlich genannt.


300 APPENDIX B. FRAGENÜBERSICHT<br />

(a) Verfahren 1<br />

(b) Verfahren 2<br />

(c) Verfahren 3<br />

(d) Verfahren 4<br />

Figure B.2: dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt


B.1. GRUPPE 1 301<br />

Figure B.3: Überführung einer Vektorgrafik in eine <strong>and</strong>ere<br />

Modellierungs−<br />

A B Projektion<br />

C<br />

Trans<strong>for</strong>mation<br />

Figure B.4: Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale<br />

Bildfläche


302 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.5: Pixelraster<br />

B.2 Gruppe 2<br />

Figure B.6: binäres Rasterbild<br />

• Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt,<br />

wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster<br />

dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt? [#0011]<br />

(Frage II/13 14. Dezember 2001)<br />

• Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung<br />

dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung<br />

”<br />

B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes<br />

grafisch mitzuteilen. [#0029]<br />

(Frage II/14 14. April 2000)<br />

• Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen<br />

Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält<br />

sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen?<br />

[#0030]<br />

(Frage II/15 14. April 2000)<br />

• Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva<br />

bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der<br />

Konstruktion des Objektes (ohne Lampe). [#0031]<br />

(Frage II/17 14. April 2000)


B.2. GRUPPE 2 303<br />

Figure B.7: Tisch<br />

• Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das ”<br />

Geheimnis“, welches<br />

es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine<br />

wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem<br />

Sehen möglich ist. [#0037]<br />

(Frage II/13 14. April 2000)<br />

• Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das<br />

Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Trans<strong>for</strong>mation des<br />

Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Trans<strong>for</strong>mation<br />

einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1<br />

angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit<br />

grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung<br />

erfolgt. [#0039]<br />

1 2 3 4 5 6 7<br />

2 3 4 5 6 7 8<br />

3 4 5 6 7 8 9<br />

4 5 6 7 8 9 10<br />

5 6 7 8 9 10 11<br />

Figure B.8: Inputbild<br />

Zeile Spalte<br />

2.5 1.5<br />

2.5 2.5<br />

4.75 5.25<br />

Table B.1: Entsprechungspunkte im Inputbild<br />

(Frage II/16 14. April 2000, Frage II/17 30. März 2001, Frage II/14 14. Dezember 2001)<br />

• Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt<br />

werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden.<br />

Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln<br />

führen. [#0057]


304 APPENDIX B. FRAGENÜBERSICHT<br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

1 2 3 4<br />

5 6<br />

7<br />

8<br />

9 10 11 12<br />

Figure B.9: Die Verbindung zweier Pixel soll angenähert werden<br />

Figure B.10: Objekt bestehend <strong>aus</strong> zwei Flächen<br />

(Frage II/11 26. Mai 2000)<br />

• Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die<br />

Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die<br />

Reihenfolge, damit beide Flächen ”<br />

in die gleiche Richtung weisen“! [#0069]<br />

(Frage II/12 26. Mai 2000, Frage II/12 10. November 2000, Frage II/15 11. Mai 2001, Frage<br />

II/11 14. Dezember 2001)<br />

• Bei der Erstellung eines Bildes mittels ”<br />

recursive raytracing“ trifft der Primärstrahl für ein<br />

bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen<br />

aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die<br />

Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten<br />

jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen<br />

geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel<br />

zugeordnete Intensität, wenn<br />

1. die Rekursionstiefe nicht beschränkt ist,<br />

2. der Strahl nur genau einmal aufgeteilt wird,<br />

3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt!<br />

Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des<br />

Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden! [#0072]<br />

(Frage II/15 26. Mai 2000)


B.2. GRUPPE 2 305<br />

2,7 A<br />

0,1<br />

0,5<br />

2 B 3 C<br />

0,4 0,1<br />

2 D 4 E<br />

Figure B.11: Aufteilung des Primärstrahls bei ”<br />

recursive raytracing“<br />

y<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

A<br />

M<br />

B<br />

0 1 2 3 4 5 6 7 8<br />

9 10 11<br />

x<br />

Figure B.12: Lineare Trans<strong>for</strong>mation M eines Objekts A in ein Objekt B<br />

• In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Trans<strong>for</strong>mation M in das<br />

Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an,<br />

die diese Trans<strong>for</strong>mation beschreibt (zwei verschiedene Lösungen)! [#0074]<br />

(Frage II/13 26. Mai 2000, Frage II/13 10. November 2000)<br />

• Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett<br />

umr<strong>and</strong>eten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das<br />

Ergebnis direkt in Abbildung B.13 eintragen. [#0075]<br />

(Frage II/14 26. Mai 2000)<br />

• Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umr<strong>and</strong>eten Bereiches<br />

des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in<br />

Abbildung B.14 eintragen. [#0080]<br />

(Frage II/11 30. Juni 2000, Frage II/14 10. November 2000)<br />

• In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell<br />

von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parameter<br />

der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter<br />

wahrgenommene Intensität I dieses Punktes!<br />

Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge<br />

gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die<br />

Näherung (1 − x) k ≈ 1 − kx für kleine x verwendbar ist. [#0085]<br />

(Frage II/12 30. Juni 2000, Frage II/15 15. Dezember 2000)


306 APPENDIX B. FRAGENÜBERSICHT<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1 1<br />

1 1<br />

1 1<br />

1 2<br />

2 2<br />

2 2<br />

1<br />

1<br />

2<br />

3<br />

2<br />

2<br />

1<br />

1<br />

2<br />

2<br />

2<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

2<br />

1<br />

1<br />

1<br />

1<br />

1<br />

1<br />

Sobel<br />

Figure B.13: Anwendung des Sobel-Operators auf ein Grauwertbild<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0 5 0 0 0 0 0 0<br />

0 5 0 0 4 0 0 0<br />

0 1 5 0 0 1 2 4<br />

0 0 5 2 4 5 5 5<br />

0 1 3 5 5 5 5 5<br />

1 3 5 5 5 2 5 5<br />

2 5 5 3 5 5 5 5<br />

Figure B.14: Anwendung eines Median-Filters auf ein Grauwertbild<br />

• Ermitteln Sie zu dem Grauwertbild <strong>aus</strong> Abbildung B.16 eine Bildpyramide, wobei jedem<br />

Pixel einer Ebene der Mittelwert der entsprechenden vier Pixel <strong>aus</strong> der übergeordneten<br />

(höher aufgelösten) Ebene zugewiesen wird! [#0088]<br />

(Frage II/13 30. Juni 2000, Frage II/11 10. November 2000, Frage II/12 15. Dezember 2000,<br />

Frage II/14 28. September 2001)<br />

• Geben Sie einen ”<br />

Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten<br />

für das Polygon <strong>aus</strong> Abbildung B.17 an und zeichnen Sie die von Ihnen verwendeten Trennebenen<br />

ein! [#0089]<br />

(Frage II/14 30. Juni 2000, Frage II/15 10. November 2000, Frage II/15 1. Februar 2002)<br />

• Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherl<strong>and</strong><br />

anh<strong>and</strong> des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes<br />

sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks<br />

R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt<br />

<strong>aus</strong> Abbildung B.18 ablesen. [#0092]<br />

(Frage II/15 30. Juni 2000)<br />

• Gegeben seien die Trans<strong>for</strong>mationsmatrix<br />

⎛<br />

und zwei Punkte<br />

M =<br />

⎛<br />

p 1 = ⎝<br />

⎜<br />

⎝<br />

3<br />

−1<br />

1<br />

0 2 0 0<br />

0 0 2 0<br />

1 0 0 −5<br />

−2 0 0 8<br />

⎞<br />

⎠ , p 2 =<br />

⎛<br />

⎞<br />

⎟<br />

⎠<br />

⎝ 2 4<br />

−1<br />

in Objektkoordinaten. Führen Sie die beiden Punkte p 1 und p 2 mit Hilfe der Matrix M in<br />

die Punkte p ′ 1 bzw. p ′ 2 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei<br />

die Umw<strong>and</strong>lungen zwischen dreidimensionalen und homogenen Koordinaten)! [#0099]<br />

⎞<br />


B.2. GRUPPE 2 307<br />

Lichtquelle<br />

Beobachter<br />

N<br />

L<br />

V<br />

p<br />

Figure B.15: Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell<br />

Parameter Formelzeichen Wert<br />

diffuser Reflexionskoeffizient k d 0.2<br />

Spiegelreflexionskoeffizient W (θ) = k s 0.5<br />

Spiegelreflexionsexponent n 3<br />

Richtung zur Lichtquelle L (−0.6, 0.8) T<br />

Richtung zum Beobachter V (0.8, 0.6) T<br />

Oberflächennormalvektor N (0, 1) T<br />

Intensität des ambienten Umgebungslichtes I a 0<br />

Intensität der Lichtquelle I p 2<br />

Table B.2: Parameter für das Phongsche Beleuchtungsmodell in Abbildung B.15<br />

(Frage II/11 13. Oktober 2000)<br />

• Wenden Sie den Clipping-Algorithmus von Cohen-Sutherl<strong>and</strong> (in zwei Dimensionen)<br />

auf die in Beispiel B.2 gefundenen Punkte p ′ 1 und p ′ 2 an, um den innerhalb des Quadrats<br />

Q = {(0, 0) T , (0, 1) T , (1, 1) T , (1, 0) T } liegenden Teil der Verbindungsstrecke zwischen p ′ 1 und<br />

p ′ 2 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen<br />

grafisch lösen. [#0100]<br />

(Frage II/12 13. Oktober 2000)<br />

• Das Quadrat Q in normalisierten Bildschirmkoordinaten <strong>aus</strong> Beispiel B.2 wird in ein Rechteck<br />

R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten trans<strong>for</strong>miert. Zeichnen Sie die<br />

Verbindung der zwei Punkte p ′ 1 und p ′ 2 in Abbildung B.20 ein und bestimmen Sie grafisch<br />

jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu<br />

approximieren! [#0102]<br />

(Frage II/13 13. Oktober 2000)<br />

• Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden.<br />

Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des<br />

fett umr<strong>and</strong>eten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen.<br />

Führen Sie außerdem für eines der Pixel den Rechengang vor. [#0103]<br />

(Frage II/14 13. Oktober 2000)


308 APPENDIX B. FRAGENÜBERSICHT<br />

3<br />

2<br />

0<br />

0<br />

8<br />

7<br />

3<br />

1<br />

9 9<br />

6 8<br />

6 9<br />

2 7<br />

Figure B.16: Grauwertbild als höchstauflösende Ebene einer Bildpyramide<br />

3<br />

2<br />

4<br />

1<br />

Figure B.17: Polygon für BSP-Darstellung<br />

• Nehmen Sie an, der Gradientenoperator in Aufgabe B.2 hätte das Ergebnis in Abbildung<br />

B.22 ermittelt. Zeichnen Sie das Histogramm dieses Gradientenbildes und finden Sie einen<br />

geeigneten Schwellwert, um Kantenpixel“ zu identifizieren. Markieren Sie in Abbildung<br />

”<br />

B.22 rechts alle jene Pixel (Kantenpixel), die mit diesem Schwellwert gefunden werden.<br />

[#0104]<br />

(Frage II/15 13. Oktober 2000)<br />

• Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des<br />

R<strong>and</strong>es eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete<br />

Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung<br />

B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse. [#0106]<br />

(Frage II/16 13. Oktober 2000)<br />

• In Abbildung B.24 sind zwei Binärbilder A und B gezeigt, wobei schwarze Pixel logisch ”<br />

1“<br />

und weiße Pixel logisch ”<br />

0“ entsprechen. Führen sie die Boolschen Operationen<br />

1. A <strong>and</strong> B,<br />

2. A xor B,<br />

3. A minus B<br />

<strong>aus</strong> und tragen Sie die Ergebnisse in Abbildung B.24 ein! [#0132]<br />

(Frage II/11 15. Dezember 2000, Frage II/15 15. März 2002)<br />

• Gegeben sei ein Farbwert C RGB = (0.8, 0.5, 0.1) T im RGB-Farbmodell.<br />

1. Welche Spektralfarbe entspricht am ehesten dem durch C RGB definierten Farbton?<br />

2. Finden Sie die entsprechende Repräsentation von C RGB im CMY- und im CMYK-<br />

Farbmodell!<br />

(Frage II/13 15. Dezember 2000, Frage II/14 19. Oktober 2001)<br />

[#0134]


B.2. GRUPPE 2 309<br />

y<br />

11<br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

A<br />

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14<br />

R<br />

B<br />

x<br />

Figure B.18: Anwendung des Clipping-Algorithmus von Cohen-Sutherl<strong>and</strong><br />

y<br />

3<br />

2<br />

1<br />

Q<br />

−2 −1 0 1 2<br />

−1<br />

3<br />

x<br />

−2<br />

Figure B.19: Clipping nach Cohen-Sutherl<strong>and</strong><br />

• Bestimmen Sie mit Hilfe der normalisierten Korrelation RN 2 (m, n) jenen Bild<strong>aus</strong>schnitt innerhalb<br />

des fett umr<strong>and</strong>eten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen<br />

Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie<br />

den gefundenen Bereich in Abbildung B.25! [#0135]<br />

(Frage II/14 15. Dezember 2000)<br />

• In Abbildung B.26 sehen Sie vier Punkte P 1 , P 2 , P 3 und P 4 , die als Kontrollpunkte für<br />

eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des<br />

Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 1 3 , also x( 1 3<br />

), und<br />

erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26<br />

eintragen, eine skizzenhafte Darstellung ist <strong>aus</strong>reichend.<br />

Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der<br />

Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird. [#0164]<br />

(Frage II/13 2. Februar 2001, Frage II/12 9. November 2001, Frage II/15 14. Dezember 2001,<br />

Frage II/14 15. März 2002)<br />

• Berechnen Sie jene Trans<strong>for</strong>mationsmatrix M, die eine Rotation um 45 ◦ im Gegenuhrzeigersinn<br />

um den Punkt R = (3, 2) T und zugleich eine Skalierung mit dem Faktor √ 2 bewirkt


310 APPENDIX B. FRAGENÜBERSICHT<br />

R<br />

Figure B.20: Verbindung zweier Punkte nach Bresenham<br />

0 0 0 1 2 2 3<br />

0 1 2 3 3 3 3<br />

1 2 3 7 7 6 3<br />

1 2 7 8 9 8 4<br />

2 2 8 8 8 9 5<br />

Figure B.21: Anwendung eines Gradientenoperators<br />

(wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei<br />

Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p ′ = Mp in den Punkt<br />

p ′ übergeführt wird.<br />

Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für<br />

die Matrixmultiplikation geeignet anwenden. [#0166]<br />

(Frage II/15 2. Februar 2001)<br />

• In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion<br />

der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine G<strong>aus</strong>ssche Normalverteilung<br />

zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren<br />

benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p 1 , p 2 und p 3 in zwei Dimensionen<br />

(also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C! [#0173]<br />

(Frage II/17 2. Februar 2001)<br />

• Skizzieren Sie das Histogramm des digitalen Grauwertbildes <strong>aus</strong> Abbildung B.29, und kommentieren<br />

Sie Ihre Skizze! [#0176]<br />

(Frage II/12 2. Februar 2001, Frage II/13 19. Oktober 2001, Frage II/12 14. Dezember 2001)


B.2. GRUPPE 2 311<br />

0 1 2 2 0 0 0<br />

2 3 5 7 6 4 2<br />

1 4 8 7 7 7 4<br />

0 6 8 3 2 6 3<br />

0 8 8 1 0 5 0<br />

Figure B.22: Auffinden der Kantenpixel<br />

Figure B.23: R<strong>and</strong> einer Region<br />

• Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass<br />

1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals<br />

unverändert lässt,<br />

2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals<br />

vollständig unterdrückt!<br />

(Frage II/14 2. Februar 2001, Frage II/15 19. Oktober 2001)<br />

[#0182]<br />

• Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öffnen“<br />

mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische ”<br />

Effekt tritt auch in diesem Beispiel auf?<br />

Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis<br />

” ”<br />

rechts in Abbildung B.31 eintragen. [#0186]<br />

(Frage II/16 2. Februar 2001)<br />

• Gegeben sei ein Farbwert C RGB = (0.8, 0.4, 0.2) T im RGB-Farbmodell. Schätzen Sie grafisch<br />

die Lage des Farbwertes C HSV in Abbildung B.32 (also die Entsprechung von C RGB im HSV-<br />

Modell). Skizzieren Sie ebenso die Lage eines Farbwertes C HSV ′ , der den gleichen Farbton<br />

und die gleiche Helligkeit aufweist wie C HSV , jedoch nur die halbe Farbsättigung! [#0201]<br />

(Frage II/13 11. Mai 2001, Frage II/13 1. Februar 2002)


312 APPENDIX B. FRAGENÜBERSICHT<br />

A B <strong>and</strong> xor minus<br />

Figure B.24: Boolsche Operationen auf Binärbildern<br />

0 1 1 1 2 2 2<br />

0 0 1 0 1 1 2<br />

1 1 1 0 0 1 2<br />

1 2 2 1 2 1 1<br />

2 2 1 0 1 0 0<br />

0 1 0 0 1 1 0<br />

0 1<br />

1 2<br />

M<br />

Figure B.25: Ermittlung der normalisierten Korrelation<br />

• Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender<br />

Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses<br />

Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze R<strong>and</strong> in Abbildung<br />

B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst. [#0203]<br />

(Frage II/12 30. März 2001)<br />

• Wenden Sie auf den fett umr<strong>and</strong>eten Bereich in Abbildung B.34 den Roberts-Operator zur<br />

Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen. [#0204]<br />

(Frage II/14 30. März 2001)<br />

• Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale<br />

Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie<br />

einen Schritt des Algorithmus im Detail anh<strong>and</strong> Ihrer Zeichnung! Wählen Sie den Schwellwert<br />

so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann<br />

vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung<br />

B.35 einzeichnen. [#0207]<br />

(Frage II/13 30. März 2001)<br />

• Gegeben seien eine 4 × 4-Matrix<br />

sowie vier Punkte<br />

M =<br />

⎛<br />

⎜<br />

⎝<br />

8 0 8 −24<br />

0 8 8 8<br />

0 0 0 24<br />

0 0 1 1<br />

p 1 = (3, 0, 1) T<br />

p 2 = (2, 0, 7) T<br />

p 3 = (4, 0, 5) T<br />

p 4 = (1, 0, 3) T<br />

im dreidimensionalen Raum. Die Matrix M fasst alle Trans<strong>for</strong>mationen zusammen, die zur<br />

Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p ′ = M · p<br />

⎞<br />

⎟<br />


B.2. GRUPPE 2 313<br />

¦¥<br />

¤£<br />

¢¡<br />

Figure B.26: Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau<br />

¨§<br />

y<br />

y<br />

5<br />

5<br />

4<br />

4<br />

3<br />

3<br />

2<br />

1<br />

R<br />

2<br />

1<br />

R<br />

0<br />

0<br />

1 2 3 4 5 6<br />

x<br />

0<br />

0<br />

1 2 3 4 5 6<br />

x<br />

Figure B.27: allgemeine Rotation mit Skalierung<br />

in Gerätekoordinaten er<strong>for</strong>derlich sind (siehe auch Abbildung B.36, die Bildschirmebene und<br />

daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Trans<strong>for</strong>mationsmatrix<br />

M werden die Punkte p 1 und p 2 auf die Punkte<br />

p ′ 1 = (4, 8, 12) T<br />

p ′ 2 = (6, 8, 3) T<br />

in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p ′ 3 und p ′ 4! [#0210]<br />

(Frage II/15 30. März 2001)<br />

• Die vier Punkte <strong>aus</strong> Aufgabe B.2 bilden zwei Strecken<br />

A = p 1 p 2 , B = p 3 p 4 ,<br />

deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline<br />

fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt<br />

(A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist!<br />

Hinweis: Zeichnen Sie p 1 p 2 und p 3 p 4 in die xz-Ebene des Gerätekoordinatensystems ein!<br />

[#0211]<br />

(Frage II/16 30. März 2001)<br />

• In Abbildung B.37 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender<br />

Reihenfolge vorkommen (also f(x) = x im angegebenen Koordinatensystem, zur<br />

Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil<br />

1. ein lineares Tiefpassfilter F 1 ,


314 APPENDIX B. FRAGENÜBERSICHT<br />

y<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

−1<br />

¤£<br />

¦¥<br />

¢¡<br />

−2<br />

−1 0 1 2 3 4 5 6<br />

x<br />

Figure B.28: drei Merkmalsvektoren im zweidimensionalen Raum<br />

Figure B.29: digitales Grauwertbild (Histogramm gesucht)<br />

2. ein lineares Hochpassfilter F 2<br />

mit 3×3-Filterkernen Ihrer Wahl an und geben Sie Ihr Ergebnis in Form eines Bild<strong>aus</strong>schnitts<br />

wie in Abbildung B.37 oder als Funktionen f 1 (x) und f 2 (x) an! Zeichnen Sie außerdem<br />

die von Ihnen verwendeten Filterkerne. R<strong>and</strong>pixel müssen nicht gesondert berücksichtigt<br />

werden. [#0214]<br />

(Frage II/12 11. Mai 2001, Frage II/11 9. November 2001)<br />

• In Abbildung B.38(a) ist ein digitales Grauwertbild gezeigt, in dem mittels normalisierter<br />

Kreuzkorrelation das Strukturelement <strong>aus</strong> Abbildung B.38(b) gesucht werden soll. Markieren<br />

Sie in Abbildung B.38(a) die Position, an der der Wert der normalisierten Kreuzkorrelation<br />

maximal ist! Die Aufgabe ist grafisch zu lösen, es sind keine Berechnungen er<strong>for</strong>derlich.<br />

[#0223]<br />

(Frage II/14 11. Mai 2001)<br />

• Wenden Sie die ”<br />

medial axis“ Trans<strong>for</strong>mation von Bloom auf das Objekt in Abbildung B.39<br />

links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen. [#0226]<br />

(Frage II/16 11. Mai 2001)


B.2. GRUPPE 2 315<br />

(a) Tiefpass<br />

(b) Hochpass<br />

Figure B.30: leere Filtermasken<br />

Formelement<br />

Figure B.31: morphologisches Öffnen<br />

• Gegeben seien eine 3 × 3-Trans<strong>for</strong>mationsmatrix<br />

⎛<br />

⎞<br />

M = −4 3 1 ⎠<br />

⎝ 3 4 2<br />

0 0 1<br />

sowie drei Punkte<br />

a = (2, 0) T ,<br />

b = (0, 1) T ,<br />

c = (0, 0) T<br />

im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine<br />

kon<strong>for</strong>me Trans<strong>for</strong>mation, wobei ein Punkt p gemäß p ′ = Mp in einen Punkt p ′ übergeführt<br />

wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und<br />

bc stehen normal aufein<strong>and</strong>er.<br />

1. Berechnen Sie a ′ , b ′ und c ′ durch Anwendung der durch M beschriebenen Trans<strong>for</strong>mation<br />

auf die Punkte a, b und c!<br />

2. Da M eine kon<strong>for</strong>me Trans<strong>for</strong>mation beschreibt, müssen auch die Punkte a ′ , b ′ und<br />

c ′ ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall<br />

ist! (Hinweis: es genügt zu zeigen, dass die Strecken a ′ c ′ und b ′ c ′ normal aufein<strong>and</strong>er<br />

stehen.)<br />

[#0229]


316 APPENDIX B. FRAGENÜBERSICHT<br />

grün<br />

gelb<br />

cyan<br />

weiß<br />

rot<br />

blau<br />

magenta<br />

Figure B.32: eine Ebene im HSV-Farbmodell<br />

50<br />

256<br />

Figure B.33: Graukeil<br />

(Frage II/17 11. Mai 2001)<br />

• Geben Sie je eine 3 × 3-Filtermaske zur Detektion<br />

1. horizontaler<br />

2. vertikaler<br />

Kanten in einem digitalen Rasterbild an! [#0247]<br />

(Frage II/12 26. Juni 2001, Frage II/11 15. März 2002)<br />

• In Abbildung B.40 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender<br />

Reihenfolge vorkommen (also f(x) = x im angegebenen Koordinatensystem, zur<br />

Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil die<br />

in Aufgabe B.2 gefragten Filterkerne an und geben Sie Ihr Ergebnis in Form eines Bild<strong>aus</strong>schnitts<br />

wie in Abbildung B.40 oder als Funktionen f 1 (x) und f 2 (x) an! R<strong>and</strong>pixel müssen<br />

nicht gesondert berücksichtigt werden. [#0248]<br />

(Frage II/14 26. Juni 2001)<br />

• Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.41 links an. Verwenden<br />

Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren<br />

Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.41 rechts<br />

eintragen. [#0255]<br />

(Frage II/16 26. Juni 2001, Frage II/14 1. Februar 2002)<br />

• Gegeben seien eine Kugel mit Mittelpunkt m S , ein Punkt p S auf der Kugeloberfläche und<br />

eine Lichtquelle an der Position p L mit der Intensität I L . Die Intensität soll physikalisch<br />

korrekt mit dem Quadrat der Entfernung abnehmen. Die Oberfläche der Kugel ist durch das


B.2. GRUPPE 2 317<br />

9 9 8 8 6 7 6 6<br />

7 8 9 8 7 2 3 1<br />

6 8 7 8 3 2 0 1<br />

8 7 8 2 3 1 1 2<br />

7 6 7 1 0 2 3 1<br />

7 6 8 2 2 1 2 0<br />

Figure B.34: Roberts-Operator<br />

Lambert’sche Beleuchtungsmodell beschrieben, der diffuse Reflexionskoeffizient ist k d . Die<br />

Szene wird von einer synthetischen Kamera an der Position p C betrachtet. Berechnen Sie<br />

die dem Punkt p S zugeordnete Intensität I S unter Verwendung der Angaben <strong>aus</strong> Tabelle<br />

B.3!<br />

Hinweis: der Punkt p S ist von der Kameraposition p C <strong>aus</strong> sichtbar, diese Bedingung muss<br />

nicht überprüft werden. [#0257]<br />

Parameter Formelzeichen Wert<br />

Kugelmittelpunkt m S (−2, 1, −4) T<br />

Oberflächenpunkt p S (−4, 5, −8) T<br />

Position der Lichtquelle p L (2, 7, −11) T<br />

Intensität der Lichtquelle I L 343<br />

diffuser Reflexionskoeffizient k d 1<br />

Position der Kamera p C (−e 2 , 13.7603, −4π) T<br />

Table B.3: Geometrie und Beleuchtungsparameter der Szene<br />

(Frage II/13 26. Juni 2001)<br />

• In Abbildung B.42(a) ist eine diskret approximierte Linie eingezeichnet. Erzeugen Sie dar<strong>aus</strong><br />

auf zwei verschiedene Arten eine ”<br />

drei Pixel dicke“ Linie und beschreiben Sie die von Ihnen<br />

verwendeten Algorithmen! Sie können die Ergebnisse direkt in die Abbildungen B.42(b) und<br />

B.42(c) einzeichnen. [#0258]<br />

(Frage II/15 26. Juni 2001, Frage II/11 28. September 2001, Frage 10 20. November 2001)<br />

• Geben Sie eine 4 × 4-Matrix für homogene Koordinaten in drei Dimensionen an, die eine<br />

perspektivische Projektion mit dem Projektionszentrum p 0 = (2, 3, −1) T beschreibt!<br />

Hinweis: das Projektionszentrum wird in homogenen Koordinaten auf den Punkt (0, 0, 0, 0) T<br />

abgebildet. [#0260]<br />

(Frage II/17 26. Juni 2001)<br />

• Gegeben seien eine Kugel S (durch Mittelpunkt M und Radius r), ein Punkt p S auf der<br />

Kugeloberfläche und ein Dreieck T (durch die drei Eckpunkte p 1 , p 2 und p 3 ). Berechnen<br />

Sie unter Verwendung der Angaben <strong>aus</strong> Tabelle B.4<br />

1. den Oberflächennormalvektor n S der Kugel im Punkt p S ,<br />

2. den Oberflächennormalvektor n T des Dreiecks!<br />

Eine Normierung der Normalvektoren auf Einheitslänge ist nicht er<strong>for</strong>derlich. [#0262]<br />

(Frage II/13 28. September 2001)


318 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.35: zweidimensionale Polygonrepräsentation<br />

• Zeichnen Sie in Abbildung B.43 die zweidimensionale Figur ein, die durch den dort angeführten<br />

Kettencode definiert ist. Beginnen Sie bei dem mit ”<br />

ד markierten Pixel. Um welche<br />

Art von Kettencode h<strong>and</strong>elt es sich hier (bzgl. der verwendeten Nachbarschaftsbeziehungen)?<br />

[#0268]<br />

(Frage II/15 28. September 2001)<br />

• Ein Laserdrucker hat eine Auflösung von 600dpi. Wie viele Linienpaare pro Millimeter sind<br />

mit diesem Gerät einw<strong>and</strong>frei darstellbar (es genügen die Formel und eine grobe Abschätzung)?<br />

[#0269]<br />

(Frage II/12 28. September 2001, Frage II/15 9. November 2001, Frage 8 20. November 2001)<br />

• Gegeben seien ein Punkt p O = (3, −2, −1) T in Objektkoordinaten sowie die Matrizen<br />

⎛<br />

⎞ ⎛<br />

⎞<br />

4 0 0 −3<br />

1 0 0 0<br />

M = ⎜ 0 2 0 4<br />

⎟<br />

⎝ 0 0 3 6 ⎠ , P = ⎜ 0 1 0 0<br />

⎟<br />

⎝ 0 0 0 1 ⎠ ,<br />

0 0 0 1<br />

0 0 1 0<br />

wobei M die Modellierungs- und P die Projektionsmatrix beschreiben. Berechnen Sie<br />

1. den Punkt p W = M · p O in Weltkoordinaten,<br />

2. den Punkt p S = P · p W in Bildschirmkoordinaten,<br />

3. die Matrix M ′ = P · M!


B.2. GRUPPE 2 319<br />

z<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

¤£<br />

¦¥<br />

¨§ ©<br />

¢¡<br />

−1<br />

−2<br />

−1 0 1 2 3 4 5 6 7<br />

x<br />

Figure B.36: Objekt und Kamera im Weltkoordinatensystem<br />

0 256<br />

x<br />

0 1 2 3 4 5<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1 2 3 4 5<br />

1 2 3 4 5<br />

1 2 3 4 5<br />

1 2 3 4 5<br />

1 2 3 4<br />

5<br />

Figure B.37: Graukeil<br />

Hinweis zu 3: die Multplikation mit P entspricht hier lediglich einer Zeilenvert<strong>aus</strong>chung.<br />

[#0272]<br />

(Frage II/12 19. Oktober 2001)<br />

• In Abbildung B.44 sind vier Punkte A, B, C und D eingezeichnet. Trans<strong>for</strong>mieren Sie diese<br />

Punkte nach der Vorschrift<br />

x ′ = 2x − 3y + xy + 4<br />

y ′ = 4x + y − 2xy + 2<br />

und zeichnen Sie Ihr Ergebnis (A ′ , B ′ , C ′ und D ′ ) direkt in Abbildung B.44 rechts ein! Um<br />

welche Art von Trans<strong>for</strong>mation h<strong>and</strong>elt es sich hier? [#0277]<br />

(Frage II/11 19. Oktober 2001)<br />

• Abbildung B.45 zeigt ein digitales Rasterbild, das als Textur verwendet wird. Durch die große<br />

Entfernung von der virtuellen Kamera erscheint die Fläche im Verhältnis 1:3 verkleinert,<br />

wobei <strong>aus</strong> Gründen der Effizienz der einfache Sub-Sampling-Algorithmus für die Verkleinerung<br />

verwendet wird. Zeichnen Sie in Abbildung B.45 rechts das Bild ein, wie es am Ausgabegerät


320 APPENDIX B. FRAGENÜBERSICHT<br />

(a)<br />

(b)<br />

Figure B.38: Anwendung der normalisierten Kreuzkorrelation<br />

Figure B.39: Anwendung der medial axis Trans<strong>for</strong>mation<br />

erscheint, und markieren Sie links die verwendeten Pixel. Welchen Effekt können Sie hier<br />

beobachten, und warum tritt er auf? [#0284]<br />

(Frage II/13 9. November 2001)<br />

• Abbildung B.46 zeigt drei digitale Grauwertbilder und deren Histogramme. Geben Sie für<br />

jedes der Bilder B.46(a), B.46(c) und B.46(e) an, welches das dazugehörige Histogramm ist<br />

(B.46(b), B.46(d) oder B.46(f)), und begründen Sie Ihre jeweilige Antwort! [#0285]<br />

(Frage II/14 9. November 2001)<br />

• Zeichnen Sie in Abbildung B.47 jene Pixel ein, die benötigt werden, um im Halbtonverfahren<br />

die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der<br />

Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen<br />

”<br />

dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0289]<br />

(Frage II/11 1. Februar 2002)<br />

• Zeichnen Sie in Abbildung B.48 jene Pixel ein, die benötigt werden, um im Halbtonverfahren<br />

die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der<br />

Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen<br />

”<br />

dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0294]<br />

(Frage II/13 1. Februar 2002)<br />

• Wenden Sie auf das Binärbild in Abbildung B.49 links die morphologische Operation ”<br />

Schließen“<br />

mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische<br />

Effekt tritt auch in diesem Beispiel auf?


B.2. GRUPPE 2 321<br />

0 256<br />

x<br />

0 1 2 3 4 5<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1 2 3 4 5<br />

1 2 3 4 5<br />

1 2 3 4 5<br />

1 2 3 4 5<br />

1 2 3 4<br />

5<br />

Figure B.40: Graukeil<br />

X<br />

Figure B.41: Anwendung des Hit-or-Miss-Operators auf ein Binärbild<br />

Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis<br />

” ”<br />

rechts in Abbildung B.49 eintragen. [#0297]<br />

(Frage II/12 1. Februar 2002)<br />

• Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.50 links an. Verwenden<br />

Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren<br />

Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.50 rechts<br />

eintragen. [#0301]<br />

(Frage II/12 1. Februar 2002)<br />

• Wenden Sie auf das Binärbild in Abbildung B.51 links die morphologische Operation Schließen“ ”<br />

mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische<br />

Effekt tritt auch in diesem Beispiel auf?<br />

Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis<br />

” ”<br />

rechts in Abbildung B.51 eintragen. [#0303]<br />

(Frage II/14 1. Februar 2002)<br />

• Geben Sie einen ”<br />

Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten


322 APPENDIX B. FRAGENÜBERSICHT<br />

(a) ”<br />

dünne“ Linie (b) ”<br />

dicke“ Linie (Variante 1) (c) ”<br />

dicke“ Linie (Variante 2)<br />

Figure B.42: Erstellen dicker Linien<br />

Parameter Formelzeichen Wert<br />

Kugelmittelpunkt M S (−2, 1, −4) T<br />

Kugelradius r 6<br />

Punkt auf Kugeloberfläche p S (−4, 5, −8) T<br />

Dreieckseckpunkt p 1 (2, 1, 3) T<br />

Dreieckseckpunkt p 2 (3, 5, 3) T<br />

Dreieckseckpunkt p 3 (5, 2, 3) T<br />

Table B.4: Geometrie der Objekte<br />

für das Polygon <strong>aus</strong> Abbildung B.52 an und zeichnen Sie die von Ihnen verwendeten Trennebenen<br />

ein! [#0305]<br />

(Frage II/15 1. Februar 2002)<br />

• Gegeben seien das Farbbildnegativ in Abbildung B.53 sowie die durch die Kreise markierten<br />

Farbwerte A, B und C laut folgender Tabelle:<br />

Farbe<br />

A<br />

B<br />

C<br />

Farbwert (RGB)<br />

(0.6, 0.4, 0.3) T<br />

(0.3, 0.2, 0.1) T<br />

(0.5, 0.3, 0.1) T<br />

Berechnen Sie die Farbwerte A ′ , B ′ und C ′ , die das entsprechende Positivbild an den gleichen<br />

markierten Stellen wie in Abbildung B.53 aufweist! [#0312]<br />

(Frage II/12 15. März 2002)<br />

• In Abbildung B.54 sollen eine überwachte Klassifikation ( supervised classification“) anh<strong>and</strong><br />

”<br />

gegebener Trainingsdaten durchgeführt und auf ebenfalls gegebene neue Daten angew<strong>and</strong>t<br />

werden. Der Merkmalsraum ( feature space“) ist eindimensional, d.h. es ist nur ein skalares<br />

”<br />

Merkmal ( feature“) zu berücksichtigen. Die Werte dieses Merkmals sind in Abbildung<br />

”<br />

B.54(a) für ein 3 × 3 Pixel großes digitales Grauwertbild eingetragen, Abbildung B.54(b)<br />

zeigt die dazugehörigen Zuordnungen zu den Klassen A und B.<br />

Die Klassifikation soll unter der Annahme einer Normalverteilung (G<strong>aus</strong>s’sche Wahrscheinlichkeitsdichte)<br />

der Daten erfolgen. Bestimmen Sie die Klassenzuordnung der Pixel in Abbildung<br />

B.54(c) (tragen Sie Ihr Ergebnis in Abbildung B.54(d) ein) und geben Sie ebenso<br />

Ihre Zwischenergebnisse an!<br />

Hinweis: die St<strong>and</strong>ardabweichung σ beider Klassen ist gleich und muss nicht berechnet werden.<br />

[#0315]


B.2. GRUPPE 2 323<br />

Figure B.43: Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz<br />

” 221000110077666434544345“<br />

y<br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

D<br />

3<br />

C<br />

2<br />

1<br />

A B<br />

0<br />

0 1 2 3 4 5 6 7 8 9 10 x<br />

y<br />

10<br />

9<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

0 1 2 3 4 5 6 7 8 9 10 x<br />

Figure B.44: Trans<strong>for</strong>mation von vier Punkten<br />

(Frage II/13 15. März 2002)


324 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.45: Sub-Sampling


B.2. GRUPPE 2 325<br />

(a) Vancouver (b) Histogramm 1<br />

(c) Kluane (d) Histogramm 2<br />

(e) Steiermark (f) Histogramm 3<br />

Figure B.46: drei digitale Grauwertbilder und ihre Histogramme


326 APPENDIX B. FRAGENÜBERSICHT<br />

0<br />

1 2 3 4 5 6 7 8 9<br />

Figure B.47: Halbtonverfahren<br />

9 8 7 6 5 4 3 2 1 0<br />

Figure B.48: Halbtonverfahren<br />

Formelement<br />

Figure B.49: morphologisches Schließen<br />

X<br />

Figure B.50: Anwendung des Hit-or-Miss-Operators auf ein Binärbild


B.2. GRUPPE 2 327<br />

Formelement<br />

Figure B.51: Halbtonverfahren<br />

2<br />

3<br />

1<br />

4<br />

Figure B.52: Polygon für BSP-Darstellung<br />

A<br />

C<br />

B<br />

Figure B.53: Farbbildnegativ<br />

2 1 5<br />

A A A<br />

9 8 12<br />

A<br />

A<br />

B<br />

3 8<br />

8 6 14<br />

B B B<br />

7 11<br />

(a) Trainingsdaten<br />

(b) Klassifikation<br />

(c) neue Daten<br />

(d) Ergebnis<br />

Figure B.54: überwachte Klassifikation


328 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.55: Rechteck mit Störobjekten<br />

A<br />

B.3 Gruppe 3<br />

Figure B.56: Pixelanordnung<br />

• Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern<br />

Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert.<br />

Verwenden Sie bitte dazu Formel<strong>aus</strong>drücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf.<br />

Stellen Sie auch das Ergebnisbild dar. [#0008]<br />

(Frage III/20 14. April 2000)<br />

• Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch,<br />

mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser<br />

Pixelanordnung. [#0010]<br />

(Frage III/18 14. April 2000)<br />

• Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche<br />

Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um


B.3. GRUPPE 3 329<br />

Figure B.57: Bild mit Störungen<br />

die Darstellung der Methode und die Begründung, warum diese Methode die Störungen<br />

entfernen wird. [#0018]<br />

(Frage III/19 14. April 2000)<br />

• Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt<br />

nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte<br />

ist I A = 100, I B = 50 und I C = 0. Berechne die Beleuchtungswerte nach dem Gouraud-<br />

Verfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden<br />

Pixeln. [#0035]<br />

(Frage III/21 14. April 2000)<br />

• Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses<br />

Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist,<br />

das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu<br />

segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form<br />

eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)! [#0064]<br />

(Frage III/16 26. Mai 2000, Frage 9 20. November 2001)<br />

• Die Trans<strong>for</strong>mationsmatrix M <strong>aus</strong> Abbildung B.60 ist <strong>aus</strong> einer Translation T und einer<br />

Skalierung S zusammengesetzt, also M = T · S (ein Punkt p wird gemäß q = M · p in den<br />

Punkt q übergeführt). Bestimmen Sie T, S und M −1 (die Inverse von M)! [#0065]<br />

(Frage III/17 26. Mai 2000, Frage III/16 10. November 2000, Frage III/19 28. September<br />

2001)<br />

• In der Vorlesung wurden Tiefenwahrnehmungshilfen ( ”<br />

depth cues“) besprochen, die es dem<br />

menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene<br />

dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in<br />

der digitalen Bildverarbeitung von verschiedenen ”<br />

shape from X“-Verfahren gelöst. Welche<br />

” depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden ” shape<br />

from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung<br />

kann kein solcher Zusammenhang hergestellt werden? [#0071]<br />

(Frage III/18 26. Mai 2000)


330 APPENDIX B. FRAGENÜBERSICHT<br />

B<br />

A<br />

C<br />

Figure B.58: Rasterdarstellung eines Objekts<br />

1 3 6 2<br />

5 6 6 1<br />

6 7 5 2<br />

6 4 1 0<br />

Figure B.59: Grauwertbild<br />

• In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in<br />

der Mitte heller ist als am R<strong>and</strong>. Geben Sie ein Verfahren an, das diese Störung entfernt!<br />

[#0076]<br />

(Frage III/19 26. Mai 2000)<br />

• Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig,<br />

um dar<strong>aus</strong> mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen<br />

Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer<br />

als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund<br />

des Positivbildes an! [#0077]<br />

(Frage III/20 26. Mai 2000)<br />

• Auf der derzeit laufenden steirischen L<strong>and</strong>es<strong>aus</strong>stellung ”<br />

comm.gr2000az“ im Schloss Eggenberg<br />

in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen<br />

soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu<br />

können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden.<br />

Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte<br />

Skizze der Anordnung ist in Abbildung B.63 dargestellt.<br />

Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63<br />

markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit<br />

halber folgende Kameraparameter an:<br />

– Brennweite: 10 Millimeter<br />

– geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter


B.3. GRUPPE 3 331<br />

M =<br />

⎛<br />

⎜<br />

⎝<br />

1 0 0 4<br />

0 2.5 0 −3<br />

0 0 2 0<br />

0 0 0 1<br />

⎞<br />

⎟<br />

⎠<br />

Figure B.60: Trans<strong>for</strong>mationsmatrix<br />

Figure B.61: Digitales Rasterbild mit zum R<strong>and</strong> hin abfallender Intensität<br />

Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition<br />

verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie<br />

eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen<br />

zur Unschärfe der binokularen Tiefenwahrnehmung verwenden. [#0078]<br />

(Frage III/17 30. Juni 2000)<br />

• Ein Koordinatensystem K 1 wird durch Rotation in ein <strong>and</strong>eres Koordinatensystem K 2<br />

übergeführt, sodass ein Punkt mit den Koordinaten p in K 1 in den Punkt q = Mp in<br />

K 2 trans<strong>for</strong>miert wird. In Tabelle B.5 sind vier Entsprechungspunkte zwischen den beiden<br />

Koordinatensystemen gegeben. Bestimmen Sie die 3 × 3-Matrix 2 M!<br />

Hinweis: Beachten Sie, dass (da es sich um eine Rotation h<strong>and</strong>elt) ||a|| = ||b|| = ||c|| = 1<br />

und weiters a · b = a · c = b · c = 0, wobei ”·“ das Skalarprodukt bezeichnet. [#0081]<br />

Punkt in K 1 Punkt in K 2<br />

(0, 0, 0) T (0, 0, 0) T<br />

a = (a 1 , a 2 , a 3 ) T (1, 0, 0) T<br />

b = (b 1 , b 2 , b 3 ) T (0, 1, 0) T<br />

c = (c 1 , c 2 , c 3 ) T (0, 0, 1) T<br />

Table B.5: Entsprechungspunkte zwischen den zwei Koordinatensystemen K 1 und K 2<br />

(Frage III/16 30. Juni 2000)<br />

• Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden<br />

an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß<br />

q = Mp zu trans<strong>for</strong>mieren (sog. ”<br />

rigid body trans<strong>for</strong>mation“)!<br />

2 Homogene Koordinaten bringen hier keinen Vorteil, da keine Translation vorliegt.


332 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.62: Farbfilmnegativ<br />

Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“<br />

”<br />

enthalten. Wären sie hingegen explizit <strong>for</strong>muliert, wäre die Antwort eigentlich Material der<br />

Gruppe I“. [#0090]<br />

”<br />

(Frage III/18 30. Juni 2000)<br />

• Dem digitalen Rasterbild in Abbildung B.64 ist eine regelmäßige Störung überlagert (kohärentes<br />

R<strong>aus</strong>chen). Beschreiben Sie ein Verfahren, das diese Störung entfernt!<br />

[#0091]<br />

(Frage III/19 30. Juni 2000)<br />

• Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation<br />

Erosion“ angew<strong>and</strong>t werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dilation<br />

genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In <strong>and</strong>eren<br />

”<br />

Worten: statt der Erosion sollen <strong>and</strong>ere morphologische Operationen eingesetzt werden, die<br />

in geeigneter Reihenfolge nachein<strong>and</strong>er <strong>aus</strong>geführt das gleiche Ergebnis liefern wie eine Erosion.)<br />

Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und<br />

benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende<br />

Formelement ist ebenfalls in Abbildung B.65 dargestellt.<br />

Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt <strong>aus</strong> der<br />

Definitionsmenge Z 2 zeigt! [#0096]<br />

(Frage III/20 30. Juni 2000, Frage III/17 10. November 2000, Frage III/17 14. Dezember<br />

2001)<br />

• Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich<br />

durch die Gleichung<br />

(A ⊖ B) c = A c ⊕ ˆB<br />

<strong>for</strong>mulieren. Warum ist in dieser Gleichung die Reflexion ( ˆB) von Bedeutung? [#0107]<br />

(Frage III/18 13. Oktober 2000)<br />

• Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas ”<br />

flau“.<br />

1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert.<br />

2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes<br />

zu verbessern?<br />

Wird durch diese Methoden auch der In<strong>for</strong>mationsgehalt des Bildes vergrößert? Begründen<br />

Sie Ihre Antwort. [#0108]<br />

(Frage III/20 13. Oktober 2000, Frage III/19 10. November 2000)


B.3. GRUPPE 3 333<br />

4m<br />

Roboter<br />

2m<br />

Kamera 1 Kamera 2<br />

x<br />

z<br />

Wurfbahn des Balles<br />

aktuelle<br />

Ballposition<br />

2m<br />

y<br />

z<br />

Figure B.63: Vereinfachter Aufbau des bällefangenden Roboters auf der L<strong>and</strong>es<strong>aus</strong>stellung<br />

comm.gr2000az<br />

• Wie äußern sich für das menschliche Auge<br />

1. eine zu geringe geometrische Auflösung<br />

2. eine zu geringe Grauwerteauflösung<br />

eines digitalen Rasterbildes? [#0113]<br />

(Frage III/19 13. Oktober 2000, Frage III/20 10. November 2000, Frage III/20 28. September<br />

2001)<br />

• Welche Aussagen kann man über die Summen der Maskenkomponenten eines ( ”<br />

vernünftigen“)<br />

Tief- bzw. Hochpassfilters treffen? Begründen Sie Ihre Antwort. [#0114]<br />

(Frage III/17 13. Oktober 2000, Frage III/18 10. November 2000)<br />

• Ein Farbwert C RGB = (R, G, B) T im RGB-Farbmodell wird in den entsprechenden Wert<br />

C YIQ = (Y, I, Q) T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet:<br />

⎛<br />

0.299 0.587 0.114<br />

C YIQ = ⎝ 0.596 −0.275 −0.321<br />

⎞<br />

⎠ · C RGB<br />

0.212 −0.528 0.311<br />

Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix <strong>aus</strong>gedrückt? (Hinweis:<br />

Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem<br />

Zusammenhang die Y-Komponente hat.) [#0120]


334 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.64: Bild mit überlagertem kohärentem R<strong>aus</strong>chen<br />

(Frage III/19 14. Dezember 2001)<br />

• Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man3 die zugrundeliegenden<br />

Operationen (Erosion und Dilation) wiederholt <strong>aus</strong>führen. Welches der<br />

folgenden beiden Verfahren führt zum gewünschten Ergebnis:<br />

1. Es wird zuerst die Erosion n-mal <strong>aus</strong>geführt und anschließend n-mal die Dilation, also<br />

(((A ⊖B) . . . ⊖ B) ⊕B) . . . ⊕ B<br />

} {{ } } {{ }<br />

n−mal ⊖ n−mal ⊕<br />

2. Es wird die Erosion <strong>aus</strong>geführt und anschließend die Dilation, und der Vorgang wird<br />

n-mal wiederholt, also<br />

(((A ⊖B) ⊕ B) . . . ⊖ B) ⊕ B<br />

} {{ }<br />

n−mal abwechselnd ⊖/⊕<br />

Begründen Sie Ihre Antwort und erklären Sie, warum das <strong>and</strong>ere Verfahren versagt! [#0126]<br />

(Frage III/16 15. Dezember 2000, Frage III/20 11. Mai 2001, Frage III/16 14. Dezember<br />

2001, Frage III/17 15. März 2002)<br />

• In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht<br />

zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung<br />

solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche<br />

Fälle typisch? [#0129]<br />

(Frage III/17 15. Dezember 2000)<br />

• In Abbildung B.67 sehen Sie die <strong>aus</strong> der Vorlesung bekannte Skizze zur Auswirkung des<br />

morphologischen Öffnens auf ein Objekt (Abbildung B.67(a) wird durch Öffnen mit dem<br />

gezeigten Strukturelement in Abbildung B.67(b) übergeführt). Wie kommen die Rundungen<br />

in Abbildung B.67(b) zust<strong>and</strong>e, und wie könnte man deren Auftreten verhindern? [#0149]<br />

3 abgesehen von einer Vergrößerung des Maskenelements B


B.3. GRUPPE 3 335<br />

1 2<br />

3<br />

Formelement<br />

Figure B.65: Alternative Berechnung der morphologischen Erosion<br />

(Frage III/18 15. Dezember 2000, Frage III/23 30. März 2001)<br />

• In Abbildung B.68 sind ein Geradenstück g zwischen den Punkten A und B sowie zwei<br />

weitere Punkte C und D gezeigt. Berechnen Sie den Abst<strong>and</strong> (kürzeste Euklidische Distanz)<br />

zwischen g und den Punkten C bzw. D. [#0150]<br />

(Frage III/19 15. Dezember 2000)<br />

• In Abbildung B.69 sind ein digitales Rasterbild sowie die Resultate der Anwendung von<br />

drei verschiedenen Filteroperationen gezeigt. Finden Sie die Operationen, die auf Abbildung<br />

B.69(a) angew<strong>and</strong>t zu den Abbildungen B.69(b), B.69(c) bzw. B.69(d) geführt haben, und<br />

beschreiben Sie jene Eigenschaften der Ergebnisbilder, an denen Sie die Filter erkannt haben.<br />

[#0151]<br />

(Frage III/20 15. Dezember 2000, Frage III/19 19. Oktober 2001)<br />

• Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer<br />

diskretisierten Bildfunktion. Erklären Sie diese Behauptung! [#0158]<br />

(Frage 4 16. Jänner 2001, Frage III/18 14. Dezember 2001)<br />

• In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen<br />

Trans<strong>for</strong>mation in zwei Dimensionen erläutert:<br />

1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare<br />

gegeben sind<br />

2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/Output-<br />

Punktpaare gegeben sind ( ”<br />

Least squares method“)<br />

Die Methode der kleinsten Quadrate kann jedoch auch dann angew<strong>and</strong>t werden, wenn genau<br />

vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche<br />

Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese<br />

Feststellung?


336 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.66: Foto mit geringem Kontrast<br />

(a)<br />

(b)<br />

Strukturelement<br />

Figure B.67: Morphologisches Öffnen<br />

Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat.<br />

[#0163]<br />

(Frage III/23 2. Februar 2001)<br />

• In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene<br />

Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben!<br />

[#0165]<br />

(Frage III/19 2. Februar 2001, Frage III/18 19. Oktober 2001)<br />

• In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt<br />

gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge<br />

des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten<br />

Speicherbedarf einer solchen Repräsentation her, wobei<br />

– das erste (größte) Bild <strong>aus</strong> N × N Pixeln besteht,<br />

– alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden,<br />

– eine mögliche Komprimierung nicht berücksichtigt werden soll!<br />

Hinweis: Benutzen Sie die Gleichung ∑ ∞<br />

i=0 qi = 1<br />

1−q<br />

(Frage III/18 2. Februar 2001, Frage III/20 1. Februar 2002)<br />

für q ∈ R, 0 < q < 1. [#0171]


B.3. GRUPPE 3 337<br />

9<br />

8<br />

B<br />

7<br />

g<br />

6<br />

5<br />

4<br />

3<br />

C<br />

2<br />

A D<br />

1<br />

3 4 5 6 7 8 9 10 11 12 13<br />

Figure B.68: Abst<strong>and</strong>sberechnung<br />

• Gegeben seien eine Ebene ε und ein beliebiger (zusammenhängender) Polyeder P im dreidimensionalen<br />

Raum. Wie kann man einfach feststellen, ob die Ebene den Polyeder schneidet<br />

(also P ∩ ε ≠ {})? [#0172]<br />

(Frage III/21 2. Februar 2001)<br />

• Es sei p(x), x ∈ R 2 die Wahrscheinlichkeitsdichtefunktion gemäß G<strong>aus</strong>sscher Normalverteilung,<br />

deren Parameter aufgrund der drei Merkmalsvektoren p 1 , p 2 und p 3 <strong>aus</strong> Aufgabe<br />

B.2 geschätzt wurden. Weiters seien zwei Punkte x 1 = (0, 3) T und x 2 = (3, 6) T im Merkmalsraum<br />

gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre<br />

Antwort):<br />

1. p(x 1 ) < p(x 2 )<br />

2. p(x 1 ) > p(x 2 )<br />

Hinweis: Zeichnen Sie die beiden Punkte x 1 und x 2 in Abbildung B.28 ein und überlegen Sie<br />

sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C <strong>aus</strong> Aufgabe B.2 weisen.<br />

[#0174]<br />

(Frage III/22 2. Februar 2001)<br />

• Das digitalen Rasterbild <strong>aus</strong> Abbildung B.71 soll segmentiert werden, wobei die beiden<br />

Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme<br />

von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung<br />

hier nicht erfolgreich sein. Welche <strong>and</strong>eren Bildeigenschaften kann man verwenden,<br />

um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können?<br />

[#0181]<br />

(Frage III/20 2. Februar 2001, Frage III/20 14. Dezember 2001)<br />

• In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen<br />

nicht kommutativ ist, d.h. für zwei Trans<strong>for</strong>mationsmatrizen M 1 und M 2 gilt M 1·M 2 ≠ M 2·<br />

M 1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R 1<br />

und R 2 , so gilt sehr wohl R 1·R 2 = R 2·R 1 . Geben Sie eine geometrische oder mathematische<br />

Begründung für diesen Sachverhalt an!<br />

Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt! [#0192]<br />

(Frage III/18 30. März 2001, Frage III/16 9. November 2001)<br />

• Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen ”<br />

Erosion“<br />

bzw. ”<br />

Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur


338 APPENDIX B. FRAGENÜBERSICHT<br />

(a) Originalbild (b) Filter 1<br />

(c) Filter 2 (d) Filter 3<br />

Figure B.69: verschiedene Filteroperationen<br />

Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion<br />

bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden<br />

kann!<br />

Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen.<br />

[#0197]<br />

(Frage III/19 30. März 2001)<br />

• Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung<br />

liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angew<strong>and</strong>t<br />

”<br />

werden, wobei<br />

⎛<br />

T = ⎝ 1 0 t ⎞ ⎛<br />

x<br />

0 1 t y<br />

⎠ , S = ⎝ s 0 0 ⎞<br />

0 s 0 ⎠ .<br />

0 0 1<br />

0 0 1<br />

Nach der Tran<strong>for</strong>mation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt<br />

soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p<br />

des Objekts gemäß obiger Vorschrift in einen Punkt p ′ = M · p des trans<strong>for</strong>mierten Objekts<br />

überführt. Welche ist die richtige Lösung:


B.3. GRUPPE 3 339<br />

Figure B.70: Zylinder mit koaxialer Bohrung<br />

1. M = T · S<br />

2. M = S · T<br />

Begründen Sie Ihre Antwort und geben Sie M an! [#0198]<br />

(Frage III/22 30. März 2001)<br />

• In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären<br />

Sie, wie die Artefakte am oberen Bildr<strong>and</strong> zust<strong>and</strong>ekommen, und beschreiben Sie eine<br />

Möglichkeit, deren Auftreten zu verhindern! [#0205]<br />

(Frage III/21 30. März 2001, Frage III/20 19. Oktober 2001)<br />

• Warum ist die Summe der Maskenelemente bei einem reinen Hochpassfilter immer gleich<br />

null und bei einem reinen Tiefpassfilter immer gleich eins? [#0212]<br />

(Frage III/20 30. März 2001)<br />

• In Aufgabe B.1 wurde nach den Begriffen ”<br />

Phong-shading“ und ”<br />

Phong-illumination“<br />

gefragt. Beschreiben Sie eine Situation, in der beide Konzepte sinnvoll zum Einsatz kommen!<br />

[#0218]<br />

(Frage III/19 11. Mai 2001)<br />

• Wendet man in einem digitalen (RGB-)Farbbild auf jeden der drei Farbkanäle einen Median-<br />

Filter an, erhält man ein Ergebnis, das vom visuellen Eindruck ähnlich einem Mediangefilterten<br />

Grauwertbild ist. Welche Eigenschaft des Median-Filters geht bei einer solchen<br />

Anwendung auf Farbbilder jedoch verloren? Begründen Sie Ihre Antwort! [#0221]<br />

(Frage III/18 26. Juni 2001)<br />

• Wenden Sie wie bei Frage B.2 ein 3 × 3-Medianfilter F 3 auf den Graukeil in Abbildung B.37<br />

an und begründen Sie Ihre Antwort! [#0222]<br />

(Frage III/18 11. Mai 2001)<br />

• 1. Kommentieren Sie die Wirkung des hohen R<strong>aus</strong>chanteils von Abbildung B.38(a) (<strong>aus</strong><br />

Aufgabe B.2) auf die normalisierte Kreuzkorrelation!


340 APPENDIX B. FRAGENÜBERSICHT<br />

Figure B.71: Segmentierung eines Grauwertbildes<br />

Figure B.72: Artefakte bei einem schachbrettartigen Muster<br />

2. Welches Ergebnis würde man bei Anwendung der normalisierten Kreuzkorrelation mit<br />

dem selben Strukturelement (Abbildung B.38(b)) auf das rotierte Bild in Abbildung<br />

B.73 erhalten? Begründen Sie Ihre Antwort!<br />

(Frage III/21 11. Mai 2001)<br />

[#0224]<br />

• Welche Farbe liegt ”<br />

in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb<br />

und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser<br />

geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau? [#0227]<br />

(Frage III/23 11. Mai 2001)<br />

• Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar<br />

4 und die Spring Conference on <strong>Computer</strong> <strong>Graphics</strong> stattfinden. Durch einen<br />

4 Für interessierte Studenten <strong>aus</strong> der Vertiefungsrichtung <strong>Computer</strong>grafik besteht die Möglichkeit, kostenlos an<br />

diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren.


B.3. GRUPPE 3 341<br />

Figure B.73: Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild<br />

automatischen Prozess wurde dar<strong>aus</strong> Abbildung B.74(b) erzeugt, wobei einige Details (z.B.<br />

die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur<br />

Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise! [#0228]<br />

(a) Originalbild<br />

(b) verbesserte Version<br />

Figure B.74: automatische Kontrastverbesserung<br />

(Frage III/22 11. Mai 2001)<br />

• In Frage B.1 wurde festgestellt, dass die Abbildung eines dreidimensionalen Objekts auf<br />

die zweidimensionale Bildfläche durch eine Kette von Trans<strong>for</strong>mationen beschrieben werden<br />

kann. Erläutern Sie mathematisch, wie dieser Vorgang durch Verwendung des Assoziativgesetzes<br />

für die Matrixmultiplikation optimiert werden kann! [#0237]<br />

(Frage III/19 26. Juni 2001)<br />

• In der Vorlesung wurden die Operationen ”<br />

Schwellwert“ und ”<br />

Median“, anzuwenden auf<br />

digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden<br />

Operationen im Kontext der Filterung? [#0244]


342 APPENDIX B. FRAGENÜBERSICHT<br />

1 1 1 1 3 7 7 7 7<br />

1 1 1 1 3 7 7 7 7<br />

1 1 1 1 3 7 7 7 7<br />

1 1 1 1 3 7 7 7 7<br />

Figure B.75: unscharfe Kante in einem digitalen Grauwertbild<br />

(Frage III/20 26. Juni 2001)<br />

• Um einem Punkt p auf der Oberfläche eines dreidimensionalen Objekts die korrekte Helligkeit<br />

zuweisen zu können, benötigen alle realistischen Beleuchtungsmodelle den Oberflächennormalvektor<br />

n an diesem Punkt p. Wird nun das Objekt einer geometrischen Trans<strong>for</strong>mation<br />

unterzogen, sodass der Punkt p in den Punkt p ′ = Mp übergeführt wird 5 , ändert sich<br />

auch der Normalvektor, und zwar gemäß n ′ = (M −1 ) T n. Geben Sie eine mathematische<br />

Begründung für diese Behauptung!<br />

Hinweis: die durch p und n definierten Tangentialebenen vor bzw. nach der Trans<strong>for</strong>mation<br />

sind in Matrixschreibweise durch die Gleichungen n T x = n T p bzw. n ′T x ′ = n ′T p ′ gegeben.<br />

[#0250]<br />

(Frage III/21 26. Juni 2001)<br />

• In Abbildung B.75 sehen Sie einen vergößerten Ausschnitt <strong>aus</strong> einem digitalen Grauwertbild,<br />

der eine unscharfe Kante darstellt. Beschreiben Sie, wie diese Kante <strong>aus</strong>sieht, wenn<br />

1. ein lineares Tiefpassfilter<br />

2. ein Medianfilter<br />

mit Maskengröße 3 × 3 mehrfach hinterein<strong>and</strong>er auf das Bild angewendet wird. Begründen<br />

Sie Ihre Antwort! [#0251]<br />

(Frage III/22 26. Juni 2001, Frage III/19 9. November 2001, Frage III/16 1. Februar 2002)<br />

• Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 20% magenta, 50% gelb und 30% schwarz<br />

gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den<br />

Farbton in Worten! [#0252]<br />

(Frage III/23 26. Juni 2001)<br />

• Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 0% magenta, 50% gelb und 30% schwarz<br />

gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den<br />

Farbton in Worten! [#0261]<br />

(Frage III/20 9. November 2001)<br />

• Skizzieren Sie das Histogramm eines<br />

1. dunklen,<br />

2. hellen,<br />

3. kontrastarmen,<br />

4. kontrastreichen<br />

5 Dieses konkrete Beispiel ist in kartesischen Koordinaten leichter zu lösen als in homogenen Koordinaten. Wir<br />

betrachten daher nur 3 × 3-Matrizen (ohne Translationsanteil).


B.3. GRUPPE 3 343<br />

monochromen digitalen Rasterbildes! [#0263]<br />

(Frage III/16 28. September 2001)<br />

• Bei vielen Algorithmen in der <strong>Computer</strong>grafik ist eine Unterscheidung zwischen der Vorderund<br />

Rückseite“ eines Dreiecks notwendig (z.B. BSP-Baum, back face culling etc.). Wie ”<br />

kann der Oberflächennormalvektor eines Dreiecks genutzt werden, um diese Unterscheidung<br />

mathematisch zu <strong>for</strong>mulieren (d.h. mit welcher Methode kann man für einen gegebenen<br />

Punkt p feststellen, auf welcher Seite eines ebenfalls gegebenen Dreiecks T er sich befindet)?<br />

Geben Sie außerdem an, ob der Vektor n T <strong>aus</strong> Aufgabe 2 unter dieser Definition in den<br />

der Vorder- oder Rückseite des Dreiecks zugew<strong>and</strong>ten Halbraum weist. Begründen Sie Ihre<br />

Antwort! [#0264]<br />

(Frage III/17 28. September 2001)<br />

• Erläutern Sie, wie ein monochromes digitales Rasterbild, das ein Schwarzweißfilm-Negativ<br />

repräsentiert, durch Manipulation seines Histogramms in das entsprechende Positivbild umgew<strong>and</strong>elt<br />

werden kann! [#0267]<br />

(Frage III/18 28. September 2001)<br />

• In Abbildung B.76 sind die Histogramme von zwei verschiedenen digitalen Grauwertbildern<br />

A und B gezeigt. Nehmen Sie an, es würde nun auf beide Bilder die Operation Histogrammäqualisierung“<br />

angew<strong>and</strong>t werden, sodass die neuen Bilder A ′ bzw. B ′ dar<strong>aus</strong> entste-<br />

”<br />

hen.<br />

1. Skizzieren Sie die Histogramme von A ′ und B ′ .<br />

2. Kommentieren Sie die Auswirkung der Histogrammäqualisierung bei den Bildern A<br />

und B bzgl. Helligkeit und Kontrast!<br />

Begründen Sie Ihre Antworten! [#0270]<br />

(a) Histogramm von Bild A<br />

(b) Histogramm von Bild B<br />

Figure B.76: Histogramme von zwei verschiedenen Bildern<br />

(Frage III/17 19. Oktober 2001)<br />

• Bei der perspektivischen Trans<strong>for</strong>mation werden entfernte Objekte zwar verkleinert abgebildet,<br />

Geraden bleiben jedoch auch in der Projektion als Geraden erhalten. Geben Sie eine<br />

mathematische Begründung dieser Eigenschaft anh<strong>and</strong> der Projektionsmatrix<br />

M =<br />

⎛<br />

⎜<br />

⎝<br />

1 0 0 0<br />

0 1 0 0<br />

0 0 0 1<br />

0 0 1 0<br />

die einen Punkt p gemäß p ′ = Mp in den Punkt p ′ überführt!<br />

Hinweis: die x- und z-Koordinate einer Geraden stehen über die Gleichung x = kz + d<br />

⎞<br />

⎟<br />

⎠ ,


344 APPENDIX B. FRAGENÜBERSICHT<br />

zuein<strong>and</strong>er in Beziehung (Sonderfälle können vernachlässigt werden). Zeigen Sie, dass nach<br />

der Trans<strong>for</strong>mation x ′ = k ′ z ′ + d ′ gilt, und verfahren Sie analog für y. [#0271]<br />

(Frage III/16 19. Oktober 2001)<br />

• In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle<br />

einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet.<br />

Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte<br />

dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt,<br />

und was sind die typischen Eigenschaften, anh<strong>and</strong> derer man das Verfahren hier erkennen<br />

kann? [#0282]<br />

(a) Beleuchtung von links<br />

(b) Beleuchtung von rechts<br />

(c) Detail <strong>aus</strong> Abbildung B.77(a)<br />

(d) Detail <strong>aus</strong> Abbildung B.77(b)<br />

(Frage III/18 9. November 2001)<br />

Figure B.77: Torus mit Oberflächenstruktur


B.3. GRUPPE 3 345<br />

• Die morphologische Dilation A ⊕ B kann als<br />

A ⊕ B = ⋃<br />

geschrieben werden, also als Mengenvereinigung des an jedes Pixel x ∈ A verschobenen<br />

Maskenelements B. Zeigen Sie unter Verwendung dieser Definition die Kommutativität der<br />

Dilation, also A ⊕ B = B ⊕ A!<br />

Hinweis: Schreiben Sie A ⊕ B = A ⊕ (B ⊕ E), wobei E das 1 × 1 Pixel große ”<br />

Einheitsmaskenelement“<br />

ist, das das Objekt bei der Dilation unverändert lässt. [#0283]<br />

(Frage III/17 9. November 2001)<br />

• Welche der folgenden Trans<strong>for</strong>mationen sind in homogenen Koordinaten durch eine Matrixmultiplikation<br />

(x ′ = M · x) darstellbar? Begründen Sie Ihre Antwort!<br />

– Translation<br />

– perspektivische Projektion<br />

– Rotation<br />

– bilineare Trans<strong>for</strong>mation<br />

– Scherung<br />

– Skalierung<br />

– bikubische Trans<strong>for</strong>mation<br />

x∈A<br />

B x<br />

(Frage III/18 15. März 2002)<br />

[#0290]<br />

• Gegeben sei die Matrix<br />

⎛<br />

M = ⎝<br />

2 −2 3<br />

2 2 −4<br />

0 0 1<br />

mit deren Hilfe ein Punkt p im zweidimensionalen Raum in homogenen Koordinaten in einen<br />

Punkt ˜p ′ = M · ˜p übergeführt wird. Diese Operation lässt sich in kartesischen Koordinaten<br />

alternativ als<br />

p ′ = s · R(ϕ) · p + t<br />

anschreiben, wobei s der Skalierungsfaktor, R(ϕ) die Rotationsmatrix (Drehwinkel ϕ) und<br />

t der Translationsvektor sind. Ermitteln Sie s, ϕ und t! [#0292]<br />

(Frage III/19 1. Februar 2002)<br />

• Das Auge des kanadischen Bergschafes in Abbildung B.78(a) ist in den Abbildungen B.78(b)<br />

bis B.78(d) vergößert dargestellt 6 . Zur Interpolation wurden das nearest neighbor Verfahren,<br />

bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren<br />

den drei Bildern B.78(b) bis B.78(d) zu und begründen Sie Ihre Antwort! [#0293]<br />

(Frage III/16 1. Februar 2002)<br />

⎞<br />

⎠ ,<br />

6 Der Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen.


346 APPENDIX B. FRAGENÜBERSICHT<br />

• Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne<br />

Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei<br />

1 bietet das Plakat in der Farbe C (1)<br />

CMYK<br />

an, die Druckerei 2 legt ein Angebot für ein Plakat<br />

der Farbe C (2)<br />

CMYK , wobei C (1)<br />

CMYK = (0.6, 0.1, 0.7, 0.0)T ,<br />

C (2)<br />

CMYK = (0.2, 0.0, 0.3, 0.3)T .<br />

Welcher Druckerei würden Sie den Auftrag erteilen, wenn<br />

1. möglichst geringe Herstellungskosten<br />

2. ein möglichst intensiver Farbton<br />

das Auswahlkriterium ist? Begründen Sie Ihre Antwort! [#0295]<br />

(Frage III/19 1. Februar 2002)<br />

• Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne<br />

Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei<br />

1 bietet das Plakat in der Farbe C (1)<br />

CMYK<br />

an, die Druckerei 2 legt ein Angebot für ein Plakat<br />

der Farbe C (2)<br />

CMYK , wobei C (1)<br />

CMYK = (0.5, 0.0, 0.6, 0.1)T ,<br />

C (2)<br />

CMYK = (0.5, 0.3, 0.6, 0.0)T .<br />

Welcher Druckerei würden Sie den Auftrag erteilen, wenn<br />

1. möglichst geringe Herstellungskosten<br />

2. ein möglichst intensiver Farbton<br />

das Auswahlkriterium ist? Begründen Sie Ihre Antwort! [#0300]<br />

(Frage III/17 1. Februar 2002)<br />

• Das Auge des kanadischen Bergschafes in Abbildung B.79(a) ist in den Abbildungen B.79(b)<br />

bis B.79(d) vergößert dargestellt 7 . Zur Interpolation wurden das nearest neighbor Verfahren,<br />

bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren<br />

den drei Bildern B.79(b) bis B.79(d) zu und begründen Sie Ihre Antwort! [#0304]<br />

(Frage III/18 1. Februar 2002)<br />

• Der in Abbildung B.80 gezeigte BSP-Baum beschreibt ein zweidimensionales Polygon. Die<br />

Trennebenen (bzw. -geraden, da wir den zweidimensionalen Fall betrachten) in jedem Knoten<br />

sind durch Gleichungen der Form ax+by = c gegeben, wobei die Außenseite jeweils durch die<br />

Ungleichung ax + by > c und die Innenseite durch ax + by < c charakterisiert sind. Weiters<br />

führen (wie in Abbildung B.80 gezeigt) die ”<br />

Außen“-Pfade nach links und die ”<br />

Innen“-Pfade<br />

nach rechts.<br />

Zeichnen Sie in einem geeignet beschrifteten Koordinatensystem das Polygon, das durch<br />

diesen BSP-Baum beschrieben wird, und kennzeichnen Sie, welche Kante zu welcher Gleichung<br />

gehört! [#0307]<br />

(Frage III/20 15. März 2002)<br />

7 Der Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen.


B.3. GRUPPE 3 347<br />

• Erklären Sie die Begriffe ”<br />

Grenzfrequenz“ (cutoff frequency) und ideales vs. nicht ideales<br />

Filter im Zusammenhang mit digitalen Rasterbildern! In welchem Zusammenhang stehen<br />

diese Konzepte mit dem Aussehen des Ausgabebildes eines Filters? [#0309]<br />

(Frage III/19 15. März 2002)<br />

• In Abbildung B.81 wurde der bekannte Stan<strong>for</strong>d-Bunny mit drei verschiedenen Beleuchtungsmodellen<br />

dargestellt. Um welche Beleuchtungsmodelle h<strong>and</strong>elt es sich in den Abbildungen<br />

B.81(a), B.81(b) und B.81(c)? Anh<strong>and</strong> welcher Eigenschaften der Bilder haben Sie die<br />

gesuchten Beleuchtungsmodelle erkannt? [#0310]<br />

(Frage III/16 15. März 2002)


348 APPENDIX B. FRAGENÜBERSICHT<br />

(a) Originalbild (b) Verfahren 1<br />

(c) Verfahren 2 (d) Verfahren 3<br />

Figure B.78: Vergrößerung eines Bild<strong>aus</strong>schnittes unter Verwendung verschiedener Interpolationsverfahren


B.3. GRUPPE 3 349<br />

(a) Originalbild (b) Verfahren 1<br />

(c) Verfahren 2 (d) Verfahren 3<br />

Figure B.79: Vergrößerung eines Bild<strong>aus</strong>schnittes unter Verwendung verschiedener Interpolationsverfahren


350 APPENDIX B. FRAGENÜBERSICHT<br />

¡£¢¥¤<br />

out<br />

¨¢¥¤<br />

out<br />

¡§¦©¨§¢<br />

out<br />

in<br />

Figure B.80: BSP-Baum<br />

(a) Modell 1 (b) Modell 2 (c) Modell 3<br />

Figure B.81: Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle


Index<br />

xy-stitching, 50<br />

z-Buffer, 209<br />

Weber-Ratio, 87<br />

8-Code, 190<br />

absolute trans<strong>for</strong>mation, 232<br />

Abtastung, 34, 117<br />

Active Vision, 265<br />

active vision, 265<br />

Affine matching, 285<br />

anaglyphs, 230<br />

anterior chamber, 45<br />

Anti-Blur filter, 259<br />

Approximation, 179, 287, 289<br />

approximation, 155<br />

Auflösung, geometrische, 34, 117, 283, 284<br />

Auflösung, radiometrische, 34, 284<br />

Augenabst<strong>and</strong>, 229<br />

Augmented Reality, 256, 285<br />

augmented reality, 56<br />

Augmented Relity, 57, 256, 287<br />

back-face culling, 208<br />

basis matrix, 177<br />

Basisfunktion, 177<br />

Bezier-Kurve, 181, 307<br />

Bezier-Kurven, 179<br />

bi-directional reflectivity function, 222<br />

Bilderkennung, 35<br />

Bildmodell, 34, 283<br />

bilineare Trans<strong>for</strong>mation, 164, 329<br />

binokulares Sehen, 231, 297<br />

blending functions, 177<br />

Blending Funktionen, 177<br />

blind spot, 45<br />

Boundary-Representation, 196, 298<br />

bounding boxes, 208<br />

box function, 134<br />

Bresenham-Algorithmus, 65, 283, 297, 301<br />

BSP-Tree, 198, 300, 315<br />

Bump-Mapping, 148, 285<br />

bump-mapping, 148<br />

cabinet, 171<br />

calibration, 256<br />

Casteljau-Algorithmus, 181, 307<br />

cavalier, 171<br />

chain code, 312, 317<br />

chain-code, 189<br />

chromaticity, 91<br />

classification<br />

supervised, 291<br />

unsupervised, 291<br />

Clipping, 167<br />

clipping, 165<br />

CMY-Farbmodell, 95, 302<br />

CMYK-Farbmodell, 93, 95, 289, 302<br />

Cohen-Sutherl<strong>and</strong>, 167, 300, 301<br />

color model, 107, 292<br />

<strong>Computer</strong> <strong>Graphics</strong>/Visualization, 57<br />

computer-aided tomographic, 207<br />

<strong>Computer</strong>grafik und Bildanalyse, 266, 284<br />

<strong>Computer</strong>tomografie, 53, 286<br />

cone-tree, 263<br />

cones, 45<br />

control points, 180, 289<br />

convolution, 130<br />

cornea, 45<br />

CSG, 199, 296<br />

cut-of-frequency, 134<br />

cutoff-Frequenz, 135, 287<br />

data garmets, 56, 292<br />

data-garments, 56<br />

DDA-Algorithmus, 65, 283<br />

density, 88<br />

density slicing, 105<br />

depth cues, 207, 323<br />

descriptive geometry, 171<br />

direct capture system, 52<br />

Diskretisierung, 34<br />

distance<br />

between pixels, 38<br />

dither matrix, 88<br />

dots, 107<br />

dynamic range, 88<br />

dynamischer Bereich, 88, 115, 284<br />

edge-image, 132<br />

Elektronenstrahlschirm, 35<br />

351


352 INDEX<br />

Entzerrung, 287<br />

Erosion, 75, 326<br />

exterior orientation, 48, 56<br />

Füllen von Polygonen, 65, 289<br />

Farbfilmnegativ, 107, 324<br />

Farbmodell, CIE, 92, 93, 283, 284<br />

Farbmodell, CMY, 95, 286<br />

Farbmodell, CMYK, 95, 286<br />

Farbmodell, RGB, 93, 95, 284<br />

feature, 290<br />

feature space, 290<br />

feature vector, 46<br />

Fenster, 40<br />

fiducial marks, 173<br />

Filter, 40<br />

filter<br />

high pass<br />

Butterworth, 136, 292<br />

ideal, 136, 292<br />

filter mask, 127<br />

Fourier-Trans<strong>for</strong>mation, 289, 326<br />

fovea, 45<br />

Freiheitsgrad, 168, 325<br />

gamma, 49<br />

G<strong>aus</strong>s-filter, 129, 286<br />

Geometrievektor, 177<br />

geometry vector, 177<br />

Gouraud-shading, 219, 287, 323<br />

gradation curve, 119<br />

Gradientenbild, 134, 301<br />

Grafik-Pipeline, 265, 285<br />

Grauwertzuweisung, 287<br />

gross fog, 49<br />

Halbraumcodes, 167<br />

half tone, 314<br />

half-space codes, 166<br />

halo, 212<br />

head-mounted displays, 56<br />

hierarchical matching, 234<br />

histogram, 337<br />

equalization, 50, 105, 117, 119<br />

spreading, 119<br />

Histogramm, 120, 323<br />

Hit-or-miss Operator, 80, 289<br />

Hochpassfilter, 132, 324<br />

homogene Koordinaten, 157, 168, 285, 299,<br />

300, 325<br />

homologue points, 232<br />

HSV-Farbmodell, 97, 305<br />

hue, 96<br />

Human-<strong>Computer</strong>-Interfaces, 263<br />

hyper-spectral, 46<br />

ideal filter, 134<br />

illuminate, 51<br />

image<br />

black & white, 87<br />

color, 87<br />

false color, 90<br />

half tone, 88<br />

image flickering, 230<br />

Image Processing/<strong>Computer</strong> Vision, 57<br />

image quality, 115<br />

immersive visualization, 263<br />

in<strong>for</strong>mation slices, 263<br />

inner orientation, 232<br />

intensity slicing, 105<br />

Interpolation, 179, 287, 289<br />

interpolation, 155, 176<br />

Interpolation, bilineare, 252, 297<br />

Kante, 38, 286<br />

Kantendetektion, 134, 306<br />

Kell-factor, 117<br />

Kettenkodierung, 190<br />

Klassifikation, 240, 244, 287, 304<br />

Klassifizierung, 242<br />

Koeffizientenmatrix, 162<br />

Koordinatentrans<strong>for</strong>mation, 325<br />

Korrelation, normalisiert, 234, 303<br />

leaf, 192<br />

Least Squares, 180<br />

Least Squares Method, 164<br />

Least squares method, 164, 329<br />

Level of Detail, 199<br />

light, 90<br />

line pairs per millimeter, 50<br />

Linie, 38<br />

Linienpaar, 117<br />

listening mode, 53<br />

logische Verknüpfung, 40<br />

luminance, 90<br />

luminosity, 117<br />

Man-Machine Interaction, 263<br />

Maske, 40<br />

masked negative, 106<br />

median filter, 128<br />

Median-Filter, 129, 299<br />

Medianfilter, 129, 285, 323<br />

Mehrfachbilder, 46, 287<br />

Merkmalsraum, 242<br />

mexican hat, 131<br />

MIP-maps, 249


INDEX 353<br />

mirror stereoscope, 230<br />

Moiree effect, 107, 292<br />

moments, 145<br />

morphological<br />

closing, 314, 315<br />

erosion, 75, 283<br />

filtering, 79, 291, 322<br />

opening, 77, 78, 284, 305, 328<br />

morphology, 75, 82, 326<br />

mosaicing, 249<br />

motion blur, 259, 291<br />

Motion Picture Expert Group, 274<br />

multi illumination, 56<br />

multi-images, 46<br />

multi-position, 46<br />

multi-sensor, 46<br />

multi-spectral, 46<br />

multi-temporal, 46<br />

multiple path, 50<br />

Multispektralbild, 32<br />

Multispektrales Abtastsystem, 52<br />

Nachbarschaft, 38, 283<br />

nearest neighbor, 165, 251<br />

negative color, 91<br />

nicht-perspektive Kamera, 51, 53, 286<br />

node file, 251<br />

nodes, 251<br />

normal equation matrix, 164<br />

offset print, 107, 292<br />

one-point perspective, 172<br />

Operationen, algebraische, 40<br />

operator<br />

Marr-Hildreth, 293<br />

optische Dichte, 107, 324<br />

parallactic angle, 229<br />

parallax, 229<br />

parallel difference, 229<br />

Parametervektor, 164, 329<br />

parametrische Kurvendarstellung, 177<br />

paraphernalia, 48<br />

passive Radiometrie, 54, 289<br />

Passpunkte, 289<br />

Phong-Modell, 218, 299<br />

Phong-shading, 219, 287<br />

photo detector, 49<br />

photo-multiplier, 49<br />

photography<br />

negative, 337<br />

Photometric Stereo, 206<br />

pigments, 90<br />

pipeline, 265<br />

polarization, 230<br />

pose, 48, 56<br />

preprocessing, 120<br />

projection, oblique, 171<br />

projection, orthographic, 171<br />

Projektionen, planar, 172, 284<br />

prozedurale Texturen, 150, 289<br />

pseudo-color, 90, 105<br />

push-broom technology, 49<br />

Quadtree, 193, 296<br />

Radar, 54, 287<br />

Radiosity, 285<br />

radiosity, 222<br />

Rasterdarstellung, 35<br />

Rasterkonversion, 36, 284<br />

ratio imaging, 107<br />

Ratio-Bild, 113, 283<br />

R<strong>aus</strong>chen, kohärentes, 326<br />

ray tracing, 210, 291<br />

ray-tracing, 210<br />

Raytracing, recursive, 208, 298<br />

Rectangular Tree, 199<br />

Region, 38<br />

relative orientation, 232<br />

remote sensing, 52<br />

Resampling, 287<br />

resampling, 165, 291<br />

Resampling, geometrisches, 249, 285<br />

resolution, 45<br />

RGB-Farbmodell, 93, 95, 97, 289, 302, 305<br />

rigid body trans<strong>for</strong>mation, 168, 325<br />

ringing, 135<br />

Roberts-Operator, 134, 306<br />

rods, 45<br />

Rotation, 157, 303<br />

Sampling, 34<br />

Scannen, 50, 287<br />

scanning electron-microscopes, 55<br />

Schwellwert, 120, 323<br />

Schwellwertbild, 32<br />

Schwerpunkt, 73, 322<br />

screening, 88<br />

Segmentierung, 120, 323<br />

sensor<br />

non-optical, 233, 292<br />

sensor model, 46<br />

Sensor-Modell, 48, 289<br />

Shape-from-Focus, 206<br />

Shape-from-Shading, 206<br />

Shape-from-X, 207, 323<br />

sinc-filter, 128


354 INDEX<br />

Skalierung, 157, 303<br />

Sobel-Operator, 133, 299<br />

sound, navigation <strong>and</strong> range, 54<br />

spatial partitioning, 197, 289<br />

spatial-domain representation, 130<br />

spectral representation, 130<br />

Spektralraum, 147, 286<br />

Spiegelreflexion, 218, 284<br />

splitting, 190, 306<br />

spy photography, 116<br />

starrer Körper, 168, 325<br />

step <strong>and</strong> stare, 49<br />

Stereo, 230, 232, 285, 324<br />

Stereo, photometrisches, 207, 284<br />

stereo-method, 206<br />

stereopsis, 175, 233, 292<br />

Structured Light, 206<br />

structured light, 55<br />

Strukturelement, 82<br />

support, 137<br />

sweep, 195<br />

Sweeps, 195, 287<br />

View Plane Normal, 174<br />

view point, 232<br />

view point normals, 232<br />

View Reference Point, 174<br />

View-Frustum, 199<br />

Virtual Reality, 256, 285<br />

vitreous humor, 45<br />

volume element, 53<br />

voxel, 53<br />

Voxel-Darstellung, 285<br />

Wahrscheinlichkeitsdichtefunktion, 244, 304<br />

window, 127<br />

wire-frame, 194<br />

XOR, 40<br />

YIQ-Farbmodell, 96, 286, 327<br />

Zusammenhang, 38<br />

table-lens, 263<br />

template, 127<br />

texels, 147<br />

Textur, 147, 286<br />

Texture-Mapping, 285<br />

Tiefenunterschied, 229<br />

Tiefenwahrnehmungshilfen, 207, 323<br />

Tiefpassfilter, 135, 287<br />

total plastic, 231<br />

track, 55<br />

Tracking, 57, 256, 287<br />

trans<strong>for</strong>m<br />

medial axis, 68, 308<br />

Trans<strong>for</strong>mation, 157, 162, 252, 297, 299<br />

trans<strong>for</strong>mations<br />

con<strong>for</strong>m, 170, 292<br />

Trans<strong>for</strong>mationsmatrix, 157, 168, 299, 300,<br />

303, 323<br />

tri-chromatic coefficients, 91<br />

tri-stimulus values, 91<br />

trivial acceptance, 166<br />

Trivial rejection, 166<br />

undercolor removal, 95<br />

Unsharp Masking, 132, 284<br />

unsharp masking, 131<br />

US Air Force Resolution Target, 50<br />

vanishing point, 171<br />

Vektordarstellung, 35<br />

View Plane, 174


List of Algorithms<br />

1 Affine matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14<br />

2 Threshold image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />

3 Simple raster image scaling by pixel replication . . . . . . . . . . . . . . . . . . . . 42<br />

4 Image resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

5 Logical mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

6 Fast mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

7 Digital differential analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<br />

8 Thick lines using a rectangular pen . . . . . . . . . . . . . . . . . . . . . . . . . . . 72<br />

9 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

10 Halftone-Image (by means of a dither matrix) . . . . . . . . . . . . . . . . . . . . . 95<br />

11 Conversion from RGB to HSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104<br />

12 Conversion from HSI to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

13 Conversion from GRB to HSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106<br />

14 Conversion from HSV to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

15 Conversion from RGB to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108<br />

16 Conversion from HLS to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

17 hlsvalue(N1,N2,HLSVALUE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<br />

18 Masked negative of a color image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<br />

19 Histogram equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124<br />

20 Local image improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126<br />

21 Weighted Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143<br />

22 Gupta-Sproull-Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144<br />

23 Texture mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154<br />

24 Casteljau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186<br />

25 Chain coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195<br />

26 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196<br />

27 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198<br />

28 Creation of a BSP tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204<br />

355


356 LIST OF ALGORITHMS<br />

29 z-buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215<br />

30 Raytracing <strong>for</strong> Octrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217<br />

31 Gouraud shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226<br />

32 Phong - shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227<br />

33 Shadow map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227<br />

34 Implementation of Atheron-Weiler-Greeberg Algorithm . . . . . . . . . . . . . . . . 227<br />

35 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229<br />

36 Feature space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246<br />

37 Classification without rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247<br />

38 Classification with rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248<br />

39 Calculation with a node file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256<br />

40 Nearest neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257<br />

41 z-buffer pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272<br />

42 Phong pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272<br />

43 Pipeline <strong>for</strong> lossless compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276<br />

44 Pipeline <strong>for</strong> lossy compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277<br />

45 JPEG image compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281<br />

46 MPEG compression pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282


List of Definitions<br />

1 Amount of data in an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

2 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

3 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

4 Perspective camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

5 Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />

6 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

7 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

8 Open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83<br />

9 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83<br />

10 Morphological filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

11 Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />

12 Contour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87<br />

13 Conversion from CIE to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

14 CMY color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100<br />

15 CMYK color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

16 YIQ color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />

17 Histogram stretching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124<br />

18 Con<strong>for</strong>mal trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162<br />

19 Rotation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165<br />

20 2D rotation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166<br />

21 Sequenced rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167<br />

22 Affine trans<strong>for</strong>mation with 2D homogeneous coordinates . . . . . . . . . . . . . . . 168<br />

23 Bliniear trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169<br />

24 Rotation in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175<br />

25 Affine trans<strong>for</strong>mation with 3D homogeneous coordinates . . . . . . . . . . . . . . . 176<br />

26 Bezier-curves in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185<br />

27 2D morphing <strong>for</strong> lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197<br />

28 Wireframe structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200<br />

357


358 LIST OF DEFINITIONS<br />

29 Boundary representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202<br />

30 Cell-structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203<br />

31 Ambient light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223<br />

32 Lambert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224<br />

33 total plastic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237


List of Figures<br />

4.1 Morphologische Erosion als Abfolge Komplement→Dilation→Komplement . . . . . 82<br />

4.2 morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

5.1 Histogramm von Abbildung B.29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

5.2 eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

6.1 Histogramm eines Graukeils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127<br />

6.2 Histogramme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128<br />

7.1 Anwendung eines Median-Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135<br />

7.2 Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135<br />

7.3 Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138<br />

7.4 Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140<br />

9.1 rotated coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166<br />

9.2 Konstruktion einer Bezier-Kurve nach Casteljau . . . . . . . . . . . . . . . . . 187<br />

11.1 grafische Auswertung des z-Buffer-Algorithmus . . . . . . . . . . . . . . . . . . . . 216<br />

B.1 wiederholte Speicherung eines Bildes in verschieden Größen . . . . . . . . . . . . . 294<br />

B.2 dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt . . . . . 300<br />

B.3<br />

Überführung einer Vektorgrafik in eine <strong>and</strong>ere . . . . . . . . . . . . . . . . . . . . . 301<br />

B.4 Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale<br />

Bildfläche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301<br />

B.5 Pixelraster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302<br />

B.6 binäres Rasterbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302<br />

B.7 Tisch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303<br />

B.8 Inputbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303<br />

B.9 Die Verbindung zweier Pixel soll angenähert werden . . . . . . . . . . . . . . . . . 304<br />

B.10 Objekt bestehend <strong>aus</strong> zwei Flächen . . . . . . . . . . . . . . . . . . . . . . . . . . . 304<br />

B.11 Aufteilung des Primärstrahls bei ”<br />

recursive raytracing“ . . . . . . . . . . . . . . . 305<br />

359


360 LIST OF FIGURES<br />

B.12 Lineare Trans<strong>for</strong>mation M eines Objekts A in ein Objekt B . . . . . . . . . . . . . 305<br />

B.13 Anwendung des Sobel-Operators auf ein Grauwertbild . . . . . . . . . . . . . . . 306<br />

B.14 Anwendung eines Median-Filters auf ein Grauwertbild . . . . . . . . . . . . . . . . 306<br />

B.15 Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell . . . . . 307<br />

B.16 Grauwertbild als höchstauflösende Ebene einer Bildpyramide . . . . . . . . . . . . 308<br />

B.17 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308<br />

B.18 Anwendung des Clipping-Algorithmus von Cohen-Sutherl<strong>and</strong> . . . . . . . . . . 309<br />

B.19 Clipping nach Cohen-Sutherl<strong>and</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . 309<br />

B.20 Verbindung zweier Punkte nach Bresenham . . . . . . . . . . . . . . . . . . . . . 310<br />

B.21 Anwendung eines Gradientenoperators . . . . . . . . . . . . . . . . . . . . . . . . . 310<br />

B.22 Auffinden der Kantenpixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311<br />

B.23 R<strong>and</strong> einer Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311<br />

B.24 Boolsche Operationen auf Binärbildern . . . . . . . . . . . . . . . . . . . . . . . . 312<br />

B.25 Ermittlung der normalisierten Korrelation . . . . . . . . . . . . . . . . . . . . . . . 312<br />

B.26 Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau . . . 313<br />

B.27 allgemeine Rotation mit Skalierung . . . . . . . . . . . . . . . . . . . . . . . . . . . 313<br />

B.28 drei Merkmalsvektoren im zweidimensionalen Raum . . . . . . . . . . . . . . . . . 314<br />

B.29 digitales Grauwertbild (Histogramm gesucht) . . . . . . . . . . . . . . . . . . . . . 314<br />

B.30 leere Filtermasken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315<br />

B.31 morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315<br />

B.32 eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316<br />

B.33 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316<br />

B.34 Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317<br />

B.35 zweidimensionale Polygonrepräsentation . . . . . . . . . . . . . . . . . . . . . . . . 318<br />

B.36 Objekt und Kamera im Weltkoordinatensystem . . . . . . . . . . . . . . . . . . . . 319<br />

B.37 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319<br />

B.38 Anwendung der normalisierten Kreuzkorrelation . . . . . . . . . . . . . . . . . . . 320<br />

B.39 Anwendung der medial axis Trans<strong>for</strong>mation . . . . . . . . . . . . . . . . . . . . . . 320<br />

B.40 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321<br />

B.41 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 321<br />

B.42 Erstellen dicker Linien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322<br />

B.43 Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz 221000110077666434544345“323<br />

”<br />

B.44 Trans<strong>for</strong>mation von vier Punkten . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323<br />

B.45 Sub-Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324<br />

B.46 drei digitale Grauwertbilder und ihre Histogramme . . . . . . . . . . . . . . . . . . 325<br />

B.47 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326<br />

B.48 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326


LIST OF FIGURES 361<br />

B.49 morphologisches Schließen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326<br />

B.50 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 326<br />

B.51 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327<br />

B.52 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327<br />

B.53 Farbbildnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327<br />

B.54 überwachte Klassifikation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327<br />

B.55 Rechteck mit Störobjekten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328<br />

B.56 Pixelanordnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328<br />

B.57 Bild mit Störungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329<br />

B.58 Rasterdarstellung eines Objekts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330<br />

B.59 Grauwertbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330<br />

B.60 Trans<strong>for</strong>mationsmatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331<br />

B.61 Digitales Rasterbild mit zum R<strong>and</strong> hin abfallender Intensität . . . . . . . . . . . . 331<br />

B.62 Farbfilmnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332<br />

B.63 Vereinfachter Aufbau des bällefangenden Roboters auf der L<strong>and</strong>es<strong>aus</strong>stellung comm.gr2000az333<br />

B.64 Bild mit überlagertem kohärentem R<strong>aus</strong>chen . . . . . . . . . . . . . . . . . . . . . 334<br />

B.65 Alternative Berechnung der morphologischen Erosion . . . . . . . . . . . . . . . . . 335<br />

B.66 Foto mit geringem Kontrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336<br />

B.67 Morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336<br />

B.68 Abst<strong>and</strong>sberechnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337<br />

B.69 verschiedene Filteroperationen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338<br />

B.70 Zylinder mit koaxialer Bohrung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339<br />

B.71 Segmentierung eines Grauwertbildes . . . . . . . . . . . . . . . . . . . . . . . . . . 340<br />

B.72 Artefakte bei einem schachbrettartigen Muster . . . . . . . . . . . . . . . . . . . . 340<br />

B.73 Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild . . . . . . . 341<br />

B.74 automatische Kontrastverbesserung . . . . . . . . . . . . . . . . . . . . . . . . . . . 341<br />

B.75 unscharfe Kante in einem digitalen Grauwertbild . . . . . . . . . . . . . . . . . . . 342<br />

B.76 Histogramme von zwei verschiedenen Bildern . . . . . . . . . . . . . . . . . . . . . 343<br />

B.77 Torus mit Oberflächenstruktur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344<br />

B.78 Vergrößerung eines Bild<strong>aus</strong>schnittes unter Verwendung verschiedener Interpolationsverfahren<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348<br />

B.79 Vergrößerung eines Bild<strong>aus</strong>schnittes unter Verwendung verschiedener Interpolationsverfahren<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349<br />

B.80 BSP-Baum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350<br />

B.81 Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle 350


362 LIST OF FIGURES


Bibliography<br />

[FvDFH90] James D. Foley, Andries van Dam, Steven K. Feiner, <strong>and</strong> John F. Hughes. <strong>Computer</strong><br />

<strong>Graphics</strong>, Principles <strong>and</strong> Practice, Second Edition. Addison-Wesley, Reading,<br />

Massachusetts, 1990. Overview of research to date.<br />

[GW92]<br />

Rafael C. Gonzalez <strong>and</strong> Richard E. Woods. Digital Image Processing. Addison-Wesley,<br />

June 1992. ISBN 0-201-50803-6.<br />

363

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!