Lehrveranstaltungsinhalt aus - Institute for Computer Graphics and ...

Lehrveranstaltungsinhalt aus ” 

Bildanalyse und 

Computergrafik“ 

Franz Leberl 

28. Jänner 2002

Contents 

0 Introduction 11 

0.1 Using Cyber-Cities as an Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 11 

0.2 Introducing the Lecturer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

0.3 From images to geometric models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

0.4 Early Experiences in Vienna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

0.5 Geometric Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 

0.6 Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 

0.7 Modeling Denver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

0.8 The Inside of buildings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 

0.9 Built-Documentation Modeling the Inside of Things in Industry . . . . . . . . . . . 15 

0.10 Modeling Rapidly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

0.11 Vegetation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

0.12 Coping with Large Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

0.13 Non-optical sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 

0.14 The Role of the Internet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 

0.15 Two Systems for Smart Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 

0.16 International Center of Excellence for City Modeling . . . . . . . . . . . . . . . . . 20 

0.17 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

0.18 Telecom Applications of City Models . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

1 Characterization of Images 37 

1.1 The Digital Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 

1.2 The Image as a Raster Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 

1.3 System Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

1.4 Displaying Images on a Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

1.5 Images as Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

1.6 Operations on Binary Raster Images . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

1.7 Algebraic Operations on Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

3

4 CONTENTS 

2 Sensing 51 

2.1 The Most Important Sensors: The Eye and the Camera . . . . . . . . . . . . . . . 51 

2.2 What is a Sensor Model? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 

2.3 Image Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 

2.4 The Quality of Scanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 

2.5 Non-Perspective Cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 

2.6 Heat Images or Thermal Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

2.7 Multispectral Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 

2.8 Sensors to Image the Inside of Humans . . . . . . . . . . . . . . . . . . . . . . . . . 58 

2.9 Panoramic Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 

2.10 Making Images Independent of Sunlight and in Any Weather: Radar Images . . . . 59 

2.11 Making Images with Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

2.12 Passive Radiometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

2.13 Microscopes and Endoscopes Imaging . . . . . . . . . . . . . . . . . . . . . . . . . 61 

2.14 Objects-Scanners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 

2.15 Photometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 

2.16 Data Garments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 

2.17 Sensors for Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 

2.18 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 

3 Raster-Vector-Raster Convergence 69 

3.1 Drawing a straight line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 

3.2 Filling of Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 

3.3 Thick lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 

3.4 The Transition from Thick Lines to Skeletons . . . . . . . . . . . . . . . . . . . . . 73 

4 Morphology 79 

4.1 What is Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 

4.2 Dilation and Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

4.3 Opening and Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 

4.4 Morphological Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 

4.5 Shape Recognition by a Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . . 85 

4.6 Some Additional Morphological Algorithms . . . . . . . . . . . . . . . . . . . . . . 86

CONTENTS 5 

5 Color 93 

5.1 Gray Value Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 

5.2 Color images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 

5.3 Tri-Stimulus Theory, Color Definitions, CIE-Model . . . . . . . . . . . . . . . . . . 96 

5.4 Color Representation on Monitors and Films . . . . . . . . . . . . . . . . . . . . . 99 

5.5 The 3-Dimensional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 

5.6 CMY-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 

5.7 Using CMYK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

5.8 HSI-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 

5.9 YIQ-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 

5.10 HSV and HLS -Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 

5.11 Image Processing with RGB versus HSI Color Models . . . . . . . . . . . . . . . . 110 

5.12 Setting Colors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 

5.13 Encoding in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 

5.14 Negative Photography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 

5.15 Printing in Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 

5.16 Ratio Processing of Color Images and Hyperspectral Images . . . . . . . . . . . . . 113 

6 Image Quality 121 

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 

6.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 

6.3 Gray Value and Gray Value Resolutions . . . . . . . . . . . . . . . . . . . . . . . . 121 

6.4 Geometric Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 

6.5 Geometric Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 

6.6 Histograms as a Result of Point Processing or Pixel Processing . . . . . . . . . . . 123 

7 Filtering 133 

7.1 Images in the Spatial Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 

7.2 Low-Pass Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 

7.3 The Frequency Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 

7.4 High Pass-Filter - Sharpening Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 137 

7.5 The Derivative Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 

7.6 Filtering in the Spectral Domain / Frequency Domain . . . . . . . . . . . . . . . . 140 

7.7 Improving Noisy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 

7.8 The Ideal and the Butterworth High-Pass Filter . . . . . . . . . . . . . . . . . . . . 141 

7.9 Anti-Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 

7.9.1 What is Aliasing ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 

7.9.2 Aliasing by Cutting-off High Frequencies . . . . . . . . . . . . . . . . . . . . 142 

7.9.3 Overcoming Aliasing with an Unweightable Area Approach . . . . . . . . . 143 

7.9.4 Overcoming Aliasing with a Weighted Area Approach . . . . . . . . . . . . 143

6 CONTENTS 

8 Texture 151 

8.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 

8.2 A Statistical Description of Texture . . . . . . . . . . . . . . . . . . . . . . . . . . 151 

8.3 Structural Methods of Describing Texture . . . . . . . . . . . . . . . . . . . . . . . 152 

8.4 Spectral Representation of Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 

8.5 Texture Applied to Visualisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 

8.6 Bump Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 

8.7 3D Texture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 

8.8 A Review of Texture Concepts by Example . . . . . . . . . . . . . . . . . . . . . . 155 

8.9 Modeling Texture: Procedural Approach . . . . . . . . . . . . . . . . . . . . . . . . 155 

9 Transformations 161 

9.1 About Geometric Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 

9.2 Problem of a Geometric Transformation . . . . . . . . . . . . . . . . . . . . . . . . 161 

9.3 Analysis of a Geometric Transformation . . . . . . . . . . . . . . . . . . . . . . . . 162 

9.4 Discussing the Rotation Matrix in two Dimensions . . . . . . . . . . . . . . . . . . 165 

9.5 The Affine Transformation in 2 Dimensions . . . . . . . . . . . . . . . . . . . . . . 167 

9.6 A General 2-Dimensional Transformation . . . . . . . . . . . . . . . . . . . . . . . 169 

9.7 Image Rectification and Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 171 

9.8 Clipping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 

9.8.1 Half Space Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 

9.8.2 Trivial acceptance and rejection . . . . . . . . . . . . . . . . . . . . . . . . . 172 

9.8.3 Is the Line Vertical? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 

9.8.4 Computing the slope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 

9.8.5 Computing the Intersection A in the Window Boundary . . . . . . . . . . . 172 

9.8.6 The Result of the Cohen-Sutherland Algorithm . . . . . . . . . . . . . . . . 173 

9.9 Homogeneous Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 

9.10 A Three-Dimensional Conformal Transformation . . . . . . . . . . . . . . . . . . . 174 

9.11 Three-Dimensional Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . 176 

9.12 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 

9.13 Vanishing Points in Perspective Projections . . . . . . . . . . . . . . . . . . . . . . 177 

9.14 A Classification of Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 

9.15 The Central Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 

9.16 The Synthetic Camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 

9.17 Stereopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 

9.18 Interpolation versus Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 182 

9.19 Transforming a Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

CONTENTS 7 

9.19.1 Presenting a Curve by Samples and an Interpolation Scheme . . . . . . . . 182 

9.19.2 Parametric Representations of Curves . . . . . . . . . . . . . . . . . . . . . 183 

9.19.3 Introducing Piecewise Curves . . . . . . . . . . . . . . . . . . . . . . . . . . 183 

9.19.4 Rearranging Entities of the Vector Function Q . . . . . . . . . . . . . . . . 183 

9.19.5 Showing Examples: Three methods of Defining Curves . . . . . . . . . . . . 184 

9.19.6 Hermite’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 

9.20 Bezier’s Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 

9.21 Subdividing Curves and Using Spline Functions . . . . . . . . . . . . . . . . . . . . 185 

9.22 Generalization to 3 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 

9.23 Graz and Geometric Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 

10 Data Structures 195 

10.1 Two-Dimensional Chain-Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 

10.2 Two-Dimensional Polygonal Representations . . . . . . . . . . . . . . . . . . . . . 196 

10.3 A Special Data Structure for 2-D Morphing . . . . . . . . . . . . . . . . . . . . . . 197 

10.4 Basic Concepts of Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 

10.5 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 

10.6 Data Structures for Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 

10.7 Three-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 

10.8 The Wire-Frame Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 

10.9 Operations on 3-D Bodies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 

10.10Sweep-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 

10.11Boundary-Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 

10.12A B-Rep Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 

10.13Spatial Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 

10.14Binary Space Partitioning BSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 

10.15Constructive Solid Geometry, CSG . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 

10.16Mixing Vectors and Raster Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 

10.17Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 

11 3-D Objects and Surfaces 211 

11.1 Geometric and Radiometric 3-D Effects . . . . . . . . . . . . . . . . . . . . . . . . 211 

11.2 Measuring the Surface of An Object (Shape from X) . . . . . . . . . . . . . . . . . 211 

11.3 Surface Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 

11.4 Representing 3-D Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 

11.5 The z-Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 

11.6 Ray-tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 

11.7 Other Methods of Providing Depth Perception . . . . . . . . . . . . . . . . . . . . 218

8 CONTENTS 

12 Interaction of Light and Objects 223 

12.1 Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 

12.2 Reflections from Polygon Facets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 

12.3 Shadows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 

12.4 Physically Inspired Illumination Models . . . . . . . . . . . . . . . . . . . . . . . . 228 

12.5 Regressive Ray-Tracing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 

12.6 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 

13 Stereopsis 235 

13.1 Binokulares Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 

13.2 Stereoskopisches Sehen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 

13.3 Stereo-Bildgebung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 

13.4 Stereo-Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 

13.5 Non-Optical Stereo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 

13.6 Interactive Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 

13.7 Automated Stereo-Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 

14 Classification 245 

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 

14.2 Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 

14.3 Features, Patterns, and a Feature Space . . . . . . . . . . . . . . . . . . . . . . . . 246 

14.4 Principle of Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 

14.5 Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 

14.6 Supervised Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 

14.7 Real Life Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 

14.8 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 

15 Resampling 255 

15.1 The Problem in Examples of Resampling . . . . . . . . . . . . . . . . . . . . . . . 255 

15.2 A Two-Step Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 

15.2.1 Manipulation of Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . 256 

15.2.2 Gray Value Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 

15.3 Geometric Processing Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 

15.4 Radiometric Computation Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 

15.5 Special Case: Rotating an Image by Pixel Shifts . . . . . . . . . . . . . . . . . . . 258

CONTENTS 9 

16 About Simulation in Virtual and Augmented Reality 261 

16.1 Various Realisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 

16.2 Why simulation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 

16.3 Geometry, Texture, Illumination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 

16.4 Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 

16.5 Virtual Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 

17 Motion 265 

17.1 Image Sequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 

17.2 Motion Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 

17.3 Detecting Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265 

17.4 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 

18 Man-Machine-Interfacing 269 

18.1 Visualization of Abstract Information . . . . . . . . . . . . . . . . . . . . . . . . . 269 

18.2 Immersive Man-Machine Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . 269 

19 Pipelines 271 

19.1 The Concept of an Image Analysis System . . . . . . . . . . . . . . . . . . . . . . . 271 

19.2 Systems of Image Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 

19.3 Revisiting Image Analysis versus Computer Graphics . . . . . . . . . . . . . . . . . 272 

20 Image Representation 275 

20.1 Definition of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 

20.1.1 Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 

20.1.2 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 

20.1.3 Progressive Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 

20.1.4 Animation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 

20.1.5 Digital Watermarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 

20.2 Common Image File Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 

20.2.1 BMP: Microsoft Windows Bitmap . . . . . . . . . . . . . . . . . . . . . . . 278 

20.2.2 GIF: Graphics Interchange Format . . . . . . . . . . . . . . . . . . . . . . . 278 

20.2.3 PICT: Picture File Format . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 

20.2.4 PNG: Portable Network Graphics . . . . . . . . . . . . . . . . . . . . . . . 279 

20.2.5 RAS: Sun Raster File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 

20.2.6 EPS: Encapsulated PostScript . . . . . . . . . . . . . . . . . . . . . . . . . 279 

20.2.7 TIFF: Tag Interchange File Format . . . . . . . . . . . . . . . . . . . . . . 279 

20.2.8 JPEG: Joint Photographic Expert Group . . . . . . . . . . . . . . . . . . . 280 

20.3 Video File Formats: MPEG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 

20.4 New Image File Formats: Scalable Vector Graphic - SVG . . . . . . . . . . . . . . 281

10 CONTENTS 

A Algorithmen und Definitionen 285 

B Fragenübersicht 289 

B.1 Gruppe 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 

B.2 Gruppe 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 

B.3 Gruppe 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

Chapter 0 

Introduction 

0.1 Using Cyber-Cities as an Introduction 

We introduce the subject of “digital processing of visual information”, also denoted as “digital 

image processing” and “computer graphics”. We introduce the subject by means of one particular 

application, namely 3D computer modelling of our cities. This is part of the wider topic of the so 

called “virtual habitat”. “Modelling cities”, what do we mean by that? The example in Slide 0.7 

shows a traditional representation of a city, in this particular example the “Eisene Tor” in Graz. 

In two dimensions we see the streetcar tracks, we see the Mariensäule, buildings and vegetation. 

This is the status quo of current urban 2-D computer graphics. 

The new approach is to represent this in three dimension as shown in Slide 0.8. The two dimensional 

map of the city is augmented to include the third dimension, thus the elevations, and in 

order to render, represent or visualise the city we add photographic texture to create as realistic 

a model of the city as possible. Once we have that we can stroll through the city, we can inspect 

the buildings, we can read the signs and derive from them what’s inside the buildings. 

The creation of the model for this city is a subject of “image processing”. The rendering of the 

model is the subject of “computer graphics”. These two belong together and constitute a field 

denoted as “digital processing of visual information”. 

The most sophisticated recent modelling of a city was archieved of a section of Philadelphia. This 

employed a software called “Microstation” and was done by hand with great detail. In this case 

this detail includes vegetation, the virtual trees, waterfountains and people. I am attempting 

here to illustrate the concepts of “computer graphics” and “image processing” by talking about 

Cyber-Cities, namely how to create them from sensor data and how to visualise them. And this 

is the subject of this introduction. 

0.2 Introducing the Lecturer 

Before we go into the material, permit me to introduce myself. I have roots both in Graz and 

in Boulder (Colorado, USA). My affiliations are since 1992 with the Technische Universität Graz, 

where I am a Professor of Computer Vision and Graphics. But I am also affiliated with a company 

in the United States since 1985 called Vexcel Corporation. In both places, the Vexcel Corporation 

and the University, cyber-cities play a role in the daily work. Vexcel Corporation in the US 

operates in four technical fields: 

1. It builds systems to process radar images 

11

12 CHAPTER 0. INTRODUCTION 

2. It deals with satellite receiving stations, to receive large quantities of images that are transmitted 

from satellites 

3. It deals with close range photogrammetry for “as-built” documentation and 

4. It deals with images from the air 

Slide 0.19 is an example showing a remote sensing satellite ground receiving station installed 

in Hiroshima (Japan), carried on a truck to be moveable. Slide 0.20 shows a product of the 

Corporation, namely a software package to process certain radar-images interferometrically. We 

will towards the end of this class, talk quickly about this interferometry. What you see in Slide 

0.20 are interferometric “fringes” obtained from images, using a phase differences between the two 

images. The fringes indicate the elevation of the terrain, in this particular case Mt. Fuji in Japan. 

Another software package models the terrain and renders realistically looking images by superimposing 

the satellite images over the shape of the terrain with its mountains and valleys. Slide 

0.22 shows another software package to convert aerial photography to so called “ortho-photos”, 

a concept we will explain later in this class. Then we have an application, a software package 

called Foto-G, which supports the modelling of existing plants performing a task called “as builtdocumentation”. 

You take images of a facility or plant, extract from the image geometry the 

location and dimensions of pipes and valves, and obtain in a “reverse engineering mode” so called 

CAD (computer-aided-design) drawings of the facility. 

0.3 From images to geometric models 

We proceed to a serious of sub-topics to discuss the ideas of city-modeling. I would like to convey 

an idea of what the essence is of “digital processing of visual information”. What we see in Slide 

0.25 is on the left part of an aerial photograph of a new housing development and on the right 

we see information extracted from the image of the left using a process called “stereoscopy”, 

representing the small area that is marked in red on the right side. We are observing here a 

transition from images of an object to a model of that object. 

Such images as in Slide 0.26 show so-called “human scale objects” like buildings, fences, trees, 

roads. But images may show our entire planet. There have been various projects in Graz to 

address the extraction of information from images in there is a burdle of problems available as 

topic for a Diplomarbeit or a Dissertation to address the optimum geometric scale and geometric 

resolution needed for a specific task at hand. If I want to model a building, what is the required 

optimum image resolution? We review in Slide 0.29 the Down-town of Denver at 30 cm per pixel. 

Slide 0.30 is the same Down-town at 1.20 m per pixel. Finally in Slide 0.31 we have 4 meters per 

pixel. Can we map the buildings and which accuracy can we get in mapping them? 

0.4 Early Experiences in Vienna 

Our Institute at the Technical University in Graz got involved in city-modelling in 1994 when we 

got invited by the Magistrat of Vienna to model a city block consisting of 29 buildings inside the 

block and another 25 buildings surrounding the block. The block is defined by the 4 streets in the 

7th district in Vienna. The work was performed by 2 students in two diploma theses and the initial 

results were of course a LEGO-type representation of each building. The building itself can not 

be recognised, as seen in the example of a generic building. It can be recognised only if we apply 

the photographic texture. We can take this either from a photograph taken from a street level or 

from aerial photography taking from an airplane. The entire cityblock was modelled but a cause 

that some photographic texture was missing. Particularly the photographic texture was missing

0.5. GEOMETRIC DETAIL 13 

in the courtyards and so they shown black or grey here. When this occurs, the representation is 

without photographic texture, and is instead in the form of a flat shaded representation. 

Slide 0.37 looks at the roof scape and we see that perhaps we should model the chimneys as shown 

here. However, the skylights were not modeled. What can we do with these data? We can walk or 

fly through the cities. We can assess changes for example by removing a building and replacing 

it by a new one. We call this “virtual reality”, but scientists often prefer the expression “virtual 

environment”, since “virtual” and “reality” represent a contradiction in terms. This differs of 

course from photographic reality, which is more detailed and more realistic by showing great 

geometric detail, showing wires, dirt on the road, cars, the effect of weather. There is yet another 

type of reality, namely “physical reality”, when we are out there in a city and we feel the wetness 

in our shoes, we feel the cold in the air, we hear the noise of birds, the screeching of cars. So we 

see various levels of reality: physical, photographic and virtual reality. 

0.5 Geometric Detail 

What geometric detail do we need when we model a city? Lets take the example of a roof. Slide 

0.44 is a roofshape extracted for the Vienna example, We have not applied to the roof photographic 

texture, but instead some generic computer texture. We will talk later of course about texture 

and I will try to explain different types of texture for use in rendering for computer graphics. If 

we apply this kind of generic texture we loose all information about the specific characteristics 

of this roof. What we would like to have is the roof shown with chimneys. Maybe we need 

skylights as well far the fire-guard in order to direct people to an exit through the roof in the case 

of a catastrophy. There is a topic here for a Diplomarbeit and Dissertation theme to study the 

amount of geometric detail needed in the presence of photographic texture: the trade-off between 

photographic texture and geometry detail. To illustrate this further let us take a look at the same 

roof with its skylights and chimneys and now use photographic texture to illustrate how this roof 

looks like. If we take photographic texture, and if we have some chimneys, and if we render this 

roof from another perspective than that from which the photograph was taken, the chimneys will 

look very unnatural. So we need to do some work and create the geometric model of the chimneys. 

If we employ that model and we now superimpose the photographic texture over it, we see that 

we have sunshine casting shadows and we have certain areas of the roof that are covered by pixels 

from the shadows left by the chimneys. If the sunshine is from another side, say in the morning, 

but the picture was taken in the afternoon, we have wrong shadows. So we need to fix this by 

eliminating the shadows in the texture. We introduce the shadow in a proper rendering by a 

computation. We also need to fill in those pixels that are covered by the perspective distortion of 

the chimneys, and use generic pixels of the roof to fill in the areas where no picture exists. Slide 

0.50 is the final result: we have removed the shadow, we have filled in the pixels. We now have 

the best representation of that roof with its chimneys and we can render this now correctly in the 

morning and in the afternoon, with rain or with sunshine. 

0.6 Automation 

All of this modeling of cities is expensive, because it is based on manual work. In order to 

reduce the cost of creating such models one needs to automate their creation. Automation is a 

large topic and is available for many Diplomarbeiten and many Dissertations. Let me illustrate 

automation for about our city-models in Graz. There already exist 2-dimensional descriptions 

so the task of automating here is to achieve the transition from two to three dimensions. Slide 

0.52 is a two-dimensional so-called geographic information system (GIS) of a certain area around 

the Schlossberg in Graz. Lets take a look at this particular building in Slide 0.53. We have a 

total of five aerial photographs, 3 of them are shown of that particular building in Slide 0.54.


The five photographs can be converted into so called edge images, a classical component of image 

processing. There are topics hidden here for more Diplomarbeiten and Dissertationen. 

We also convert an input GIS data into an output edge image. This edge image from the GISvectors 

can now be the basis for a match between these five edge images and the two dimensional 

GIS image. They will not fit, because those edges of the roof as shown here are elevated and 

therefore perspectively distorted as the other polygon is the representation of the footprint of the 

building. 

Algorithm 1 Affine matching 

1: Read in and organize one or more digital photos with their camera information 

2: Compute an egde image for each of the photos 

3: Read in and organize the polygons of each building footprint 

4: Project the polygon into each photo’s edge image 

5: Vector-raster convert the polygon in each edge image, creating a polygon image 

6: Compute a distance transform for each polygon image 

7: repeat 

8: Compute the distance between each edge image and its polygon image using the distance 

transform 

9: Change the geometry of the polygon image 

10: until distance no longer gets reduced 

There is a process called “affine matching” which allows to match the edge images computed from 

the aeriaphotos and the representation which originally was a vector data structure. Affine matching 

is a Graz innovation: To match two different data structures namely raster and vector, which 

in addition are geometrically different, is the purpose of affine matching the footprint of the house 

is in an orthographic projection, with the roofline of the house in a central perspective projection. 

Affine matching overcomes these differences and finds the best possible matches between the data 

structures. The result in Slide 0.58 shows how the footprint was used to match the roofline of the 

building using this affine matching technique. The algorithm itself is rather simple described (see 

Algorithm 1). Now, the same idea of matching vectors with images is shown in the illustration 

of Slide 0.59 where we see in yellow the primary position of a geometric shape, say typically the 

footprint, and in red is the roofline. We need to match the roofline with the footprint. Slide 0.60 

is another example of these matches, and Slide 0.61 is the graphic representation of the roofline. 

0.7 Modeling Denver 

We talk about a method to model all the buildings of a city like Denver (Colorado, USA). This 

is an aerial photographic coverage of the entire city. Slide 0.63 is the down town area of Denver. 

From overlapping aerial photographs we can automatically create a digital elevation model (DEM) 

by a process called stereo matching. A DEM is a representation of the z-elevation to each (x, y) 

at a regular grid mesh of points. So we have a set of regularly space of (x, y) locations where we 

know the z-value of the terrain. We invite everybody to look into a Diplomarbeit or Dissertation 

topic of taking this kind of digital elevation model and create from what it is called the “Bald 

Earth”. One needs to create a filter which will take the elevation model and erase all the trees 

and all the buildings, so that the only thing that is left is the Bald Earth. What is being “erased” 

are towers, trees, buildings. That process needs an intelligent low-pass-filter. We will talk about 

low-pass-filters later in this class. Slide 0.67 is the result a so called Bald Earth DEM (das DEM 

der kahlen Erde). The difference between the two DEMs, namely the Bald Earth DEM and the 

full DEM is of course the elevation of the vertical objects that exist on top of the Bald Earth. 

These are the buildings, the cars, the vegetation. This is another topic one could study. Now we 

need to look at the difference DEM and automatically extract the footprints of buildings. We can

0.8. THE INSIDE OF BUILDINGS 15 

do that by some morphological operations, where we will close the gaps straighten the edges of 

buildings, then compute the contours of the buildings. Finally we obtain the buildings and place 

them on top of the Bald Earth. 

When we have done that, we can now superimpose over the geometric shapes of building “boxes” 

(the box-models) the photographic texture. We get a photorealistic model of all of Denver, all 

generated automatically from aerial photographs. There exist multiple views of the same area of 

Denver. 

0.8 The Inside of buildings 

City models are not only a subject of the outside of buildings, but also of their inside. Slide 

0.74 is the Nationalbibliothek in Vienna, in which there is a Representation Hall (Prunksaal). If 

one takes the architect’s drawings of that building, one can create a wire mesh representation as 

illustrated in Slide 0.75, consisting of arcs and nodes. We can render this without removal of the 

hidden surfaces and hidden lines to obtain this example. 

We can go inside this structure, take pictures and use photographic texture to photo-realistically 

render the inside of the Prunksaal in a manner that a visitor to the Prunksaal will never see. We 

can not fly into the Prunksaal like a bird. We can also see the Prunksaal in the light that computer 

rendering permits us to create. We can even go back a hundred years and show the Prunksaal at it 

was a hundred years ago, before certain areas were converted into additional shelf-space for books. 

There is a Diploma- and Dissertation-topic hidden in developments to produce images effectively 

and efficiently inside a building. An example is shown in Slide 0.80 and Slide 0.81 of the ceiling, 

imaging it efficiently in all its detail and colorful glory. 

Yet another subject is how to model objects inside a room like this statue of emperor Charles VI. 

He is shown in Slide 0.82 a triangulated mesh created from a point cloud. We will talk a little bit 

about triangulated meshes later. Slide 0.82 is based on 20.000 points that are triangulated in a 

non-trivial process. Slide 0.83 is a photo-realistic rendering of the triangulated point cloud, with 

each triangle being superimposed by the photographic texture that was created from photographs. 

A good scientific topic for Diplomarbeiten of Dissertationen is the transition from point clouds 

to surfaces. A non-trivial problem exists when we look at the hand of the emperor. We need to 

make sure to connect points in the triangles that should topologically be connected. And we do 

not want the emperor to have hands like the feet of a duck. 

0.9 Built-Documentation Modeling the Inside of Things in 

Industry 

There exists not only cultural monuments, but also industrial plants. This goes back to that idea 

of “inverse” or “reverse engineering” to create drawings of a facility of a building for example, of 

a refinery. The refinery may have been built 30 or 40 years ago and the drawings are no longer 

available, since there was no CAD at that time. We take pictures of the inside of a building, using 

perhaps thousands of pictures. We re-establish relationships between the pictures. We need to 

know from were they are taken. One picture overlaps with another picture. Which pictures show 

the same objects and which do not? That is done by developing this graph in Slide 0.89. Each 

node of the graph is “a postage stamp” of the picture and the arcs between these nodes describe 

the relationship. If there is no arcs then there is no relationship. Any images can be called up 

on a monitor. Also pairs of images can be set up. We can point to a point on one image and 

a process will look for the corresponding point in the other overlapping image or images. The 

three dimensional location of the point we have pointing at in only one image will be shown in 

the three dimensional rendering of the object. So again, “from image to objects” means in this


case “reverse engineering” or “as-built-documentation”. Again there are plenty of opportunities 

for research and study in the area of automation of all these processes. 

A classical topic is the use of two pictures of a some industrial structure to find correspondences 

of the same object in both images without any knowledge about the camera or object. By eye we 

can point to the same feature in two images, but this is not trivial to do by machine if we have 

no geometrie relationships established between the two images that would limit the search areas. 

One idea is to find many candidates of features in both images and than determine by some logic 

which of those features might be identical. So we find one group of features in one image, and 

another group in the other image. Then we decide which points or objects belong together. The 

result is shown as highlighted circles. 

A similar situation is illustrated in Slide 0.95, however with test targets to calibrate a camerasystem 

for as-built-documentation. We automatically extract all the test objects (cercles) from 

the images. We can see a three dimensional pattern of these calibration targets in Slide 0.96 and 

Slide 0.97. 

Now the same approach can also be applied to the outside of buildings as shown in Slide 0.98 with 

three photographs of a railroad-station. The three images are input to an automatic algorithm to 

find edges, the edges get pruned and reduced so that we are only left with significant edges that 

represent windows, doors, awnings and the roofline of the building. This of course can also be 

converted into three dimensions. There is yet another research topic, namely “automated mapping 

of geometric details of facades”. Slide 0.100 and Slide 0.101 are the three dimensional renderings 

of those edges that are found automatically in 3-D. 

0.10 Modeling Rapidly 

We not only want to create these data at a low cost, we also want to get them rapidly. Slide 

0.103 is an example: a village as been imaged from a helicopter with a handheld camera, looking 

out to the horizon we appreciate an oblique, panoramic image. “Give us a model of that village 

tomorrow” may be the task. Particularly when it concerns catastrophies, disasters, military or 

anti-terror operations and so forth. The topic which is hidden here is that these photos were 

not taken with a well-controlled camera but accidentally and hastily from a helicopter and with 

an average amateur camera. The research topic here is the “use of uncalibrated cameras”. A 

wire-mesh representation of the geometry can be created by a stereo process. We can then place 

the buildings on top of the surface much like in the Denver-example discussed earlier and we can 

render it in a so-called flat-shaded representation. We can now look at it, navigate in the data set, 

but this is not visually as easy to interpret as it would be if we had photography super-imposed, 

which is the case in Slide 0.109 and Slide 0.110. Now we can rehearse an action needed because of 

a catastrophy or because of a terrorist attack in one of those buildings. We can fly around, move 

around and so forth. 

0.11 Vegetation 

“Vegetation” is a big and important topic in this field. Vegetation is difficult to map, difficult 

to render and difficult to remove. Vegetation as in the Graz-example, may obscure facades. If 

we made pictures to map the buildings and to get the photographic texture, then these trees, 

pedestrians and cars are a nuisance. What can we do? We need to eliminate the vegetation, and 

this is an interesting research topic. The vegetation is eliminated with a lot of manual work. How 

can we automate that? There are ways and ideas to automate this kind of separation of objects 

that are at a different depth from the viewer using multiple images.

0.12. COPING WITH LARGE DATASETS 17 

Using vegetation for rendering, like in the picture of the Schloßberg of Slide 0.115, is not trivial 

either. How do we model vegetation in this virtual habitat? The Schloßberg example is based on 

vegetation that is photographically collected and then pasted onto flat surfaces that are mounted 

on tree trunks. This is acceptable for a still image like Slide 0.117, but if we have some motion, 

then vegetation produces a very irritating effect, because the trees move as we walk by. Another 

way, of course, is to really have a three dimensional rendering of a tree, but they typically are 

either very expensive or they look somewhat artificial, like the tree in the example of Slide 0.118. 

Vegetation rendering is thus also an important research topic. 

0.12 Coping with Large Datasets 

We have a need to cope with large data sets in the administration, rendering and visualization 

of city data. The example of modeling Vienna with its 220,000 buildings in real-time illustrates 

the magnitude of the challenge. Even if one compresses the 220,000 individual buildings into 

20,000 “blocks”, thus on average combining 10 buildings into a single building block, one still has 

to cope with a time-consuming rendering effort that is not possible to achieved in real-time. A 

recent doctoral thesis by M. Kofler (1998) reported on algorithms to accelerate the rendering on 

an unaided computer by the factor of 100, simply by using an intelligent data structure. 

If the geometric data are augmented by photographic texture, then the quantity of data gets even 

more voluminous. Just assume that one has 220,000 individual buildings consisting of 10 facades 

each, each facade representating roughly 10m × 10m, photographic texture at a resolution of 

5cm × 5cm per pixel. You are invited to compute the quantity of data that results from this 

consideration. 

Kofler’s thesis proposed a clever data structure called “LOD/R-tree”. “LOD” stands for level 

of detail, and R-tree stands for rectangular tree. The author took the entire city of Vienna and 

defined for each building a rectangle. These are permitted to overlap. In addition, separate 

rectangles represent a group of buildings, even the districts are represented by one rectangle each. 

Actually, the structure was generalized to 3D, thus we are not dealing with rectangles but with 

cubes. 

Now as this is being augmented by photographic texture one needs to select the appropriate data 

structure, to be super-imposed over the geometry. As one uses the data one defines the so-called 

“Frustum” as the intanstaneous cone-of-view. At the front of the viewing cone one has high 

resolution, whereas in the back one employs low resolution. The idea is to store the photographic 

texture and the geometry at various levels of detail and then call up those levels of detail that 

are relevant, at a certain of distance to the viewer. This area of research is still rapidly evolving 

and “fast visualization” is therefore another subject of on-going research for Diplomarbeiten and 

Dissertationen. The actual fly-over of Vienna using the 20,000 building blocks in real-time is now 

feasible on a regular personal computer producing about 10 to 20 frames per second as opposed 

to 10 seconds per frame prior to the LOD/R-tree data structure. Slide 0.129 and Slide 0.130 are 

two views computed with LOD/R-tree. The same LOD/R-tree data structure can also be used to 

fly over regular DEMs - recall that these are regular grids in (x, y) to which a z-value is attached 

at each grid intersection to represent terrain elevations. These meshes are then associated with 

photographic texture as shown in three segmential views. We generally call this “photorealistic 

rendering of outdoor environments”. 

Another view of a Digital Elevation Model (DEM), super-imposed with a higher resolution aerial 

photograph, is shown in Slide 0.135 and Slide 0.136.


0.13 Non-optical sensing 

Non-photographic, therefore non-optical, sensors could be used for city modeling. Recall that 

we model cities from sensor data and then we render cities using the models as input and we 

potentially augment those by photographic texture. Which non-optical sensors can we typically 

consider? A first example is radar imagery. We can use imagery taken with microwaves at 

wavelengths between 1 mm to 25 cm or so. That radiation penetrates fog, rain, clouds and is 

thus capable of “all-weather” operations. The terrain is illuminated actively like with a flash 

light supporting a “day & night” operation. An antenna transmits microwave radiation, this gets 

reflected on the ground, echoes are coming back to the antenna which is now switched to receive. 

We will discuss radar imaging in a later section of this class. Let’s take a look at two images. One 

image of Slide 0.138 has the illumination from the top, the other has the illumination from the 

bottom. Each image point or pixel covers 30 cm × 30 cm on the ground representing a geometric 

resolution of 30 cm. Note that the illumination causes shadows to exist and how the shadows fall 

differently in the two images. 

The radar images can be associated with a direct observation of the digital elevation of the terrain. 

Slide 0.139 is an example associated with the previous two images of the area of the Sandia 

Research Laboratories in Albuquerque (New Mexico, USA). About 6,000 people work at Sandia. 

The individual buildings are shown in this dataset, which is in it-self rather noisy. But it becomes 

a very powerful dataset when it is combined with the actual images. We have found here a 

non-stereo way of directly mapping the shape of the Earth in three dimensions. 

Another example with 30 cm × 30 cm pixels is a small village, the so-called MOUT site (Military 

Operations in Urban Terrain). Four looks from the four cardinal directions show shadows and 

other image phenomena that are different to understand and are subject of later courses. We will 

not discuss those phenomena much further in this course. Note simply that we have four images 

of one and the same village and those phenomena in the four images look very different. Just 

study those images in detail and consider how shadows fall, how roofs are being imaged and note 

in particular one object, namely a church as marked. This church can be reconstructed using 

eleven measurements. There are about 47 measurements one can take from those four images, 

so that we have a set of redundant observations of these dimensions to describe the church. The 

model of the church is shown in Slide 0.141, and is compared to an actual photograph of the same 

church in Slide 0.142. This demonstrates that one can model a building not only from optical 

photography, but from various types of sensor data. We have seen radar images in combination 

with interferometry. There is a ample opportunity to study “Building re-construction from radar 

images” in the form of Diploma and Doctoral thesis. 

Another sensor is the laser scanner. Slide 0.144 is an example of a laser scanner result from downtown 

Denver. How does a laser scanner operate? An airplane carries a laser device. It shoots a 

laser ray to the ground. It gets reflected and the time it takes to do the roundtrip is measured. If 

there is an elevation the roundtrip time is shorter than if there is a depression. The direction into 

which the laser “pencil” looks rapidly changes from left-to-right to create a “scanline”. Scanlines 

are being added up by the forward motion of the plane. The scanlines accrue into an elevation 

map of the ground. 

The position of the airplane itself is determined using a Global Positioning System which is 

carried on the airplane. The position might have a systematic error. But by employing a second 

simultaneously observed GPS position on the ground one will really observe the relative motion 

between the airplane GPS and the stationary GPS platform on the ground. This leads to a position 

error in the cm-range for the airplane and to a very small error in the cm range for the distance 

between the airplane in the ground. Laser measurements are a very hot topic in city modeling, 

and there are advantages as well as disadvantages vis-a-vis building models from images. To study 

this issue could be a subject of Diploma and Doctoral thesis. 

Note that as the airplane flies along, only a narrow strip of the ground gets mapped. In order to 

cover a large area of the ground one has to combine individual strips. Slide 0.147 illustrates how

0.14. THE ROLE OF THE INTERNET 19 

the strips need to be merged and how any discrepancies between those strips, particularly in their 

overlaps, need to be removed by some computational measure. In addition, one needs to know 

points on the ground with their true coordinates in order to remove any uncertainties that may exist 

from the airplane observations. So finally we have a matched, merged, cleaned-up data set and we 

now can do the same thing that we did with the DEM from aerial photography, namely we merge 

the elevation data obtained from the laser scanner with potentially simultaneously collected video 

imagery, also taken from that same airplane: We obtain a laser scan and phototexture product. 

0.14 The Role of the Internet 

It is of increasing interest to look at a model of a city from remote locations. An example is the 

so-called “armchair tourism”, vacation planning and such. Slide 0.152 is an example of work done 

for a regional Styrian tourism group. They contracted to have a mountain-biking trail advertised 

on the Internet using a VRML model of the terrain. Shown is in Slide 0.153 a map near Bad 

Mitterndorf in Styria and a vertical view of a mountain-biking trail. Slide 0.154 is a perspective 

view of that mountain-bike trail super-imposed onto a digital elevation model that is augmented 

by photographic texture obtained from a satellite. This is actually available today via the Internet. 

The challenge is to compress the data without significant loss of information and to offer that 

information via the Internet at attractive real-time rates. Again Diploma and Doctoral thesis 

topics could address the Internet and how it can help to transport more information faster and in 

more detail and of course in all three dimensions. 

Another example of the same idea is an advertisement for the Grazer Congress on the Internet. 

The Grazer Congress’s inside was to be viewable to far away potential organizers of conferences. 

They obtain a VRML view of the various inside spaces. Because of the need to compress those 

spaces, the data are geometrically very simple, but they carry the actual photographic texture 

that is available through photographs taken at the inside of the Grazer Congress. 

The Internet is a source of a great variety of image information, an interesting variation of the city 

models relates to the so-called “orthophoto”, namely photographs taken from the air or from space 

that are geometrically corrected to take on the geometry of a map. The example of Slide 0.158 

shows the downtown of Washington D.C. with the U.S. Capitol (where the parliament resides). 

This particular web site is called “City Scenes”. 

0.15 Two Systems for Smart Imaging 

We already talked about imaging by regular cameras, by radar or non-imaging sensing and by 

laser. Let’s go a step further: specific smart sensing developed for city mapping. As part of a 

doctoral thesis in Graz a system was developed to be carried on the roof of a car with a number 

of cameras that allow one to reconstruct the facades of buildings in the city. Images are produced 

by driving with this system along those buildings. At the core of the system is a so-called linear 

detector array consisting of 6,000 CCD elements in color. These elements are combined with two 

or three optical systems, so that 3,000 elements are exposed through one lens and another 3,000 

elements through another lens. By properly arranging the lenses and the CCDs one obtains a 

system, whereby one lens collects a straight line of the facade looking forward and the other lens 

collects a straight line either looking backwards or looking perpendicular at the building. 

In Slide 0.163 we see the car with the camera-rig driving by a few buildings in Graz- Kopernikusgasse. 

Slide 0.164 shows two images with various details from those images in Slide 0.165, in 

particular images collected of the Krones-Hauptschule. Simultaneously with the linear detector 

array collecting images line by line as the car moves forward (this is also called “push broom 

imaging”), one can take images with a square array camera. So we have the lower resolution


square array camera with maybe 700 × 500 pixels augmented by the linear detector array images 

with 3,000 pixels in one line and an infinite number of lines as the car drives by. The opportunity 

exists here as well to perform work for Diploma or Doctoral-theses to develop the advantages and 

disadvantages of square array versus line array cameras. 

A look at an image of a linear array shows its poor geometry because as the car drives there are 

lots of motions going on. In the particular doctoral thesis, the candidate developed software and 

algorithms to fix the geometric deformations in the images. Used is the fact that many of the 

features are recti-linear, for example edges of windows and details on the wall. This can help to 

automatically produce good images. If two images are produced, one can produce a stereo 

rendering of the city scape. The human observer can obtain a 3 dimensional using stereo glasses, 

as we will discuss later. 

That linear detector array approach carried in a car as a rigid arrangement without any moving 

camera points was also used by the same author to create a panoramic camera. What is a panorama 

camera? This is a camera that sweeps (rotates) across the area of interest with an open shutter, 

producing a very wide angle of view, in this case of 360 degrees in the horizontal dimension and 

maybe 90 degrees in the vertical direction. We can use two such images for stereoscopy by taking 

photos from two different positions. The example shown in Slide 0.172 has two images taken of 

an office space to combine into a stereo pair which can be used to recreate a complete digital 3-D 

model of the office space. These are the two raw images in which the “panoramic sweep” across 

360 o is presented as a flat image. 

What is the geometry of such a panoramic camera? This is rather complex. We do have a 

projection center O that is located on a rotation axis, which in turn defines a z-coordinate axis. 

The rotation axis passes through the center of an imaging lens. The CCD elements are arranged 

vertically at location z CCD . An object point p Obj is imaged onto the imaging surface at location 

z CCD . The distance between O and the vertical line through the CCD is called “focal distance” 

f CCD . An image is created by rotating the entire arrangement around the z-axis and collecting 

vertical rows of pixels of the object space, and as we move we a assemble many rows into a 

continuous image. One interesting topic about this typ of imaging would be to find out what 

the most efficient and smartest ways would be to image indoor spaces (more potential topics for 

Diploma- and Doctoral research. To conclude Slide 0.175 is an image of an office space with a door, 

umbrella and a bookshelf that is created from that panoramic view in Slide 0.172 by geometrically 

“fixing” it to make it look like a photo from a conventional camera. The Congress Center in 

Graz has also been imaged in Slide 0.176 with a panoramic sweep; a separate sweep was made in 

slideFigure x to see how the ceiling looks when swept with a panoramic camera. 

0.16 International Center of Excellence for City Modeling 

Who is interested in research on city models in the world? What are the “centers of excellence”? 

In any endeavour that is new and “hot” you always want to know who is doing what and where. 

In Europe there were several Conferences in recent years on this subject. One of these was in 

Graz, one was in Ascona in Switzerland, one in Bonn. Ascona was organized by the ETH-Zürich, 

Bonn by the University of Bonn, the Graz meeting by our Institute. 

The ETH-Zurich is home of considerable work in this area so much so that some University people 

even started a company, Cybercity AG. The work in Zurich addresses details of residential homes 

led to the organisation of two workshops in Ascona for which books have been published in the 

Birkhäuser-Verlag. One can see in these examples of Slide 0.182 Slide 0.183 Slide 0.184 Slide 0.185 

Slide 0.186 that they find edges, use those to segment the roof into it’s parts. They use multiple 

images of the same building to verify that the segmentation is correct and improve it if errors 

are found. The typical example from which they work is aerial photography at large scales (large 

scales are at 1:1500; small scales are at 1:20,000). Large models have been made, for example of 

Zurich as shown in Slide 0.186.

0.17. APPLICATIONS 21 

The most significant amount of work in this area of city modeling has probably been performed at 

the University in Bonn. The image in Slide 0.188 is an example of an area in Munich. The method 

used in Bonn is fairly complex and encompasses an entire range of procedures that typically would 

be found in many chapters of books on image processing or pattern recognition. One calls the 

diagram shown in Slide 0.189 an “image processing pipeline”. 

The data processed in Bonn are the same as used in Zurich. There exists an international data-set 

for research so that various institutions have to ability to practice their skill and compare the 

results. We will later go through the individual worksteps that are being listed in the pipe-line. 

One result from Bonn using the international images shows edges and from the edges finds match 

points and corners in separate images of the same object. This indicates the top of a roof. This 

illustration in Slide 0.190 is explaining the principle of the work done in Bonn. Another Bonnapproach 

is to first create corners and then topologically connect the corners so that roof segments 

come into existence. Then these roof-segments are merged into the largest possible area that might 

present roofs as shown in this example. 

Another approach is to start the modelling of a building not from the image itself nor from its 

edges and corners, but to create point clouds by stereo measurements. This represents a dense 

digital elevation model as we have explained earlier in the Denver-example. Digital elevations are 

illustrated here by encoding the elevation by brightness values with dark being low, white being 

high. One can now try to fit planes to the elements of the digital elevation model. Slide 0.193 is 

an intermediate result, where it looks as if one has found some roofs. The digital elevation model 

here invites one to compute planes to define roofs and the sides of buildings. 

In North America the work on City modeling is typically sponsored by the Defence Advanced 

Research Projects Agency (DARPA). Their motivation is the military application, for example 

to fight urban wars or having robots move through cities, or face terrorists. DARPA programs 

typically address university research labs. The most visible ones were the University of Massachusetts 

in Amherst, the University of Southern Colorado, the Carnegie-Mellon University and 

the Stanford Research Institute (SRI), which is a spin-out from Carnegie-Mellon University. SRI 

is a well-known research lab that is separately organised as a foundation. 

In the US there are other avenues towards modeling of cities, which are not defense oriented. One 

is architecture. In Los Angeles there is the architecture department of the University of California 

at Los Angeles. They are building a model the entire city of Los Angeles using students and 

manual work. 

0.17 Applications 

Let me come to a conclusion of city modeling. Why do people create such modeling? The 

development of an anwer presents another opportunity to do application studies for Diploma 

and Doctoral thesis Let me illustrate some of those applications of city models. These certainly 

include city planning, architectural design, (car-)navigation, there is engineering re-construction 

of buildings that have been damaged and need to be repaired, then infotainment (entertainment), 

there is simulation and training for fire-guards and for disaster preparedness. Applications can 

be found in Telecom or in the Military. A military issue is guidance of robot soldiers, targeting 

and guiding of weapons. In Telecom we may need to transmit data from roof to roof as one 

way of broad band wireless access systems. In infotainment we might soon have 3-dimensional 

phonebooks. 

0.18 Telecom Applications of City Models 

A particular computer graphics and image processing issue which should be of specific interest to 

Telematics-people is “the use of building models for Telecom and how these building models are


made”. In Slide 0.202 is a three dimensional model of the downtown of Montreal. The purpose 

of this model is the plan to setup on top of roofs of high buildings. Those antennas would serve 

as hubs to illuminate other buildings and to receive data from other buildings, in a system that 

is called Local Multi-Point Distribution system (LMDS). This is a broadband wireless access 

technology that competes with fibre optics in the ground and with satellite communication. We 

will see how the technologies will shake out, but LMDS is evolving everywhere, it is very scaleable 

since one can build up the system sequentially hub-by-hub, and one can increase the performance 

sequentially as more and more users in buildings sign up. 

Slide 0.204 is a model of a large section of Vancouver, where the buildings are modeled in support 

of an LMDS-project. In order to define where to place a hub one can go into software that 

automatically selects the best location for a hub. For example if we place an antenna on a high 

building we then can determine which buildings illuminated from that antenna and which are not. 

We use examples from a Canadian project to map more than 60 cities. One delivers to the 

Telecom company so called “raster data”, but also so called “vector data”, and also non-graphic 

data, namely addresses. We will talk later about raster and vector data structures, and we will 

discuss how they are converted into one another. 

The geometric accuracy of the shape of these buildings should be in the range of ±1 meter in x, 

y, and z in order to be useful for the optimum location of antennas. 

How many buildings are in a square km? In Montreal this was about 1000 buildings per sqkm 

in the downtown. Because the data need to be delivered quickly (Telecom-companies need them 

“now”), one can not always have perfect images to extract buildings from. So one must be able 

to mix pre-existing photography and new aerial sources and work from what is there. For this 

reason one needs to be robust in one’s procedures vis-à-vis the type of photography. The question 

often is: from what altitude is that photography taken and therefore what is the scale of the 

photographs? 

Some Telecom-companies want all buildings (commercial and residential), while others only need 

the commercial buildings. Most of the companies want all addresses. Even multiple addresses 

must be provided in the case of an apartment building. There is always a need to be quick and 

inexpensive Companies expect that a hundred sqkm can be modeled per week which is a hundred 

thousand buildings per week. One cannot achieve this by hand. One has to do this by machine. 

One challenge might be that one is faced with aerial photography that is flown at too large a 

scale. Slide 0.207 shows a high-riser, looks different in one view from the other stereoscopic view 

in Slide 0.208. In a high-rise building we may not even see a certain side of the building in one 

photograph, but we see that side in the other. Our procedure must cope with these dissimilarities. 

In Slide 0.209 is a set of polygons extracted from an image and one can already see that some 

polygons are not visible from that particular photograph. Clearly those data where extracted from 

another photograph as shown in Slide 0.210. The same situation is illustrated again in this second 

example of Slide 0.211. Finally we have a raster representation of the buildings in Slide 0.212. So 

we have an (x, y)-grid on the ground and to each (x, y)-grid we have a z-elevation. The images 

shown before were the source of the building in the center of this particular raster representation. 

But we also want a vector-representation of the building footprints and of the details of the roofs 

as in the example of downtown Montreal. These vectors are needed, because the addresses can 

be associated with polygons describing a building, but one has a harder time associating addresses 

with a raster representation. However, the signal propagation computation needs raster data as 

shown here. 

The entire area of central Montreal has 400,000 buildings as shown in Slide 0.217. Zooming in 

on the green segment permits one to see city-blocks. Zooming in further produces individual 

buildings. A very complex building is the cathedral, which on a an aerial photograph looks like 

Slide 0.220.

0.18. TELECOM APPLICATIONS OF CITY MODELS 23 

Lets us summarize: the data-sets being used for this Telecom wave-propagation modeling in the 

LMDS application consists first of all of vector data of the buildingsSlide 0.222 but also of the 

vegetation, because the vegetation may block the intervisibility of antennas and, we show also 

the combination of both. Of course the same data are needed in a raster format of the building 

data, and finally a combination of raster and vector data to include the trees. And we must 

not forget the addresses. Again, there may be one address per building, or multiple addresses for 

each building. The addresses are locked to the geometric data address-locators that are placed 

inside the polygons. As a result the addresses are associated with the polygons and thus with the 

buildings. 

What do such Telecom-data-sets go for in terms of price? A building may cost between $ 1 and $ 

25. A square km may go for $ 100 to $ 600. However, if there are 1000 buildings per sqkm then 

obviously an individual building may cost less than one dollar. A metropolis such as Montreal 

may cover 4000 square km but the interest is focussed on 800 sqkm. On average of course there 

are less than 1000 buildings per sqkm. One might find more typically 200 or so buildings per sqkm 

over larger metropolitan regions. 

...

24 CHAPTER 0. INTRODUCTION


Slide 0.1 Slide 0.2 Slide 0.3 Slide 0.4 






Slide 0.25 Slide 0.26 Slide 0.27 Slide 0.28



























Slide 0.117 Slide 0.118 Slide 0.119











































Chapter 1 

Characterization of Images 

1.1 The Digital Image 

Images can be generated from at least two sources. The first is creation of the image from the 

measurements taken by a sensor. We would call this a “natural image”. In contrast, an image may 

also be generated by a computer describing an object or a situation that may or may not consist 

in the real-world. Such images are “computer generated” (CGI, computer-generated-images). 

All digital images have a coordinate system associated with them. Slide 1.5 is an original and 

typical image with two dimensions and has a rectangular (Cartesian) coordinate system with 

axes x and y. Therefore a location in the image can be defined by its coordinates x and y. 

Properties of the image can now be associated with that location. In that sense the image is an 

algebraic function f(x, y). When we deal with digital images then we discretize this continuous 

function and we replace the continuous image by rows and columns of image elements or pixels. 

A pixel is typically to be a square or rectangular entity. More realistically of course the sensor 

that may have caused an image may have an instantaneous field-of-view that is not rectangular 

or square. It is oftentimes a circle. We are presenting an image digitally as an arrangement of 

square pixels, although the machinery which creates the digital image may not produce square 

pixels. 

Digital images are fairly simple arrangements of numbers that are associated with gray values as 

illustrated in Slide 1.7. If shows four different gray values between 0 and 30 with 0 being white 

and 30 being black. A very simple type of image is a so-called “binary image” or binary mask. 

That is an image of which the pixels have gray values of either 0 as white or 1 as black. Such a 

binary image may be obtained by thresholding a gray value image. We may have a threshold 

Algorithm 2 Threshold image 

1: create a binary output image with the same dimensions as the input image 

2: for all pixel p of the input image do 

3: retrieve grayvalue v of pixel p from image 

4: find pixel p ′ of output image corresponding to p 

5: if v ≥ v t then {compare grayvalue v with threshold v t } 

6: set p ′ to white 

7: else 

8: set p ′ to black 

9: end if 

10: end for 

37

38 CHAPTER 1. CHARACTERIZATION OF IMAGES 

that takes all pixel values between 15 and 25 to be black (or 1) and all other gray values will be 

set to white or 0. 

An immediate question to ask is for the reason that this technology has been developed to take 

continuous gray values and convert them into digital pixel arrays. Let’s discuss a few advantages, 

a very significant one is “quantification”. In a digital environment we are not subject to judging 

an image with our opinions but one has actual measurements. This can be illuminated by an 

example of a gray area embedded either in a dark or a white background. Subjectively our eye 

will tell us that the gray area is brighter when embedded in a dark environment or darker when 

embedded in a brighter environment. But in reality the two gray values are identical. An 

eye can objectively differentiate a limited number of gray values. In a chaotic image we may be 

able to separate only 16 to 64 gray values. Relatively, though, namely in situations where we 

have two areas adjacent to one another, our eyes become very sensitive to the differences. But 

we cannot compare a gray-tone in one corner of an image to a gray-tone in another corner of the 

same image and be certain which one is brighter or darker. That can be easily accomplished in a 

digital environment. 

There is a whole host of other advantages that will not be discussed at the same level of detail. 

First, a very important one is the automation of the visual sense. We can give the computer eyes 

and can process the visual information by machine, and thereby taking the work of interpreting 

various visual inputs away from the human. Examples are quality control in a factory environment 

or in inaccessible, dangerous areas. 

Second, an advantage is “flexibility”. We have options that we do not have in an analog environment 

or with the natural visual sense in configuring very flexible sensing systems for very specific 

tasks. Third, the ability to store, retrieve, transfer and publish visual information at very little 

cost is another advantage if the information is digital. We all have of course experience now with 

multimedia information on the web and we all know that duplication and transfer is available at 

almost no cost. Forth is the advantage to enhance the visual sense of the human by an array of 

sensors, for example under water imaging, sound imaging, x-ray imaging, microwave imaging. We 

will address sensors in more detail. 

Fifth, digital processing of sensor data is essentially independent of the specifics of the sensor. We 

may have algorithms and software that are applicable to a variety of sensors. That is an advantage 

in a digital environment. Sixth is cost: digital images are inexpensive. This was mentioned already 

in the context of storage, transfer and publication. Expensive looking color images can be rendered 

on a computer monitor and yet we have no direct costs for those images. This is quite a difference 

from going to a photo lab and getting quality paper prints offer diapositive. 

The seventh advantage of digital images needs an example to explain. There exist numerous 

satellites orbiting the Earth and carrying Earth-observing sensors. One such system is from the 

US-NASA and is called “Landsat”, Slide ?? is an example of a Landsat image of the Ennstal 

with its rows and columns. What makes this image interesting is that the color presentation of 

what the sensor in orbit “sees”. The presentation is made from 7 separate spectral channels, not 

from simple red/green/blue color photography. Something that is very typical of the flexibility 

and versatility of digital sensors and digital image processing is this ability to extend the visual 

capabilities of humans and operate with many more images than a human can “see” or cope with. 

Prüfungsfragen: 

• Was versteht man unter einem ” 

Schwellwertbild“, und für welchen Zweck wird es verwendet? 

• Welche Vorteile haben digitale Bilder gegenüber analogen Bildern? 

• Was versteht man unter einem Mehrfach- oder Multispektralbild, und wofür wird es verwendet?

1.2. THE IMAGE AS A RASTER DATA SET 39 

1.2 The Image as a Raster Data Set 

A digital image is an array of pixels. It was already mentioned that in principle the images are 

continuous functions f(x, y). A very simple “image model” states that f(x, y) is the product of 

two separate functions. One function is the illumination I and the other function describes the 

properties of the object that is being illuminated, namely the reflection R. The reflection function 

may vary between 0 and 1 whereas the illumination function may vary between 0 and ∞. 

We now need to discretize this continuous function in order to end up with a digital image. 

We might create 800 by 1000 pixels, a very typical arrangement of pixels for the digital sensing 

environment. So we sample our continuous function f(x, y) into an N × M matrix with N rows 

and M columns. Typically our image dimension are 2 n . So our number of rows may be 64, 128, 

512, 1024 etc. We not only discretize or sample the image (x, y)-locations. We also have to take 

the gray value at each location and discretize it. We do that also at 2 b , with b typically being 

small and producing 2, 4, 8, 12, 16 bits per pixels. 

Definition 1 Amount of data in an image 

Definition 3: ”The amount of data of an image” 

To calculate the amount of data of an image you have to have given the geometric and radiometric 

resolution of the image. 

Let’s say we have an image with N columns and M rows (geometric resolution) and with the 

radiometric resolution of R bits per pixel. 

The amount of data b of the image is then calculated using the formula: 

b = N ∗ M ∗ R 

A very simple question is shown in Slide 1.20. If we create an image of an object and we need to 

understand from the image a certain detail in the object, say a spec of dirt on a piece of wood of 

60 cm by 60 cm, and if that dirt can be as small as 0.08 mm 2 , what’s the size of the image to be 

sure that we recognize all the dirt spots? 

The resolution of an image is a widely discussed issue. When we talk about a geometric resolution 

of an image than we typically associate with this the size of the pixel on the object and the number 

of pixels in an image. When we talk about radiometric resolution than we describe here the number 

of bits we have per pixel. Let us take the example of geometric resolution. We have in Slide 1.22 

and Slide 1.23 a sequence of images of a rose that begins with a resolution of a 1000 by 1000 pixels. 

We go down from there to ultimately 64 by 64 or even 32 by 32 pixels. Clearly at 32 by 32 pixels 

we cannot recognize the rose any more. 

Lets take a look at the radiometric resolution. We have in Slide 1.24 a black and white image of 

that a rose at 8 bits per pixel. We reduce the number of bits and in the extreme case we have 

one bit only, resulting in a binary image (either black or white). In the end we may have a hard 

time interpreting what we are looking at, unless we know already what to expect. As we will see 

later, image processing a 8-bits in black & white images is very common. A radiometric resolution 

at more bits per black & white pixel is needed for example in radiology. In medicine it is not 

uncommon to use 16 bits per pixel. With 8 bits we obviously get 256 gray values, if we have 12 

bits we have 4096 gray values. 

The color representation is more complex, we will talk about that extensively. In that case we 

do not have one 8-bit number per color pixel, but we typically have three numbers, one each for 

red/green/blue, thus 24 bits in total per each color pixel.



• Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“. 

” 

darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells? 

Was ist 

• Beschreiben Sie den Vorgang der Diskretisierung beim 

einem digitalen Bild. 

Übergang von einem analogen zu 

• Was versteht man unter Sampling, und welche Probleme treten dabei auf? Sie sind eingeladen, 

in Ihrer Antwort Formeln zu benutzen. 

• Was bedeuten die Begriffe ” 

geometrische“ bzw. ” 

radiometrische“ Auflösung eines Bildes? 

Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen. 

1.3 System Concepts 

We talk about image-analysis, image-processing or pattern recognition and about computer graphics. 

What are their various basic ideas? Image processing goes from the image to a model of an 

object, and from there to an understanding of the object. In [GW92] an image analysis system 

is described in the first introduction chapter. One always begins with (a) sensors, thus with the 

image acquisition step, the creation of an image by a camera, radar system, by sound. Once the 

image is acquired it is, so to speak, “in the can”. We now can (b) improve the image, this is called 

“pre-processing”. Improving means fixing errors in the image, making the image look good for 

the eye if a human needs to inspect it. Preprocessing produces a new, improved image. 

We now want to decompose the image into its primitives. We would like to (c) segment it into areas 

or fields, edges, lines, regions. This creates from the pre-processed image as it has been seen visually 

a new image in which the original pixels are substituted by the image regions, contours, edges. We 

denote this as “segmentation”. After segmentation we need to create a (d) representation and a 

description of the image contents. And finally we want to use the image contents and (e) interpret 

their meaning. How do objects looks like? This phase is called recognition and interpretation. 

All of this is based on (f) knowledge about a problem domain, about the sensor, about the object, 

about the application of the information. So once the object information has been interpreted we 

now can use the information extracted from the image for action. We may make a decision to e.g. 

move a robot, or to dispose of a defective part or to place an urban waste dump and so forth. 

The typical ideas at the basis of computer graphics are slightly different. We start out from the 

computer in which we store data about objects and create an image as a basis for actions. So 

we have a database and an application model. We have a program to take the data from the 

database and to feed the data into a graphic system for display. The object of computer graphics 

is the visual impression of a human user. However, what may seem like two different worlds, image 

processing versus computer graphics, really are largly one and the same world. Image processing 

creates from images of the real world a model of that real world. Computer graphics takes a model 

of objects and creates from it an image of those objects. So in terms of a real world, computer 

graphics and image processing are entirely complementary. Image processing is going from real 

world to a model of the real world, and computer graphics takes the object of the real world and 

creates an image of it. 

Where those two areas do diverge is in the non-real world. There is no sensing and no image 

analysis of a non-real world. What is computer graphics of a non-real world? Just look at 

cartoons and the movies. So there is point-of-view that says that image processing and computer 

graphics belong together. A slightly different point of view is to say that image processing and 

computer graphics overlap in areas addressing the real world, and that there are areas that are 

separate.

1.4. DISPLAYING IMAGES ON A MONITOR 41 


• Skizzieren Sie den Vorgang der Bilderkennung als Kette von Prozessen von der Szene bis hin 

zur Szenenbeschreibung. 

1.4 Displaying Images on a Monitor 

The customary situation today is with a refresh buffer in which we store numbers and represent 

the image. We will use a display controller that managers this buffer based on data and software 

residing on a host computer. And we have a video controller that takes what’s in the buffer and 

presents this information on a computer monitor. In the buffer we might have a binary image 

at 1 bit per pixel. Or we may have a color image at 24 bits per pixel. These are the typical 

arrangements for refresh buffers. The refresh buffer typically is larger than the information on 

a computer monitor. The computer monitor may display 800 by 1000 pixels. The refresh-buffer 

might hold 2000 by 2000 pixels. An image is displayed on the monitor using a cathode-ray tube 

or as LCD-arrangement. On a cathode-ray tube the image is being painted line by line on the 

phosphor is surface, going from top to bottom. 

Then the ray gets turned off. So it moves from left to right with the beam-on, right to left 

with the beam-off, top down with beam-on, down-to-top at beam-off. An image like the one in 

Slide “Wiedergabe bildhafter Information” is a line drawing. How could this be represented on 

a monitor? In the early days this was by a vector scan, so the cathode-ray was used to actually 

paint vectors on the monitor. Very expensive vector display monitors where originally built maybe 

as long as into the mid-80’s. The development of television monitors became very inexpensive, 

but vector monitors remained expensive, and so a transition took place from vector monitors to 

raster monitors, and today everything is represented in this raster. Vector scan) We could have 

a raster display to present the contours of an object, but we can also fill the object in the raster 

data format. 

Not all representations on a monitor are always dealing with the 3-dimensional world. Many 

representations in image form can be of an artificial world or of technical data, thus of non-image 

information. This is typically denoted by the concept of “visualization”. Slide “Polyline” is a 

visualization of data in one dimension. Associated with this very simple idea are concepts such as 

polylines (representing a bow tie) and we have a table of points 0 to 6 representing this polyline. 

There are concepts such as “markers” which are symbols that represent particular values in a two 

dimensional array. This has once been a significant element in computer graphic literature that 

today no longer represents a big issue. 


• Beschreiben sie die Komponenten, die in einem Computer zur Ausgabe und zur interaktiven 

Manipulation eines digitales Rasterbildes benötigt werden. 

• Beschreiben Sie unter Verwendung einer Skizze den Aufbau eines digitalen Rasterbildes auf 

der Leuchtfläche eines Elektronenstrahlschirmes . 

• Was ist der Unterschied zwischen Vektor- und Rasterdarstellung eines digitalen Bildes? 

Veranschaulichen Sie Ihre Antwort anhand eines einfachen Beispiels und beschreiben Sie die 

Vor- und Nachteile beider Verfahren. 

• Erklären Sie anhand einer Skizze den zeitlichen Ablauf des Bildaufbaus auf einem Elektronenstrahlschirm!


Algorithm 3 Simple raster image scaling by pixel replication 

1: widthratio ⇐ newimagewidth/oldimagewidth 

2: heightratio ⇐ newimageheight/oldimageheight 

3: for all y such that 0 ≤ y < newimageheight do 

4: for all x such that 0 ≤ x < newimagewidth do 

5: newimage[x, y] ⇐ oldimage[round(x/widthratio), round(y/heightratio)] 


7: x ⇐ 0 


Algorithm 4 Image resizing 

1: widthratio ⇐ newgraphicwidth/oldgraphicwidth 

2: heightratio ⇐ newgraphicheight/oldgraphicheight 

3: for all Points p in the graphic do 

4: p.x ⇐ p.x × widthratio 

5: p.y ⇐ p.y × heightratio 


1.5 Images as Raster Data 

We deal with a continuous world of objects, such as curves or areas and we have to convert them 

into pixel arrays. Slide “Rasterkonvertiertes Objekt” shows the representation of a certain figure 

in a raster image. If we want to enlarge this, we obtain a larger figure with the exact same shape 

but a larger size of the object’s elements. If we enlarged the image by a factor of two, what was 

one pixel before now is talking up four pixels. The same shape that we had before would look 

identical but smaller if we had smaller pixels. We make a transition to pixels that are only a 

quarter as large as before. If we now enlarge the image, starting from the smaller pixels we get 

back the same shape we had before. However, if we reconvert from the vector to a raster format, 

then the original figure really will produce a different result at a higher resolution. So we need to 

understand what pixel size and geometric resolution do in the transition from a vector world to a 

raster world. 


• Was versteht man unter ” 

Rasterkonversion“, und welche Probleme können dabei auftreten? 

1.6 Operations on Binary Raster Images 

There is an entire world of interesting mathematics dealing with binary images and operations on 

such binary images. These ideas have to do with neighborhoods, connectivity, edges, lines, and 

regions. This type of mathematics was developed in the 1970’s. A very important contributor was 

Prof. Azriel Rosenfeld, who with Prof. Avi Kak wrote the original book on pattern recognition 

and image processing. 

What is a neighborhood? Remember that a pixel at location (x, y) has a neighborhood of 

four pixels, that are up and down, left and right of the pixel in the middle. We call this an N 4 

neighbourhood or 4-neighbors. We can also have diagonal neighbors N D with the lower left, lower 

right, upper right, upper left neighbors. We add these N D and the N 4 neighbors to obtain the N 8 

neighbors. This is being further illustrated as Prof. Rosenfeld did in 1970. Slide 1.56 presents 

the N 4 -neighbors and the N 8 -neighbors and associates this with a chess game’s movements of the 

king. We may also have the oblique-neighbors N v and the springer-neighbors N sp which are like

1.6. OPERATIONS ON BINARY RASTER IMAGES 43 

analogous chess movements of the springer etc. 

from the “Dame” game. 

Another diagonal neighborhood would derive 

We have neighborhoods of the first order, which are the neighbors of a pixel-x. The neighbors of 

the neighbors are “neighbors of second order” with respect to a pixel at x. We could increase the 

order by having neighbors of the neighbors of the neighbors. 

Definition 2 Connectivity 

2 Pixel haengen zusammen, wenn sie einanders Nachbarn sind und dieselbe Zusammenhangseigenschaft 

V besitzen. 

4-Zusammenhang: 

1: if q N4-Nachbar von p then {Def. 5} 

2: Pixel p und q haengen zusammen 

3: else 

4: Pixel p und q haengen nicht zusammen 

5: end if 

m-Zusammenhang: 

1: if (N4 (p) geschnitten N4 (q)) = 0 then {N4( x): Menge der x-N4-Nachbarn} 

2: if (q ist N4 -Nachbar von p)||(q ist ND-Nachbar von p) then {Def. 5} 

3: Pixel p und q haengen zusammen 

4: else 


6: end if 

7: else 


9: end if 

Connectivity is defined by two pixels belonging together: They are “connected” if they are one 

another’s neighbors. So we need to have a neighbor-relationship to define connectivity. Depending 

on a 4-neighborhood, an 8-neighborhood, a springer-neighborhood we can define various types of 

connectivities. We therefore say that two pixels p and q are one another’s neighbors if they are 

connected, if they are neighbors under a neighborhood-relationship. 

This becomes pretty interesting and useful once we start to do character-recognition and we need 

to figure out which pixels belong together and create certain shapes. We may have an example 

of three-by-three pixels of which four pixels are black and five pixels are white. We now can 

have connections established between those four black pixels under various connectivity rules. A 

connectivity with eight neighbors creates a more complex shape than a connectivity via so-called m- 

neighbors, where m-neighbors have been defined previously in Slide “Zusammenhaengende Pixel”. 

Definition 3 Distance 

Gegeben: Punkte p(x,y) und q(s,t) 

1: De(p,q) = 2√ (x − s) 2 + (y − t) 2 (Euklidische Distanz) 

2: D4-Distanz (City Block Distance) 

3: D8-Distanz (Schachbrett-Distanz) 

The neighborhood- and connectivity-relationships can be used to established distances between 

pixels, to define edges, lines and region in images, to define contours of objects, to find a path 

between any two locations in an image and to perhaps eliminate pixels as noise if they are not 

connected to any other pixels. A quick example of a distance addresses two pixels P and Q with


a distance depending on the neighborhood-relationships that we have defined. The Euclidian 

distance of course is simply obtained by the pythagorean sum of the coordinate differences. But 

if we take a 4-neighborhood as the base for distance measurements than we have a “city block 

distance”, two blocks up, two blocks over. Or if we have the 8-neighborhood than we have a 

“chessboard type of distance”. 

Let’s define an “edge”. This is important because there is a mathematical definition that is a little 

different from what one would define an edge to be in a sort casual way. An edge e in an image 

is a property of a pair of pixels which are neighbors of one another. That is thus a property of 

a pair of pixels and one needs to consider two pixels to define this. It is important that the two 

pixels are neighbors under a neighborhood relationship. Any pair of pixels that are neighbors of 

one another represent an edge. The edge has a “direction” and a “strength”. Clearly the strength 

of the edge is what is important to us. The edge is defined on an image B and an edge image is 

obtained by taking each edge value at each pixel. We can apply a threshold to the weight and the 

direction of the edge. All edges with a weight beyond a certain value become 1 and all edges less 

than a certain value become 0. In that case now we have converted our image into a binary edge 

image. 

What is a line? A line is a finite sequence of edges, with each edge e i , i = 1, . . . n. A line is 

a sequence of edges where the edges need to be one another’s neighbor under a neighborhood 

relationship. The edges must be connected. A line has a length, the length is the number of the 

edges that form that line. 

What’s a region in the image? A region is a connected set R of pixels from an image B. A region 

has a contour. A contour is a line composed of edges and the edges are defined with the property 

of two neighboring pixels P and Q. P must be part of the region R, Q must not be. This sounds 

all pretty intuitive, but gets pretty complicated once one starts doing operations. 


• Wenn wir eine ” 

Distanz“ zwischen zwei Pixeln in einem Digitalbild anzugeben haben, stehen 

uns verschiedene Distanzmaße zur Verfügung. Zählen Sie bitte auf, welche Distanzmaße Sie 

kennen. Sie sind eingeladen, für die Beantwortung Formeln zu nutzen. 

• Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle 

” 

Arten von Nachbarschaften auf, die in der Vorlesung behandelt wurden, und beschreiben Sie 

diese Nachbarschaften mittels je einer Skizze. 

• Welche Möglichkeiten gibt es, Pixel in einem digitalen Rasterbild als zusammenhängend zu 

definieren? Erläutern Sie jede Definition anhand einer Skizze. 

• Zu welchen Zwecken definiert man Nachbarschafts- und Zusammenhangsbeziehungen zwischen 

Pixeln in digitalen Rasterbildern? 

• Geben Sie die Definitionen der Begriffe ” 

Kante“, ” 

Linie“ und ” 

Region“ in einem digitalen 

Rasterbild an. 

1.7 Algebraic Operations on Images 

We can add two images, subtract, multiply, divide them, we can compare images by some logical 

operations and we can look at one image using a second image as a mask. Suppose we have a 

source image, an operator and a destination image. Now, depending on the operator we obtain 

a resulting image. We take a particular source and destination image and make our operator 

the function “replace” or the function “or” or the function “X or” or the function “and” to then 

obtain different results. We may have mask operations. In this case we take an image A to

1.7. ALGEBRAIC OPERATIONS ON IMAGES 45 

Algorithm 5 Logical mask operations 

This is an example for a mask operation. Two images are linked with the Boolean OR-operator, 

pixel by pixel. 

1: for all i=0, i


mask. For this we may have an input frame buffer A and an output frame puffer B. We may 

be able to process everything that is in these two buffers in a 1/30 of a second. So we can do an 

operation on N times M pixels in (N × M)/30 seconds, as illustrated in Slide “Operationen”. 


• Gegeben seien die zwei binären Bilder in Abbildung ??. Welches Ergebnis wird durch eine 

logische Verknüpfung der beiden Bildern nach einer ” 

xor“-Operation erhalten? Verwenden 

Sie bitte eine Skizze. 

• Erläutern Sie anhand einiger Beispiele, was man unter algebraischen Operationen mit zwei 

Bildern versteht. 

• Erklären Sie die Begriffe ” 

Maske“, ” 

Filter“ und ” 

Fenster“ im Zusammenhang mit algebraischen 

Operationen mit zwei Bildern. Veranschaulichen Sie Ihre Antwort anhand einer Skizze.

1.7. ALGEBRAIC OPERATIONS ON IMAGES 47









1.7. ALGEBRAIC OPERATIONS ON IMAGES 49 














Chapter 2 

Sensing 

2.1 The Most Important Sensors: The Eye and the Camera 

The eye is the primary sensor of a human. It is certainly important to understand how it operates 

to understand how a computer can mimic the eye and how certain new ideas in computer vision 

and also in computer graphics have developed taking advantage of the specificities of the eye. 

In Slide Slide 2.5 we show an eye and define an optical axis of an eye’s lens. This optical axis 

intersects the retina at a place called the fovea, which is the area of highest geometric and radiometric 

resolution. The lens can change its focal length using muscles that pull on the lens and 

change its shape. As a result the human can focus on objects that are near by, for example at a 

25 cm distance which is typically used in reading a newspaper or book. Or it can focus at infinity 

looking out into the world. 

The light that is projected from the world through the lens onto the retina gets converted into 

signals that are then fed by nerves into the brain. The place where the nerve leaves the eye is 

called the blind spot. That is a location where no image can be sensed. The optical system of 

the eye consists, apart from the lens, of the so called vitreous humor 1 , in front of the lens is a 

protective layer called the cornea 2 and between the lens and the cornea is a space filled with liquid 

called the anterior chamber. Therefore the optical system of the eye consists of essentially four 

optically active bodies: 1. the cornea, 2. the anterior chamber, 3. the lens and 4. the vitreous 

humor. 

The conversion of light into nerve signals is accomplished by means of rods and cones that are 

embedded in the retina. The rods 3 are black-and-white sensors. The eye has about 75 million of 

them, and they are distributed widely over the retina. 

If there is very little light, the rods will still be able to receive photons and convert them into 

recognizable nerve-signals. If we see color, we need the cones 4 . We have only 6 million of those 

and they are not that evenly distributed as the rods are. They are concentrated at the fovea so 

that the fovea has about 150.000 of those cones per square millimeter. That number is important 

to remember for a discussion of resolution later on. 

We take a look at the camera as an analogon of an eye. A camera may produce black-and-white 

or color-images, or even false color-images. Slide is a typical color image taken from an airplane of 

a set of buildings (see these images also in the previous Chapter 0). This color-photograph is built 

1 in German: Glaskörper 

2 in German: Hornhaut 

3 in German: Zäpfchen 

4 in German: Stäbchen 

51

52 CHAPTER 2. SENSING 

from three component images. First is a the red channel. Second is the green channel followed by 

the blue channel. We can combine those red/green/blue channels into a true color-image. 

In terms of technical imaging, a camera is capable of producing a single image or an entire image 

sequence. When we have multiple images or image sequences, we typically denote them as multiimages. 

A first case be in the form of multi-spectral images, if we break up the entire range of electromagnetic 

radiation from ultraviolet to infrared into individual bands and produce a separate image 

for each band. We call the sum of those images multi-spectral. If we have many of those bands we 

might call the images hyper-spectral. Typical hyper-spectral image cameras produce 256 separate 

images simultaneously, not just red/green/blue! 

A second case is to have the camera sit somewhere and make images over and over, always in the 

same color but observing changes in the scene. We call that multi-temporal. A third case is to 

observe a scene or an object from various positions. A satellite may fly over Graz and take images 

once as the satellite barely arrives over Graz, a moment later as the satellite already leaves Graz. 

We call this multi-position images. 

And then finally, a fourth case might have images taken not only by one sensor but by multiple 

sensors, not just by a regular optical camera, but perhaps also by radar or other sensors as we will 

discuss them later. That approach will produce some multi-sensor images. 

This multiplicity of images presents a very interesting challenge in image processing. Particularly 

when we have a need to merge images that are taken at separate times from separate positions 

and with different sensors, and if we want to automatically extract information about an object 

from many images of that object, we have a good challenge. Multiple digital images of a particular 

object location results in multiple pixels per given location. 

Those pixels can be stacked on top of one another and then represent “a vector” with the actual 

gray values in each individual image being the “elements” of that vector. We can now apply the 

ideology of vector algebra to these multi-image pixels. Such a vector may be called feature vector, 

with the features being the color values of the pixel to which the vector belongs. 


• Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige 

Beispiele für Mehrfachbilder! 

2.2 What is a Sensor Model? 

So far we have only talked about one particular sensor, the camera as an analagon to the eye. We 

describe in image processing each sensor by a so called sensor model. What does a sensor model 

do? It replaces the physical image and the process of its creation by a geometric description of 

the image’s creation. We stay with the camera: this is designed to reconstruct the geometric ray 

passing through the perspective center of the camera, from there through the image plane and out 

into the world. 

Slide 2.11 illustrates that in a camera’s sensor model we have a perspective center 0, we have an 

image plane P , we have image coordinates x and h, we have an image of the perspective center 

H at the location that is obtained by dropping a line perpendicular from the perspective center 

onto the image plane. We find that our image coordinate system x, h, and its origin M does not 

necessarily have to coincide with location H. 

So what is now a sensor model? It is a set of techniques and of mathematical equations that allow 

us to take an image point P ′ as shown in Slide 2.11 and define a geometric ray going from location

2.2. WHAT IS A SENSOR MODEL? 53 

Definition 4 Perspective camera 

Definition 10 (Modellierung einer perspektiven Kamera(siehe Abschnitt 2.2)): 

Ziel: eine Beziehung zwischen dem perspektivischen Zentrum und der Welt aufzustellen; Werkzeug: 

perspektivische Transformation (projeziert 3 D-Punkte auf eine Ebene), ist eine nichtlineare Transformation. 

Beschreibung von Slide 2.12: 

Man arbeitet mit 2 Koordinatensystemen: 1.Bild-Koordinatensystem (x,y,z), 2.Welt- 

Koordinatensystem (X,Y,Z). Ein Strahl vom Punkt w im 3 D-Objektraum trifft auf die Bildebene 

(x,y) im Bildpunkt c. Das Zentrum dieser Bildebene ist der Koordinatenursprung, von dem aus 

normal zu deren Ebene noch eine zusaetzliche z-Achse verlaeuft, die identisch mit der optischen 

Achse unserer Kameralinse ist. Dort, wo der Strahl diese z-Achse schneidet, hat man das sogenannte 

Linsenzentrum, welches die Koordinaten (0,0,L) besitzt; L ist bei Focuskameras mit der 

Focuslaenge zu vergleichen. Bedingung: 

Z > L 

d.h., alle Punkte, die uns interessieren, liegen hinter der Linse. 

Vektor 

w 0 

gibt die Position der Rotationsachsen im 3 D-Raum an, vom Ursprung des Welt- 

Koordinatensystems bis zum Zentrum der Aufhaengung der Kamera 

Vektor r definiert, wo der Bildursprung ist unter Beruecksichtigung der Rotationsachsen 

(X 0 , Y 0 , Z 0 ), 

welche die Kamera auf und ab 

rotieren lassen koennen, vom Zentrum der Aufhaengung bis zum Zentrum der Bildebene, 

r = (r 1 , r 2 , r 3 ) T 

. 

Perspektivische Transformation: Beziehung zwischen (x,y) und (X,Y,Z) 

Hilfsmittel: aehnliche Dreiecke 

x : L = (−X) : (Z − L) = X : (L − Z) 

y : L = (−Y ) : (Z − L) = Y : (L − Z) 

’-X’ bzw. ’-Y’ bedeuten, dass die Bildpunkte invertiert auftreten (Geometrie) 

x = L · X : (L − Z) 

y = L · Y : (L − Z) 

Homogene Koordinaten von einem Punkt im kartesischen Koordinatensystem: 

w kar = (X, Y, Z) T 

w hom = (k · X, k · Y, k · Z, k) T = (w hom1 , w hom2 , w hom3 , w hom4 ) T , k = const.! = 0 

Zurueckwandlung in kartesische Koordinaten: 

Perspektivische Transformationsmatrix: 

w kar = (w hom1 : w hom4 , w hom2 : w hom4 , w hom3 : w hom4 ) T 

P = 

⎛ 

⎜ 

⎝ 

1 0 0 0 

0 1 0 0 

0 0 1 0 

0 0 −1 : L 1 

⎞ 

⎟ 

⎠


0 (the perspective center) through P ′ into the world. What the sensor model does not tell us is 

where the camera is and how this camera is oriented in space. So we do not, from the sensor 

model, find the world point P in three dimensional space (x, y, z). We only take a camera and an 

image with its image point P ′ and from that can project back into the world a ray, but where that 

ray intersects the object point in the world needs something that goes beyond the sensor model. 

We need to know where the camera is in a World system and how it is oriented in 3D-space. 

In computer vision and in computer graphics we do not always deal with cameras that are carried 

in aircraft looking vertically down and having therefore a horizontal image plane. Commonly, we 

have cameras that are in a factory environment or similar situation and they look horizontally or 

obliquely at something that is nearby. 

Slide 2.12 illustrates the relationships between a perspective center and the world. We have an 

image plane which is defined by the image coordinate axes x and y (was x and h before) and a 

ray from the object space denoted as W will hit the image plane at location C. The center of the 

image plane is defined by the coordinate origin. Perpendicular onto the image plane (which was 

defined by x and y) is the Z-axis and may in this case be identical to the optical axis of the lens. 

In this particular case we would not have a difference between the image point of the perspective 

center (was H before) and the origin of the coordinate system (was M before). 

Now, in this robotics case we have two more vectors that define this particular camera. We have a 

vector r that defines where the image origin is with respect to our rotation axis that would rotate 

the camera. And we have a vector W 0 that gives us the position of that particular rotation axis 

in 3D-space. We still need to define for that particular camera its rotation axis that will rotate 

the camera up and down and that is oriented in a horizontal plane. We will talk about angles 

and positions of cameras later in the context of transformations. Let us therefore not pursue this 

subject here. All we need to say at this point is that a sensor model relates to the sensor itself 

and in robotics one might understand the sensor model to include some or all of the exterior 

paraphernalia that position and orient the camera in 3D-space (the pose). In photogrammetry, 

just that later data are part of the so-called exterior orientation of the camera. 


• Erläutern Sie den Begriff ” 

Sensor-Modell“! 

2.3 Image Scanning 

Images on film need to be stored in a computer. But before they can be stored they need to be 

scanned. On film an image is captured in an emulsion. The emotion contains chemistry and as 

light falls onto the emulsion the material gets changed under the effect of photons. Those changes 

are very volatile. They need to be preserved by developing the film. The emulsion is protected 

from the environment by supercoats. The emulsion itself is applied to a film base. So the word 

“film” really applies to just the material on which the emulsion is fixed. There is a substrate that 

holds the emulsion onto the film base and the film base on its back often has a backing layer. That 

will be a black and white film. 

With colored film we have more than one emulsion. We have three of those layers on top of one 

another. We are dealing mostly with digital images, so analog film, photolabs and chemical film 

developments are not of great interest of us. But we need to understand a few basic facts about 

film and the appearances of objects in film. 

Slide 2.15 illustrate that appearance. We have the ordinate of a diagram to record the density 

that exists from the reflections of the world onto the emulsion. Those densities are 0 when it is 

very white, there is no light and the film is totally transparent (negative film!). And as more and 

more light falls onto that film the film will get more exposed and the density will get higher until

2.3. IMAGE SCANNING 55 

the negative is totally black. Now this negative film is exposed by the light that is emitted from 

the object through a lens onto the film. Typically, the relationship between the density recorded 

on film and light emitted from an object is a logarithmic one. As the logarithm of the emitted 

light increases along the abszissa the density will typically increase linearly and that is the most 

basic relationship between the light falling onto a camera and the light recorded on film, except 

in the very bright and the very dark areas. When there is almost no light falling on the film, the 

film will still show what is called a gross fog. So film typically will never be completely unexposed. 

There will always appear to be an effect as if a little bit of light had fallen onto the film. We have 

a lot of light coming in, we loose the linear relationship again and we come to the “shoulder” of 

the gradation curve. As additional light comes in, the density of the negative does not increase 

any more. 

Note that the slope of the linear region is denoted here by tan(α) and is called the gamma of the 

film. This defines more or less sensitive films and the sensitivity has to do with the slope of that 

linear region. If a lot of light is needed to change the density, we call this a slow or “low sensitivity 

film”. If a small change in light causes large change in density then we call this a “very sensitive 

film” and the linear region is shallower. The density range that we can record on film is often 

perhaps between 0 to 2. However, in same technical applications or in the graphic arts and in the 

printing industry, densities may go up to 3.6. And in medicine X-ray film density is going up as 

high as 4.0. Again, we will talk more about density later so keep in mind those numbers: Note 

that they are dimensionless numbers. We will interpret them later. 

We need to convert film to digital images. This is based on one of three basic technologies. 

First, so-called drum scanners have the transparent film mounted on the drum, inside the drum 

is a fixed light source, the drum rotates, the light source illuminates the film and the light that 

is coming through the film is collected by a lens and put on a photo detector (photo-multiplier 5 ). 

The detector sends electric signals which get A/D converted and produce at rapid intervals a series 

of numbers per one rotation of the drum. We do get a row of pixels per one drum rotation. That 

has been very popular but has recently been made obsolete because this device has sensitive and 

rapid mechanic movements. It is difficult to keep these systems calibrated. 

Second, a much simpler way of scanning is by using not a single dot but a whole array of dots, 

namely a CCD (charge-coupled-device). We put them in a scan-head and collect light that is for 

example coming from below the table, shining through the film, gets collected through the lens 

and gets projected onto a serious of detectors. There may be 6000, 8000, 10.000 or even 14.000 

detectors. And these detectors collect the information about one single line of film. The detector 

charges are being read out, an A/D converter produces for each detector element one number. 

Again, the entire row of detectors will create in one instant a row of pixels. How do we get a 

continuous image? Of course by moving the scan head and we can be in the process of collecting 

the charges built up row by row into an image (push-broom technology). 

Third, we can have a square array detector field. The square CCD is mounted in the scan-head 

and the scan-head “grabs” a square. How do we get a complete image that is much larger that a 

single square? 

By stepping the camera, stopping it, staring at the object, collecting 1000 by 1000 pixels, reading 

them out, storing them in the computer, moving the scan head, stopping it again, taking the next 

one and so on. That technology is called step and stare. An array CCD is used to cover a large 

document by individual tiles but then assemble the tiles into a seam-less image. 

We get the push-broom single-path linear CCD array scanner typically in desktop-, household-, 

H.P.-, Microtec-, Mostec-, UMAX-type products. 

Those create an image in a single swath and are limited by the length of the CCD array. If we 

want to create a larger image than the length of a CCD array then we need to assemble image 

segments. 

5 in German: Sekundärelektronenverfielfacher


So to create a swath by one movement of the scan head, we step the scan head over and repeat 

this swath in the new location. This is called the multiple path linear CCD scanner. Another 

name for this is xy-stitching. The scan head moves in x and y, individual segments are collected, 

then will be “stitched” together. 


• Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen 

(z.B. Fotografien)! 

2.4 The Quality of Scanning 

People are interested in how accurate scanning is geometrically. The assessment is typically based 

on scanning a grid and comparing the grid intersections in a digital image with the known grid 

intersection coordinates of the film document. A second issue is the geometric resolution. We 

check that by imaging a pattern. 

Slide 2.22 is called a US Air Force Resolution Target and each of the patterns has a very distinct 

distance between the black lines and intervals between of those black lines. As those black lines 

get smaller and narrower together we challenge the imaging system more and more. 

If we take a look at an area that is beyond the resolution of the camera than we will see that we 

cannot resolve the individual bars anymore. The limiting case that we can just resolve is used to 

describe the resolution capability of the imaging system. That may describe the performance of a 

scanner but it may just as well describe the resolution of a digital camera. 

These resolution targets come with tables that describe what each element resolves. For example, 

we have groups of six elements each (they are called Group 1, 2, 3, 4, 5, 6) and within each group 

we find six elements. 

In the example shown in Slide 2.24 one sees how the resolution is being designated by line pairs 

per millimeter. However, we have a pixels and the pixels have a side length. How do we relate 

the line pairs per millimeter to pixel diameter? We will discuss this later. 

The next subject for evaluating a digital image and developing a scanner is the gray value performance. 

We have a Kodak gray wedge that has been scanned. On the bright end the density 

is 0, on the dark end the density is 3.4. We have now individual steps of 0.1 and we can judge 

whether those steps get resolved both in the bright as well as in the dark area. On a monitor like 

this we can not really see all thirty-four individual steps in intervals of 0.1 D from 0 to 3.4. We 

can use Photoshop and do a function called histogram equalization, whatever that means, on each 

segment of this gray wedge. As a result we see that all the elements have been resolved in this 

particular case. 


• Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren 

kann man sie ermitteln? 

2.5 Non-Perspective Cameras 

Cameras per se have been described as having a lens projecting light onto film and then we scan 

the film. We might also have instead of film a digital square array CCD in the film plane to get 

the direct digital image. In that case we do not go through a film scanner. We can also have a

2.6. HEAT IMAGES OR THERMAL IMAGES 57 

camera on a tripod with a linear array, moving the linear array while the light is falling on the 

image plane collecting the pixels in a sequential motion much like a scanner would. There also 

are stranger cameras yet which do not have a square array in the film plane and avoid a regular 

perspective lens. These are non-perspective cameras. 

First let us look at a linear CCD array in howing a CCD array with 6000 elements that are 

arranged side by side, each element having a surface of 12 mm x 12 mm. These are being read out 

very rapidly and so that a new line can be exposed as the array moves forward. For example, an 

interesting arrangement with two lenses is shown in Slide 2.28: the two lenses expose one single 

array in the back. Half of the array looks in one direction, half in the other direction. By moving 

the whole scan head we now can assemble two digital strip images. Such a project to build this 

camera was completed as part of a PhD thesis in Graz. The student built a rig on top of his car, 

mounted this camera, he drove through the city, collecting images of building facades as we have 

seen earlier (See Chapter 0). 


• Welche Vor- und Nachteile haben nicht-perspektive (optische, also etwa Zeilen-, Wärmeoder 

Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? 

2.6 Heat Images or Thermal Images 

Heat images collect electromagnetic radiation in the middle to the far infrared, not in the near 

infrared. So it is not next to visible light in the electromagnetic spectrum. That type of sensing can 

be accomplished by a mirror that would illuminate (look at) essentially one small instantaneous 

field-of-view (IFOR), in the form of a circular area on the ground, collect the light from there, 

project it onto a detector and make sure that in a rapid sequence one can collect a series of those 

circular areas on the ground. 

What we have here is an instantaneous angle-of-view α. We have the center of a cone that relates 

the sensor to the ground, and the axis of the cone is at an angle of the vertical called “A”. In the 

old days, say in the sixties and seventies, often-times the recording was not digital but on film. 

Slide 2.35 illustrates the old-fashioned approach. We have infrared-light coming from the ground. 

It is reflected off a mirror, goes through an optical system that focuses that light on the IRdetector, 

it converts the incoming photons into an electric signal which is then used to modulate 

the intensity of light which is then projected via another lens and a mirror onto a piece of curved 

film. 

Slide 2.36 was collected in 1971 or 1972 in Holland. These thermal images were taken from an 

airplane over regularly patterned Dutch landscapes. What we see here is the geometrical distortion 

of fields, as a result of the airplane wobbling in the air as the individual image lines are collected 

in each row. Each image line is accrued to its previous one by a sequential motion of the airplane. 

A closer look shows that there are areas that are bright and others that are dark. If it is a positive 

then the bright things are warm, the dark things are cold. 



Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras?


2.7 Multispectral Images 

We already saw the concept of multi-spectral images. In principle they get, or in the past have 

been, collected by a rotating mirror that reflects the light from the ground off a mirror onto a 

refraction prism. The refraction prism splits the white light coming from the ground into its 

color-components. We have for each color a detector. This could be three for red/green/blue or 

226 for hyper-spectral-systems. Detectors convert the incoming light into an electric signal and 

they get either A/D converted or directly recorded. In the old days recording was onto a magnetic 

tape unit, today we record everything on a digital disc with a so-called direct capture system DCS. 

When one does these measurements with sensors one really is into a lot of open air physics. One 

needs to understand what is light, electromagnetic radiation. When energy comes from the sun a 

lot of it is in the visible area, somewhat less in the ultraviolet, some what less in the infrared. 

The sun’s energy is augmented by energy that the Earth itself radiates off as an active body. 

However, its energy is in the longer wavelengths. The visible light goes, of course, from blue via 

green to red. The infrared goes from the near infrared to the middle and far infrared. As our 

wavelengths get longer we go away from infrared and we go into the short waves, microwaves, long 

microwaves and radiowaves. 

When we observe in a sensor the radiation that comes in from the surface of the Earth we don’t get 

an even distribution of the energy as the sun has sent it to the Earth but we get the reflection of 

the surface and those reflections are depending on what’s on the ground, but also depends on what 

the atmosphere does to the radiation. A lot of that radiation gets blocked by the atmosphere, in 

particular from the infrared on. There are a few windows at 10 micrometers, and at 14 micrometers 

wavelength, where the energy gets blocked less and we can obtain infrared radiation. In the visible 

and near infrared the atmosphere lets this radiation through unless, of course, the atmosphere 

contains a lot of water in form of clouds, rain or snow: that will block the visible light just as well 

as it blocks a lot of the longer wavelength. The blocking of the light in the atmosphere is also a 

measure of the quality of the atmosphere. 

In imaging the Earth’s surface, the atmosphere is a “nuisance”. It reduces the ability to observe 

the Earth’s surface. However, the extent to which we have blockage by the atmosphere tells us 

something about pollution, moisture etc. So something that can be a nuisance to one application 

can also be useful in another. 

We are really talking here about the ideas that are at a base of a field called remote sensing. A 

typical image of the Earth’s surface shown in Slide 2.42. 

In a color photograph has no problem from the atmosphere, we have the energy from the sun 

illuminating the ground, we have the red/green/blue colors of a film image, it can be scanned and 

put into the computer, and the computer can use the colors to assess what is on the ground. 


• Skizzieren Sie das Funktionsprinzip eines ” 

multispektralen Abtastsystemes“ (Multispectral 

Scanner). Sie sind eingeladen, in der Beantwortung eine grafische Skizze zu verwenden. 

2.8 Sensors to Image the Inside of Humans 

Sensors cover a very wide field and imaging is a subset of sensing (think also of acoustics, temperature, 

salinity and things like that). Very well known are so called CAT scans (computer aided 

tomography). That was invented in 1973 and in 1975 the inventors received the Nobel prize, two 

scientists from England (Houndsfield&Cormack). It was the fastest recognition of a breakthrough 

ever. It revolutionized medicine because it allowed medical people to look at the inside

2.9. PANORAMIC IMAGING 59 

of humans at a resolution and accuracy that was previously unavailable without having to open 

up that human. 

Slide 2.44 illustrates the idea of the CAT scan that represents the transmissivity of little cubes of 

tissue inside the human. While a pixel is represented in two dimensions, here each gray value 

represents how much radiation was transmitted through a volume element. So therefore those gray 

values do not associate well with a 2D pixel but with a 3D voxel or volume element. A typical 

CAT image that may appear in 2D really reflects in x and y a 1 mm × 1 mm base, but in z it 

may reflect a 4 mm depth. 


• Erklären Sie, wie man mit Hilfe der Computertomografie ein dreidimensionales Volumenmodell 

vom Inneren des menschlichen Körpers gewinnt. 

2.9 Panoramic Imaging 

We talked in Chapter 0 about the increasingly popular panoramic images. 

They used to be produced by spy satellites, spy airplanes, spacecraft of other planets or of the 

Earth. The reason why we are interestest in these images is that we would like to have a high 

geometric resolution and a very wide swath, thus a wide field of view at high resolution. Those two 

things are in conflict. A wide angle lens gives an overview image or one has to have a tele-lens to 

give a very detailed image, but only of a small element of the object. How can we have both a very 

high resolution of a tele-lens and still have a coverage from a wide angle lens? That is obtained 

by moving the tele lens, by sweeping it to produce a panoramic image (compare the material from 

Chapter 0). 



Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? 

2.10 Making Images Independent of Sunlight and in Any 

Weather: Radar Images 

Slide 2.49 is an image taken from a European Space Agency (ESA) satellite called ERS-1, of an 

area in Tirol’s Ötztal. There exists a second image so that the two together permit us to see a 

three dimensional model in stereo. We will talk about this topic of stereo later. How is a radar 

image being produced? Let’s assume we deal with an aircraft sensor. 

Because we are making images with radiation that is way beyond the infrared, namely in the 

microwaves (we have one millimeter to two meter wavelengths, but typically 3 to 4 to 5 cm 

wavelengths). We can not use glass lenses to focus that radiation. We need to use something else, 

namely antennas. So a wave gets generated, it’s traveling through a waveguide to an antenna. 

The antenna transmits the small burst of energy, a pulse. That travels through the atmosphere to 

the ground. It illuminates the area on the ground with a footprint that is a function of the shape 

of the antenna. The ground reflects it back, the antenna goes into the listening mode and “hears” 

is the echo. The echo is coming from the nearby objects first, from the far away objects latest. 

This gets amplified, gets A/D converted, gets sampled and produces a row of pixels, in this case 

radar image pixels. The aircraft moves forward, the same repeats itself 3000 times at second. One


obtains a continuous image of the ground. Since we illuminate the ground by means of the sensor, 

we can image day-and-night. Since we use microwaves, we can image through clouds, snow and 

rain (all weather). 


• Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile 

bietet dieses Verfahren? 

• Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten aus digitale Bilder 

erzeugen, aus welchen ein topografisches Modell des Geländes (ein Höhenmodell) aus einer 

einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der 

elektromagnetischen Strahlung, die für diese Zwecke genutzt werden! 

2.11 Making Images with Sound 

There is a very common technique to map the floor of the oceans. There exists really only one 

technique right now that is widely applicable. Under-Water SONAR. SONAR means sound, 

navigation and range. It is a total analogy to radar except that we don’t use antennas and 

electromagnetic energy but we use membranes that vibrate instead of sound impulses and we 

need water for sound to travel. The sound pulse travels through the water, hits the ground, gets 

reflected, the membrane goes into a listening mode for the echos. These get processed and create 

one line of pixels. As the ship moves forward, line by line gets accrued into a continuous image. 

The medical ultrasound technology is similar to under-water imaging, but there are various different 

approaches. Some methods of sound imaging employ the Doppler-effect. We will not discuss 

medical ultrasound in this class, but defer to later classes in the “image processing track”. 


• Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung! 

2.12 Passive Radiometry 

We mentioned earlier that the Earth is active, is transmitting radio-waves without being illuminated 

by the sun. This can be measured by passive radiometry. We have an antenna, not a lens. 

It “listens” to the ground. The antenna receives energy which comes from a small circular area on 

the ground. That radiation is collected by the antenna, is processed and creates an image point. 

By moving the antenna we can move that point on the ground and thereby have a scanning motion 

producing an image scan that gets converted into a row of pixels. By moving the aircraft forward 

we accumulate rows of pixels for a continuous image. Passive radiometry is the basis of weather 

observations from space where large areas are being observed, for example the arctic regions. 



passiver Radiometrie“?

2.13. MICROSCOPES AND ENDOSCOPES IMAGING 61 

2.13 Microscopes and Endoscopes Imaging 

The most popular microscopes for digital imaging are so called scanning electron-microscopes 

(SEM) or X-ray-microscopes. Endoscopes are optical devices using light to look “inside things”. 

Most users are in medicine to look into humans. There is a lens-system and light to illuminate the 

inside of the human. The lens collects the light, brings it back out, goes in the computer and on 

the monitor the medical staff can see the inside of the human, the inside of the heart, the inside 

of arteries and so forth. The endoscopes are often times taking on the shape of thick “needles” 

that be inserted into a human. 

The same approach is used in mechanical engineering to inspect the inside of engines, for example 

to find out what happens while an explosion takes place inside a cylinder chamber in an engine. 


• Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung 

digitaler Rasterbilder verwendet werden! 

2.14 Objects-Scanners 

The task is to model a 3D object, a head, a face, an engine, a chair. We would like to have a 

representation of that object in the computer. This could already be a result of a complete image 

processing system, of which the sensor is only a component, as is suggested in Slide 2.58. The 

sensor produces a 3D model from images of the entire object. This could be done in various ways. 

One way is to do it by a linear array camera that is being moved over the object and obtains a 

strip-image. This is set up properly in the scanner, to produce a surface patch. Multiple patches 

must be assembled. This is done automatically by making various sweeps of the camera over the 

object as it gets rotated. 

We can also have a person sit down on a rotating chair and a device will optically (by means of 

an infrared laser) scan the head and produce a 3D replica of the head. Or the object is fixed and 

the IR-laser is rotating. 

The next technique would be to scan an object by projecting a light pattern on to the surface. 

That is called structured light 6 . Finally we can scan an object by having something touch it with 

a touch-sensitive pointer and the pointer is under a force that keeps the tip of the pointer on 

the object as it moves; another approach is to have a pointer move along the surface and track 

the pointer by one of may Tracking Technologies (optical, magnetic, sound, see also Augmented 

Reality later on). 


• Welchem Zweck dient ein sogenannter ” 

Objektscanner“? Nennen Sie drei verschiedene Verfahren, 

nach denen ein Objektscanner berührungslos arbeiten kann! 

2.15 Photometry 

We are now already at the borderline between sensors and image processing/image analysis. In 

photometry we do not only talk about sensors. However, photometry is a particular type of sensor 

6 in German: Lichtschnitte


arrangement. We image a 3D object with one camera taking multiple images like in a time series, 

but each image is taken with a different illumination. So we may have four or ten lamps at different 

positions. We take one image with lamp 1, a second image with lamp 2, a third image with lamp 

4 etc. We collect these multiple images thereby producing a multi illumination image dataset. 

The shape reconstruction is based on a model of the surface reflection properties. Reviewing those 

properties, the radiometry of the image produces the object shape. 

2.16 Data Garments 

Developments attributed to computer graphics concern so-called data-garments. We need to 

sense not only properties of the objects of interest, but also need to sense where an observer is 

because we may want to present him or her with a view of an object in the computer from specific 

places and directions. The computer must know in these cases where we are. This is achieved 

with data-gloves and head-mounted displays (HMD). For tracking the display’s pose, we may have 

magnetic tracking devices to track where our head is, in which direction we are looking. There 

is also optical tracking which is more accurate and less sensitive to electric noise, there may be 

acoustic tracking of the position and attitude of the head using ultrasound. 


• Was versteht man unter data garmets“ (Datenkleidung)? 

” 

Geräte dieser Kategorie! 

Nennen Sie mindestens zwei 

2.17 Sensors for Augmented Reality 

In order to understand what the sensor needs for augmented reality, we need first to understand 

what augmented reality is. Let us take a simple view. Augmented reality is a simultaneous visual 

perception by a human being of the real environment, of course by looking at it, and superimposing 

onto that real environment virtual objects and visual data that are not physically present in the 

real environment. 

How do we do this? We provide the human with transparent glasses which double as computer 

monitors. So we use one monitor for the left eye, another monitor for the right eye. The monitors 

show a computer generated image, but they are transparent (or better semitransparent). We not 

only see what is on the monitor, we also see the real world. The technology is called head mounted 

displays or HMDs. Now, for an HMD to make any sense, the computer needs to know where the 

eyes are and in what direction they are looking. Therefore we need to combine this HMD with a 

way of detecting the exterior orientation or pose. 

That is usually accomplished by means of magnetic positioning. Magnetic positioning, however, 

is fairly inaccurate and heavily affected by magnetic fields that might exist in a facility with 

computers. Therefore we tend to augment magnetic positioning by optical positioning as suggested 

in Slide 2.63. A camera is looking at the world, mounted rigidly with the HMDs. Grabbing an 

image, one derives from the image where the camera is and in which direction it is pointed and 

one also detects where the eyes are and in which direction they are looking. Now we have the basis 

for the computer to feed into the glasses the proper object in the proper position and attitude 

so that the objects are where they should be. As we review augmented reality, we immediately 

can see an option of viewing the real world via the cameras and feeding the eyes not with the 

direct view of reality, but indirectly with the camera’s views. This reduces the calibration effort 

in optical tracking. 

Prüfungsfragen:

2.18. OUTLOOK 63 

• Erklären Sie das Funktionsprinzip zweier in der Augmented Reality häufig verwendeter 

Trackingverfahren und erläutern Sie deren Vor- und Nachteile! 

Antwort: 

Tracking Vorteile Nachteile 

magnetisch robust kurze Reichweite 

schnell 

ungenau 

optisch genau Anforderung an Umgebung 

aufwändig 

2.18 Outlook 

The topic of imaging sensors is wide. Naturally we have to skip a number of items. However, some 

of these topics will be visited in other classes for those interested in image processing or computer 

graphics. They also appear in other courses of our school. Two examples might illustrate this 

matter. The first is Interferometry, a sensing technology combined with a processing technology 

that allows one to make very accurate reconstructions of 3D shapes by making two images and 

measuring the phase of the radiation that gave rise to each pixel. We will deal with this off and 

on throughout “image processing”. 

Second, there is the large area of medical imaging, with a dedicated course. This is a rapidly 

growing area where today there are ultrafast CAT scanners producing thousands of images of a 

patient in a very short time. It becomes a real challenge for the doctor to take advantage of 

these images and reconstruct what the objects are of which those images are taken. This very 

clearly needs a sophisticated level of image processing and computer graphics to help human 

analysts with an understanding what’s in the images and to reconstruct the relevant objects 

in 3D. A clear separation of the field into Image Processing/Computer Vision and Computer 

Graphics/Visualization is not really useful and feasible.

64 CHAPTER 2. SENSING




















68 CHAPTER 2. SENSING

Chapter 3 

Raster-Vector-Raster Convergence 

Algorithm 7 Digital differential analyzer 

1: dy = y 2 − y 1 

2: dx = x 2 − x 1 

3: m = dy/dx 

4: y = y 1 

5: for x = x 1 to x 2 do 

6: draw (x, round(y)) 

7: y = y + m {Step y by slope m} 


3.1 Drawing a straight line 

We introduce the well-known Bresenham Algorithm from 1965. The task is to draw a straight 

line on a computer monitor and to replace a vector representation of a straight line that goes from 

the beginning point to an end point by a raster representation in the form of pixels. Obviously, as 

we zoom in on a straight line, that is shown on a computer monitor we do notice that we are really 

looking at an irregular edge of an area that is representing the straight line. The closer we look, 

the more we see that the edge of that straight line is not straight at all. Conceptually, we need to 

find those pixels in a raster representation that will represent the straight line, as shown in Slide 

3.4. The simplest method of assigning pixels to the straight line is the so-called DDA Algorithm 

(Digital Differential Analyzer). Conceptually we intersect the straight line with the columns that 

pass through the center of the patterns of pixels. The intersection coordinates are (x i , y i ) and 

at the next column of pixels they are (x i + 1, y i + n). The DDA algorithm (see Algorithm 7) 

uses rounding operations to find the nearest pixel simple by rounding the y-coordinates. Slide ?? 

illustrates graphically the operations of the DDA algorithm, Slide 3.6 is a conventional procedure 

doing what was just described graphically. Obviously the straight line’s beginning point is defined 

by (x 0 , y 0 ), the end point is defined by (x l , y l ) for simplicity’s sake we say that x is an integer 

value, then we define auxiliary values, dx, dy, y and m as real numbers and we go then through a 

loop column by column of pixels doing rounding operations to find those pixels that will represent 

the straight line. 

The DDA Algorithm is slow because it uses rounding operations. In 1965 Bresenham proposed his 

algorithm that was exceedingly fast and outperformed the DDA Algorithm by far and Pittoray 

in 1967 proposed the Midpoint-Line-Algorithm (see Algorithm ??). These algorithms avoid the 

rounding operations and simply operate with decision variables only. For a long time, the vector 

69

70 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE 

to raster conversion implemented by Bresenham and Pittoray was only applicable to straight 

lines. It was as late 1985 that this ideology of very fast conversion of vector to raster was extended 

to involve circles and ellipses. In this class we will not go beyond straight lines. 

The next six illustrations address the Bresenham Algorithm. We begin by defining the sequence 

of pixels that are being visited by the algorithm as East, and North-East of the previous pixel and 

we find an auxiliary position m which is halfway between the North-East and the East pixel. The 

actual intersection of the straight line with a line through the column of pixels to be visited is 

denoted by Q. Essentially Bresenham now says: ”Given that we know the previous pixel we must 

make a decision whether we should assign to the straight line the pixel NE or the pixel E. You 

can of course immediately see that the approach here is applicable to straight lines that progress 

between the angles of 0 and 45 degrees. However, for directions between 45 and 90 degrees and so 

forth the same ideas apply and with minimum modifications. Slide 3.10 and Slide 3.11 actually 

describe the procedure used for the Midpoint Line Algorithm with the beginning point (x 0 , y 0 ) 

and end point (x 1 , y 1 ) and will come back from the procedure with the set of raster pixels that 

describe the straight line. Again we have to have a dx and a dy with increments E and increments 

NE, we have an auxiliary variable b and we have variables x and y. Now the algorithm itself 

is self-explanatory, we really do not need much text to explain it. The reader is invited to work 

through the algorithm. 

The next two Slide 3.12 and Slide 3.13 explain the basic idea behind the midpoint line algorithm. 

Note that we have introduced an auxiliary point M into the approach and the coordinates of that 

point are (x p + 1, y p + 1/2). The equation of a straight line clearly is 

ax m + by m + c = 0, 

a point that is not on the straight line will produce with the equation a value of either more or 

less than zero. Values larger than zero would be above the straight line, values less than zero with 

a negative signal below the straight line. 

Now we can write the equation of a straight line also as 

y = dy 

dx x + b. 

This can be rearranged as shown in Slide 3.13, we can ultimately write down that a variable d 

that can be larger than zero, equal to zero or less than zero equals 

d = dy(x p + 1) − dx(y p + 1 2 ) + c. 

If d is larger than zero, then the pixel of interest is NE, otherwise the pixel of interest is E. If E 

is selected as the next pixel, then we have to compute a new value for d, a d new , by putting into 

the equation of a straight line the coordinate of a new midpoint M which then we would have to 

call (x p + 2, y p + 0.5), which, if we look at is, is really nothing else but the old value of d + dy. But 

if we select NE as the next pixel, then our midpoint has the coordinates (x p + 2, y p + 1.5), which 

is nothing else but the old value of d + dy − dx. Once we realize that, we see that the equation of 

a straight line comes out for a value of the midpoint M as a + b/2 and if we do that and we don’t 

want to divide anything by two, we simply multiply everything by a factor of 2 and we end up by 

saying 

2d = 2dy − dx. 

So Bresenham’s trick was to avoid multiplications and divisions, and simply make decision 

whether things are larger or smaller than zero and by finding a value that is larger than zero 

add to that value one number if it is less than zero, add another number and work one’s way along 

the straight line from pixel to pixel. So this was a pretty creative algorithm to be fast. 

There is a problem. The line that is horizontal has a sequence of pixels that are basically a pixel

3.2. FILLING OF POLYGONS 71 

diameter of ?. See in Slide 3.15 that line a would be a dark line. However if we incline that line by 

45 degrees, then the pixels we find to be assigned to that line have a distance that is the diameter 

of a pixel times the square root of 2. Therefore we have across the entire length of straight line 

fewer pixels, the same line would be less dark. We will address this and related subjects later in 

section 3.3. 


• Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus 

gegenüber dem DDA-Algorithmus. 

• Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt 

werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden. 

Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln 

führen. 

• Das Quadrat Q in normalisierten Bildschirmkoordinaten aus Beispiel B.2 wird in ein Rechteck 

R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten transformiert. Zeichnen Sie die 

Verbindung der zwei Punkte p ′ 1 und p ′ 2 in Abbildung B.20 ein und bestimmen Sie grafisch 

jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu 

approximieren! 

3.2 Filling of Polygons 

Another issue when converting from the vector world to the raster world is dealing with areas that 

have boundaries in the form of polygons. Such polygons could be convex, concave, they could 

intersect themselves, they could have islands. It is very quickly a non-trivial problem to take a 

polygon from the vector world, create from it a raster representation and fill the area inside the 

polygons. Slide 3.17 illustrates the issue. Instead of finding pixels along the polygon we simply 

have the task of finding pixels that are inside the polygon represented by a sequence of vectors. 

We define a scan line as a row of pixels going from left to right . The illustrations in Slide 3.17 

illustrate that the first pixel would be assigned when along the scan line we intersect the first 

vector, and every time we find along the scan line an intersection with a vector from the polygon, 

we change from assigning pixels to not-assigning pixels and vice-versa. 

A second approach shown in Slide 3.18 is the idea of using the Bresenham algorithm to rastorize 

all the vectors defining the polygon and then, after that, go along the scan lines and take the pairs 

of pixels from the Bresenham algorithm and fill intermediate spaces with additional pixels. As 

we can see in this example, that approach may produce pixels that have a center outside of the 

actual polygon. There is yet another algorithm that we could use that takes the polygonal points 

at the inside of the polygon. That is different from the previous application of the Bresenham 

algorithm. 

Slide 3.21 illustrates for the first time a concept, which we will address in a moment, and that is 

if we have a very narrow polygon, a triangle, we might get a very irregular pattern of pixels, and 

when we look at this kind of pattern, we notice that we have a severe case of aliasing. Aliasing is 

a topic of interest in computer graphics. 


• Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon ausgefüllt 

(also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme 

treten auf, wenn das Polygon sehr ” 

spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)?


3.3 Thick lines 

A separate subject is the various ways one can use to plot thick lines, not simply applying a 

Bresenham algorithm to a mathematically infinitely line, but to say a fat line. One way of doing 

that is to apply a Bresenham algorithm and then replicate the pixels along the columns and 

saying that when found a pixel according to Bresenham I now make five pixels out of that. We do 

that, then the thickness of the line becomes a function of the slope of the straight line. A second 

way of plotting a thick line is by taking the Bresenham pixels and think of applying at each 

location of the pixel a rectangular pen. That is, as in the example of Slide 3.23 a pensize of 5 × 5, 

each 25 pixels (see Algorithm 8. 

Algorithm 8 Thick lines using a rectangular pen 

1: procedure drawThickLine2(x1,y1,x2,y2,thickness,color); 

2: var x,i:integer; 

3: p1x,p1y,p2x,p2y:integer; 

4: dx,dy,y,m:real; 

5: Begin 

6: dy:=y2-y1; 

7: dx:=x2-x1; 

8: m:=dy/dx; 

9: y:=y1; 

10: for x:=x1 to x2 do 

11: p1x:=x-(thickness div 2); {upper left point} 

12: p1y:=Round(y)+(thickness div 2); 

13: p2x:=x+(thickness div 2); {lower right point} 

14: p2y:=Round(y)-(thickness div 2); 

15: drawFilledRectangle(p1x,p1y,p2x,p2y,color); {rectangle with p1 and p2} 

16: y:=y+m; 

17: end for; 

18: end; {drawThickLine2} 

{Note: drawFilledRectangle draws a rectangle given by the upper left 

and the lower right point. If you want to use a circular pen simply replace the rectangle with 

drawFilledCircle(x,y,(thickness div 2),color). Syntax: drawFilledCircle(mx,my,radius,color)} 

The difficulty of fat lines becomes evident if we have circles. Let us assume in Slide 3.25 that we 

have pixel replication as the method, we use Bresenham to assign pixels to the circle and then 

we add one pixel at top and one pixel below at each pixel. What we can very quickly see is that 

the thickness of the line describing the circle is good at zero and ninety degrees, but is narrower 

at 45 degrees where the same thickness, which was t at 0 and 90 degrees reduces to t divided by 

square root of two. This problem goes away if we think of using a moving pen with 3 × 3 pixels. 

In that case the variation in pixels goes away. Yet another approach will be, that if we apply 

a vector-to-raster-conversion algorithm, to two contours by changing the radius of the circle and 

then we fill the area described by the two contours with pixels again, we see that we do avoid the 

change in thickness of the lines. 


• Nennen Sie verschiedene Techniken, um ” 

dicke“ Linien (z.B. Geradenstücke oder Kreisbögen) 

zu zeichnen.

3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 73 

Definition 5 Skeleton 

The skeleton of a region R contains all points p which have more than one nearest neighbour on 

the border-line of R. The points p are the centers of these discs which intersect the border-line b 

in two or more points. 

The detection of skeletons is useful for shape recognition and runs in O(n 2 ) for concave polygons 

and O(n log n) for convex polygons. 

3.4 The Transition from Thick Lines to Skeletons 

The best known algorithm to make a transition from a thick line or an area to a representation 

by the area’s skeleton is by Bloom from the year 1967. We define a region R, and its borderline 

B. The basic idea of the medial axis transform (see Definition 3.4) is to take a region as shown in 

Slide 3.30 and replace this region by those pixels (string of individual pixels) which have more than 

one single nearest neighbor along the boundary of the area. When we look at the area at example 

(a) in the slide, we can very quickly recognize that every point along the dashed lines has two 

nearest points along the border, either on the left and right border or on the left and top border 

etc. When we create a pattern like that and we have a disturbance as we see in image (b) of that 

slide, we see that we get immediately a stop from the center line leading towards the disturbances. 

Example (c) shows how this basic idea of finding pixels who have two nearest neighbors along the 

borderline will create a pattern when the area itself is not rectangular, but has an L-shape. Slide 

3.31 summarizes in words, how we go from a region to a boundary line b, and from the boundary 

line we go to pixels p which have more than a single nearest neighbor on the boundary line b. As 

a result the pixels p form the so-called medial axis of region R. 

This basic matter of finding the medial axis is expensive, because the distances need to be computed 

among all the pixels within the region R and all the pixels on the boundary line B. A lot of sorting 

would go on. For this reason, Bloom considered a different approach. He said, the transition 

from the region to the skeleton, or the medial axis, is better achieved by means of rethinning 

algorithm. Therefore we go from the edge of a region and we delete contour pixels. What is a 

pixel on the contour? A pixel on the contour is part of the region R and has a value of 1 in a 

binary representation, and it has at least one zero among its eight neighbors, which is therefore a 

pixel that does not belong to region R. Slide 3.32 explains the basic idea of a thinning algorithm. 

We have a pixel p 1 and its eight neighbors p 2 through p 9 . We can now associate with a pixel p 1 

a number of non-zero neighbors by simply adding up the gray values of the eight neighborhoods. 

We compute a second auxiliary number S of p 1 , which is the number of transitions from zero to 

one in the ordered set of values of pixels p 2 to p 8 . The decision whether a pixel p 1 gets deleted or 

not depends on the outcome of four computations. We compute ( also shown in Slide 3.34 ). 

Pixel p is deleted if: 

Pixel p is also deleted if: 

2


we can see that after the initial iteration through all pixels, which pixels have been deleted. After 

five iterations the result is obtained in slide Slide 3.36. 

We have now dealt with the issue of converting a given vector to a set of binary pixels and have 

denoted that as vector raster conversion, this is also denoted as scan conversion and it occurs in 

the representation of vector data in a raster monitor environment. What we have not yet talked 

about is the inverse issue, namely given is a raster and a pattern and we would like to get vectors 

from it. We have touched upon a raster pattern and replacing it by a medial axis or skeleton. But 

we have not yet really come out from that conversion with a set of vectors. Yet, the raster-vectorconversion 

is an important element in dealing with object recognition. A particular example has 

been hinted at in Slide 3.36 because it clearly represents an example from character recognition. 

The letter H in a binary raster image is described by many pixels. To recognize as a raster H, 

might be based on a conversion to a skeleton, a replacement of the skeleton by a set of vectors and 

then by submitting those vectors to a set of rules that would tell us which letter we are dealing 

with. 


• Wenden Sie die ” 

medial axis“ Transformation von Bloom auf das Objekt in Abbildung B.39 

links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen.

3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 75









3.4. THE TRANSITION FROM THICK LINES TO SKELETONS 77 



Slide 3.37 Slide 3.38

78 CHAPTER 3. RASTER-VECTOR-RASTER CONVERGENCE

Chapter 4 

Morphology 


• Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch, 

mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser 

Pixelanordnung. 

4.1 What is Morphology 

This is an interesting subject. It is not very difficult yet also not to be underestimated. We talk 

about shape and the structure of objects in images. It’s a topic that has to do with binary image 

processing. Recall that binary images have pixels that are only either black or white. Objects 

typically are described by a group of black pixels and the background consists of all white pixels. 

So one has a two-dimensional space of integer numbers to which we apply set theory in morphology. 

Let us take an object - we call it A - and that object is hinged at a location designated in Slide 4.5 

by a little round symbol. Morphology now says that A is a set of pixels in this two-dimensional 

space. A separate object B is also defined by a set of pixels. We now translate A by distance x 

and obtain a new set called Ax. The translation is described by two numbers, x 1 and x 2 , for the 

two dimensions of the translation. 

We can write the expression in Slide 4.6 to define the result A after the translation: Ax consists of 

all pixels c, so that c is equal to a + x, where a are all the pixels from pixel set A. Geometrically 

and graphically we can illustrate the translation very simply by the two distances x 1 and x 2 of 

Slide 4.7. Instead of A we have (A)x. A very simple concept for humans becomes a somewhat 

complex equation in the computer. 

Morphology also talks about “reflection”. 

We have a pixel set B and reflect it into a set ˆB, which is the set of all pixels x such that x is 

−b, where b is each pixel from pixel set B. The interpretation of −b is needed. Geometrically, ˆB 

is the mirror reflection of B, and we have mirrored B over the hinge point (point of reflection). 

The next concept we look at is “complementing” a set A into a set A C . A C is the set of all pixels 

x so that x are just all those pixels that do not belong to set A. 

An object to be composed of all the pixels inside a contour is called A, and A C is the background. 

Next we can take two or objects A, B and we build a difference A − B. The difference is the 

set of all pixels x, such that x belongs to set A but not to set B. We can describe this by a new 

symbol and say this is the interscetion of two sets, namely of set A and the complement B C of B. 

79

80 CHAPTER 4. MORPHOLOGY 

Definition 6 Difference 

Given two objects A and B as sets of pixels (points of the 2D-Integer-space). 

The difference of the two sets A and B is defined as 

A − B = {x|x ɛ A, x not ɛ B} = A intersects B C . 

Slide 4.14 shows A, B and A − B is now A reduced by the area of B covering part of A. 


• Was ist Morphologie“? 

” 

Antwort: die Anwendung nichtlinearer Operatoren auf die Form eines Objekts 

4.2 Dilation and Erosion 

“Dilation” means that we make something bigger (in German: Blähung). The symbol we use to 

describe the dilation of a set A using a “structure element” B is shown in Slide ??. A dilated by 

B is the collection of all the pixels x that belong to the reflected and translated structure element 

B, and belong to A, provided they are not zero, or not empty. 

This sounds pretty difficult, but when we look at it geometrically it is very simple. 

A be a square with a side length d and B is another square of a diameter d/4. If we reflect B 

around a reflection point that is in the center of the square then the reflection is the same as the 

original structure element. Thus we reflect (with no effect) and shift B by a distance to pixel x. 

As we go to each pixel x of set A, we place the (reflected) structure element there - we translate 

ˆB to the location x - and we now have the union of the pixels in set A and the structure element 

ˆB. We add up all pixels that are in the union of A and ˆB. What we do is to add a little fringe 

around area A that is obtained by moving the pixels of set B over A and through all pixels of 

A. ˆB will extend along the fringe of A, so we make A a little larger. If our structure element is 

not a square but a rectangle of dimension d in one direction and d/4 in the other, then we obtain 

an enlargement of our area A that is significant in one direction and less significant in the other 

direction. 

Algorithm 9 Dilation 

1: for all x do 

2: Y = Translate(Reflect(B), x) 

3: for all y element Y do 

4: if (y element A) AND (x not element X) then 

5: Insert(X, x) 

6: end if 



9: return X 

Dilation has a sister operation called “erosion” (Abmagerung). The erosion is thex x opposite of a 

dilation, and the symbol designating an erosion is a little circle with a minus in it, shown in Slide 

4.18. 

The result of an erosion consists of all those pixels x that come from the structure element B 

placed at location x such that the shifted structure element completely lies within set A. How 

does this look like geometrically?

4.2. DILATION AND EROSION 81 

Definition 7 Erosion 

X ⊖ B = {d ∈ E 2 : B d ⊆ X} 

B . . . binary erosion matrix 

B d . . . B translated by d 

X . . . binary image matrix 

Outgoing from this equation we get to following equal expression: 

X ⊖ B = ⋂ 

b∈B 

X −b 

In Slide 4.19 we have subtracted from set A a fringe that has been deleted like an eraser of the 

size of B. Doing this with a non-square but rectangular structure element we receive a result 

that in the particular case of Slide 4.19 reduces set A to merely a linear element because there 

is only one row of pixels that satisfies the erosion condition using this type of structure element 

with dimensions d and d/4. 

There is a duality of erosion and dilation because we can express an erosion of set A by structure 

element B as a dilation taking the complement of A C and dilate the complement A C of A with a 

reflection ˆB of B. 

This is being demonstrated in Slide 4.21 where we go through the erosion definition of A by 

structure element B and say the complement of that eroded object A by structure element B 

equals the complement of the set of all pixels x, such that B gets placed over x and we count 

those pixels of A that are not on B. We go through our previous definitions and we can show in 

Slide 4.21 that we end up with a dilation of the complement A C of set A with the reflection ˆB of 

structure element B. 


• Erläutern Sie die morphologische ” 

Erosion“ unter Verwendung einer Skizze und eines Formelausdruckes. 

• Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation 

Erosion“ angewandt werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dilation 

genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In anderen 

” 

Worten: statt der Erosion sollen andere morphologische Operationen eingesetzt werden, die 

in geeigneter Reihenfolge nacheinander ausgeführt das gleiche Ergebnis liefern wie eine Erosion.) 

Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und 

benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende 

Formelement ist ebenfalls in Abbildung B.65 dargestellt. 

Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt aus der 

Definitionsmenge Z 2 zeigt! 

Antwort: Die morphologische Erosion kann durch eine Abfolge der folgenden Operationen 

ersetzt werden (siehe Abbildung 4.1): 

1. Komplement 

2. Dilation 

3. Komplement


1 2 

3 

Formelement 

Figure 4.1: Morphologische Erosion als Abfolge Komplement→Dilation→Komplement 

• Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich 

durch die Gleichung 

(A ⊖ B) c = A c ⊕ ˆB 

formulieren. Warum ist in dieser Gleichung die Reflexion ( ˆB) von Bedeutung? 

• Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen Erosion“ ” 

bzw. Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur 

” 

Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion 

bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden 

kann! 

Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen. 

Antwort: Man betrachtet den gewünschten Kernel für die morphologischen Operationen 

als Filtermaske (mit ” 

1“ für jedes gesetzte ” 

Pixel“ im Kernel, ” 

0“ sonst) und faltet das 

Binärbild mit dieser Maske. Im Ergebnisbild stehen nun Werte g(x, y), wobei 

– g(x, y) ≥ 1, wenn mindestens ein Pixel der Maske mit dem Inputbild in Deckung war 

(Dilation), bzw. 

– g(x, y) ≥ K, wenn alle Pixel der Maske mit dem Inputbild in Deckung waren (Erosion), 

wobei K die Anzahl der gesetzten Maskenpixel ist. 

4.3 Opening and Closing 

We have a more complex operation that is a sequence of previously defined operations. We call 

them “opening” and “closing”. Let’s take first the question of opening. We may have two objects, 

one to the left and the other one to the right and they are connected by a thin bridge, perhaps 

because of a mistake in sensing and preprocessing of the data.

4.3. OPENING AND CLOSING 83 

We can separate those two objects by an operation called “opening”. Opening a set A by means 

of a structure element B is defined by a symbol shown in Slide 4.24 namely by an open little 

circle. This begins with the erosion of A using structure element B, and subsequently dilating 

again the result by the structure element B as well. So we first shrink, then we enlarge again. But 

in shrinking we get rid of certain things that are not there anymore when we enlarge. 

Slide 4.25 shows the circular structure 

Definition 8 Open 

A ◦ B = (A ⊖ B) ⊕ B 

◦. . .open , ⊖. . .erosion , ⊕. . .dilation ; B is a circular structure element 

element B and the original object A. As we now erode object A and obtain a shrunk situation, 

object A is certainly broken up into two eroded smaller objects. 

Now the bridge between the two points in the original set A is narrower than the size of the 

structure element, so the structure element will, like an eraser, erase that bridge. Now we want to 

go back to the original size. So we dilate with the structure element B again and what we obtain 

is now the separation of thinly connected objects. 

Slide 4.27 and Slide 4.28 are a summary of the opening operation. 

We proceed to the “closing” operation. Closing set A with the help of structure element B is 

defined by a little filled circle. 

We first dilate A by B and then we erode the result by structure element B. We do the opposite 

of opening. The process will remove little holes in things. One will not break up, but connect, 

one will fill in, remove noise. 

Definition 9 Closing 

A • B = (A ⊕ B) ⊖ B 

⊖ Erosion: remove all structures smaller than the structure element B 

⊕ Dilation: restore the original size excepting the removed structures 

Closing set A with structure element B means to first dilate A by B and afterwards erode 

the result by structure element B. 

Slide 4.30 Slide 4.31 Slide 4.32 Slide 4.33 feature a complex shape. The shape seems to break 

apart when it really should not. We take the original figure and dilate (make it larger). As it 

grows, this will reduce small details. The resulting object is less sophisticated, less detailed than 

we had before. 

Closing an object A using the structure element B can again be shown to be the dual with opening, 

concerning complementarity and reflection. Closing an object A with respect of structure element 

B and creating the complement of the result is the same as opening the complement of A with 

the mirror reflection of structure element B. 


• Erläutern Sie das morphologische ” Öffnen“ unter Verwendung einer Skizze und eines Formelausdruckes.


Formelement 

Figure 4.2: morphologisches Öffnen 

• Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man1 die zugrundeliegenden 

Operationen (Erosion und Dilation) wiederholt ausführen. Welches der 

folgenden beiden Verfahren führt zum gewünschten Ergebnis: 

1. Es wird zuerst die Erosion n-mal ausgeführt und anschließend n-mal die Dilation, also 

(((A ⊖B) . . . ⊖ B) ⊕B) . . . ⊕ B 

} {{ } } {{ } 

n−mal ⊖ n−mal ⊕ 

2. Es wird die Erosion ausgeführt und anschließend die Dilation, und der Vorgang wird 

n-mal wiederholt, also 

(((A ⊖B) ⊕ B) . . . ⊖ B) ⊕ B 

} {{ } 

n−mal abwechselnd ⊖/⊕ 

Begründen Sie Ihre Antwort und erklären Sie, warum das andere Verfahren versagt! 

(a) ist richtig, bei (b) bleibt das Objekt nach der ersten ⊖/⊕-Iteration un- 

Antwort: 

verändert. 

• Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öffnen“ 

mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische ” 

Effekt tritt auch in diesem Beispiel auf? 

Weiße Pixel gelten als logisch 0“, graue Pixel als logisch 1“. Sie können das Ergebnis 

” ” 

rechts in Abbildung B.31 eintragen. 

Antwort: siehe Abbildung 4.2, typischer Effekt: Trennung von Regionen, die durch eine 

schmale Brücke“ verbunden sind 

” 

4.4 Morphological Filter 

Definition 10 Morphological filter 

A morphological filter consits out of one or more morphologic operations such as dilation, erosion, 

open, close, hit and miss that are applied sequentially to an input image. 

A very simple application is morphological filtering. Say we have an object such as an ice floe on 

the ocean and we have several little things floating around it. We would like to recognize and map 

the large ice floe. 

1 abgesehen von einer Vergrößerung des Maskenelements B

4.5. SHAPE RECOGNITION BY A HIT OR MISS OPERATOR 85 

We would like to isolate this object, measure its surface, its contour, see where it is. In an 

automated process we need to remove all the trash around it. We need to fill the holes and get 

rid of the extraneous details on the open water. 

Morphological filtering is illustrated in Slide 4.38 and Slide 4.39. 

We find a structure element which has to be a little larger than the elements that we would like to 

remove. We first let that structure element run over the image and perform an erosion operation. 

When we erode with the structure element every object that is smaller than that structure element 

will disappear, but those holes will get bigger. We follow with dilation after the erosion. That 

combination is what we call the opening operation. We have removed all the small items outside 

the object, but the elements inside the object are still there. 

We now do the opposite operation, namely the closing. That means we take the opening result 

and do a dilation, which increases the size of the object in such a way that it will also close up all 

the holes, then we have to shrink it again. We have to do a dilation with our structure element B 

and that operation is “closing”. The sequence of opening, thinning a result, and closing, produces 

a clean object without extraneous detail. We have applied morphological filtering. 


• Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern 

Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert. 

Verwenden Sie bitte dazu Formelausdrücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf. 

Stellen Sie auch das Ergebnisbild dar. 

• Erklären Sie anhand eines Beispiels den Vorgang des morphologischen Filterns! 

4.5 Shape Recognition by a Hit or Miss Operator 

Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have 

three small objects X, Y and Z and we would like to find object X as shown in Slide 4.41 

The union of X, Y , and Z is denoted as the auxiliary object A. Now we define a structure element 

W , and from that structure element a second structure element as the difference of W and shape 

X that we are looking for. That gives an interesting structure element which in this case looks 

like the frame of a window. We build the complement A C of A, which is the background without 

the objects X, Y , and Z. 

If we erode A with X then the object that is smaller than X gets wiped out, the object that is 

larger than X will be showing as an area which results from the erosion by object X. For X we 

obtain a single pixel in Slide 4.42. The automated process has produced pixels that are candidates 

for the object of interest, X. We need to know which pixel to choose. 

We go through this operation again, but use A C as the object and W − X as structure element. 

The erosion of A C by the structure element W − X produces the background with an enlarged 

hole for the 3 objects X, Y , and Z, and two auxiliary objects, namely the single pixel where our 

X is located and a pattern consisting of several pixels for the small objects in Slide 4.43. We 

intersect the two erosion results we had obtained, once eroding A with object X, the other with 

A C eroded by W − Z. The intersection produces a single pixel at the location of our object X. 

This is the so-called Hit-or-Miss-Method of finding an instance where object X exists. 

All other objects that are either bigger or smaller will disappear. The process and the formula 

are shown in Slide 4.46. Slide 4.46 summarizes the Hit-or-Miss Process that was illustrated in the 

previous paragraph. The process uses a symbol with a circle and a little asterisk in it. Again: A


Definition 11 Hit or Miss Operator 

A ⊗ W = (A ⊖ W 1 ) ∩ (A C ⊖ W 2 ) 

Morphology can recognize shapes in an image with the hit-or-miss operator. Assume we have 

three small objects X, Y and Z and we would like to find object X. The union of X, Y , and Z is 

denoted as the auxiliary object X. Now we define a structure element W , and from that structure 

element a second structure element as the difference of W and shape X that we are looking for. 

That gives an interesting structure element which in this case looks like the frame of a window. 

We build the complement A C of A, which is the background without the objects X, Y and Z. If 

we erode A with X then the object that is smaller than X gets wiped out, the object that is larger 

than X will be showing as an area which results from the erosion by object X. For X we obtain 

a single pixel. The automated process has produced pixels that are candidates for the object of 

interest, X. We need to know which pixel to choose. 

We go through this operation again, but use A C as the object and W − X as structure element. 

The erosion of A C by the structure element W − X produces the background with an enlarged 

hole for the 3 objects X, Y , and Z and two auxiliary objects, namely the single pixel where our 

X is located and a pattern consisting of several pixels for the small objects. We intersect the 

two erosion results we had obtained, once eroding A with object X, the other with A C eroded 

by W − Z. The intersection produces a single pixel at the location of our object X. This is the 

so-called Hit-or-Miss-Method of finding an instance where object X exists. 

All other objects that are either bigger or smaller will disappear. The process uses a symbol with 

a circle and a little asterisk in it. Again: A is eroded by X and the complement of A is eroded by 

W − X. The two results get intersected. We have two structure elements, X and W − X. 

is eroded by X and the complement of A is eroded by W − X. The two results get intersected. 

We have two structure elements, X and W − X. 

Slide 4.46 shows that the equation can be rewritten in various forms. 


• Wie ist der ” 

Hit-or-Miss“-Operator A ⊛ B definiert? Erläutern Sie seine Funktionsweise zur 

Erkennung von Strukturen in Binärbildern! 

Antwort: Es gilt 

A ⊛ B = (A ⊖ B) ∩ [ A C ⊖ (W − B) ] , 

wobei W ein Strukturelement größer als B ist. Bei Erosion von A mit B verschwinden 

alle Teile von A, die kleiner sind als B, ein Teil in der Form von B bleibt als isoliertes 

Pixel zurück. Bei Erosion von A C mit W − B werden alle Löcher von A C , die größer sind 

als B, aufgeweitet, während Teile der Form B wieder ein einzelnes Pixel ergeben. Der 

Mengendurchschnitt liefert also genau dort ein gesetztes Pixel, wo ein Teil von A mit B 

identisch ist. 

4.6 Some Additional Morphological Algorithms 

Morphological algorithms that are commonly used deal with finding the contour of an object, 

findintranslationg the skeleton of an object, filling regions, cutting off branches from skeletons. The 

whole world of morphological algorithms is clearly applicable in character recognition, particularly 

in dealing with handwriting. It is always applied in those cases where the object of interest can 

be described in a binary image, where we do not need color nor gray values. Instead we simply 

have object or non-object.

4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 87 

Given an object A in Slide 4.48 and Slide 4.49, we are looking for the contour of A as b(A). We 

use a structure element B to find the contour. The contour of region A is obtained by subtracting 

from A an eroded version of A. The erosion should just be by one pixel. Structure element B is 

a 3 × 3 window. 

Definition 12 Contour 

We present the formal definition of a contour. It is the digital counterpart of a boundary of an 

analog set. 

We are looking for the contour of A as b(A). 

b(A) = A − (A ⊖ B) (4.1) 

We use a structure element B to find the contour. The contour of region A is obtained by 

subtracting from A an eroded version of A. The erosion should just be by one pixel. Structure 

element B is a 3x3 window which looks like : 

⎡ 

a 11 a 12 a 13 

⎤ 

a 31 a 32 a 33 

⎢ 

B = ⎣ . 

. .. . 

⎥ 

⎦ (4.2) 

The contour of a connected set of points R is defined as the points of R having at least one neighbor 

not in R. The contour is the outline or visible edge of a mass, form or object. 

Slide 4.49 shows the erosion of region A and the difference from region A to get to the contour 

pixels. 

Region filling is the opposite operation, starting from a binary representation of a contour. We 

want to fill the interior of the contour. This particular contour is continuous, non-interrupted, 

under an 8-neighborhood relationship (recall: up, down, left, right plus all oblique relationships). 

We build the complement A C of contour A. The structure element B is again a 3 × 3 matrix 

but only using the 4-neighbors. Region filling is an iterative process according to Slide 4.51. 

We get a running index k, which increases as we go through the iterations, create at each step 

an intermediate result that always looks back at the complement A C of A, using the structure 

element B and applying a dilation of the previous iteration by the structure element and the union 

with the complement A C of A, and repeat this step by step, until such time that we do not get 

any new pixels added. The issue is the starting point X 0 , which is an arbitrary pixel inside the 

contour from which we start the process. 

A final illustration of the usefulness of morphology deals with the automated recognition of zip 

codes that are hand-written. 

Slide 4.52 and Slide 4.53 presents a hand-written address that is being imaged. Through some 

pre-processing that hand-writing has been converted into a binary image. The first step might 

be to threshold the gray-tone image to convert to a binary image. After having thresholded the 

address we need to find the area with the zip-codes. 

Let us address the task of extracting all connected components in the area that comprises the 

address field. From a segmentation into components, one finds rectangular boxes containing a 

connected object. One would assume to have now each digit separate from all the other digits. 

However, if two digits are connected like in this example with a digit 3 and a digit 7, then we 

misread this to be one single digit. We can help ourselves considering the shape of the rectangular 

box, plus using knowledge about how many digits one has in a zip-code. It is five basically in 

the United States, so one needs to have five digits and so one can look for joined characters by 

measuring the relative widths of the boxes that enclose the characters. We must expect certain 

dimensions of the box surrounding a digit. Opening and closing operations can separate digits 

that should be separate, or merge broken elements that should describe a single digit. Actual 

character recognition (OCR for “Optical Character Recognition”) then takes each binary image


window with one digit and seeks to find which value between 0 and 9 this could be. This can be 

based on a skeleton of each segment, and a count of the structure with nodes and arcs. We will 

address this topic later in this class. 

As a short outlook beyond morphology of binary images, let’s just state that there is a variation 

of morphology applied to gray value images. 

Gray-tone images can be filtered with morphology, and an example is presented in Slide 4.55. 


• Gegeben sei die in Abbildung ?? dargestellte Pixelanordnung. Beschreiben Sie grafisch und 

mittels Formel das Verfahren der morphologischen Ermittlung des Umrisses des dargestellten 

Objektes mit einem von Ihnen vorzuschlagenden Strukturelement. 

• Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des 

Randes eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete 

Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung 

B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse.

4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 89









4.6. SOME ADDITIONAL MORPHOLOGICAL ALGORITHMS 91 








92 CHAPTER 4. MORPHOLOGY

Chapter 5 

Color 

5.1 Gray Value Images 

A precursor to color images is of course a black & white image. Some basic issues can be studied 

with black & white images before we proceed to color. A regular gray value image is shown in 

Slide 5.3. We need to characterize a gray value image by its densities, the way it may challenge 

our eyes, the manner in which it captures the physics of illumination and reflection, and how it 

is presented to the human viewer. We have discussed such concepts as the density of film, the 

intensity of the light that is reflected from objects, and the quality of an image in terms of its 

histogram. 

Intensity describes the energy, light or brightness. When an intensity value is zero, we are talking 

about darkness, no light. If the intensity is bright, then we should have a large value describing 

it. The opposite is true for film. A film with a density zero is completely transparent, whereas a 

film at a density 4 is totally opaque and will not let any light go through. 

A negative film that is totally transparent is representing an object that does not send any light 

through the optical system. A negative that is totally opaque had been brightly illuminated. The 

opposite is true for a positive film. The darker the positive film, the less light it represents. 

In Chapter 2 we already talked about the eye, but we did not address sensitivity of the eye to 

brightness, energy and light. A notable characteristic of the eye is that it is very sensitive to ratios. 

If we present to an eye two different brightnesses, let’s say a density of 0.11 and 0.10 then the eye 

might perceive this as if it were the same densities as 0.55 and 0.5, both being 10% different from 

one another. The sensitivity of the eye to differences ∆I of the intensity of light I is expressed by 

the Weber-Ratio ∆I/I. 

What is now the interval, as a ratio r, when presenting an image with n discrete gray values? Let 

us define n intensity steps I: 

I n = r n I 0 

If we say intensity I is the maximum and intensity I 0 is the minimum, then we will have to compute 

the value of r that allows to break up the interval I 0 to I into n steps. Slide 5.5 illustrates the 

issue: 

r = n √ 

In 

I 0 

If n = 3, then we have 4 different levels of intensity, namely 1/8, 1/4, 1/2 and 1. The eye needs 

an r value of 0.01 or the differences between two intensities will not be recognizable, conceptually 

presenting a capability of resolving 100 different gray values. 

93

94 CHAPTER 5. COLOR 

A monitor presents an intensity I, that is a function of N, the number of electrons creating the 

intensity on the monitor. Slide 5.6 presents the relationship. Film has a density that relates 

linearly to the logarithm of the energy of light that falls onto the film. The dynamic range is the 

ratio of the highest and lowest intensity that a medium can represent. In the event of a monitor 

that value might be 200, in the event of film it might be 1000, in the event of paper it might be 

100. Note that the dynamic range d is the power of base 10 that the medium can support, thus 

10d. For film the ratio of brightest and darkest intensity is 1000 and therefore film typically has 

a density range d = 3, whereas paper lies at d < 2. 

Continuous tone photography cannot be printed directly. Instead one needs to create so-called 

half tone images by means of a raster pattern. These images make use of the spatial integration 

that human eyes perform. A half tone is a representation of a gray tone. The image is resolved 

into discrete points, each point is associated with an area on paper. At each point one places a 

small dot proportional in size to the density of the object. If it is bright the dots are small, and 

they are large dots if the object is dark. One denotes this also as screening of a gray tone image. 

Note that screening typically is arranged at an angle of 45 o . In Slide 5.7 is a so-called half-tone 

image. Slide 5.8 makes the transition to the digital world. Gray tones can be obtained in a digital 

environment by substituting for each pixel a matrix of subpixels. If we have 2 × 2 subpixels we 

can represent five gray values as shown in Slide 5.9. Similarly, a 3 × 3 pattern will permit one 

to represent 10 different gray values. We call the matrix into which we subdivide pixels a dither 

matrix: a D2 - dither matrix means that 2 × 2 pixels are used to represent one digital gray value 

of a digital image. 

The basic principle is demonstrated in Algorithm 10. An example for the creation of a 3 × 3 dither 

matrix would be: 

D = 

⎡ 

⎣ 6 8 4 

1 0 3 

5 2 7 

An image gray value is checked against each element of the dither matrix and only those pixels 

are set, where the gray value is larger than the value in the dither matrix. 

For a gray value of 5 the given matrix D would produce the following pattern 

⎡ 

P = ⎣ 

0 0 1 

1 1 1 

0 1 0 

⎤ 

⎦ 

⎤ 

⎦ 

A dither matrix of n × n defines n 2 + 1 different patterns. It should be created wisely in order not 

to define patterns that produce artefacts. For instance the following pattern (for a gray value of 

3) would create horizontal lines if applied on larger areas. 

D = 

⎡ 

⎣ 5 3 6 

1 0 2 

8 4 7 

⎤ 

⎦ v=3 

−→ 

⎡ 

⎣P = 

0 0 0 

1 1 1 

0 0 0 

⎤ 

⎦ 


• Was versteht man unter dem ” 

dynamischen Bereich“ eines Mediums zur Wiedergabe bildhafter 

Informationen, und im welchem Zusammenhang steht er mit der Qualität der Darstellung? 

Reihen Sie einige gebräuchliche Medien nach aufsteigender Größe ihres dynamischen 

Bereiches!

5.2. COLOR IMAGES 95 

Algorithm 10 Halftone-Image (by means of a dither matrix) 

1: dm = createDitherMatrix(n, n) {create a Dither-Matrix n × n} 

2: for all pixels (x, y) of the image do 

3: v == getGrayValueOfPixel(x, y) 

4: for all elements (i, j) of dm do {checking the value against the matrix} 

5: if v > dm(i, j) then 

6: setPixel(OutputImage,x · n + i,y · n + j,black) {applying the pattern} 

7: else 

8: setPixel(OutputImage,x · n + i,y · n + j,white) 

9: end if 



Häufigkeit 

Grauwert 

Figure 5.1: Histogramm von Abbildung B.29 

• Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt, 

wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster 

dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt? 

• Skizzieren Sie das Histogramm des digitalen Grauwertbildes aus Abbildung B.29, und kommentieren 

Sie Ihre Skizze! 

Antwort: Das Histogramm ist bimodal, wobei die Spitze im Weiß-Bereich etwas flacher ist 

als im Schwarz-Bereich, da das Bild im hellen Bereich mehr Struktur aufweist als im dunklen 

Bereich (siehe Abbildung 5.1). 

5.2 Color images 

Of course computer graphics and digital image processing are significantly defined by color. Color 

has been a mysterious phenomenon through the history of mankind and there are numerous models 

that explain color and how color works. 

Slide 5.12 does this with a triangle: the three corners of the triangle represent white, black and 

color, so that the arcs of the triangle represent values of gray, tints between white and pure color 

or shades between pure color and black. The concept of tones fills the area of the triangle. A color 

is being judged against existing color tables. A very widely used system is by Munsell. This is 

organized along 3 ordering schemes: hue (color), value (lightness) and saturation. These 3 entities 

can be the coordinate axes of a 3D space. We will visit the 3-dimensional idea later in subtopic 

5.5.


Color in image processing represents us with many interesting phenomena. The example in Slide 

5.16 is a technical image, a so-called false color image. In this case film is being used that is not 

sensitive to blue, but is instead sensitive to green, red and infrared. In this particular film, the 

infrared light falling onto the emulsion will activate the red layer in the film. The red light will 

activate the green layer, the green light will activate the blue layer. As a result, an image will 

show infrared as red. Slide 5.16 is a vegetated area. We recognize that vegetation is reflecting a 

considerable amount of infrared light, much more so than red or green light. Healthy vegetation 

will look red, sick vegetation will reflect less infrared light and will therefore look whitish. 

Color images not only serve to represent the natural colors of our environment, or the electromagnetic 

radiation as we receive it with our eyes or by means of sensors, but color may also be used 

to visualize things that are totally invisible to humans. 

Slide 5.18 is an example of a terrain elevation in the form of color, looking at the entire world. 

Similarly, Slide 5.19 illustrates the rings of planet Saturn and uses color to highlight certain 

segments of those rings to draw the human observer’s attention. The colors can be used to mark 

or make more clearly visible to a human interpreter a physical phenomenon or particular data 

that one wants the human to pay attention to. This is called pseudo-color. 


• Was versteht man unter einem Falschfarbenbild (false color image) bzw. einem Pseudofarbbild 

(pseudo color image)? Nennen Sie je einen typischen Anwendungsfall! 

5.3 Tri-Stimulus Theory, Color Definitions, CIE-Model 

The eye has color sensitive cones around the fovea, the area of highest color sensitivity in the eye. 

It turns out that these cones are not equally sensitive to red, green and blue. Slide 5.22 shows that 

we have much less sensitivity to blue light than we have to green and red. The eye’s cones can see 

the electromagnetic spectrum from 0.4 to 0.7 µm wavelength (or 400 to 700 nanometers). We find 

that the eye’s rods are most sensitive in the yellow - green area. Sensitivity luminance is best in 

that color range. Slide 5.23 illustrates the concept of the tri-stimulus idea. The tri-stimulus theory 

is attractive since it explains that all colors can be made from only 3 basic colors. If one were to 

create all spectral colors from red, green, blue, our cones in the eye would have to respond at the 

levels shown in Slide 5.13. The problem exists that one would have to allow for negative values 

in red, which is not feasible. So those colors cannot be created. Such colors are being falsified by 

too much red. 

The physics of color is explained in Slide 5.25. 

White light from the sun is falling onto an optical prism, breaking up the white light into the 

rainbow colors from ultraviolet via blue, green, yellow, orange to red and on to infrared. These 

are the spectral colors first scientifically explained by Sir Isaac Newton in 1666. We all recall from 

elementary physics that the electromagnetic spectrum is ordered by wavelength or frequency and 

goes from cosmic rays via gamma rays and X rays to ultraviolet, then on to the visible light, from 

there to near infrared, far infrared, microwaves, television and radio frequencies. Wavelengths of 

visible light range between 0.35 µm to 0.7 µm. Ultraviolet has shorter wavelengths in the range 

of 0.3 µm, infrared goes from 0.7 to perhaps several 300 µm. 

We would like to create color independent of natural light. We have two major ways of doing this. 

One is based on light, the other on pigments. We can take primary colors of light and mix them 

up. 

These primary colors would be green, blue and red, spectrally clean colors. As we mix equal 

portions of those three, we produce a white color. If we mix just two of them each we get yellow, 

cyan and magenta.

5.3. TRI-STIMULUS THEORY, COLOR DEFINITIONS, CIE-MODEL 97 

In contrast to additive mixing of light there exist subtractive primaries of pigments. If we want to 

print something we have colors to mix. Primary colors in that case are magenta, yellow and cyan. 

As we mix equal parts we get black. If we mix pairs of them, we get red, green and blue. We call 

yellow, magenta and cyan primary colors, green, red and blue secondary colors of pigment. To 

differentiate between subtractive and additive primaries, we talk about pigments and light. An 

important difference between additive and subtractive colors is the manner in which they are being 

generated. A pigment absorbs a primary color of light and reflects the other two. Naturally then, 

if blue and green get reflected but red is absorbed, that pigment appears cyan, and represents 

the primary pigment “cyan”. The primary colors of light are perceived by the eye’s cones on the 

retina as red, green and blue, and combinations are perceived as secondary colors. 

The Commission Internationale of Éclairage (CIE) has been responsible for an entire world of 

standards and definitions. As early as 1931, CIE confirmed the spectral wavelenghts for red with 

100 nm, green with 546.1 nm and blue with 435.8 nm. 

So far we have not yet been concerned about the dimensions of the color issue. But Munsell 

defined concepts such as hue 1 , intensity (value or lightness), and saturation or chroma 2 . We 

can build from such concepts a three dimensional space and define chromaticity, thus color, as a 

2-dimensional subspace. 

The necessity of coping with negative color as one builds spectral colors from RGB has led the 

Commission Internationale l’Éclairage (CIE) to define 3 primary colors X, Y and Z. CIE defined 

their values to form the spectral colors as shown in Slide 5.27 

The Y -curve was chosen to be identical to the luminous efficiency function of the eye. 

The auxiliary values X, Y and Z are denoted as tri-stimulus values, defining tri-chromatic coefficients 

x, y, z as follows: 

x = 

y = 

z = 

X 

X + Y + Z 

Y 

X + Y + Z 

Z 

X + Y + Z 

and x + y + z = 1. 

A 3-dimensional space is defined by X, Y, Z and by x, y, z. X, Y, Z are the amounts of red, 

green, and blue to obtain a specific color; whereas x, y, z are normalized tri-chromatic coefficients. 

One way of specifying color with the help of the tri-chromatic coefficients is by means of a CIE 

chromaticity diagram. 

A two dimensional space is defined by the plane x + y + z = 1 with an x-and a y-axis, whereby 

the values along the x-axis represent red, and y is green. The values vary between 0 and 1. The 

z-value (blue) results from z = 1 − x − y. 

There are several observations to be made about the CIE chromaticity diagram: 

1. A point is marked as “green”, and is composed of 62% green, 25% red and from z = 1−x−y, 

13% blue. 

2. Pure spectral colors from a prism or rainbow are found along the edge of the diagram, with 

their wavelength in nm. 

1 in German: Farbton 

2 in German: Sättigung


3. Any point inside the tongue-shaped area represents a color that cannot only be composed 

from x, y and z, but also from the spectral colors along the edge of the tongue. 

4. There is a point marked that has 33% of x, 33% of y and 33% of z and in the CIE-value for 

white light. 

5. Any point along the boundary of the chromaticity chart represents a saturated color. 

6. As a point is defined away from the boundary of the diagram we have a desaturated color 

by adding more white light. Saturation at the point of equal energy is 0. 

7. A straight line connecting any 2 colors defines all the colors that can be mixed addditively 

from the end points. 

8. From the white point to the edge of the diagram, one obtains all the shades of a particular 

spectral color. 

9. Any three colors I, J, K define all other colors that can be mixed from them, by looking at 

the triangle by I, J, K. 

Definition 13 Conversion from CIE to RGB 

To device-specifically transform between different monitor RGB-spaces we can use transformations 

from a particular RGB monitor -space to CIE XYZ-space. 

The general transformation can be written as: 

X = X r · R m + X g · G m + X b · B m 

Y = Y r · R m + Y g · G m + Y b · B m 

Z = Z r · R m + Z g · G m + Z b · B m 

Under the assumption that equal RGB voltages (1,1,1) should lead to the colour white and specifying 

chromaticity coordinates for a monitor consisting of long-persistence phosphors like this: 

x y 

red 0.620 0.330 

we have for example: 

green 0.210 0.685 

blue 0.150 0.063 

The inverse transformation is: 

X = 0.584 · R m + 0.188 · G m + 0.179 · B m 

Y = 0.311 · R m + 0.614 · G m + 0.075 · B m 

Z = 0.047 · R m + 0.103 · G m + 0.939 · B m 

R m = 2.043 · X − 0.568 · Y − 0.344 · Z 

G m = −1.036 · X + 1.939 · Y + 0.043 · Z 

B m = 0.011 · X − 0.184 · Y + 1.078 · Z 


• Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer 

Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften 

sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen, 

und welche den Schnittpunkten der Geraden durch A, B mit dem Rand des CIE-Farbraumes?

5.4. COLOR REPRESENTATION ON MONITORS AND FILMS 99 

• Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt 

werden? Begründen Sie Ihre Antwort anhand einer Skizze! 

5.4 Color Representation on Monitors and Films 

The CIE chromaticity diagram describes more colors than the subset that is displayable on film 

on a monitor, or on a printer. 

The subset of colors that may be displayable on a medium can be represented from its primary 

colors in an additive system. A monitor uses the RGB model. In order for the same color to 

appear on a printer that was perceived on a monitor, and that might come from scanning color 

film, the proper mix of that color from the triangles can be assessed via the CIE chromaticity 

diagram. 


• Vergleichen Sie die Methode der Farberzeugung bei einem Elektronenstrahlbildschirm mit 

der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz? 

5.5 The 3-Dimensional Models 

The tri-stimulus values x, y, z define a 3D space as shown in Slide 5.33 with the plane x+y +z = 1 

marked. If a color monitor builds its colors from 3 primaries RGB, then it will be able to display 

a subset of the CIE-colors. 

The xyz-space is shown in Slide 5.35 in 3 views. 

We extend our model to a three dimensional coordinate system with the red, green and blue color 

axes, the origin at black, a diagonal extending away from the origin under 45 degrees with each 

axis giving us gray values until we hit the white point. The red-blue plane defines the magenta 

color, the red-green plane defines yellow and the green-blue plane defines cyan. That resulting 

color model is shown in Slide 5.36 and is illustrated in Slide 5.37. 

The RGB values range between 0 and 1. The RGB model is the basis of remote sensing and 

displaying color images on various media such as monitors. 

How does one modify the histogram of an RGB-image? Clearly changing the intensity of each 

component image separately will change a resulting color. This needs to be avoided. We will 

discuss other color models that will help here. 


• Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie 

mindestens drei Beispiele davon! 

5.6 CMY-Model 


• Gegeben sei ein Farbwert C RGB = (0.8, 0.5, 0.1) T im RGB-Farbmodell.


Definition 14 CMY color model 

CMY stands for: 

C . . . Cyan 

M . . . Magenta 

Y . . . Yellow 

The three dimensional geometric representation of the CMY-Model can be done in the same way 

as the RGB-Model representation i.e a cube. 

In contrast to the RGB-Model the CMY-Model uses the principle of subtractive colors. 

Subtractive colors are seen when pigments in an object absorb certain wavelengths of white light 

while reflecting the rest. 

We see examples of this all around us. Any colored object, whether natural or man-made, absorbs 

some wavelengths of light and reflects or transmits others; the wavelengths left in the reflected/transmitted 

light make up the color we see. 

Some examples: 

• White light falling onto a cyan pigment will be reflected as a mix of blue and green since 

red will get absorbed. 

• White light falling onto a magenta pigment will be reflected as a mix of red and blue since 

green will get absorbed. 

• White light falling onto a yellow pigment will be reflected as a mix of red and green since 

blue will get absorbed. 

Therefore the conversion of RGB to CMY is supported by the physics of light and pigments. This 

leads to the following conversion-formulas: 

C = 1 − R 

M = 1 − G 

Y = 1 − B 

R = 1 − C 

G = 1 − M 

B = 1 − Y 

The CMY-Model is not used on monitors but in printing.

5.7. USING CMYK 101 

1. Welche Spektralfarbe entspricht am ehesten dem durch C RGB definierten Farbton? 

2. Finden Sie die entsprechende Repräsentation von C RGB im CMY- und im CMYK- 

Farbmodell! 

Antwort: 

C CMY = (1, 1, 1) T − C RGB = (0.2, 0.5, 0.9) T 

K = min(C, M, Y ) = 0.2 

C CMYK = (0, 0.3, 0.7, 0.2) T 

Der gegebene Farbton entspricht etwa orange. 

5.7 Using CMYK 

Definition 15 CMYK color model 

CMYK is a scheme for combining primary pigments. The C stands for cyan (aqua), M stands for 

magenta (pink), Y is yellow, and K stands for black. The CMYK pigment model works like an 

”upside-down” version of the RGB (red, green, and blue) color model. The RGB scheme is used 

mainly for computer displays, while the CMYK model is used for printed color illustrations (hard 

copy). 

K is being defined as the minimum of C ′ , M ′ , and Y ′ so that C is really redefined as C ′ − K, M 

as M ′ − K, and Y as Y ′ − K. 

Conversion from RGB to CMYK: 

C ′ = 1 − R 

M ′ = 1 − G 

Y ′ = 1 − B 

K = min(C ′ , M ′ , Y ′ ) 

C = C ′ − K 

M = M ′ − K 

Y = Y ′ − K 

Defining K (black) from CMY is called undercolor removal. Images become darker than they 

would be as CMY-alone, and there is less need for expensive printing colors CMY, which also 

need time to dry on paper. 


• Entsprechend welcher Formel wird eine CMYK-Farbdarstellung in eine RGB-Darstellung 

übergeführt? 

• Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in 

das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird 

das CMYK-Modell verwendet? 


der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz? 

• Im Vierfarbdruck sei ein Farbwert durch 70% cyan, 0% magenta, 50% gelb und 30% schwarz 

gegeben. Rechnen Sie den Farbwert in das RGB-Farbmodell um und beschreiben Sie den 

Farbton in Worten!


Antwort: es ist 

C CMYK = (0.7, 0.0, 0.5, 0.3) T 

C CMY = (1, 0.3, 0.8) T 

C RGB = (0, 0.7, 0.2) T 

Die Farbe entspricht einem leicht bläulichen Grünton. 

5.8 HSI-Model 

The hue-saturation-intensity color model derives from a transformation of the RGB color space 

that is rather complicated. The HSI-model is useful when analyzing images where color and 

intensity is important by itself. Also one may do an improvement of the image in its HSI-version, 

not the natural RGB-representation. 

Slide 5.44 introduces the transition from RGB to HSI. A color located at P in the RGB triangle 

has its hue H described by the angle with respect to the red axis. Saturation S is the distance 

from the white point, thus from the point of equal RGB at the center of the triangle. 

Intensity is not within the triangle of Slide 5.44, but is perpendicular to the triangle in plane, Slide 

5.45 explains. The HSI-model is thus a pyramid - like shape. It is visualized in Slide 5.46. 

Conversion of RGB to HSI has been explained in concept, but it is based on one elaborate algorithm. 

The easiest element is intensity I which simply is I = 1/3(R + B + G). We do not detail 

H and S, nor do we address the inverse conversion from HSI to RGB. 

5.9 YIQ-Model 


• Zum YIQ-Farbmodell: 

1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell? 

2. Wo wird das YIQ-Farbmodell eingesetzt? 

• Ein Farbwert C RGB = (R, G, B) T im RGB-Farbmodell wird in den entsprechenden Wert 

C YIQ = (Y, I, Q) T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet: 

⎛ 

0.299 0.587 0.114 

C YIQ = ⎝ 0.596 −0.275 −0.321 

⎞ 

⎠ · C RGB 

0.212 −0.528 0.311 

Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix ausgedrückt? (Hinweis: 

Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem 

Zusammenhang die Y-Komponente hat.) 

5.10 HSV and HLS -Models 

Variations on the HSI-Models are available. The HSV model (Hue-Saturation-Value) is also called 

the HSB model with B for brightness. This responds to the intuition of an artist, who thinks

5.10. HSV AND HLS -MODELS 103 

Definition 16 YIQ color model 

This model is used in U.S. TV broadcasting. 

well-known matrix M. 

The RGB to YIQ transformation is based on a 

M = 

⎛ 

⎝ 

0.299 0.587 0.114 

0.596 −0.275 −0.321 

0.212 −0.523 0.311 

⎞ 

⎠ 

The Y -component is all one needs for black & white TV. Y has the highest bandwidth, I and Q 

get less. Transmission of I, Q are separate from Y , where I, Q are encoded in a complex signal. 

RGB to YIQ Conversion: 

YIQ to RGB Conversion: 

Y = 0.299 · R + 0.587 · G + 0.114 · B 

I = 0.596 · R − 0.275 · G − 0.321 · B 

Q = 0.212 · R − 0.523 · G + 0.311 · B 

R = 1 · Y + 0.956 · I + 0.621 · Q 

G = 1 · Y − 0.272 · I − 0.647 · Q 

B = 1 · Y − 1.105 · I + 1.702 · Q 

Again, simple image processing such as histogram changes can take place with only Y . Color does 

not get affected since that is encoded in I, Q. 

in terms of tint, shade and tone. We introduce a cylindrical coordinate system, and the model 

defines a hexagon. 

In the coordinates with Slide 5.44. The hue is again measured as an angle around the vertical 

axis, in this case with intervals of 120 degrees going from one primary color to the next (Red 

at 0 degrees, green at 320. Blue at 240 and the intermediate degrees are then yellow, cyan and 

magenta). The value of saturation S is a ratio going from 0 at the center of the pyramid to one 

at the side of the hexagon. The values for V are varying between 0 for black and one for white. 

Note that the top of the hex? can be obtained by looking at the RGB cube along the diagonal 

axis from white to black. This is illustrated in Slide 5.45. This also provides the basic idea of 

converting an RGB input into an HSV color model. 

The HSL (hue-lightness-saturation) model of color is defined by a double hex-cone shown in Slide 

5.49. The HLS model is essentially obtained as a deformation of the HSV model by pulling up 

from the center of the base of the hex-cone (the V = 1 plane). Therefore a transformation of an 

RGB into an HLS color model is similar to the RGB to HSV transformation. 

The HSV color space is visualized in Slide 5.52. Similarly Slide 5.53 illustrates an entire range of 

color models in the form of cones and hex-cones. 


• Gegeben sei ein Farbwert C RGB = (0.8, 0.4, 0.2) T im RGB-Farbmodell. Schätzen Sie grafisch 

die Lage des Farbwertes C HSV in Abbildung B.32 (also die Entsprechung von C RGB im HSV- 

Modell). Skizzieren Sie ebenso die Lage eines Farbwertes C HSV ′ , der den gleichen Farbton 

und die gleiche Helligkeit aufweist wie C HSV , jedoch nur die halbe Farbsättigung!


Algorithm 11 Conversion from RGB to HSI 

1: Input, real R, G, B, the RGB color coordinates to be converted. 

2: Output, real H, S, I, the corresponding HSI color coordinates. 

3: float Z, n, Hf, Sf, delta 

4: Z = ((R-G)+(R-B))*0.5 

5: n = sqrt((R-G)*(R-G)+(R-B)*(G-B)) 

6: 

7: if n! = 0 then 

8: delta=acos(Z/n) 

9: else 

10: delta=0.0 

11: end if 

12: 

13: if B


Algorithm 12 Conversion from HSI to RGB 

1: Input, real H, S, I, the HSI color coordinates to be converted. 

2: Output, real R, G, B, the corresponding RGB color coordinates. 

3: float H, S, I 

4: float rt3, R, G, B, hue 

5: 

6: if S = 0 then 

7: R=I, G=I, B=I 

8: else 

9: rt3=1/sqrt(3.0) 

10: end if 

11: 

12: if 0.0


Algorithm 13 Conversion from GRB to HSV 


2: Output, real H, S, V, the corresponding HSV color coordinates. 

3: real B, bc, G, gc, H, R, rc, rgbmax, rgbmin, rmodp, S, V 

4: rgbmax = max ( R, G, B ) 

5: rgbmin = min ( R, G, B ) 

6: V = rgbmax 

7: 

8: Compute the saturation. 

9: if rgbmax/ = 0.0 then 

10: S = ( rgbmax - rgbmin ) / rgbmax 

11: else 

12: S = 0.0 

13: end if 

14: 

15: Compute the hue. 

16: if S = 0.0 then 

17: H = 0.0 

18: else 

19: rc = ( rgbmax - R ) / ( rgbmax - rgbmin ) 

20: gc = ( rgbmax - G ) / ( rgbmax - rgbmin ) 

21: bc = ( rgbmax - B ) / ( rgbmax - rgbmin ) 

22: if R = rgbmax then 

23: H = bc - gc 

24: else 

25: if G = rgbmax then 

26: H = 2.0 + rc - bc 

27: else 

28: H = 4.0 + gc - rc 

29: end if 

30: H = H * 60.0 

31: Make sure H lies between 0 and 360.0 

32: H = rmodp ( H, 360.0 ) 

33: end if 

34: end if


Algorithm 14 Conversion from HSV to RGB 

1: Input, real H, S, V, the HSV color coordinates to be converted. 


3: real B, f, G, H, hue, i, p, q, R, rmodp, S, t, V 

4: 

5: if s = 0.0 then 

6: R = V, G = V, B = V 

7: else 

8: Make sure HUE lies between 0 and 360.0 

9: hue = rmodp ( H, 360.0 ) 

10: hue = hue / 60.0 

11: i = int ( hue ) 

12: f = hue - real ( i ) 

13: p = V * ( 1.0 - S ) 

14: q = V * ( 1.0 - S * f ) 

15: t = V * ( 1.0 - S + S * f ) 

16: end if 

17: 

18: if i = 0 then 

19: R = V, G = t, B = p 

20: else 

21: if i = 1 then 

22: R = q, G = V, B = p 

23: else 

24: if i = 2 then 

25: R = p, G = V, B = t 

26: else 

27: if i = 3 then 

28: R = p, G = q, B = V 

29: else 

30: if i = 4 then 

31: R = t, G = p, B = V 

32: else 

33: if i = 5 then 

34: R = V, G = p, B = q 

35: end if 

36: end if 

37: end if 

38: end if 

39: end if 

40: end if


Algorithm 15 Conversion from RGB to HLS 


2: Output, real H, L, S, the corresponding HLS color coordinates. 

3: real B, bc, G, gc, H, L, R, rc, rgbmax, rgbmin, rmodp, S 

4: 

5: Compute lightness. 

6: rgbmax = max ( R, G, B ) 

7: rgbmin = min ( R, G, B ) 

8: L = ( rgbmax + rgbmin ) / 2.0 

9: 

10: Compute saturation. 

11: if rgbmax = rgbmin then 

12: S = 0.0 

13: else 

14: if L


Algorithm 16 Conversion from HLS to RGB 

1: Input, real H, L, S, the HLS color coordinates to be converted. 


3: real B, G, H, hlsvalue, L, m1, m2, R, S 

4: 

5: if L


grün 

gelb 

cyan 

weiß 

¡¤£¦¥ ¨§ 

rot 

¢¡¤£¦¥ 

blau 

magenta 

Figure 5.2: eine Ebene im HSV-Farbmodell 

Antwort: Es gilt (siehe Abbildung 5.2): 

C HSV = (20 ◦ , 75%, 0.8) 

C ′ HSV = (20 ◦ , 37.5%, 0.8) 

C ′ RGB = (0.8, 0.6, 0.5) 

Halbierung der Sättigung im HSV-Modell bedeutet Halbierung der Entfernung vom Zentrum. 

Die Komponenten des entsprechenden Punktes im RGB-Modell liegen näher beinander, die 

Ordnung bleibt aber erhalten. 

• Welche Farbe liegt ” 

in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb 

und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser 

geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau? 

5.11 Image Processing with RGB versus HSI Color Models 

An RGB color test pattern is shown in Slide 5.51. This test pattern is being used to calibrate 

printers, monitors, scanners, image color through a production system that is based on color. 

This particular test pattern is a digital and offers 8 bits of red, green and blue. This pattern is 

symmetric from top to bottom, consisting of one black band on top, bands two, three and four are 

the primary colors. For the RGB model 5, 6, 7 are the secondary colors, band 8 should be white, 

band 9 then is a continuous variation from blue to red. Band 9 is a gray wedge. 

The manner in which the band of rainbow colors is shown in Slide 5.52 obtained is by continuously 

varying from left to right the intensity of blue through values of 1 to 0, of red from 0 to full 

intensity, and then green goes from 0 to full and back to 0 across the band. Using the process we 

have conceptually hinted at in the HSI-model converts this RGB image into an HSI image. The 

easy part is the computation of the intensity I, the complex process is the computation of hue 

and saturation. In Slide 5.46, we are looking at the same pattern in terms of hue: we see that 

we have lost all sense of color and essentially have a bright image on the left and a dark image 

on the right in the color band. Looking at the saturation the variation in the various colors has 

also disappeared and the variation of saturation going in the color band from left to center to 

right. Most of the information is in the intensity band although some differences in colors have 

disappeared here.

5.12. SETTING COLORS 111 

The advantage of the HSI model is that we can optimize an image by just optimizing the intensity 

segment of the HSI presentation. It is not uncommon that one goes from the RGB into the 

HSI color model, modifies the intensity band only and then does the transformation back into 

RGB. This typically will apply for histogram modifications of color images. As stated earlier 

this optimization will preserve the color and saturation and it will only change the contrast as 

we perceive it through the intensity of the image. Doing the optimization on each color band 

separately will give us unpredictable color results. 

Slide 5.53 illustrates the approach by means of an underexposed RGB original of a Kakadu bird. 

The result obtained by an HSI transformation and histogram equalization of just the intensity 

band produces the result shown next. We do have a much improved and satisfactory image. 

A similar ideology is used when one creates color products from multiple input sources: an example 

might be a high resolution black and white satellite image at one meter pixel size that is being 

combined with a lower resolution color image in RGB at 4 meter resolution. A process to combine 

those two image sources takes the RGB low resolution image and converts it into an HSI-model. 

The I component is then removed and for it one inserts the higher resolution black and white 

satellite image. The result is transformed back into RGB space. The entire operation requires of 

course that all images have the same pixel size and are a perfect geometric match. 

5.12 Setting Colors 

We have now found that a great number of different color models exist that allow us to define 

colors in various ways. Slide 5.60 is a pictorial summary of the various color models. The models 

shown are those that are common in the image processing and technical arena. The most popular 

color model in the very large printing and graphic arts industry is not shown here, and that is 

the CMYK model. Setting a color on a monitor or printer requires that the color be selected on 

a model that the output device uses. 

Let us assume that we choose the red, green, blue model for presentation of an image on a color 

monitor. It is customary to change the red, green and blue channels in order to obtain a desired 

output color. Inversely, an output color could be selected and the RGB components from which 

that output color is created are being set automatically. 

If we were to choose the HSV color model we would create a sample color by selection of an angle 

for the hue we would shift saturation on a slider between 0 and 1, we would set the value also 

between 0 and 1 and in the process obtain the result in color. Inversely, a chosen color could be 

converted into its HSV components. 

Finally, the HSI and RGB models can be looked at simultaneously: as we change the HSI values, 

the system instantaneously computes the RGB output and vice versa. In the process, the 

corresponding colors are being shown as illustrated in Slide 5.63. 

Optical illusions are possible in comparing colors: Slide 5.64 shows the same color appearing 

differently when embedded in various backgrounds. 

5.13 Encoding in Color 

This is the topic of pseudo-color in image processing where we assign color to gray values in order 

to highlight certain phenomena and make them more easily visible to a human observer. Slide 5.67 

illustrates a medical X-ray image, initially in a monochrome representation. The gray values can 

be “sliced” into 8 different gray value regions which then can be encoded in color. The concept of 

this segmentation into gray value regions is denoted as intensity slicing, sometimes density slicing 

and is illustrated in Slide 5.66. The medical image may be represented by Slide 5.66 where the


gray values are encoded as f(x, y). A plane is defined that intersects the gray values at a certain 

level li, one can assign now all pixels with a value greater than li to one color. All pixels below 

the slicing plane can be assigned to another color, and by moving the slicing plane we can see 

very clearly on a monitor which pixels are higher and lower than the slicing plane. This becomes 

a much more easily interpretable situation than one in which we see the original gray values only. 

Another matter of assigning color to a black and white image is illustrated in Slide 5.68. The 

idea is to take an input value f(x, y) and apply it, three different transformations, one into a red 

image, one into a green image and one into a blue image, so that the to three different images 

are assigned to the red, green and blue guns of a monitor. A variety of transformations would 

be available to obtain from a black and white image the colorful output. Of course, the matter 

of Slide 5.68 is nothing but a more general version of the specialized slicing plane applied in the 

previous Slide 5.66. 

Slide 5.69 illustrates the general transformation from a gray level to color with the example of an 

X-ray image obtained from a luggage checking system at an airport. We can see in the example 

how various color transformations enhance a luggage with and without explosives such that the 

casual observer might notice the explosive in the luggage very quickly. We skip the discussion of 

the details of a very complex color transformation but refer to [GW92, chapter 4.6]. 



(pseudo color image)? Nennen Sie je einen typischen Anwendungsfall! 

5.14 Negative Photography 

The negative black and white photograph of Slide 5.71 is usually converted to positive by inverting 

the gray values as in Slide 5.72. This is demonstrated in Algorithm ??. Well we take a transformation 

that simply inverts the values 0 to 255 into 255 to 0. This trivial approach will not 

work with color photography. As shown in Slide 5.73 color negatives typically are masked with 

a protective layer that has a brown-reddish color. If one were to take an RGB scan of that color 

negative and convert it into a positive by inverting the red, green and blue components directly 

one would obtain a fairly unattractive result as shown in Slide 5.74. One has first to eliminate the 

protective layer, that means one has to go to the edge of the photograph and find an area that is 

not part of the image to determine the RGB components that represent that protective layer and 

then we have to subtract the R component from all pixel R-values, similarly in the B component 

and in green G. As a result we obtain a clean negative as shown in Slide 5.75. If we now convert 

that slide we obtain a good color positive as shown in Slide 5.76. Again, one calls this type of 

negative a masked negative (compare Algorithm 18). There have been in the past developments 

of color negative film that is not masked. However, that film is for special purposes only and is 

not usually available. 

Algorithm 18 Masked negative of a color image 

1: locate a pixel p which color is known in all planes {e.g. the black film border} 

2: for all planes plane do 

3: diff = grayvalue(p, plane) - known grayvalue(p, plane) {calculate the “masking layer”} 

4: for all pixel picture do 

5: grayvalue(pixel,plane) = grayvalue(pixel, plane) - diff {correct the color} 

6: Invert(pixel) {invert the corrected negative pixel to get the positive} 


8: end for

5.15. PRINTING IN COLOR 113 


• Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig, 

um daraus mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen 

Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer 

als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund 

des Positivbildes an! 

5.15 Printing in Color 

As we observe advertisement spaces with their posters, we see colorful photographs and drawings 

which, when we inspect them from a short distance, are really the sum of four separate screened 

images. 

We have said earlier that for printing the continuous tone images are being converted into half 

tones and we also specified in a digital environment that each pixel is further decomposed by a 

dithering matrix into subpixels. 

When printing a color originally, one typically uses the four color approach and bases this on 

the cyan, magenta, yellow and black pigments which are the primary colors from which the color 

images are being produced. Each of these separates of the four components is screened and the 

screen has an angle with respect to the horizontal or vertical. However, in order to avoid a Moiree 

effect, by interference of the different screens with one another, the screens themselves are slightly 

rotated with respect to one another. This type of printing is used in the traditional offset printing 

industry. 

If printing is then directly from a computer onto a plotter paper, then the dithering approach is 

used instead. If we look at a poster that is directly printed with a digital output device and not 

via an offset press, we can see how the dithering matrix is responsible for each of the dots on the 

poster. Again each dot is encoded by one of the four basic pigment colors, cyan, magenta yellow 

or black. 


• Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird 

verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert? 

Antwort: Vier separate Bilder (je eines für die Komponenten Cyan, Magenta, Yellow und 

Black) werden übereinander gedruckt (CMYK-Farbmodell). Jede Ebene ist ein Halftone- 

Bild, wobei die Ebenen geringfügig gegeneinander rotiert sind, um den Moiree-Effekt zu 

verhindern. 

5.16 Ratio Processing of Color Images and Hyperspectral 

Images 

We start out from a color image and for simplicity we make the assumption that we only have two 

color bands, R, G, so that we can explain the basic idea of ratio imaging. Suppose a satellite is 

imaging the terrain in those two colors. As the sun shines onto the terrain, we will have a stronger 

illumination on terrain slopes facing the sun than on slopes that face away from the sun. Yet, 

the trees may have the exact same color on both sides of the mountain. When we look now at 

the image of the terrain, we will see differences between the slopes facing the sun and the slopes 

facing away from the sun.


In Slide 5.81 let’s take three particular pixels, one from the front slope, one from the back, and 

perhaps a third pixel from a flat terrain, all showing the same type of object, namely a tree. We 

now enter these three pixels into a feature space that is defined by the green and red color axes. 

Not surprisingly, the three locations for the pixels that we have chosen are on a straight line from 

the origin. Clearly, the color of all three pixels is the same, but the intensity is different. We are 

back again with the ideology of the HSI-model. 

We now can create two images from the one color input image. Both of those images are black 

and white. In one case, we place at each pixel its ratio R/G, the angle that that vector forms with 

the abscissa. In the other image we place at each pixel the distance of the pixel from the origin in 

the feature space: As a result, we obtain one black and white image in Slide 5.82, that is clean of 

color and shows us essentially the variations in intensity as a function of slope. The other image 

in Slide 5.83 shows us the image clean of variations of density as if it were all flat and therefore 

the variations of color are only shown there. Conceptually, one image is the I component of an 

HSI transformation and the other one is the H component. Such ratio images have in the past 

been used to take satellite images and make an estimate of the slope of the terrain, assuming that 

the terrain cover is fairly uniform. That clearly is the case on glaciers, the arctic or antaractic or 

in heavily wooded areas.

5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 115









5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 117 
















5.16. RATIO PROCESSING OF COLOR IMAGES AND HYPERSPECTRAL IMAGES 119 


• Was ist ein ” 

Ratio-Bild“? 

• Zu welchem Zweck würde man als Anwender ein sogenanntes ” 

Ratio-Bild“ herstellen? Verwenden 

Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes.

120 CHAPTER 5. COLOR

Chapter 6 

Image Quality 

6.1 Introduction 

As image quality we generally denote an objective impression of the crispness, the color, the detail, 

the composition of an image. Slide 6.2 is an example of an exciting image with a lot of detail, 

crispness and color. Slide 6.3 adds the excitement of motion and a sentiment of activity and cold. 

Generally in engineering we do not deal with these concepts that are more artistic and aesthetic. 

We deal with art definitions. 

6.2 Definitions 

In images we define quality by various components. Slide 6.5 illustrates radiometric concepts of 

quality that relate to density and dynamic range. Density 0 means that the light can go through 

the image unhindered, density 4 means that the image blocks the light. Intensity is the concept 

associated with the object. Greater intensity means that more light is coming from the object. 

The dynamic range of an image is the greatest density value divided by the least density value 

in the image, the darkest value divided by the brightest value. The dynamic range is typically 

encoded logarithmically. 






Bereiches! 

6.3 Gray Value and Gray Value Resolutions 

We have already described in earlier presentations the idea of resolving gray values. Chapter 3 the 

concept of a gray wedge and how a gray wedge gets scanned to assess the quality of a scanning 

process. Similarly we can assess the quality of an image by describing how many different gray 

values the image can contain. Slide ?? illustrates the resolution of a gray value image. 

Note again that in this case we talk about the gray values in an image whereas in the previous 

chapter we talked about the quality of the conversion of a given continuous tone image into a 

digital rendition in a computer in the process of scanning. 

121

122 CHAPTER 6. IMAGE QUALITY 

Resolving great radiometric detail means that we can recognize objects in the shadow, while we 

also can read writing on a bright roof. Resolution of the gray values in the low density bright 

areas does not compromise a resolution in the high density dark areas. Slide ?? is a well resolved 

image. 


• Was versteht man unter der Grauwerteauflösung eines digitalen Rasterbildes? 

Antwort: Die Anzahl der verschiedenen Grauwerte, die in dem Bild repräsentiert werden 

können 

6.4 Geometric Resolution 

Again, just as in the process of scanning an image, we can judge the image itself independent from 

its digital or analog format. I refer to an earlier illustration which essentially describes again by 

means of the US Air Force (USAF) resolution target how the quality of an image can be described 

by means of how well it shows small objects on the ground of in the scene. We recall that the 

USAF target, when photographed presents to the camera groups of line patterns and within each 

group elements. So in this particular case of Slide 6.10 group 6, element 1 is the element still 

resolved. We know from an accompanying table that that particular element in group 6 presents 

the resolution of 64 line pairs per mm. We can see in the lower portion of the slide that element 

6 in group 4 represents 28 pairs per mm. 

We have now in Slide 6.11 a set of numbers typical of the geometrical resolution in digital image. 

We have a resolution where we typically deal with of dots per inch, for example when something 

is printed. So a high resolution is 3000 dots per inch, a low resolution is 100 dots per inch. Note 

that at 3000 dots per inch, each point is about 8 micrometers, recall that 1000 dots per inch is 25 

micrometers per pixel. Which leads us to the second measure of geometric resolution: the size of 

a pixel. When we go to a computer screen, we have a third measure and we say the screen can 

resolve 1024 by 1024 pixels, irrespective of the size of the screen. 

Recall the observations about the eye and the fovea. We said that we had about 150 000 cone 

elements per mm on the fovea. So when we focus our attention on the computer monitor, those 

1000 by 1000 pixels would represent the resolution of about 3 by 3 mm on the retina. We may 

really not have any use for a screen with more resolution, because we wouldn’t be able to digest 

the information on the screen in one glance, because it would overwhelm our retina. 

A next measure of resolution is the linear line pairs per mm, mentioned earlier. The 25 line pairs 

per mm is a good average of resolution for photography on a paper-print, and 50 line pairs per mm 

is a very good resolution on film. Best resolutions can be obtained with spy photography which 

is very slow filming needs lots of exposure times, but is capable of resolving great detail. In that 

case we make it in access of 75 line pairs per mm. 

It might be of interest to define the geometric resolution of an unaided eye, that is 3 to 8 pixels 

per mm at a distance of 25 cm. Again, when a human person sits in front of a monitor and starts 

seeing the images at a continuous pattern, and not recognizing individual pixels, at 3 pixels per 

mm the screen could have a dimension of 300 by 300 mm. For an eagle-eyed person at 8 pixels 

the same surface that the human can resolve would be about an 12 by 12 cm square. 

It is of interest to relate these resolutions to one another, this is shown in Slide 6.15. 

Film may have n line pairs per mm. This represents 2.8 × n pixels per mm (see below). If we had 

film of 25 pairs per mm then we would have to represent this image at 14 micrometers per pixel 

under this relationship. Now on a monitor of a sidelength of 250 mm with 1024 pixels, one pixel 

has the dimension of 0.25 mm.

6.5. GEOMETRIC ACCURACY 123 

We can again confirm that if we have on a monitor each pixel occupying 250 micrometers (equals 

0.25 mm) then we have 4 pixels in a mm, then typically the range of normal vision people perceive 

this as a continuous tone image, the actual range is at 125 to 300 micrometer per pixel. 

The Kell-factor proposed during World War II in the context of television suggests that resolving 

a single line pair of a black and white line by 2 pixels will be insufficient because statistically we 

cannot be certain that those pixels would fall directly on each dark line and on each bright line, 

but they fall halfway in between. If they do, the line pairs will not be resolved. Therefore Kell 

proposed, that the proper number of pixels to resolve under all circumstances each line pair needs 

is 2 √ 2 the number of line pairs per mm. 


• Ein sehr hochauflösender Infrarotfilm wird mit einer geometrischen Auflösung von 70 Linienpaaren 

pro Millimeter angepriesen. Mit welcher maximalen Pixelgröße müsste dieser 

Film abgetastet werden, um jedweden Informationsverlust gegenüber dem Filmoriginal zu 

vermeiden? 

• Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit 

welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine 

Skizze. 

6.5 Geometric Accuracy 

An image always represents a certain geometric accuracy of the object. Again we have already 

taken a look at the basic idea when we talked in an earlier chapter about the conversion of a given 

analog picture into a digital forum. Geometric accuracy of an image is described by the sensor 

model, a concept mentioned in the chapter on sensors. We have deviations between the geometric 

locations of object points in a perfect camera, from the geometric locations of our real camera. 

Those discrepancies can be described in a calibration procedure. Calibrating imaging systems is 

a big issue and has given many diploma engineers and doctors their degrees in vision. The basic 

idea is illustrated in Slide 6.17. 


• Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen 

Rasterbildes? 

6.6 Histograms as a Result of Point Processing or Pixel 

Processing 

The basic element of analyzing the quality of any image is a look at its histogram. Slide 6.19 

illustrates an input image in color, that is semidark and for which we want to build its histogram: 

we find many pixels in the darker range and fewer in the brighter range. We can now change this 

image by redistributing the histogram in a process called histogram equalization. We see however 

in Slide 6.20 that we have a histogram for each of the color component images, while we are only 

showing a composite of the colors denoted as luminosity. The summary of this manipulation is 

shown in Slide 6.22. 

A very common improvement of an image’s quality is a change of the assignment of gray values to 

the pixels of an image. This is based on the histogram. Let us assume that there indeed in each


Algorithm 19 Histogram equalization 

1: For an N x M image of G gray-levels (often 256), create an array H of length G initialized 

with 0 values. 

2: Form the image histogram: Scan every pixel and increment the relevant member of H - if pixel 

p has intensity g p , perform 

H[g p ] = H[g p ] + 1 

3: Form the cumulative image histogram H c 

4: Set 

H c [0] = H[0] 

H c [p] = H c [p − 1] + H[p] where p = 1, 2, ..., G − 1 

T [p] = round( G − 1 

NM H c[p]) 

5: Rescan the image and write an output image with gray-levels g q , setting 

g q = T [g q ] 

Definition 17 Histogram stretching 

Stretching or spreading of an histogram is mapping the grey value of each pixel of an image or 

part of an image to an piecewise continuous function T (r). 

Normally the gradation curve T (r) is monotonous growing and assigns a small range of gray values 

of the input image over the entire range of available values, so that the result image looks as if it 

had a lot more contrast.

6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 125 

of the 8 bit input images exists one more pixel. We may have individual gray values, say the gray 

value 67 which may have 10 000 pixels, gray values 68 may have none, gray values 69 may have 

none, but gray value 70 may again have 7000 pixels. We can change this image by allocating. The 

is input gray values to new output values depending on their frequency as seen in the histogram. 

We aim for a histogram that is as uniform as possible. Slide 6.23 shows a detail of the previous 

slide in one case with the input histogram and in a second case with the equalized histogram: we 

have attempted to distribute the gray values belonging to the input pixels such that the histogram 

is as uniform as possible. Slide 6.24 shows how we can change the gray values from an input image 

B into an output image C. Geometrically we describe the operation by a 2-d diagram with the 

abscissa for input pixels and the ordinate for the gray values. The relationship between input and 

output pixels is shown on the curve in the slide representing a look-up-table. 

Slide 6.25 illustrates again how an input image can be changed and how a certain area of the 

input image can be highlighted in the output image. We simply set all input pixels below a 

certain threshold A and above a certain threshold B to zero, and then set the intermediate range 

and spread that range to a specific value in the output image. Another method of highlighting is 

to take an input image and convert it one-on-one to an output image with the exception of a gray 

value range from a lower gray value A to an upper gray value B which is set into one output gray 

value, thereby accentuating this part of the input image. 

Another analysis is shown in Slide 6.26 where we represent the 8 bits of a gray value image as 

8 separate images and in each bit plane we see the bit that is set in the byte of the image. Bit 

plane 7 is most significant, bit plane zero the least significant. We obtain an information about 

the contents of an image as shown in Slide 6.27 where we see the 8 levels of an image and note 

that we have a thresholded type image at level 7 and we have basically no information in level 0. 

We see there is very low information in the lower three bits of that image. In reality we may not 

deal with an 8 bit image, but with a 5 bit image. 

Histograms let us see where all pixels are aggregated. Low digital numbers represent a dark image, 

pixels clustered in the high digital numbers show a bright image. A narrow histogramm with a 

single peak is a low contrast image because does not have many different gray values. However, 

if an image has all of its gray values occupied with pixels and if those are equally distributed, we 

obtain a high contrast - high quality image. 

How do we change the histogram of an image and spread it or equalize it? We think of the image 

gray values in the input on the abscissa of a 2D-diagramm and translate them to output gray 

values on the ordinate. We use a curve that relates the input to the output pixels. The curve is 

denoted as gradation curve or t(r). Let’s take an example of an image with very low contrast as 

signified by a histogram that has only values in the range round 64 and 10 gray values to the left 

and to the right.. We now spread this histogram by a curve t(r) that takes the input values where 

we have many and spreads them over many values in the output image. As a result we now have 

pixels values spread over the entire range of available values, so that the image looks as if it had 

a lot more contrast. We may not really be able to change the basic shape of the histogram, but 

we can certainly stretch it as shown in slide ??. Equlisation is illustrated in slide ??. 

We may want to define a desired histogram and try to approach this histogram given an input 

image which may have a totally different histogram. How does this work? Slide 6.36 explains. 

Let’s take a thermal image of an indoor scene. We show the histogram of this input image and 

for compansion we also illustrate the result of equalization. We would like, however, to have a 

histogram as shown in the center of the histogram display of the slide. We change the input 

histogram to approach the designed histogram as best as we possbile obtaining the third image. 

The resulting image permits one to see chairs in the room. 

Slide 6.36 summarizes that enhancement is the improvement of the image by locally processing 

each pixel separately from the other pixels. This may not only concern contrast but could address 

noise as well. A noisy input image can become worse if we improve the histogram since we may 

increase the noise. If we do some type of histogram equalization that is locally changing we might 

get an improvment of the image structure and increased ease of interpretability.


An example is shown in slide 6.38. We have a very noisy image and then embedded in the image 

are 5 targets of interest. With a global histogram process we may not be able to resolve the detail 

within those 5 targets. We might enhance the noise that already exists in the image and still not 

see what is inside the targets. However, when we go through the image and we look at individual 

segments via a small window and we improve the image locally, moving the window from place 

to place with moving new parameters at each location, we might obtain the result as shown in 

the next component of the Slid. We find that detail within each target consists of a point and a 

square around that point. 

Algorithm 20 Local image improvement 

g(x,y) ... Ergebnisbild 

f(x,y) ... Ausgangsbild 

und 

wobei 

k ... Konstante 

M ... globaler Mittelwert 

σ ... Standardabweichung der Grauwerte 

g(x, y) = A(x, y) ∗ {f(x, y) − m(x, y)} + m(x, y) 

A(x, y) = k ∗ M 

σ(x, y) 

We have taken an input image f(x, y) and created a resulting image g(x, y), by a formula shown 

in slide 6.39. There is a coefficient A(x, y) involved and a mean of m(x, y). So in a window we 

computed mean gray value m(x, y) as an average gray value, we subtract it from each gray value 

in the image f, we multiply the difference by a multiplication factor A(x, y) and then add back 

the mean m(x, y). 

What is this A(x, y)? It is in itself a function of (x, y) in each window we compute the mean m(x, y) 

and a standard deviation σ(x, y) of the gray values. We also compute a global average M, separate 

from the average of each small window, and we have some constant k. These improvements of 

images according to the formula, and similar approaches, are heavily used in medical imaging 

and many other areas where images are presented to the eye for interactive analysis. We are 

processing images here, but before we analyze them. Therefore, we call this preprocessing. A 

particular example of preprocessing is shown in the medical image of slide 6.40 illustrating how a 

bland image with no details reveals its detail after some local processing. 

Another idea is the creation of difference images, for example an X-ray image of a brain taken 

before and after some injection is given as a contrast agent. We then have an image of the brain 

before and after the contrast agent has entered into the blood stream. The two images can then 

be subtracted and will highlight the vessels that contain the contrast material. 

How else can we improve images? We can take several noisy images and average them. For 

example we can take a microscopic image of some cells, and a single image may be very noisy, but 

by repeating the image and computing the average of the gray values of each pixels we eliminate 

the noise and obtain a better signal. Slide 6.44 shows the effect of averaging 128 images. 


• Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses 

Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist, 

das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu 

segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form 

eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)!


50 

0 

0 

255 

Figure 6.1: Histogramm eines Graukeils 

• Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender 

Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses 

Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze Rand in Abbildung 

B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst. 

Antwort: siehe Abbildung 6.1 

• Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar 

1 und die Spring Conference on Computer Graphics stattfinden. Durch einen 

automatischen Prozess wurde daraus Abbildung B.74(b) erzeugt, wobei einige Details (z.B. 

die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur 

Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise! 

• Skizzieren Sie das Histogramm eines 

1. dunklen, 

2. hellen, 

3. kontrastarmen, 

4. kontrastreichen 

monochromen digitalen Rasterbildes! 

Antwort: Siehe Abbildung 6.2, man beachte, dass die Fläche unter der Kurve immer gleich 

groß ist. 

1 Für interessierte Studenten aus der Vertiefungsrichtung Computergrafik besteht die Möglichkeit, kostenlos an 

diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren.


(a) dunkel 

(b) hell 

(c) kontrastarm 

(d) kontrastreich 

Figure 6.2: Histogramme

6.6. HISTOGRAMS AS A RESULT OF POINT PROCESSING OR PIXEL PROCESSING 129














Slide 6.45

132 CHAPTER 6. IMAGE QUALITY

Chapter 7 

Filtering 

7.1 Images in the Spatial Domain 

We revisit the definition of an image space with its cartesian coordinates x and y to denote the 

columns and rows of pixels. We define a pixel at location (x, y) and denote its gray value with 

f(x, y). Filtering changes the gray value f of an input image into an output gray value g(x, y) in 

accordance with Slide 7.3. The transformation 

g(x, y) = T [f(x, y)] 

is represented by an operator T which acts on the pixel at location (x, y) and on its neighbourhood. 

The neighbourhood is defined by a mask which may also be denoted as template, window or filter 

mask. We can therefore state in general terms that: a filter is an operation that produces from 

an input image and its pixels f(x, y) an output image with pixels g(x, y) by a filter operator T . 

This operator uses in the transformation the input pixel and its neighbourhood to produce a value 

in the output pixel. We will see later that filtering is a concept encompassing many different 

types of operations to which the basic definition applies common. It may be of interest to note 

that some of the operations we have previously discussed can be classified as filter operations, 

namely the transformation of the image where an operation addresses a neighbourhood of size 

1 × 1. Those transformations produce from an input an output pixel via the transfer function 

T that one calls “point operations” or transformations of individual pixels. We have the special 

case of contrast enhancement in Slide 7.4, and of “thresholding”. Similarely, these operations on 

single pixels included the inversion of a negative to a positive as shown in Slide 7.5. The same 

type of operation is shown in Slide 7.6. The astronomic imaging sensors at times produce a very 

high density range that challenges the capabilities of film and certainly of monitors. On an 8-bit 

image we may not really appreciate the detail that a star may provide through a high resolution 

telescope. To do better justice to a high density range image a single pixel operation is applied 

that non-linearly transforms the input gray values into the output gray values. Again, in the 

narrow sense of our definition of filtering, this is a “filter operation”. However, we have previously 

discussed the same transformation under the name of contrast stretching. In this particular case, 

the contrast stretch is logarithmic. 


• In der Vorlesung wurden die Operationen ” 

Schwellwert“ und ” 

Median“, anzuwenden auf 

digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden 

Operationen im Kontext der Filterung? 

133

134 CHAPTER 7. FILTERING 

7.2 Low-Pass Filtering 

Let us define a mask of 3 by 3 pixels in Slide 7.8. We enter into that mask values w 1 , w 2 , . . . , w 9 . 

We call those values w “weights”. We now place the 3 by 3 pixel mask on top of the input image 

which has gray values denoted as z i . Let us assume that we center the 3 by 3 mask over the pixel 

z 5 so that w 5 is on top of z 5 . We can now compute a new gray value g 5 as the sum of the products 

of the values w i and z i , w i · z i in accordance with slide. This describes an operation on an input 

image without specifying the values in the filter mask. We need to assign such values to the 3 by 

3 mask: a low-pass filter is filled with a set of values shown in slide: in this example we assign the 

value 1/9. The sum of all values is 1. Similarly, a larger mask of 5 × 5 values may be filled with 

1/25, a 7 × 7 filter mask with 1/49. The three examples are typical low-pass filters. 

Slide 7.10 illustrates the effect the low-pass filter masks, filled with weights of 1/k, with k being 

the number of pixels in the filter mask. Slide 7.10 shows the image of a light bulb and how the 

effect of low-pass filters increases the blur as the size of the filter mask increases from 25 via 125 

to 625 values representing windows with side lengths of 5,1 and 25 pixels. 

We will next consider the analogy between “filtering” and “sampling”. Slide 7.11 shows an image 

and the gray value profile along a horizontal line (row of pixels). The continuous gray value trace 

needs to be “sampled” into discrete pixels. We show in Slide 7.12 the basic concept of a transition 

from the continuous gray value trace to a set of pixels. If we reconstruct the original trace from 

the discrete pixels, we will obtain a new version of the continuous gray value trace. If turns out 

that the reconstruction is nothing else but a filtered version of the original. Sampling and signal 

reconstruction are thus an analogy to filtering, and sampling theory is related to filter theory. 

Slide 7.13 illustrates one particular and important low-pass-filter: the sinc-filter. A sinc function 

is 

sinc(f) = sin(πf) 

πf 

and represents the Fourier-transform of a rectangular “pulse” in the Fourier-space (see below). 

Slide 7.13 illustrates how a filtered value of the input function is obtained from the large filter mask 

representing the sinc-function. By shifting the sinc-function along the abszissa and computing a 

filter value at each location, we obtain a smoothed version of the input signal. This is analogous 

to sampling the input signal and reconstructing it from the samples. 

Next we consider the median filter. This is a popular and frequently used operator. It inspects 

each gray value under the filter window and picks that gray value under that window which has 

half of the pixels with larger and the other half with smaller gray values. Essentially the gray 

values under the filter window are being sorted and the median value is chosen. Where would 

this be superior to an arithmetic mean? Clearly, the median filter does suppress high frequency 

information or rapid changes in the image. Thus it suppresses salt and pepper noise. Salt and 

pepper noise results from irregularities where individual pixels are corrupted. They might be 

either totally black or totally white. 

By applying a median filter one will throw out these individual pixels and replace them by one midrange 

pixel from the neighbourhood. The effect can sometimes be amazing. Slide 7.16 illustrates 

with a highly corrupted image of a female person, and a corruption of the image with about 20% 

of the pixels. Computing the arithmetic mean will produce a smoother image but will not do away 

with the effect of noise. Clusters of corrupted pixels will result in persistent corruptions of the 

image. However, the median filter will work a miracle. An image, almost as good as the input 

image, without many corruptions, will result. 

A median filter also has a limitation: If we have fine details in an image, say individual narrow 

linear features (an example would be telegraph wires in an aerial photo) then those pixels marking 

such a narrow object will typically get suppressed and replaced by the median value in their 

environment. As a result the fine linear detail would no longer show in the image.

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¢ 

¡ 

¡ 

¡ 

¡ 

¡ 

7.2. LOW-PASS FILTERING 135 

0 0 5 0 0 0 0 0 0 

0 0 5 0 

0 0 1 5 

0 0 0 5 

0 4 

0 0 

2 4 

0 0 0 

1 2 4 

5 5 5 

0 0 1 3 5 5 5 5 5 

0 1 3 5 5 5 2 5 5 

0 2 5 5 3 5 5 5 5 

0 

0 

0 

0 

0 0 0 0 

0 1 2 1 

1 2 4 4 

1 3 5 5 

Figure 7.1: Anwendung eines Median-Filters 


• Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche 

Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um 

die Darstellung der Methode und die Begründung, warum diese Methode die Störungen 

entfernen wird. 

• Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er 

eingesetzt? 

• Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umrandeten Bereiches 

des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in 

Abbildung B.14 eintragen. 

Antwort: Siehe Abbildung 7.1 

• Skizzieren Sie die Form des Filterkerns eines Gaussschen Tiefpassfilters. Worauf muss man 

bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten? 

• Tragen Sie in die leeren Filtermasken in Abbildung B.30 jene Filterkoeffizienten ein, sodass 

1. in Abbildung B.30(a) ein Tiefpassfilter entsteht, das den Gleichanteil des Bildsignals 

unverändert lässt, 

2. in Abbildung B.30(b) ein Hochpassfilter entsteht, das den Gleichanteil des Bildsignals 

vollständig unterdrückt! 


(a) Tiefpass 

(b) Hochpass 

Figure 7.2: Tief- und Hochpassfilter


7.3 The Frequency Domain 

We have so far looked at images represented as gray values in an (x, y) cartesian coordinate system. 

We call this the spatial-domain representation. There is another representation of images using 

sinus- and cosinus-functions called spectral representation. The transformation of the spatialdomain 

image f(x, y) into a spectral-domain representation F (u, v) is via a Fourier-transform: 

∫ ∫ 

F {f(x, y)} = F (u, v) = f(x, y)e −2jπ(ux+vy) dxdy 

x 

The spectral representation is with the independent variables u, v which are the frequencies in the 

coordinate directions. The spectral representation can be converted back into a special representation 

by the inverse transform: 

∫ ∫ 

f(x, y) = F (u, v)e −2jπ(ux+vy) dudv(???) 

u 

v 

In the discrete world of pixels, the double integral ∫ ∫ is replaced by a double summation ∑ ∑ . 

A filter operation can be seen as a convolution (Faltung) in accordance with Slide 7.18. The 

convolution is defined in nd graphically illustrated in through . 

In this case the two functions f(x) and g(x) are one-dimensional functions for simplicity. They 

are being convolved using an operation denoted by a symbol ∗: 

f(x) ∗ g(x) = 

∫ ∞ 

y 

t=−∞ 

f(t)g(x − t)dt 

We define the function f(t) as a simple rectangle on the interval 0 ≤ t ≤ 1. The second function 

g(t) is also defined in the same space as a box on the interval 0 ≤ t ≤ 1. We illustrate the function 

g at location −t and at x − t produce the product of f(t). g(x − t), and as shown in Slide 7.24 as 

the shaded area. We illustrate this at x = x 1 , and x = x 2 . The convolution now is the integral of 

all these areas as we move g(x − t) into the various positions along the axis x. When there is no 

overlap between the two functions the product f · g is empty. As a result, the integral produces 

values that increase monotonously from 0 to c and then decrease from c to 0 as the co-ordinate x 

goes from 0 through 1 to the value of 2. This produces a “smoothed” version of the input function 

f. 

It is now of interest to appreciate that a convolution in the spatial domain is a multiplication in 

the spectral domain. This was previously explained in Slide 7.18. We can thus execute a filter 

operation by transforming the input image f and the filter function f into the spectral domain, 

resulting in F and H. We multiply the two spectral representation, obtain G as the spectral 

representation of the output image. After of to a Fourier transform of G we have the viewable 

output image g. 

This would be the appropriate point in this course to interrupt the discussion of filtering and 

inserting a “tour d’horizon” of the Fourier-transform. We will not do this in this class, and 

reserve that discussion for a later course as part of the specialization track in “image processing”. 

However, a Fourier-transform of an image is but one of several transforms in image processing. 

There are others, such as the Hadamard-transform, a Cosine-transformation, Walsh-transforms 

and similar. Of interest now is the question of filtering in the spatial domain, representing a 

convolution, or in the spectral domain representing a multiplication. At this point we only state 

that with large filter masks at sizes greater than 15 x 15 pixels, it may be more efficient to use 

the spectral representation. We do have the cost of 3 Fourier transforms (note: f ??? F, h ??? H, 

G ??? G), but the actual convolution is being replaced by a simple multiplication of F???H. 

Slide 7.25 now introduces certain filter windows and their representation both in the spatial and 

spectral domains. presents a one-dimensional filter functions. We are, therefore, looking at a row

7.4. HIGH PASS-FILTER - SHARPENING FILTERS 137 

of pixels in the spatial domain or a row of pixels in the spectral domain through the center of the 

2D function. The 2D functions themselves are rotationally symmetric. 

A typical low-pass filter in the spatial domain will have a Gaussian shape. Its representation in 

the spatial domain is similar to its representation in the spectral domain. In the spectral domain it 

is evident that the filter rapidly approaches a zero-value, therefore suppressing higher frequencies. 

In the spectral domain a high-pass filter has a large value as frequencies increase and is zero at 

low frequencies. Such a high pass-filter looks like the so called mexican hat, if presented in the 

spatial domain. A band pass-filter in two dimensions is a ring like a “donut shape”, and in the 

one dimensional case it is a Gaussian curve that is displaced with the respect to the origin. In 

the spatial domain the band-pass filter-shape is similar to a “mexican hat”. However, the values 

in the high pass-filter are negative outside the central area in the spectral domain, whereas is in 

the band pass-filter the shape goes first negative, then positive again . 


• Beschreiben Sie anhand einer Skizze das ” 

Aussehen“ folgender Filtertypen im Frequenzbereich: 

1. Tiefpassfilter 

2. Hochpassfilter 

3. Bandpassfilter 

7.4 High Pass-Filter - Sharpening Filters 

We now are ready to visit the effect of a high-pass filter. In the spatial domain, the shape of the 

high pass-filter was presented in Slide 7.26. In actual numerical values such a filter is shown in 

Slide 7.28. The filter window is normalized such that the sum of all values equals zero. Note that 

we have a high positive value in the center and negative values at the edge of the window. The 

pixel at the center of the window in the image will be emphasized and the effect of neighbouring 

pixels reduced. Therefore small details will be accentuated. Background will be suppressed. It is 

as if we only had the high-frequency detail left and the low frequency variations disappear. The 

reason is obvious. In areas were pixel values don’t change very much, the output gray values 

will become 0, because there are no differences among the gray values. The input pixels will be 

replaced by the value 0 because we are subtracting from the gray value the average value of the 

surrounding pixels. 

This high-pass filter can be used to emphasize (highlight) the geometric detail. But if we do 

not want to suppress the background, as we have seen in the pure high pass-filter, we need to 

re-introduce it. This leads to a particular type of filter that is popular in the graphic arts: the 

unsharp masking or USM. The high pass-filtered image really is the difference between an original 

image and a low-pass version of the image, so that only the high frequency content survives. In 

the USM we would like to have the high-pass-version of the image augmented with the original 

image. We obtain this by means of a high pass-filter version of the image and adding to it the 

original image, however multiplied by a factor A − 1, where A > 1. If A = 1 we have a standard 

high pass-filter. As we increase A we add more and more of the original image back. The effect 

is shown in Slide 7.30. In that slide we have a 3 by 3 filter window and the factor A is shown 

variably as being 1.1, 1.15, and 1.2. As A increases, the original image gets more and more added 

back in, to a point where we get overwhelmed by the amount of very noisy detail. 

Prüfungsfragen:

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¡ 

¢ 

¡ 

¡ 

¡ 

¡ 

¡ 


• Gegeben sei eine Filtermaske entsprechend Abbildung ??. Um was für eine Art Filter handelt 

es sich hier? 

• Gegeben sei ein Bild nach Abbildung ??. Was sind die Ergebnispixel im Ergebnisbild an 

den markierten drei Orten nach Anwendung der Filtermaske aus Abbildung ??? 

• Eines der populärsten Filter heißt ” 

Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte 

um eine einfache formelmäßige Erläuterung. 

• In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in 

der Mitte heller ist als am Rand. Geben Sie ein Verfahren an, das diese Störung entfernt! 

• Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas ” 

flau“. 

1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert. 

2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes 

zu verbessern? 

Wird durch diese Methoden auch der Informationsgehalt des Bildes vergrößert? Begründen 

Sie Ihre Antwort. 







(a) Tiefpass 

(b) Hochpass 

Figure 7.3: Tief- und Hochpassfilter 

7.5 The Derivative Filter 

A very basic image processing function is the creation of a so called edge-image. Recall that we 

had one definition of “edge” that related to the binary image early on in this class (Chapter 1). 

That definition of an edge will now be revisited and we will learn about a second definition of an 

edge. 

Let us first define what a gradient image is. We apply a gradient operator to the image function 

f(x, y). The gradient of f(x, y) is shown in Slide 7.32, denoted as ∇ (Nabla). A gradient is thus a

7.5. THE DERIVATIVE FILTER 139 

multidimensional entity; in a two dimensional image we obtain a two dimensional gradient vector 

with a length and a direction. We now have to associate with each location x, y in the image these 

two entities. The length of the gradient vector is of course the Pythagorean sum of its elements, 

namely of the derivatives of the gray-value function with respect to x and y. We typically use 

the magnitude of the gradient vector and ignore it’s direction. However, this is not true in every 

instance. 

We are not dealing with continuous tone images but discrete renditions in the form of pixels and 

discrete matrices of numbers. We can approximate the computation of a gradient function by 

means of a three by three matrix as explains. The 3 × 3 matrix has nine values z 1 , z 2 , . . . , z 9 . We 

approximate the derivative my means of a first difference, namely z 5 − z 8 , z 5 − z 6 , and so forth. 

The magnitude of the gradient function is being approximated by the expression shown in Slide 

7.34. We define a way of computing the gradient in a discrete, sampled digital image avoiding 

squares and square roots. We can even further simplify the approximation as shown in Slide 7.34 

namely as the sum of the absolute values of the differences between pixel gray values. We can 

also use gradient approximations by means of cross-differences, thus not by horizontal and vertical 

differences along rows and columns of pixels in the window. 

Some of these approximations are associated with their inventors. The gradient operator 

∇f ≈ |z 5 − z 9 | + |z 6 − z 8 | 

is named after Roberts. Prewitt’s approximation is little more complicated: 

∇f ≈ |(z 7 + z 8 + z 9 ) − (z 1 + z 2 + z 3 )| + |(z 3 + z 6 + z 9 ) − (z 1 + z 4 + z 7 )| 

Slide 7.36 computation is being implemented. Two filter functions are sequentially being applied 

to the input image, and the two resulting output images are being added up. The Roberts 

operates with two windows of dimensions 2 × 2. The case of Prewitt uses two windows with 

dimensions 3 × 3, and a third gradient approximation by Sobel also uses two 3 × 3 windows. 

Lets take a look at an example: Slide 7.37 shows a military fighter plane and the gradient image 

derived from it using the Prewitt-operator. These gradient images can then be post-processed 

by e.g. removing the background details, simply by reassigning gray values above a certain level 

to zero or one, or assign gradients of a certain value to a particular colour such as white or black. 

This will produce from an original image the contours of its objects, as seen Slide 7.37. 

We call the resulting image, after a gradient operator has been applied, an edge-image. However, 

in reality we don’t have any edges yet. We still have a gray-tone image that visually appears like 

as image of edges and contours. To convert this truly to an edge image we need to treshold the 

gradient image so that only the highest valued pixels get a value one and all lower value pixels are 

set to 0 (black) and are called “background”. 

This means that we have produced a binary image where the contours and edgy objects are marked 

as binary elements. We now need to remove the noise, for example in the form of single pixels 

using by a morphological filter. We also have to link up the individual edge pixels along the 

contours so that we obtain contour lines. Linking up these edges is an operation that has to do 

with “neighbourhoods” (Chapter 1) we also need to obtain skeletons and connected sequences of 

pixels as discussed previously (Chapter 3). 


• Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett 

umrandeten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das 

Ergebnis direkt in Abbildung B.13 eintragen.


9 9 8 8 6 7 6 6 

7 8 9 8 7 2 3 1 

6 8 7 8 3 2 0 1 

8 7 8 2 3 1 1 2 

7 6 7 1 0 2 3 1 

7 6 8 2 2 1 2 0 

0 

2 

3 

5 

12 

12 

6 

4 

3 

3 

2 

1 

2 

3 

2 

Figure 7.4: Roberts-Operator 

• Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden. 

Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des 

fett umrandeten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen. 

Führen Sie außerdem für eines der Pixel den Rechengang vor. 

• Wenden Sie auf den fett umrandeten Bereich in Abbildung B.34 den Roberts-Operator zur 

Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen. 

Antwort: Siehe Abbildung 7.4 

7.6 Filtering in the Spectral Domain / Frequency Domain 

We define a filter function H(u, v) in the spectral domain as a rectangular function, a so-called 

box function. Multiplying the Fourier transform of an image by H(u, v) produces the spectral 

representation of the final image G(u, v) as a product of H and F. We have a transform-function of 

the shape of filter-function H as shown in Slide 7.39, that has the value 1 at the origin, and from 

the origin to a value D 0 and we assume that H is rotationally symmetric. In the frequency domain 

the value D 0 is denoted as a cut-of-frequency. Any frequency beyond D 0 will not be permitted 

through the filter function. 

Let us take a look of how this works. In Slide 7.40 we have the image of the head of a bee course in 

the spectral domain we would not be able to judge what the image shows. We can create a spectral 

representation by applying a Fourier-transform to the image and we can now define circles in 

the spectral representation with the centre at the origin of the spectral domain and radius that 

contains 90%, 93% or more of the image frequencies, also denoted as the “energies”. Now if we 

apply a filter function H is shown before that will only let the frequencies pass through within 

90% of the energy and than we transform the resulting function G back from this spectral into 

the spatial domain to obtain an image g we obtain a blurred version of the original image. As we 

let more frequencies go throw the blur will be less and less. What we have obtained is a series of 

low-pass filtered images of the head of the bee and we also have indicated how much of the image 

content we have filtered out and how much we have let go through the low-pass filter. 

If we transform the function H from the spectral domain into the spatial domain, we obtain Slide 

7.41. If we apply this filter function to an image that contains nothing but 2 white points, we 

will obtain an image g that will appear corrupted, presenting us with a ghost image. We should 

therefore be careful with that type of box filter (in the spectral domain). The ghost images of 

high contrast content in our input image will be disturbing. It is advisable to not use such a box 

filter, which is sometimes also called ideal filter. We should use instead an approximation . We 

introduce the Butterworth filter as shown in . The Butterworth filter is represented in the 

spectral domain by a curve as shown in the slide: 

H(u, v) = 

1 + 

1 

( ) 2n 

D(u,v) 

D 0

7.7. IMPROVING NOISY IMAGES 141 

In two dimensions this is a volcano-like shape. Applying that type of filter now to the bee produces 

a series of low pass filtered images without ghost images . A straight example of the difference of 

a applying the box-filter as opposed to the Butterworth filter is shown in . 

Of course this entire discussion of spectral and spatial domains and of convolutions and filters 

requires space, time and effort and is related to a discussion of the Fourier transform, and 

the effect of these transforms and of the suppression of certain frequencies on the appearance of 

functions. Typically throughout an engineering program the signals are mostly one-dimensional, 

whereas in image processing the typical signals are two-dimensional. A quick view of Fourier 

transforms of certain functions illustrates some of what one needs to be aware off. Slide 7.47 

presents one function F (u) in the spectral domain in the form of a rectangular pulse. Its transform 

into the spatial domain gives us the sinc-function, as previously discussed as f(x) = sin(πx)/(πx). 

Now if we cut off the extremities of f(x) and then transform that function back into the spectral 

space we obtain a so-called ringing of the function. Giving up therefore certain frequencies in the 

spectral domain can lead to a certain noisiness of the signal in the other domain. 


• Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit 

der ” 

cutoff“-Frequenz D 0 an! Skizzieren Sie die Transferfunktion! 

7.7 Improving Noisy Images 

There are many uses of filters. We have already found the use of filters to enhance edges, and 

pointed out that filters transform individual pixels. We may use filters also to remove problems 

in images. Let us assume that we have compressed an image from 8 bits to 4 bits and therefore 

have reduced a number of available grey values to 16. We have an example in Slide 7.49 where 

the low number of gray values creates artefacts in the image in the form of gray value contours. 

By applying a low-pass filter we can suppress the unpleasant appearance of false density contours. 

Another example also in Slide 7.49 is an image with some corruption by noise. A low pass filter 

will produce a new image that is smoother and therefore more pleasant to look at. Finally, we 

want to revisit the relationship between “filter” and “sampling”.. Slide 7.51 illustrates again the 

monkey-face: smoothing an image by a low-pass filter maybe equivalent to sampling the image, 

then reconstructing it from the samples. 


• Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer 

diskretisierten Bildfunktion. Erklären Sie diese Behauptung! 

7.8 The Ideal and the Butterworth High-Pass Filter 

??? inspection we may want to use high-pass filters, because our eye likes the crisp, sharp edges 

and a high level of energy in an image. Slide 7.53 introduces such a high pass filter in the 

spectral domain. The ideal high-pass filter lets all high frequencies go through and supresses 

all low frequencies. This “ideal” filter has the same problems as we have seen in the low-pass 

case. Therefore we may prefer the Butterworth high-pass filter is not a box, but a monotonaes 

function. Of course in the 2-dimensional domain the ideal and Butterworth high-pass filters 

appear like a brick with a hole in it. Application of the high-pass filter is to enhance the contrast 

and bringing out the fine detail of the object as shown in . The high-pass filter improves the


appearance of the image, suppresses the background. If we addin the original image in a highpass 

filtered version we come again to a type of ”emphasis filter” that we have seen earlier under 

the name “unsharp masking” (USM). The resulting image can be processed into an equalized 

histogram for optimum visual inspection. 

Again high-pass filters can be studied in booth the spatial and the spectral domains . We have 

the sinc function in the spectral domain which represents in the spatial domain a box-function, a 

pulse. The sinc 2 function is a triangular function in the spatial domain. And a Gaussian function 

will remain a Gaussian function both in the spectral and in the spatial domains. 


• Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters 

und vergleichen Sie die Vor- und Nachteile beider Filtertypen! 

7.9 Anti-Aliasing 

7.9.1 What is Aliasing ? 

Recall the rasterization or scan-conversion of straight lines and curves, and the resulting aliasing. 

Suppose we work with a trigonometric function of some sort. This function is being sampled 

at certain widely spaced intervals. Reconstruction of the function from samples will produce 

a particular function that’s not really there. What is shown in Slide 7.58 is a high frequency 

function, whereas the samples describe a low frequency sinus-curve. We denote the falsification of 

the original function into one of a different frequency with “aliasing”. This type of aliasing is a 

widely reviewed subject of sampling theory and signal processing and is not particular to image 

processing or graphics. 

Aliasing is a result of our need to sample continous functions, both in the creation of images and 

in the creation of visualizations of objects in computer graphics. 

7.9.2 Aliasing by Cutting-off High Frequencies 

explains the issue further with an excursion into sampling theory. We have an input-image f(x) in 

the spatial domain that needs to be sampled. As we go into the spectral domain we cannot use all 

frequencies. We cut them off at w and we loose all frequencies outside the interval −w ≤ F (u) ≤ w. 

Let us now define a sampling function in the spatial domain as s(x), consisting of a series of Diracfunctions 

at an interval ∆x. The multiplication of f(x) with s (x) produces the sampled function in 

the spatial domain. As we go into the spectral domain we also obtain a set of discrete frequencies 

s(u) at 1/∆x, 2/∆x. 

If we now convolve (in the spectral domain) the sampling function S(u) with the original function 

F (u) we get the spectral view of the sampled function f(x) · s(x). We see the original function 

F (u) repeated at locations −1/∆x, +1/∆x, . . . Transforming this back into the spatial domain 

produces samples from which the original function f(x) can only be incompletely reconstructed. 

What is now the effect of changing ∆x? If we make it smaller we get a more accurate sampling 

of the input function in accurdance with . We see in the spectral domain that the repetitions 

of the original function F (u) in F (u) ∗ S(u) are spaced apart at wider intervals 1/∆x, as ∆x 

gets smaller. Slide 7.62 illustrates that we could isolate the spectrum of our function f(x) by 

multiplying F (x) ∗ S(u) by a box filter G(u), producing F (u) and we can fully reconstruct f(x) 

from the samples. If w was the smallest frequency in our function f(x) or F (u), then we have no

7.9. ANTI-ALIASING 143 

less if the sampling interval ∆x is smaller than 1/2w: 

∆x ≤ 1 

2w 

Whittaker-Shannon theorem 

In turn we define a cut-off frequency w and denote it the Nyquist-frequency that is fully represented 

by a sampling interval ∆x if w = 1/(2∆x) Nyquist frequency. 

7.9.3 Overcoming Aliasing with an Unweightable Area Approach 

Of course the implementation is again as smart as possible to avoid multiplications and divisions, 

and replaces them by simpler operations. The approach in Slide 7.63. Aliasing occurs if ∆x 

violates the Whittaker-Shannon theorem. Anti-Aliasing by means of a low-pass filter occurs 

in the rasterization or scan conversion of geometric elements in computer graphics. We have 

discussed this effect in the context of scan conversion by means of the Bresenham-approach. 

Slide 7.63 explains another view of the issue, using the scan-conversion of a straight line. We can 

assign grey values to those pixels that are being touched by the area representing the “thin-line”. 

This would produce a different approach from Bresenham because we are not starting out from 

a binary decision that certain pixels are in, all others are out: We instead select pixels that are 

“touched” by the straight line, and assign a brightness proportional to the area that the overlap 

takes up. 

7.9.4 Overcoming Aliasing with a Weighted Area Approach 

Algorithm 21 Weighted Antialiasing 

1: set currentX to x-value of start of line 

2: while currentX smaller than x-value of end of line do 

3: apply Bresenham’s Line Algorithm to get appropriate currentY -value 

4: consider three cones (each with diameter of 2 pixels and volume normalized to 1) 

erected over the grid positions (currentX, currentY + 1), (currentX, currentY ) and 

(currentX, currentY - 1 ) 

5: for all cones do 

6: determine the intersection of the cone’s base with the line 

7: calculate the volume above the intersection 

8: multiply the obtained volume with the desired gray value 

9: take the result and set it as the pixel’s gray value 


11: increase currentX 

12: end while 

In weighted area sampling we also decrease a pixel´s brightness as it has less overlap with the area 

of the “thin line”. But not all overlap areas are treated equal! We introduce a “distance” from the 

center of a pixel for the overlap area. With this basic idea in mind we can revisit the unweighted 

area sampling and treat all overlap areas equally, implementing a “box-filter” as shown in . Each 

overlap area is multiplied with the same value represented us the height of the box, normalized to 

1. 

A weighted area sampling approach is shown in . The “base” of the filter (its support) is circular 

and larger than a pixel, typically with a diameter at 2x the pixel´s side length. The height of the 

filter come in such that its volume is 1. 

illustrates the effect that a maring small triangle would have on pixels as it moves across an image. 

The triangle is smaller than a pixel.


Getting Antialiased Lines by Means of the Gupta-Sproull approach. 

Algorithm 22 Gupta-Sproull-Antialiasing 

1: dx := x2 − x1; 

2: dy := y2 − y1; 

3: d := 2*dy − dx; 

4: incrE := 2∗ dy; 

5: incrNE := 2 ∗ ( dy − dx); 

6: two v dx := 0; 

7: invDenom := 1/(2∗ Sqrt(dx ∗ dx + dy ∗ dy)); 

8: two dx invDenom := 2∗ dx ∗ invDenom; 

9: x := x1; 

10: y := y1; 

11: IntensifyPixel (x, y, 0); 

12: IntensifyPixel (x, y +1, two dx invDenom); 

13: IntensifyPixel (x, y −1, two dx invDenom); 

14: while x < x2 do 

15: if d < 0 then 

16: two v dx := d + dx; 

17: d := d + incrE; 

18: x := x + 1; 

19: else 

20: two v dx := d − dx; 

21: d := d + incrNE; 

22: x := x + 1; 

23: y := y + 1; 

24: end if 

25: IntensifyPixel (x, y, two v dx ∗ invDenom); 

26: IntensifyPixel (x, y +1, two dx invDenom-two v dx ∗ invDenom); 

27: IntensifyPixel (x, y −1, two dx invDenom-two v dx ∗ invDenom); 

28: end while 

29: intensity := Filter(Round(Abs(distance))); 

30: WritePixel (x, y, intensity); 

Using the weighted area method, we can pre-compute a table for lines at different distances from 

a pixel’s center. A line will typically intersect those cones centered on three pixels as shown in 

Slide 7.69, but it may intersect also only 2, maximally 5 such cones. The look-up table is filled 

with values, computed by using to the definitions of as a function F (D, t) of two variables: t as 

the line’s thickness and D as the distance from a pixel center. Gupta and Sproull, two early 

pioneers of computer graphics, introduced the table look up for a 4-bit display device. There are 

only 16 values of D needed since a 4-bit display only has 16 different gray values. The Bresenham 

method (the midpoint line algorithm) needs now to be modified to not only decide on the E or 

NE pixel, but we also need to assign a grey value. However, we not only set a grey value for the 

single pixels at E or NE, but also for its two neighbours above and below. 

Slide 7.70 illustrates how distance D is being computed using simple trigonometry: 

dx 

D = v √ 

dx2 + dy 2 

And we need two additional distances D above and D below : 

dx 

D above = (1 − v) √ (7.1) 

dx2 + dy 2


dx 

D below = (1 + v) √ (7.2) 

dx2 + dy 2 


• Erklären Sie, unter welchen Umständen ” 

Aliasing“ auftritt und was man dagegen unternehmen 

kann! 

• In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären 

Sie, wie die Artefakte am oberen Bildrand zustandekommen, und beschreiben Sie eine 

Möglichkeit, deren Auftreten zu verhindern!

146 CHAPTER 7. FILTERING






















150 CHAPTER 7. FILTERING

Chapter 8 

Texture 

8.1 Description 

Texture is an important subject in the analysis of natural images of our environment and in the 

computer generation of images if we want to achieve photo-realism. Slide 8.3 illustrates three 

different sets of textures. The first maybe of pebbles on the ground, the second of a quarry to 

mine stones and the third is a texture of fabric. We can describe texture (a) pictorially be means 

of the photograph of the surface (b) or by a set of mathematical methods: these may be statistical, 

structural or spectral. And finally we will present a procedural approach to modeling and using 

texture. 


• Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an. 

8.2 A Statistical Description of Texture 

Recall the image function as z = f(x, y) with the image gray values z. We can compute so-called 

moments of the image gray values as shown in Slide 8.7. The moments are denoted as µ n (z). 

The first moment, µ 1 (z) is the mean of the gray values. The second moment m 2 (z) represents 

the variance of the gray values with respect to the mean, see definition 8.2 . Moments include the 

probability of the gray value p(z). In a discrete context probability is represented by the histogram 

of the gray values. Obviously if a gray value is very unlikely to occur, its column in the histogram 

will be very low or empty. 

The measure of texture can be a function of these moments. A very simple one is the value R . 

If there is no variation in gray value then its variance σ 2 or its standard deviation σ or its second 

moment µ 2 (z) is 0 or close to 0. In that case the value of R is 0 as well. R therefore represents a 

measure of the smoothness of the image. We can associate a separate value R with each pixel i 

by computing it for a window around that pixel i. 

There are other statistical measures of texture, for example associated with the “edginess” of 

an area. In this case we would produce an edge value associated with each pixel, for example 

representing the number, direction and strength of the edges in small windows surrounding a 

pixel. Nominally we obtain a different texture parameter at each pixel. However, we are looking 

to describe an extended image by regions of similar texture. Therefore we will classify the texture 

parameters into a few groups. We may create an equidensity image as discussed previously. If 

151

152 CHAPTER 8. TEXTURE 

µ n (z) = ∑ [(z i − m) n ∗ p(z i )] 

z ... the grayvalue Image, z i the gray value of the i-th pixel in the Image 

m ... mean value of z ( average intensity ) 

µ n ... n-th moment of z about the mean 

R ... a measure of the relative smoothness 

σ 2 ... variance 

R = 1 − 1/(1 + σ 2 (z)) 

we do that, we might be able to describe a quarry with a only two texture parameters as shown 

in Slide 8.8 and . While the quarry itself has been delineated manually by a human operator, a 

texture parameter is computed within this delineated polygon and the equidensity method applied 

to the texture parameter will define two different textures in this quarry. 

A very frequently used texture measure is the so called co-occurrence-matrix which is seeking to 

describe the occurrence of similar patterns in an image. We are not discussing this in this context 

other than to mention the name. 


• Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden? 

Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern! 

8.3 Structural Methods of Describing Texture 

In order to understand the concept of a structural texture description we refer to Slide 8.11. We 

define a rule that replaces a small window in a image by a pattern, for example we replace a 

window by a pattern “aS” and “a” may represent a circle. If we now apply the same operation 

multiple times, we do get an arrangement of repetitive patterns located adjacent to one another 

in a row and column pattern “a a a S”. We might denote the neighbourhood relationship between 

adjacent areas by different symbols. Slide 8.12 below the current location, “c” to the left. We can 

now describe a certain pattern by a certain sequence of “a”, “b” and “c” operations. A texture 

primitive which is shown here as a circle, could be any kind of other pattern. We set up our 

texture by repeating the pattern. Note again that a this point we are concerned with describing 

texture as we find it in a natural image. We are not, at this time, generating a texture for an 

object that we want to visualise. 


• Erläutern Sie die strukturelle Methode der Texturbeschreibung! 

8.4 Spectral Representation of Texture 

We have previously discussed the technical viability to represent an image in a computer. In 

the spatial domain we are using the rows and colums of pixels, in the spectral domain we are

8.5. TEXTURE APPLIED TO VISUALISATION 153 

using the frequencies. The description of texture using the spectral representation of an image is 

therefore described next. Slide 8.14 illustrates a typical texture pattern. Its spectral representation 

illustrates that there are distinct patterns that are repeated in the image. These are illustrated 

by dominant frequencies in the image. 

We call the two dimensional function in the spatial domain the spectral function s(r, j), where r 

is the radius of a certain spectral location from the origin and j is the angle from the axis x in a 

counter-clock wise direction. Any location in the spectral representation of the image therefore has 

the coordinates r, j. Slide 8.15 explains this further. We can simplify the spectral representation 

of the image into two functions. One functions is a plot of the angle j as a function of r and the 

other is a plot of the function r for a given value of j. Slide 8.16 illustrates two different patterns 

of textures and the manifestation of those patterns in the j-curve. A texture parameter can now 

be extracted from the spectral representation, for example by counting the number of peaks or 

the average distance between the peaks in the spectral domain. 

We can also set up a texture vector with several values that consider the number of peaks as a 

function of the radius r. The aim is to associate with a pixel or a window in the image a simple 

number or vector that is indicative of the type of texture one finds there. Therefore we have 

here a case of classification where we could take a set of know textures and create from those a 

feature space in two or more dimensions (see Chapter 14). If we now have an unknown texture 

we might try to describe this in terms of the known textures using the feature space and looking 

for the nearest texture that we can find given the texture numbers of the unknown texture. In 

this manner we can replace an input image by a texture image which indicates at each location 

the kind of texture which exists there. In classifying areas of similar texture as one area we will 

replace a large number of pixels by a small numbers of textures and a description of the contour 

of an area of uniform texture. 


• Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum 

auf? Welche Aussagen können über eine Textur anhand ihres Spektrums gemacht werden? 

• Das digitalen Rasterbild aus Abbildung B.71 soll segmentiert werden, wobei die beiden 

Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme 

von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung 

hier nicht erfolgreich sein. Welche anderen Bildeigenschaften kann man verwenden, 

um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können? 

8.5 Texture Applied to Visualisation 

To achieve photorealism in the visualisation of two- or three-dimensional objects we employ descriptions 

of texture rather than texture itself. We may apply artificial texture, also denoted as 

synthetic texture and place this on the geometric polygons describing the surface shape of an 

object. The texture itself may consist of texture elements which are also denoted as texels. Slide 

8.19 is an example of some simple objects showing a wire-frame rendering of an indoor scene and 

illustrates how unrealistic this type of representation appears. Slide 8.20 is a result of placing 

photographic texture on top of the objects. We obtain a photorealistic representation of those 

objects. Basic concepts in Slide 8.21 are illustrated by various examples of a two-dimensional flag, 

a three-dimensional indoor-scene, a two dimensional representation of symbols, phototexture of 

wood, texture of some hand-written material. 

How is this photographic texture applied to a geometrically complex object? This is illustrated 

in Slide 8.22, see also 23. We deal with three different coordinate systems. At first we have the 

representation on a monitor or display medium. And a window on this display contains a segment


Algorithm 23 Texture mapping 

1: surround the object with a virtual cylinder 

2: for all pixels of the texture do 

3: make a coordinate transformation from carthesian to cylindric coordinates {to wrap the 

texture on the cylinders surface} 


5: for all points of the object do 

6: project the point perpendicularly from the midpoint of the cylinder to the cylinders surface 

7: where the projection cuts the edge of the object, assign the object point the color of the 

corresponding cylinder point 


of a three dimensional object which is represented in a world-coordinate system. The surface of 

this object needs to be photo-textured and it receives that photo-texture from a texture map with 

its third coordinate system. Essentially we a projecting the texture map onto the curved surface 

of an object and than render the curved surface on the display medium using a transformation 

that results from a synthetic camera and with a camera pose consisting of attitude and angle 

orientation. 


• Erklären Sie, wie in der Visualisierung die Qualität eines vom Computer erzeugten Bildes 

durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften 

(insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer 

Textur eignen. 

• In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht 

zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung 

solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche 

Fälle typisch? 

8.6 Bump Mapping 

In order to provide a realistic appearance to a surface which is not smooth but bumpy, there exists 

a concept called bump-mapping. This applies a two dimensional texture to a three-dimensional 

object and making the two dimensional texture appear as if it were three dimensional. Slide 8.24 

and Slide ?? explain the concept with a donut and a strawberry. Note that the texture really is 

two dimensional. The third dimension is introduced by some 2D-picture of a shadow and detail 

that is not available in the third dimension. This is visible in the contours of the object where 

the bumps on the texture are not reflected in the geometry of the object. In this case we do not 

apply the photographic texture we did use in the previous chapter, but we deal with a computed 

texture. 


• Was beschreibt der Begriff ” 

Bump-Mapping“? 

• In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle 

einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet. 

Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte 

dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt, 

und was sind die typischen Eigenschaften, anhand derer man das Verfahren hier erkennen 

kann?

8.7. 3D TEXTURE 155 

8.7 3D Texture 

Another concept of texture is three dimensional. In this case we do not texture a surface but an 

entire three dimensional body. An example is shown in Slide 8.27 where the surface results from 

the intersection of the three dimensional texture body with the surface geometry. 


• Was ist eine ” 

3D Textur“? 

8.8 A Review of Texture Concepts by Example 

Slide 8.29 illustrates from an animated movie an example of the complexities of applying photorealistic 

textures to three dimensional objects. We begin with basic shapes of the geometric entity 

and apply to it some basic colours. We superimpose on these colours an environment map. This 

is again modified by a bump map and the appropriate illumination effect. The intermediate result 

is shown in Slide 8.31 adding dirt specks for additional realism. Slide 8.32 adds further details: 

We want to add details by creating a near-photographic texture, by adding more colour, the effect 

of water troplets, mirror and spectral reflections. 

We should not be surprised that the creation of such animated scenes consumes growing computing 

power and therefore takes time the complete. The final result is in Slide 8.33. 

8.9 Modeling Texture: Procedural Approach 

As previously discussed we process natural images to find a model of texture and we use those 

models to create images. Slide 8.35 details the method of analysing existing texture. We have a 

real scene of an environment and we do understand from the image model that the intensity in 

the image is a function of the material property f r and the illumination E i . Material property is 

unknown and needs to be determined from the raster image. Illumination is known. We estimate 

the model parameters for the material property and we use it to approximate objects. The photo 

texture use this and the virtual scene with the unknown material and illumination properties to 

compute the density per pixel and thereby obtain a synthetic image. An issue is now to find a 

method of model an unknown texture by simple curves. 

Slide 8.36 explains how a reference surface, a light source, a camera and a texture to be analysed 

can be set up into a sensor system. The resulting image is illustrated in Slide 8.37 with the known 

reference surface and the unknown texture. 

We need to have the reference texture so that we can calibrate the differences in illumination. As 

seen in the previous slide we have an image of texture and an effect of illumination, particularly 

we may have mirror or specular reflection. We do not discuss models for reflection at this time but 

just show a given model in Slide 8.38 for illustration purposes. We have for each pixel a known 

gray value f and we know the angle Q i under which a pixel is being illuminated and how the 

reflection occurs . We will discuss the parameters of the illumination model in Chapter ??. We 

need to compute the parameters of the reflections that are marked. 

In Slide 8.39 we study a particular column of pixels that represent a gray value curve of an 

unknown photo texture. The question is: what is “texture” here ? Slide 8.40 explains. We do 

have the actual brightness along the row of pixels plotted and we model the change of brightness 

as a function of the illumination with an average that we can calibrate with our reference pattern. 

The deviation from the average is then the actual texture in the form of an irregular signal. We


now need to describe that signal statistically by means of a few simple numbers. How to do this 

is a topic of “statistical signal analysis”, for example in a spectral representation of the signal as 

previously discussed in section 8.4. 

Let us review the basic idea in a different way. We have an image of a surface and we can 

take a little window for analysis. We can create a texture surface by projecting that window 

multiple times onto the surface and we may obtain in the process some type of “tiling effect”. The 

procedural texture discussed before will model the surface texture by mathematics and avoid the 

seaming effect of the individual tiles. We can create any kind of shapes in our synthetic surface as 

shown in Slide 8.43. We can illustrate in Slide 8.44 that those shapes can be fairly complex even 

in three dimensions. 



prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile 

bringt ihr Einsatz?

8.9. MODELING TEXTURE: PROCEDURAL APPROACH 157









8.9. MODELING TEXTURE: PROCEDURAL APPROACH 159 





160 CHAPTER 8. TEXTURE

Chapter 9 

Transformations 

9.1 About Geometric Transformations 

We will discuss in this chapter the transformation of objects in a fixed coordinate system, the 

change of coordinate systems with a fixed object, the deformation of objects, so that from an 

input object the geometrically changed output object results, we will discuss projections of the 

3D-world into a 2D-display plane and finally we will discuss under the heading of “transformations” 

the change in representation of an object if we approximate it by simple functions and we denote 

this as approximation and interpolation. 

Geometric transformations apply when objects move in a fixed world coordinate system, but they 

also apply when we need to look at objects and have to create images or use images of objects to 

reconstruct them. In that case we need to understand the projection of the object into an image 

or display medium. A very important application of geometric transformation is in robotics. This 

is can be unrelated to the processing of digital visual information, but employ the same sets of 

formule and ideologies of “transformation”. A simple robot may have associated with it numerous 

coordinate systems which are attached to its rigid elements. Each coordinate system is related 

to each other coordinate system by a coordinate transformation. Slide 9.3 and Slide 9.4 explain 

how a world coordinate system is home to the robot’s body which in turn is the reference for the 

robot arm. The arm holds the hand, the hand holds the fingers and the fingers seek to relate to an 

object or box which itself is presented in the world coordinate system. Slide 9.4 illustrates these 

six coordinate systems in a simplified presentation in two dimensions. 

Our interest is in geometric transformations concerning the use of imagery. Slide 9.5 illustrates 

an early video image of the surface of planet Mercury, from NASA’s Mariner mission in the mid 

1960’s. We do need to relate each image to images taken from other orbits, and we need to place 

each image into a coordinate reference frame that is defined by meridians, the equator and poles of 

the planet. Slide 9.6 represents a geometric rectification of the previous image. The transformation 

is into a Mercator or Stereographic projection. We can see the geometric correction of the image 

if we note that the craters which were of elliptical shape in the original image, now approximate 

circles, as they would appear from an overhead-view straight down. Such a view is also denoted 

as an orthographic projection. 

9.2 Problem of a Geometric Transformation 

The geometric transformation applies typically to a 2-dimensional space in the plane, the 3- 

dimensional space as in the natural human environment and more generally to n-dimensional space. 

161

162 CHAPTER 9. TRANSFORMATIONS 

In the processing of digital visual information, most of our geometric transformations address the 

3-dimensional space of our environment and the 2-dimensional space of a display medium. Slide 

9.8 illustrates the transformation of objects in a rigid coordinate system (x, y) in a 2-dimensional 

space. We have in this example 2 objects, 1 and 2, before the transformation, and 1’, 2’ after 

the transformation. The general model of a rigid body transformation in 2-D the space is shown 

inSlide 9.9: the equation takes the input (x, y) coordinates and produces from them the output 

x ′ , y ′ coordinates using transformation parameters a 0 , a 1 , a 2 and b 0 , b 1 , b 2 . Slide 9.10 illustrates 

the usefulness of this formulation, if we have given objects before and after the transformation 

and we need to determine (“estimate”) the unknown parameters of the transformation. Given are 

therefore: x 1 , y 1 , x 2 , y 2 , x ′ 1 , y ′ 1, x ′ 2, and y ′ 2, and we seek to compute a 0 , a 1 , a 2 , b 0 , b 1 , b 2 . 

We may also know the transformation parameters and need to compute for each given input 

coordinate pair (x, y) its associated output coordinate pair x ′ , y ′ as illustrated in Slide 9.11. 

This concludes the introduction of the basic ideas of transformations using the example of 2 

dimensional space. 

9.3 Analysis of a Geometric Transformation 

We will use the example of a 2-dimensional object that is transformed in 2D-space under a socalled 

conformal transformation which does not change the angles of the object. The following 

illustrates in Slide 9.13, Slide 9.14, Slide 9.15 through Slide 9.16 the elements from which a 

geometric transformation in 2D-space is assembled. A very basic element of a transformation 

always is the translation. We add to each pair (x, y) an object’s translational component t x and 

t y to produce the output coordinates x ′ , y ′ . 

Definition 18 Conformal transformation 

x ′ = s · cos(α) · x − s · sin(α) · y + t x 

y ′ = s · sin(α) · x + s · cos(α) · y + t y 

A second important transformational element is scaling. An object gets reduced or enlarged by a 

scale factor s ( see definition 18 ), and more generally we might use 2 different scale factors in the 

x coordinate direction denoted s x and in the y coordinate direction denoted s y . As a result, we 

may obtain a squished thus deformed object. We call a deformation by means of 2 different scale 

factors an affine deformation and will discuss this later. 

Finally we have rotations and rotate an object by an angle α. The transformation equation 

representing the rotation is shown in Slide 9.15. For a rotation we need a point around which we 

rotate the object. Normally this is the origin of the coordinate system. The general expression for 

a rotation using a rotation angle α produces the output x ′ , y ′ coordinates from the input (x, y) 

coordinates by multiplying those coordinates with cos α and sinα in accordance with Slide 9.16. 

This can also be presented in matrix notation, resulting in the expression p ′ = R · p, and we call 

R the rotation matrix. 

What makes now a transformation in 2D-space specifically a conformal transformation? We 

already stated that this does not change any angles. Obviously this requires that our body not 

be changed in shape. Instead it may be enlarged or reduced, it may be translated and it may be 

rotated, but right angles before the transformation will be right angles after the transformation 

as well. Slide 9.17 explains that we combine the three elements of the 2D-transformation that we 

denoted as scaling by factor s, rotating by angle α and translating by the translation elements t x

9.3. ANALYSIS OF A GEOMETRIC TRANSFORMATION 163 

and t y . We call this a four parameter transformation since we have four independent elements of 

the transformation: s, α, t x , t y . In matrix notation this transformation is 

x ′ = s · Rx + t, 

and s · R can be replaced by the transformation matrix M. 

We have described a transformation by means of Cartesian coordinates (x, y). One could use 

polar coordinates (r, φ). A point with coordinates (x, y) receives the coordinates (r, φ). A rotation 

becomes a very simple operation, changing the angle φ by the rotation angle ω. The relationships 

between (x, y) and (r, φ) are fairly obvious: 

x = r cos φ, y = r sin φ. 

A rotated point p ′ will have the coordinates r cos(φ + ω) and r sin(φ + ω). 

When performing a transformation we may have a fixed coordinate system and rotate the object 

or we may have a fixed object and rotate the coordinate system. In Slide 9.20 we explain how a 

point p with coordinates (x, y) obtains coordinates (X, Y ) as a result of rotating the coordinate 

system by an angle α. Note that the angle α is the angle subtended between the input and output 

axes. We can therefore interpret that the rotation matrix is not only filled with the elements cos α, 

sin α, but we can interpret the rotation matrix to be filled with the angle subtended between the 

rotation axes before and after rotation and we have the angles xX, xY , yX, yY and have them 

all enter the rotation matrix with a cos(xX), cos(xY ) etc. 

We have thus found in Slide 9.20 a second definition for the contents of the rotation matrix: first 

was the interpretation of R with cos α and sin α of the rotation angle α. The second now is, 

that the elements of the rotation matrix are the cosinus of the angles subtended by the input and 

output coordinates. 


• In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Transformation M in das 

Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an, 

die diese Transformation beschreibt (zwei verschiedene Lösungen)! 

Antwort: Zwei verschiedene Lösungen ergeben sich, weil das Objekt symmetrisch ist und 

um die y-Achse gespiegelt werden kann, ohne verändert zu werden. 

⎛ 

2 0 

⎞ 

4 

M 1 = ⎝ 0 0.5 3 ⎠ 

0 0 1 

⎛ 

⎞ 

−2 0 12 

M 2 = ⎝ 0 0.5 3 ⎠ 

0 0 1 

• Berechnen Sie jene Transformationsmatrix M, die eine Rotation um 45 ◦ im Gegenuhrzeigersinn 

um den Punkt R = (3, 2) T und zugleich eine Skalierung mit dem Faktor √ 2 bewirkt 

(wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei 

Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p ′ = Mp in den Punkt 

p ′ übergeführt wird. 

Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für 

die Matrixmultiplikation geeignet anwenden. 

Antwort: 

M = T(3, 2) · S( √ 2) · R(45 ◦ ) · T(−3, −2)


= 

= 

= 

⎛ 

⎝ 1 0 3 

0 1 2 

0 0 1 

⎛ 

⎝ 

⎛ 

⎝ 

1 0 3 

0 1 2 

0 0 1 

1 0 3 

0 1 2 

0 0 1 

⎞ ⎛ √ ⎞ ⎛ 

2 

√ 0 0 

⎠ · ⎝ 0 2 0 ⎠ · 

⎞ 

0 0 1 

⎛ 

⎞ 

1 −1 0 

⎛ 

⎠ · ⎝ 1 1 0 ⎠ · ⎝ 

0 0 1 

⎞ 

⎛ 

⎠ · ⎝ 

1 −1 −1 

1 1 −5 

0 0 1 

⎞ 

⎝ cos 45◦ − sin 45 ◦ 0 

sin 45 ◦ cos 45 ◦ 0 

0 0 1 

⎠ = ⎝ 

1 0 −3 

0 1 −2 

0 0 1 

⎛ 

⎞ 

⎠ 

1 −1 2 

1 1 −3 

0 0 1 

⎞ 

⎠ 

⎞ 

⎠ · 

⎛ 

⎝ 1 0 −3 

0 1 −2 

0 0 1 

⎞ 

⎠ 

• Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Transformationsmatrix (in 

zwei Dimensionen) gefragt, die sich aus einer Skalierung und einer Rotation um ein beliebiges 

Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Transformation? 

Begründen Sie Ihre Antwort! 

Antwort: Rotationszentrum (r x , r y ), Rotationswinkel (ϕ) und Skalierungsfaktor (s) ergeben 

vier Freiheitsgrade. 

• Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung 

liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angewandt 

” 

werden, wobei 

⎛ 

T = ⎝ 1 0 t ⎞ ⎛ 

x 

0 1 t y 

⎠ , S = ⎝ s 0 0 

⎞ 

0 s 0 ⎠ . 

0 0 1 

0 0 1 

Nach der Tranformation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt 

soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p 

des Objekts gemäß obiger Vorschrift in einen Punkt p ′ = M · p des transformierten Objekts 

überführt. Welche ist die richtige Lösung: 

1. M = T · S 

2. M = S · T 

Begründen Sie Ihre Antwort und geben Sie M an! 

Antwort: Antwort 1 ist richtig, da durch die Skalierung der Schwerpunkt genau dann 

unverändert bleibt, wenn er im Koordinatenursprung liegt. Die anschließende Translation 

verschiebt das Objekt (und damit den Schwerpunkt) an die gewünschte Position. Es ist also 

⎛ 

M = T · S = ⎝ 1 0 t ⎞ ⎛ 

x 

0 1 t y 

⎠ · ⎝ s 0 0 

⎞ ⎛ 

0 s 0 ⎠ = ⎝ s 0 t ⎞ 

x 

0 s t y 

⎠ 

0 0 1 0 0 1 0 0 1 

• Gegeben seien eine 3 × 3-Transformationsmatrix 

⎛ 

⎞ 

M = −4 3 1 ⎠ 

⎝ 3 4 2 

0 0 1 

sowie drei Punkte 

a = (2, 0) T , 

b = (0, 1) T , 

c = (0, 0) T

9.4. DISCUSSING THE ROTATION MATRIX IN TWO DIMENSIONS 165 

im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine 

konforme Transformation, wobei ein Punkt p gemäß p ′ = Mp in einen Punkt p ′ übergeführt 

wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und 

bc stehen normal aufeinander. 

1. Berechnen Sie a ′ , b ′ und c ′ durch Anwendung der durch M beschriebenen Transformation 

auf die Punkte a, b und c! 

2. Da M eine konforme Transformation beschreibt, müssen auch die Punkte a ′ , b ′ und 

c ′ ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall 

ist! (Hinweis: es genügt zu zeigen, dass die Strecken a ′ c ′ und b ′ c ′ normal aufeinander 

stehen.) 

Antwort: 

1. 

2. 

a ′ = (8, −7) T 

b ′ = (6, 4) T 

c ′ = (2, 1) T 

a ′ − c ′ = (6, −8) T 

b ′ − c ′ = (4, 3) T 

(a ′ − c ′ ) · (b ′ − c ′ ) = 6 · 4 + (−8) · 3 = 0 

9.4 Discussing the Rotation Matrix in two Dimensions 

A rotation matrix R is filled with four elements it if concerns rotations in two dimensions. 

Definition 19 Rotation in 2D 

x ′ = x · cos θ − y · sin θ 

y ′ = x · sin θ + y · cos θ 

written in matrix-form: 

( cos θ − sin θ 

R = 

sin θ cos θ 

( ) ( ) 

x 

′ 

x 

y ′ = R · 

y 

) 

As shown in 9.1 two elements can be combined into a unit vector, namely unit vectors i and j. 

The rotation matrix R consists of i, j, which are the unit vectors in the direction of the rotated 

coordinate system. We can show that the rotation matrix has some interesting properties, namely 

that the multiplications of the unit vectors with themselves are 1, and that the cross-products of 

the unit vectors are zero. ( see also Slide 9.22 )


Definition 20 2D rotation matrix 

A point of an object is rotated about the origin by multiplying it with a so called rotation matrix. 

When dealing with rotations in two dimensions the rotation matrix R consists of four elements. 

These elements can be combined into two unit vectors i and j. 

( ) ( 

cos α 

− sin α 

i = 

, j = 

sin α 

cos α 

( ) 

cos θ − sin θ 

R = 

= (i, j) 

sin θ cos θ 

( ) ( ) 

x 

′ 

x 

y ′ = R · 

y 

) 

Starting from a given coordinate system with axes X and Y the vectors i and j correspond to the 

unit vectors in the direction of the rotated coordinate system (see Figure 9.1). 

Figure 9.1: rotated coordinate system 

We have now found a third definition of the rotation matrix element, namely the unit vectors 

along the axes of the rotated coordinate system as expressed in the input coordinate system. Slide 

9.23 summarizes the 3 interpretations of the elements of a rotation matrix. 

Let’s take a look at the inverse of a rotation matrix. Note that if we premultiply a rotation 

matrix by its inverse we get the unit matrix (obviously). But we also learn very quickly, that 

premultiplying the rotation matrix with the transposed of the rotation matrix also produces the 

unit vector, which very quickly proves to us in accordance with Slide 9.24 that the inverse of a 

rotation matrix is nothing else but the transposed rotation matrix. 

We now take a look at the forward and backward rotation. Suppose we have rotated a coordinate 

system denoted by x into a new coordinate system of X. If we now premultiply the new coordinate 

system with the transposed rotation matrix, we obtain the inverse relationship and see that we 

obtain, in accordance with Slide 9.25, the original input coordinates. Therefore we know that the 

transposed of a rotation matrix serves to rotate back the rotated coordinate system into its input 

state. 

Let’s now take a look at multiple sequential rotations. We first rotate input coordinates x into 

output coordinates x 1 and then we rotate the output coordinates x 1 further into coordinates x 2 .

9.5. THE AFFINE TRANSFORMATION IN 2 DIMENSIONS 167 

We see very quickly that x 2 is obtained from the product of two rotation matrixes R 1 and R 2 . 

However, it is also very quickly evident that multiplying two rotation matrixes produces nothing 

else but a third rotation matrix. 

Definition 21 Sequenced rotations 

x 1 = R 1 x 

x 2 = R 2 x 1 

x 2 = R 2 R 1 x = Rx 

R = R 2 R 1 

It is important, however, to realize that matrix multiplications are not commutative: R 2 · R 1 is 

not necessarily identical to R 1 · R 2 ! 


• In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen 

nicht kommutativ ist, d.h. für zwei Transformationsmatrizen M 1 und M 2 gilt M 1·M 2 ≠ M 2· 

M 1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R 1 

und R 2 , so gilt sehr wohl R 1·R 2 = R 2·R 1 . Geben Sie eine geometrische oder mathematische 

Begründung für diesen Sachverhalt an! 

Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt! 

Antwort: Bei der Drehung um eine fixe Rotationsachse addieren sich die Rotationswinkel, 

die Reihenfolge der Rotationen spielt daher keine Rolle. 

9.5 The Affine Transformation in 2 Dimensions 

Slide 9.28 is an example of an Affine Transformation created with the help of a letter “F”. We 

see a shearing effect as a characteristic feature of an Affine Transformation. Similarly, Slide 9.29 

illustrates how a unit square will be deformed for example by squishing it only along the axis x 

but not along the axis y, or by shearing the square in one direction or in the other direction. All 

these are effects of an Affine Transformation. 

Slide 9.30 provides us with the equation for a general Affine Transformation in 2 dimensions. We 

see that this is a six parameter transformation, defined by transformation parameters a, b, c, d, 

t x , t y . We may again ask the question of estimating the unknown transformation parameters 

if we have given a number of points both before and after the transformation. Question: How 

many points do we need at a minimum to be able to solve for the unknown six transformation 

parameters. Obviously we need three points, because each point provides us with two equations, 

so that three points provide us with six equations suitable of solving for the six unknown equation 

parameters. But be aware: those three points cannot be colinear! 

Let us now analyze the elements of an Affine Transformation and let us take a look at Slide 9.32, 

Slide 9.33 however recalling what we saw from Slide 9.13 to Slide 9.15. First, we see a scaling of 

the input coordinates, in this case denoted as p x and p y , independently by scaling factors s x and 

s y to obtain output coordinates q x and q y . We can denote the scaling operations by means of a 

2 × 2 scaling matrix M sc as shown in Definition ??. 

Secondly, we have a shearing deformation which adds to each coordinate x an increment that is 

proportional to y and we add in y an augmentation that is proportional to the x coordinate using a


proportionality factor g. That shearing transformation can be described by a matrix M sh shearing 

(see Definition ?? ). Thirdly, we can introduce a translation adding to each x and y coordinate 

the translational element t x and t y ( see Definition ?? ). 

Finally, we can rotate the entire object identical to the rotation that we saw earlier using a rotation 

angle α and producing a rotation matrix M R ( see Chapter 9.4 ). An Affine Transformation is now 

the sum total of the transformations, thus the product of three transformations: M sc for scale, 

M sh for shearing and M R for rotation and adding on the translation as discussed previously. 

Slide 9.34 further explains how the transformation of the input coordinate vector p into an output 

coordinate q is identical to the earlier two equations, converting the input coordinate pair (x, y) 

into an output coordinate pair (x ′ , y ′ ) via a six parameter affine transformation. 

Definition 22 Affine transformation with 2D homogeneous coordinates 

⎛ 

⎝ x′ 

y ′ 

w ′ 

⎞ 

⎠ = 

⎛ 

⎝ s x 0 0 

0 s y 0 

0 0 1 

⎞ 

⎠ · 

⎛ 

⎝ x y 

w 

⎞ ⎛ 

⎠ = M sc 

⎝ x y 

w 

⎞ 

⎠ 

⎛ 

⎝ 

⎛ 

x ′ 

y ′ 

w ′ 

⎝ x′ 

y ′ 

w ′ 

⎞ 

⎠ = 

⎞ 

⎠ = 

⎛ 

⎝ 

⎛ 

1 h x 0 

h y 1 0 

0 0 1 

⎝ 1 0 t x 

0 1 t y 

0 0 1 

⎞ 

⎛ 

⎠ · ⎝ 

⎞ 

⎠ · 

⎛ 

x 

y 

w 

⎝ x y 

w 

⎞ ⎛ 

⎠ = M sh 

⎝ 

x 

y 

w 

⎞ ⎛ 

⎠ = M tr 

⎝ x y 

w 

⎞ 

⎠ 

⎞ 

⎠ 

⎛ 

⎝ x′ 

y ′ 

w ′ 

⎞ 

⎠ = 

⎛ 

⎝ r 11 r 12 t x 

r 21 r 22 t y 

0 0 s 

⎞ 

⎠ · 

⎛ 

⎝ x y 

w 

⎞ 

⎠ 

Definition 22 shows an example of how to construct a Affine Transformation that rotates, translates 

and scales it in one step. The transformation is done in 2D using homogeneous coordinates ( see 

Chapter 9.9 ). The parameters r i specify the rotation, t i specify the translational element and s 

is a scaling factor ( which in this case scales equally in both directions x, and y ). 


• Es seien zwei Punktwolken“ entsprechend Abbildung ?? gegeben. Stelle zunächst die 

” 

geeignete Transformation der einen Punktgruppe auf die zweite Punktgruppe unter Verwendung 

des dazu einzusetzenden Formelapparates (ohne Verwendung der angebenen Koordinaten) 

dar, sodass die markierten drei Punkte im linken Bild (jene drei, welche als 

Kreisflächen markiert sind) nach der Transformation mit den drei Punkten im rechten Bild 

(die ebenfalls als Kreisflächen markiert sind) zur Deckung gebracht werden. 

• Stellen Sie bitte für die in der Frage ?? gesuchte Berechnung der unbekannten Transformationsparameter 

die Koeffizientenmatrix auf, wobei die Koordinaten aus Abbildung ?? nur 

ganzzahling verwendet werden.

9.6. A GENERAL 2-DIMENSIONAL TRANSFORMATION 169 

9.6 A General 2-Dimensional Transformation 

We begin the consideration of a more general 2-dimensional transformation by a look at the 

bilinear transformation ( see Definition 23 ), which takes the input coordinates (x, y) and converts 

them into an output coordinate pair (X, Y ) via a bilinear expression which has a term with a 

product (x, y) of the input x and input y coordinates. This transformation is called bilinear 

because if we freeze either the coordinate x or the coordinate y we obtain a linear expression for 

the transformation. Such a transformation has 8 parameters as we can see from Slide 9.36. Each 

input point (x, y) produces 2 equations as shown in that slide. We need four points to compute 

the transformation parameters a, b, c, e, f, g, and the translational parameters d and h. By means 

of a bilinear transformation we can match any group of four input points into any group of four 

output points and thereby achieve a perfect fit by means of that transformation. 

Definition 23 Bliniear transformation 

x ′ = a ∗ x + b ∗ y + c ∗ xy + d 

y ′ = e ∗ x + f ∗ y + g ∗ xy + h 

A more general transformation would be capable of taking a group of input points as shown in 

Slide 9.37, in this example with an arrangement of 16 points, into a desired output geometry 

as shown in Slide 9.38. We suggest that the randomly deformed arrangements of that slide be 

converted into a rigidly rectangular pattern: How can we achieve this? 

Obviously, we need to define a transformation with 16 × 2 = 32 parameters for all 32 coordinate 

values. Slide 9.39 illustrates the basic concept. We are setting up a polynomial transformation to 

take the input coordinate pair (x, y) and translate it into an coordinate pair (X, Y ) by means of 

two 16-parameter polynomials. These polynomial coefficients a 0 , a 1 , a 2 , . . . and b 0 , b 1 , b 2 , . . . may 

initially be unknown, but if we have 16 input points with their input locations (x, y) and we 

know their output locations (X, Y ), then we can set up an equation system to solve the unknown 

transformation parameters a 0 , a 1 , . . . , a 15 , and b 0 , b 1 , . . . , b 15 . 

Slide 9.40 illustrates the type of computation we have to perform. Suppose we had given in the 

input coordinate system 1 the input coordinates (x i , y i ) and we have n such points. We also have 

given in the output coordinate system 2 the output coordinates (X j , Y j ) and we have the same 

number of output points n. We can now set up the equation system that translates the input 

coordinates (x i , y i ) into output coordinates (X j , Y j ). What we ultimately obtain is an equation 

system: 

x = K · u 

In this equation, x are the known output coordinates, u is the vector of unknown transformation 

parameters, and this may be 4 in the conformal transformation, 6 in the affine, 8 in the bilinear or, 

as we discussed before, 32 for a polynomial transformation that must fit 16 points from an input 

to 16 output locations. What is in the matrix K? It is the coefficient matrix for the equations and 

is filled with the input coordinates as shown in the polynomial or other transformation equations. 

How large is the coefficient matrix K? Obviously for an affine transformation, the coefficient 

matrix K is filled with 6 by 6 elements, and in the polynomial case discussed here the coefficient 

matrix K has 36 by 36 elements. 

What happens if we have more points given in system 1 with their transformed coordinates in 

system 2 than we need to solve for the unknowns? Suppose we had ten input and ten output 

points to compute the unknown coefficients of a conformal transformations where we would only 

need 2 points producing 4 equations to allow us to solve for the 4 unknowns? We have an overdetermined 

equation system and our matrix K is rectangular. We can not invert a rectangular 

matrix. So what do we do?


There is a theory in statistics and estimation theory which is called Least Squares Method. Slide 

9.41 explains: we can solve an over-determined equation system which has a rectangular and not 

a square coefficient matrix by premultiplying the left and the right side of the equation system by 

a transposed of the coefficient matrix, K T . We obtain in this manner a square matrix K T · K on 

the right hand side and we call this a normal equation matrix. It is square in the shorter of the 

two dimensions of the rectangular matrix K and it can be inverted. So the unknown coefficient u 

results of an inverse of the product K T · K as shown in Slide 9.41. 

This is but a very simple glimpse at the matters of “Least Squares”. In reality, this is a concept 

that can fill many hundreds of pages of textbooks, but the basic idea is that we estimate the 

unknown parameters u using observations that are often erroneous, and to be robust against such 

errors, we provide more points (x i , y i ) in the input system and (X i , Y i ) in the output system than 

needed as a minimum. Because of these errors the equations will not be entirely consistent and 

we will have to compute transformation parameters that will provide a best approximation of the 

transformation. 

“Least squares” solutions have optimality properties if the errors in the coordinates are statistically 

normally distributed. 


• Im 2D Raum sei ein bilineare Transformation gesucht, und die unbekannten Transformationsparameter 

seien zu berechnen. Es seien dafür N Punkte mit ihren Koordinaten vor und 

nach der Transformation bekannt, wobei N > 4. Welcher Lösungsansatz kommt hier zur 

Anwendung? 

Antwort: Methode der kleinsten Quadrate: 

X = K · u 

K T · X = K T ·K · u 

u = ( K T · K ) −1 

· KT · X 

• In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen 

Transformation in zwei Dimensionen erläutert: 

1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare 

gegeben sind 

2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/Output- 

Punktpaare gegeben sind ( ” 

Least squares method“) 

Die Methode der kleinsten Quadrate kann jedoch auch dann angewandt werden, wenn genau 

vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche 

Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese 

Feststellung? 

Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat. 

Antwort: 

u = ( K T K ) −1 

K T X = K −1 ( ( 

K 

T ) −1 

K 

T ) X = K −1 X 

Diese Umformungen sind möglich, da K hier eine quadratische Matrix ist. Da das Gleichungssystem 

nicht überbestimmt ist, existiert eine exakte Lösung (Fehler ε = 0). Diese 

Lösung wird auch von der Methode der kleinsten Quadrate gefunden, indem der Fehler 

(ε ≥ 0) minimiert wird. 

• Beschreiben Sie eine bilineare Transformation anhand ihrer Definitionsgleichung!

9.7. IMAGE RECTIFICATION AND RESAMPLING 171 

9.7 Image Rectification and Resampling 

We change the geometry of an input image as illustrated in Slide 9.43, Slide 9.44, Slide 9.45 

showing a mesh or a grid superimposed over the input image. We similarly show a different shape 

mesh in the output image. The task is to match the input image onto the output geometry so 

that the meshes fit one another. We have to establish a geometric relationship between the image 

in the input and the output using a transformation equation from the input to the output. 

If we now do a geometric transformation of an image we have essentially two tasks to perform. 

First we need to describe the geometric transformation between the input image and the output 

image by assigning to every input image location the corresponding location in the output image. 

This is a geometric operation with coordinates. Second, we need to produce an output gray level 

for the resulting image based on the input gray levels. We call this second process a process of 

resampling as shown in Slide 9.46, and use operations on gray values. 

Again, what we do conceptually is to take an input image pixel at location (x, y) and to compute 

by a spatial transform the location in the output image at which this input pixel would fall and 

this location has the coordinates (x ′ , y ′ ) in accordance with Slide 9.46. However, that location 

may not perfectly coincide with the center of a pixel in the output image. Now we have a second 

problem and that is to compute the gray value at the center of the output pixel by looking at the 

area in which the input image corresponds to that output location. One method is to assign the 

gray value we find in the input image to the specific location in the output image. If we use this 

method, we have used a so-called nearest neighbor-method. 

An application of this matter of resampling and rectification of images is illustrated in Slide 9.47. 

We have a distorted input image which would show an otherwise perfectly regular grid with some 

distortions. In the output image that same grid is reconstructed with reasonably perfect vertical 

and horizontal grid lines. The transition is obtained by means of a geometric rectification and 

this rectification includes as an important element the function of resampling. Slide 9.113 is 

again the image of planet Mercury before the rectification and Slide 9.49 after the rectification 

performing a process as illustrated earlier. Let us hold right at this point and delay a further 

discussion of resampling to a separate later (Chapter 15). Resampling and image rectification was 

only mentioned at this point to establish the relationship of this task to the idea of 2-dimensional 

transformations from an input image to an output image. 


• Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein 

verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die erforderlich sind, um ein 

solches verzerrtes Bild zu rektifizieren! 

Antwort: 

1. geometrisches Resampling: Auffinden von korrespondierenden Positionen in beiden 

Bildern 

2. radiometrisches Resampling: Auffinden eines geeigneten Grauwertes im Ausgabebild 

9.8 Clipping 

As part of the process of transforming an object from a world coordinate system into a display 

coordinate system on a monitor or on a hardcopy output device we are faced with an interesting 

problem: We need to take objects represented by vectors and figure out which element of each 

vector is visible on the display device. This task is called clipping. An algorithm to achieve


clipping very efficiently is named after Cohen-Sutherland. Slide 9.51 illustrates the problem 

a number of objects is in world coordinates and a display window will only show part of those 

objects. On the monitor the objects will be clipped. 

Slide 9.52 algorithm. The task is to receive on the input side a vector defined by the end points 

p 1 and p 2 and computing auxiliary points C, D where this vector intersects the display window 

which is defined by a rectangle. 

9.8.1 Half Space Codes 

In order to solve the clipping problem Cohen and Sutherland have defined so-called half-space 

codes in Slide 9.54 and relate to the half spaces defined by the straight lines delineating the display 

window. These half-space codes designate spaces to the right, to the left, to the top and to the 

bottom of the boundaries of the display window, say with subscripts c r , c l , c t , and c b . For example 

if a point is to the right of the vertical boundary, the point’s half-space code is set to “true”, but if 

it is to the left the code per is set to “false”. Similar a location above the window gets a half-space 

code c t “true” and below gets “false”. 

We now need to define a procedure called “Encode” in Slide 9.55 which takes an input point p and 

produces for it the associated four half-space codes assigning the half-space codes to variable c, a 

Boolean variable. We obtain 2 values for each of the 2 coordinates p x and p y of point p, obtaining 

a value of true or false depending on where p x falls with respect to the vertical boundaries of the 

display window, and on where p y falls with respect to the horizontal boundaries. 

9.8.2 Trivial acceptance and rejection 

Slide 9.56 is a picture of the first part of the procedure clip, as it is presented in [FvDFH90, 

Section 3.12.3]. Procedure “Encode” is called up for the beginning and end points of a straight 

line, denoted as P 1 and P 2 and the resulting half-space codes are denoted as C 1 and C 2 . We 

now have to take a few decisions about the straight line depending on where P 1 and P 2 fall. We 

compute 2 auxiliary Boolean variables, |In| 1 and |In| 2 . We can easily show that the straight line 

is entirely within the display window if |In| 1 and |In| 2 are “true”. This is called trivial acceptance, 

shown in Slide 9.57 for points A, B. Trivial rejection is also shown in Slide 9.57 for a straight line 

connecting points C and D. 

9.8.3 Is the Line Vertical? 

We need to proceed in the “Clipping Algorithm” in Slide 9.58, if we do not have a trivial acceptance 

nor a trivial rejection. We differentiate among cases where at least one point is outside the display 

window. The first possibility is that the line is vertical. That is considered first. 

9.8.4 Computing the slope 

If the line is not vertical we compute its slope. This is illustrated in Slide 9.59. 

9.8.5 Computing the Intersection A in the Window Boundary 

With this slope we compute the intersection of the straight line with the relevant boundary lines of 

the display window at w l , w r , w t and w b . We work our way through a few decisions to make sure 

that we do find the intersections of our straight line with the boundaries of the display window.

9.9. HOMOGENEOUS COORDINATES 173 

9.8.6 The Result of the Cohen-Sutherland Algorithm 

The algorithm will produce a value starting that either the straight line is entirely outside of the 

window or it returns with the end points of the straight line. These are the end points from the 

input if the entire line segment is within the window, they are the intersection points of the input 

line with the window bountaries if the line intersects them. 


• Welche ” 

Halbraumcodes“ werden im Clipping verwendet, und welche Rolle spielen sie? 

• Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherland 

anhand des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes 

sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks 

R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt 

aus Abbildung B.18 ablesen. 

• Wenden Sie den Clipping-Algorithmus von Cohen-Sutherland (in zwei Dimensionen) 

auf die in Beispiel B.2 gefundenen Punkte p ′ 1 und p ′ 2 an, um den innerhalb des Quadrats 

Q = {(0, 0) T , (0, 1) T , (1, 1) T , (1, 0) T } liegenden Teil der Verbindungsstrecke zwischen p ′ 1 und 

p ′ 2 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen 

grafisch lösen. 

Antwort: 

c l c r c t c b 

p ′ 1 true false false false 

p ′ 2 false true false true 

9.9 Homogeneous Coordinates 

A lot of use of homogenous coordinates is made in the world of computer graphics. The attraction 

of homogenous coordinates is that in a 2- or 3- dimensional transformation of an input x coordinate 

system or object described by x into an output coordinate system x ′ or changed object we do not 

have to split our operation into a part with a multiplication for the rotation matrix and scale 

factor, and separately have an addition for the translation vector t. Instead we simply employ 

only a matrix multiplication having a simple homogeneous coordinate X for a point and output 

coordinates X ′ for the same point after the transformation. 

Slide 9.62 explains the basic idea of homogenous coordinates. Instead of working in 2 dimensions 

in a 2-dimensional Cartesian coordinate system (x, y) we augment the coordinate system by a 

third coordinate w, and any point in 2D-space with locations (x, y) receives a third coordinate 

and therefore is at location (x, y, w). If we define w 1 = 1 we have defined a horizontal plane 

for the location of a point. Again Slide 9.63 states that Cartesian coordinates in 2 dimensions 

represent a point p as (x, y) and homogeneous coordinates in 2 dimensions have that same point 

represented by the three element vector (x, y, 1). Let us try to explain how we use homogeneous 

coordinates staying with 2 dimensions only. In Slide 9.64 we have another view of a translation 

in Cartesian coordinates. Slide 9.65 describes scaling, in this particular case an affine scaling 

occurs with separate scale factors in the two different coordinate directions (Slide 9.66 illustrates 

a rotation). Slide 9.67 illustrates the translation by means of a translation vector and scaling 

by means of a scaling matrix. Slide 9.68 introduces the relationship between a Cartesian and 

a homogenous coordinate system. Slide 9.69 uses homogeneous coordinates for a translation by 

means of a multiplication of the input coordinate into an output coordinate system. The same 

operation is used for scaling in Slide 9.70 and for rotation in Slide ??, Slide ?? summarizes. 

As Slide 9.73 reiterates that translation and scaling are described by matrix multiplication and 

of course rotation and scaling have previously also been matrix multiplications in the Cartesian


coordinate system. If we now combine these three transformations of translation, scaling and 

rotation we obtain a single transformation matrix M which describes all three transformations 

without separation into multiplication and additions as is necessary in the Cartesian case. 

The simplicity of doing everything in matrix form is the appeal that leads computer graphics 

software to heavily rely on homogeneous coordinates. In image analysis homogeneous coordinates 

are not as prevalent. One may assume that because we often times have in image processing 

to estimate transformation parameters using for this over-determined equation systems and the 

method of least squares. That approach typically is better applicable with Cartesian geometry 

than with the homogeneous system. 


• Erklären Sie die Bedeutung von homogenen Koordinaten für die Computergrafik! Welche 

Eigenschaften weisen homogene Koordinaten auf? 

• Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden 

an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß 

q = Mp zu transformieren (sog. rigid body transformation“)! 

” 

Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“ 

” 

enthalten. Wären sie hingegen explizit formuliert, wäre die Antwort eigentlich Material der 

Gruppe I“. 

” 

• Gegeben seien die Transformationsmatrix 

und zwei Punkte 

M = 

⎛ 

p 1 = ⎝ 

⎛ 

⎜ 

⎝ 

3 

−1 

1 

0 2 0 0 

0 0 2 0 

1 0 0 −5 

−2 0 0 8 

⎞ 

⎛ 

⎠ , p 2 = ⎝ 

in Objektkoordinaten. Führen Sie die beiden Punkte p 1 und p 2 mit Hilfe der Matrix M in 

die Punkte p ′ 1 bzw. p ′ 2 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei 

die Umwandlungen zwischen dreidimensionalen und homogenen Koordinaten)! 

Antwort: 

⎛ 

⎜ 

⎝ 

⎛ 

⎜ 

⎝ 

0 2 0 0 

0 0 2 0 

1 0 0 −5 

−2 0 0 8 

0 2 0 0 

0 0 2 0 

1 0 0 −5 

−2 0 0 8 

⎞ 

⎟ 

⎠ · 

⎞ 

⎟ 

⎠ · 

⎛ 

⎜ 

⎝ 

⎛ 

⎜ 

⎝ 

3 

−1 

1 

1 

2 

4 

−1 

1 

⎞ ⎛ 

⎟ 

⎠ = ⎜ 

⎝ 

⎞ ⎛ 

⎟ 

⎠ = ⎜ 

⎝ 

−2 

2 

−2 

2 

8 

−2 

−3 

4 

⎞ 

⎟ 

⎠ 

2 

4 

−1 

⎞ 

⎞ 

⎞ 

⎠ 

⎟ 

⎠ ⇒ p′ 1 = 

⎟ 

⎠ ⇒ p′ 2 = 

⎛ 

⎛ 

⎝ 

−1 

1 

−1 

⎝ 2 

−0.5 

−0.75 

⎞ 

⎠ 

⎞ 

⎠ 

9.10 A Three-Dimensional Conformal Transformation 

In three dimensions things become considerably more complex and more difficult to describe. Slide 

9.75 shows that a 3-dimensional conformal transformation rotates objects or coordinate axes, scales

9.10. A THREE-DIMENSIONAL CONFORMAL TRANSFORMATION 175 

Definition 24 Rotation in 3D 

The three-dimensional rotation transforms an input point P with coordinates (x,y,z) into an output 

coordinate system (X,Y,Z) by means of a rotation matrix R. 

The elements of this rotation matrix can be interpreted following: 

- as the cosines of the angles subtended by the coordinate axes xX,yX,zX,...zZ 

- as the assembly of the three unit vectors directed along the axes of the rotated coordinate systems 

but described in terms of the input. 

R = 

or 

R = 

⎛ 

⎝ 

⎛ 

cos(xX) cos(yX) cos(zX) 

cos(xY ) cos(yY ) cos(zY ) 

cos(xZ) cos(yZ) cos(zZ) 

⎝ r ⎞ 

11 r 12 r 13 

r 21 r 22 r 23 

⎠ 

r 31 r 32 r 33 

⎞ 

⎠ 

P ′ = R · P 

A 3D rotation can be considered as a composition of three individual 2-D rotations around the 

coordinate axes x,y,z. It is easy to see that rotating around one axis will also affect the two other 

axis. Therefore the sequence of the rotations is very important. Changing the sequence of the 

rotations may result in a different output image. 

them and translates, just as we had in 2 dimensions. However, the rotation matrix now needs to 

cope with three coordinate axes. 

In analogy to the 2-dimensional case we now know that the rotation matrix takes an input point 

P with coordinates (x, y, z) into an output coordinate system (X, Y, Z) by means of a rotation 

matrix R. The elements of this rotation matrix are again first: the cosines of the angles subtended 

by the coordinate axes xX, yX, zX, . . . , zZ; second is the assembly of three unit vectors directed 

along the axes of the rotated coordinate system but described in terms of the input coordinate 

systems (Slide 9.76 the multiplication of three 2-D rotation a matrices as shown in Slide 9.77. The 

composition of the rotation matrix by three individual 2-D rotations around the three coordinate 

axes x, y and z is the most commonly used approach. Each rotation around an axis needs to 

consider that that particular axis may already have been rotated by a previous rotation. Note as 

we rotate around a particular axis first, that will move the other two coordinate axes. We then 

rotate around the rotated second axis, affecting the third one again and then we rotate around the 

third axis. The sequence of rotations is of importance and will change the ultimate outcome if we 

change the sequence. Slide 9.79 illustrates how we might define a three-dimensional rotation and 

translation by means of three points P 1 , P 2 , P 3 which represent two straight line sequency P 1 P 2 

and P 1 P 3 . We begin by translating P 1 into the origin of the coordinate system. We proceed by 

rotating P 2 into the z axis and complete the rotation by rotating P 3 into the yz plane. We thereby 

obtain the final position. If we track this operation we see that we have applied several rotations. 

We have first rotated P 1 P 2 into the xz plane. Then we have rotated the result around the y-axis 

into the z-axis. Finally we have rotated P 1 P 3 around the z-axis into the yz plane. Slide 9.80 

and Slide 9.81 explain in detail the sequence of three rotations of three angles which are denoted 

in this case first as angle Θ, second angle φ, and third angle α. Generally, a three dimensional 

conformal transformation will be described by a scaling l, a rotation matrix R and a translation 

vector t. Note that l is a scalar value, the rotation matrix is a 3 by 3 matrix containing three 

angles and translation vector t has three elements with translations along the directions x, y, and 

z. This type of transformation contains seven parameters for the three dimensions as opposed to 

four parameters in the 2D case. Note that the rotation matrix has 3 angles, the scale factor is a 

fourth value and the translation vector has three values, resulting in a total of seven parameters


to define this transformation. 


• Was versteht man unter einer ” 

konformen Transformation“? 

9.11 Three-Dimensional Affine Transformations 

Definition 25 Affine transformation with 3D homogeneous coordinates 

case ’translation’: 

case ’rotation x’: 

case ’rotation y’: 

case ’rotation z’: 

case ’scale’: 

tr matrix = 

rotation x = 

rotation y = 

rotation z = 

⎛ 

⎜ 

⎝ 

⎛ 

⎜ 

⎝ 

⎛ 

⎜ 

⎝ 

scale matrix = 

⎛ 

⎜ 

⎝ 

1 0 0 t x 

0 1 0 t y 

0 0 1 t z 

0 0 0 1 

⎞ 

⎟ 

⎠ 

1 0 0 0 

0 cos φ − sin φ 0 

0 sin φ cos φ 0 

0 0 0 1 

cos φ 0 sin φ 0 

0 1 0 0 

− sin φ 0 cos φ 0 

0 0 0 1 

cos φ − sin φ 0 0 

sin φ cos φ 1 0 

0 0 0 0 

0 0 0 1 

⎛ 

⎜ 

⎝ 

s x 0 0 0 

0 s y 1 0 

0 0 s z 0 

0 0 0 1 

By using homogeneous coordinates, all transformations are 4x4 matrices. So the transformations 

can be easily combined by multiplying the matrices. This results in a speedup because every point 

is only multiplied with one matrix and not with all transformation-matrices. 

The three-dimensional transformation may change the object shape. A simple change results from 

shearing or squishing, and produces an affine transformation. Generally, the affine transformation 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠ 

⎞ 

⎟ 

⎠

9.12. PROJECTIONS 177 

does not have a single scale factor, but we may have up to three different scale factors along the 

x, y, and z axes as illustrated in Slide 9.83. An other interpretation of this effect is to state 

that a coordinate X is obtained from the input coordinates (x, y, z) by means of these shearing 

elements h yx and h zy which are really part of the scaling matrix M sc . Ultimately, a cube will be 

deformed into a fairly irregular shape as shown in Slide 9.84 with the example of a building shape. 

A three-dimensional affine transformations now has 12 parameters, so that transformations of the 

x, y, and z coordinates are independent of one another. Yet, however, the transformation will 

maintain straight lines as straight lines. However, right angles will not remain right angles. 

9.12 Projections 

From a higher dimensional space, projections produce images in a lower dimensional space. We 

have projection lines in projectors that connect input to output points, we have projection centers 

and we have a projection surface onto which the high-dimensional space is projected. In the real 

world we basically project 3-dimensional spaces onto 2 dimensional projection planes. The most 

common projections are the perspective projections as used by the human eye and by optical 

cameras. 

We differentiate among a multitude of projections. The perspective projections model what happens 

in a camera or the human eye. However, engineers have long used parallel projections. These 

are historically used also in the arts and in cartography and have projection rays (also called 

projectors or projection lines) that are parallel. If they are perpendicular onto the projection 

plane we talk about an orthographic projection. If they are not perpendicular but oblique to the 

projection plane, we talk about an oblique projection (see Slide 9.86). 

A special case of the orthographic projection results in the commonly used presentations of three 

dimension space in a top view, front view and side view (Slide 9.87) case. Heavy use in architecture 

and civil engineering of top views, front views and side views of a 3-D space is easy to justify: 

from these 3 views we can reconstruct the 3 dimensions of that space. Another special is the 

axonometric projection where the projection plane is not in one of the three coordinate planes of 

a three-dimensional space.. Yet another special case in the isometric projection which occurs if 

the projection plane is chosen such that all three coordinate axes are changed equally much in the 

projection (the projection rays are directed along the vector with elements (1,1,1). We highlight 

particular oblique projections which are the cavalier and the cabinet projection. The cavalier 

projection produces no scale reduction along the coordinate axes because it projects perfectly 

under 45 ◦ . In the cabinet projection we project under an angle α = 63.4 ◦ since from tan α = 2, 

this projection shrinks an object in one direction by factor of 1/2. 

9.13 Vanishing Points in Perspective Projections 

In order to construct a perspective projection we can take advantage of parallel lines. In the 

natural world they meet of course at infinity. In the projection they meet at a so-called vanishing 

point 1 . This is a concept of descriptive geometry, a branch of mathematics. Slide 9.91 is the 

example of a perspective projection as produced by a synthetic camera. Note how parallel lines 

converge at a point which typically is outside the display area. The vanishing point is the image 

of the object point at infinity. 

Because there exists an infinity of directions for bundles of parallel lines in 3D space, there exists 

an infinity of vanishing points. However, special vanishing points are associated with bundles 

of lines that are parallel with the coordinate axes. Such vanishing points are called principal. 

1 in German: Fluchtpunkt


If we may have only one axis producing a finite vanishing point since the other two axes are 

themselves parallel to the projection plane and their vanishing points are at infinity. Therefore 

such a perspective projection is called a one-point perspective in which a cube aliguid with the 

coordinate axes will only have one vanishing point. Analogously, Slide 9.93 and Slide 9.94 present 

a 2-point and a general 3-point perspective. 

9.14 A Classification of Projections 

Slide 9.96 presents the customary hierarchy of projections as they are commonly presented in 

books about architecture, art and engineering. In all cases, these projections are onto a plane 

and are thus planar projections. The differenciation between perspective and parallel projections 

is somewhat artificial if one considers that with a perspective center at infinity, one obtains the 

parallel projection. However, the projections are grouped into parallel and perspective projections, 

the perspective axes are then subdivided into single point, two point and three point perspective 

projections and the parallel projections are classified into orthographic and oblique ones, the 

oblique have the cavalier and cabinet projection as special cases. The orthographic projections 

have the axonometry on one hand and the multi-view orthographic on the other hand and within 

the axonometric projection we have one special case we discussed, the isometric projection. 

We do not discuss the world of more complex projections, for example to convert the surface of a 

sphere into a plane: this is the classical problem of cartography with its need to present a picture 

of the Earth on a flat sheet of paper. 


• In der Vorlesung wurde ein ” 

Baum“ für die Hierarchie diverser Projektionen in die Ebene 

dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden 

Projektionen. 

9.15 The Central Projection 

This is the most important projection of all the ones we discuss in this class. The simple reason 

for this is that it is the geometric model of a classical camera. Slide 9.98 explains the geometry of 

a camera and defines three coordinate systems. The first is the world coordinate system with X, 

Y and Z. In this world coordinate system we have a projection center O at location (X 0 , Y 0 , Z 0 ). 

The projection center is the geometric model of a lens. All projection lines are straight lines 

going from the object space, where there is an object point P at the location (x, y, z) through the 

projection center O and intersecting the image plane. 

We know at this point that the central projection is similar to the perspective projection. There 

is a small difference, though. We define the projection center with respect to an image plane and 

insist on some additional parameters that describe the central projection that we do not typically 

use in the perspective projection. 

Note that we have a second coordinate system that is in the image plane that is denoted in Slide 

9.98 by ξ and η. This is a rectangular 2-dimensional Cartesian coordinate system with an origin 

at point M. The point P in object space is projected onto the image location P ′ = (x, y). Third, 

we have the location of point O of the perspective center defined in a sensor coordinate system. 

The sensor coordinate system has its origin at the perspective center O, is a three-dimensional 

coordinate system, its x and y axes are nominally parallel to the image coordinate system (ξ, η)and 

the z-axis is perpendicular to the image plane.

9.15. THE CENTRAL PROJECTION 179 

We do have an additional point H defined in the central projection which is the intersection of 

the line perpendicular to the image plane and passing through the projection center. Note that 

this does not necessarily have to be identical to point M. M simply is the origin of the image 

coordinate system and is typically the point of symmetry with respect to some fiducial marks as 

shown in Slide 9.98. 

In order to describe a central projection we need to know the image coordinate system with 

its origin M, we need to know the sensor coordinate system with its origin O and we need to 

understand the relationship between the sensor coordinate system and the world coordinate system 

(X, Y, Z). 

Let us take another look at the same situation in Slide 9.99 where the coordinate systems are again 

illustrated. We have the projection center O as in the previous slide and we have two projection 

rays going from object point P 1 to image point P 1 ′ or object point P 2 to image point P 2. ′ We also 

do have an optical axis or the direction of the camera axis which passes through point O and is 

perpendicular to the image plane. In Slide 9.99 there are two image planes suggested. One is 

between the perspective center and the object area and that suggests the creation of a positive 

image. In a camera, however, the projection center is typically between the object and the image 

plane and that leads geometrically to a negative image. Slide 9.99 also defines again the idea of 

an image coordinate system. In this case it is suggested that the image is rectangular and the 

definition of the image coordinates is by some artificial marks that are placed in the image plane. 

The marks are connected and define the origin M. We have also again the point H which is 

the intersection of the line perpendicular to the image plane but passing through the projection 

center. He also have some arbitrary location for a point P ′ that is projected into the image. We 

will from here on out ignore that M and H may be 2 locations. Typically the distance between 

M and H is small, and it is considered an error of a camera if M and H don’t coincide. 

Normal cameras that we use as amateurs do not have those fiducial marks and therefore they 

are called non-metric cameras because they do not define an image coordinate system. Users of 

non-metric cameras who want to measure need to help themselves by some auxiliary definition of 

an image coordinate system and they must make sure that the image coordinate system is the 

same from picture to picture if multiple pictures show the same object. Professional cameras that 

are used for making measurements and reconstructing 3D objects typically will have those fiducial 

marks as fixed features of a camera. In digital cameras the rows and columns of a CCD array will 

provide an inherent coordinate system because of the numbering of the pixels. 

Slide 9.100 is revisiting the issue of a 3-dimensional rotation. We have mentioned before that there 

are three coordinate systems in the camera: Two of those are 3-dimensional and the third one is 

2-dimensional. The sensor coordinate system with its origin at projection center O and the world 

coordinate system (X, Y, Z) need to be related via a 3-dimensional transformation. Slide 9.100 

suggests we have 3 angles that define the relationship between the 2 coordinate systems. Each of 

those angles represents a 2-dimensional rotation around 1 of the 3 world coordinate axes. Those 

are angles in 3D space. 

Recall that we have several definitons of rotations matrixes and that we can define a rotation 

matrix by various geometric entities. These can be rotations around axes that rotate themselves, 

or they can be angles in 3-D space subtended by the original axes and the rotated axis. Slide 

9.100 describes the first case. Θ is the angle of rotation around the axis Z, but in the process we 

will rotate axes X and Y . φ is rotating around the axis X and will take with it obviously the 

axes Z and Y . And A then is a rotation around the rotated axis Z. Conceptually, everything 

we said earlier about 3-dimensional transformations, rotations and so forth applies here as well. 

Our earlier discussions of a 3D conformal transformation applies to the central projection and the 

central projection really is mathematically modeled by the 3-dimensional conformal which elevates 

that particular projection to a particularly important role.


9.16 The Synthetic Camera 

We have various places in our class in which we suggest the use of a synthetic camera. We 

have applications in computer graphics in order to create a picture on a display medium, on a 

monitor, for augmented or virtual reality. We have it in image processing and photogrammetry 

to reconstruct the world from images and we have terminology that has developed separately as 

follows. What is called a projection plane or image plane is in computer graphics called a View 

Plane. What in image processing is a projection center is in computer graphics a View Reference 

Point VRP. And what in image processing is the optical axis or the camera axis is in computer 

graphics the View Plane Normal VPN. Slide 9.102 and Slide 9.103 explain this further. We do 

have again a lens center and an image plane and an optical axes Z that is perpendicular to the 

image plane which itself is defined by coordinate axis X and Y . An arbitrary point in object 

space (X, Y, Z) is projected through the lens center on to an image plane. Note that in a synthetic 

camera we do not worry much about fine points such as an image coordinate system defined by 

fiducial marks or the difference between the points M and H (M being the origin of the image 

coordinate system and H being the intersection point of the line normal to the image plane and 

passing through the lens center). 

In robotics we typically use cameras that might use rotations around very particular axes. Slide 

9.103 defines the world coordinate system with (X, Y, Z) and defines a point or axis of rotation 

in the world coordinate system at the end of vector w 0 at location (X 0 , Y 0 , Z 0 ). That location of 

an axis of rotation then defines the angle under which the camera itself is looking at the world. 

The camera has coordinate axes (x, y, z) and an optical axis in the direction of coordinate axis z. 

The image coordinates are 2-dimensional with an origin at the center of the image and that point 

itself is defined by an auxiliary vector r with respect to the point of rotation. So we see that we 

have various definitions of angles and coordinate systems and we always need to understand these 

coordinate systems and convert them into one another. 

Slide 9.104 explains this further: We do have a camera looking at the world, again we have an image 

coordinate system (x, y), and a sensor system (x, y, z) that are defined in the world coordinate 

system (X, Y, Z). As we want to define where a camera is in the world coordinate system and in 

which direction its optical axis is pointing we have to build up a transformation just as we did 

previously with the 3-dimensional conformal transformation. 

Let us assume that we start out with a perfect alignment of our camera in the world coordinate 

system so that the sensor coordinate axes x, y, z and the world coordinate axis X, Y , Z are 

coinciding. We now move the camera into an arbitrary position which represents the translation 

in 3-D space defined, if you recall, by the translational vector t. Then we orient the camera by 

rotating it essentially around 3 axes into an arbitrary position. First rotation may be as suggested 

in Slide 9.105 around the z axis which represents the angle A in Slide ??. In this slide it is 

suggested that the angle is 135 o . Next we roll the camera around the x axis, also again by an 

angle of 135 o and instead of having the camera looking up into the sky we now have it look down at 

the object. Obviously we can apply a third rotation around the rotated axis y to give our camera 

attitude complete freedom. We now have a rotation matrix that will be defined by those angles of 

rotation that we just described, we have a translation vector as described earlier. Implied in all of 

this is also a scale factor. We have not discussed yet the perspective center and the image plane. 

Obviously, as the distance grows, we go from a wide-angle through a normal-angle to a tele-lens 

and that will affect the scale. So the scale of the image is affected by the distance of the camera 

from the object and also by the distance of the projection center from the image plane. 

Note that we need 7 elements to describe the transformation that we have seen in Slide 9.105. We 

need 3 elements of translation, we have 3 angles of rotation and we have one scale factor that is 

defined by the distance of the projection center from the image plane. That are the exact same 7 

transformation parameters that we had earlier in the 3-dimensional conformal transformation. 

Prüfungsfragen:

9.17. STEREOPSIS 181 

• Gegeben seien eine 4 × 4-Matrix 

sowie vier Punkte 

M = 

⎛ 

⎜ 

⎝ 

8 0 8 −24 

0 8 8 8 

0 0 0 24 

0 0 1 1 

p 1 = (3, 0, 1) T 

p 2 = (2, 0, 7) T 

p 3 = (4, 0, 5) T 

p 4 = (1, 0, 3) T 

im dreidimensionalen Raum. Die Matrix M fasst alle Transformationen zusammen, die zur 

Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p ′ = M · p 

in Gerätekoordinaten erforderlich sind (siehe auch Abbildung B.36, die Bildschirmebene und 

daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Transformationsmatrix 

M werden die Punkte p 1 und p 2 auf die Punkte 

p ′ 1 = (4, 8, 12) T 

p ′ 2 = (6, 8, 3) T 

in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p ′ 3 und p ′ 4! 

⎞ 

⎟ 

⎠ 

Antwort: 

es gilt 

˜p ′ 1 = (8, 16, 24, 2) T ⇒ p ′ 1 = (4, 8, 12) T 

˜p ′ 2 = (48, 64, 24, 8) T ⇒ p ′ 2 = (6, 8, 3) T 

˜p ′ 3 = (48, 48, 24, 6) T ⇒ p ′ 3 = (8, 8, 4) T 

˜p ′ 4 = (8, 32, 24, 4) T ⇒ p ′ 4 = (2, 8, 6) T 

9.17 Stereopsis 

This is a good time to introduce the idea of stereopsis although we will have a separate chapter 

later in this class. The synthetic camera produces an image that we can look at with one eye 

and if we produce a second image and show it to the other eye we will be able to “trick” the eye 

into a 3-dimensional perception of the object that was imaged. Slide 9.107. We model binocular 

vision by two images: we compute or present to the eyes two existing natural images of the object, 

separately one image to one eye and the other image to the other eye. Those images can be taken 

by one camera placed in two locations or there can be synthetic images computed with a synthetic 

camera. Slide 9.108 explains further that our left eye is seeing point P left , the right eye is seeing 

point P right , and in the brain those 2 observations are merged in a 3-dimensional location P . 

Slide 9.109 illustrates that a few rules need to be considered when creating images for stereoscopic 

viewing. Image planes and the optical axis for the 2 images should be parallel. Therefore, one 

should not create two images with converging optical axes. This would be inconsistent with natural 

human viewing. Only people who squint 2 will have converging optical axes. Normal stereoscopic 

viewing would create a headache if the images were taken with converging optical axes. 

We call the distance between the two lens centers for the two stereoscopic images the stereobase 

B. Slide 9.110 shows the same situation in a top view. We have the distance from the lens center 

2 in German: schielen


to the image plane which is typically noted as the camera constant or focal length and an object 

point W which is projected into image locations (X 1 , Y 1 ) and (X 2 , Y 2 ), and we have the two optical 

axes Z parallel to one another and perpendicular to XY . 

Note that we call the ratio of B/Distance-to-W also the Base/Heigth ratio, this being a measure 

of quality for the stereo-view. If we compute a synthetic image from a 3-dimensional object for 

the left and the right eye we might get a result as shown in Slide 9.111 which indeed can be viewed 

stereoscopically under a stereoscope. 

To make matters a little more complicated yet, it turns out that a human can view stereoscopically 

two images that do not necessarily have to be made by a camera under a central perspective 

projection. As long as the two images are similar enough in radiometry and if the geometric 

differences are not excessive, the human will be able to merge the two images into a 3-dimensional 

impression. This factor has been used in the past to represent measurements in 3 dimensions, 

for example, temperature. We could encode temperature as a geometric difference in 2 otherwise 

identical images and we would see a 2-dimensional scene and temperature would be shown as 

height. This and similar applications have in the past been implemented by various researchers. 

9.18 Interpolation versus Transformation 

One may want to transfer an object such as a distorted photo in Slide 9.113 into an output 

geometry. This can be accomplished by a simplified transformation based for example on 4 points. 

This will reveal errors (distortions) in other known points (see Slide 9.114 and Slide 9.115). These 

errors can be used to interpolate a continuous error function d x (x, y), d y (x, y) which must be 

applied to each (x, y) location: 

x ′ = x + d x (x, y) 

y ′ = y + d y (x, y) 

We have replaced a complicated transformation by a much simpler transformation plus an interpolation. 

Question: What is the definition of interpolation? 

9.19 Transforming a Representation 

9.19.1 Presenting a Curve by Samples and an Interpolation Scheme 

We may want to represent an object in various ways. We may have a continuous representation 

of an object or we might sample that object and represent the intervals between samples by some 

kind of interpolation and approximation technique. So we have conceptually something similar 

to a transformation because we have two different ways of representing an object. Slide 9.117 

introduces the basic idea that is described by a set of points p 1 , p 2 , . . . , p n . If we are in a 2- 

dimensional space we may want to represent that object not by n points but by a mathematical 

curve. In 3-dimensional space it may be a surface to represent a set of points: We transform from 

one representation into another. 

A second item is that an object may not be given by points, but by a set of curves x = f x (t), 

y = f y (t), and z = f z (t). We would like to replace this representation by another mathematical 

representation which may be more useful for certain tasks. 

Again while we are going to look at this basically in 2 dimensions or for curves, a generalization 

into 3 dimensions and to surfaces always applies.

9.19. TRANSFORMING A REPRESENTATION 183 

9.19.2 Parametric Representations of Curves 

We introduce the parametric representation of a curve. We suggest in Slide 9.120 that the 2- 

dimensional curve Q in an (x, y) Cartesian coordinate system can be represented by two curves 

Q = x(t), y(t). We note this as a parametric representation. The parameter t typically can be 

the length of the curve and as we proceed along a curve, the coordinate x and the coordinate y 

will change as the function of the curve length t. More typically, t may be “time” for a point to 

move along the curve. The advantage of a parametric representation is described in Slide 9.120. 

The tangent is replaced by a tangent vector Q ′ (t) = ( dx(t) 

dt 

length. 

, dy(t) 

dt 

). That vector has a direction and 

9.19.3 Introducing Piecewise Curves 

We may also not use a representation of the function x(t) or y(t) with a high order polynomial 

but instead we might break up the curve into individual parts, each part being a polynomial of 

third order (a cubic polynomial). We connect those polynomials at joints by forcing continuity at 

the joints. 

If a curve is represented in 3D-space by the equations x(t), y(t), and z(t) as shown in Slide 9.121, 

we can request that at the joints those polynomial pieces be continuous in the function but are 

also continuous in the first derivative or tangent. We may even want to make it continuous in 

the curvature or second derivative (the length of the tangent). However, this type of geometric 

continuity is narrower than the continity in “speed” as acceleration, if t is interpreted as time. 

One represents such a curve by a function Q(t) which is really a vector function (x(t), y(t), z(t)). 

9.19.4 Rearranging Entities of the Vector Function Q 

In accordance with the equation of Slide 9.121, Q(t) can be represented as a multiplication of a 

(row) vector T and a coefficient matrix C where T contains the independent parameter t as the 

coefficient of the unknowns a x , b x , and c x . Matrix C can now be decomposed into M · G. As a 

result we can write that Q(t) = T · M · G and we call now G a geometry vector and M is called 

a basis matrix. We can introduce a new entity, a function B = T · M and those are the cubic 

polynomials or the so-called blending functions. 


• Was sind der ” 

Geometrievektor“, die ” 

Basisfunktion“ und die ” 

Blending Funktionen“ einer 

parametrischen Kurvendarstellung? 

Antwort: 

Man zerlegt C in C = M · G, sodass 

x(t) = a x t 3 + b x t 2 + c x t + d x 

y(t) = a y t 3 + b y t 2 + c y t + d y 

z(t) = a z t 3 + b z t 2 + c z t + d z 

Q(t) = (x(t), y(y), z(t)) T = T · C 

T = (t 3 , t 2 , t, 1) 

Q(t) = T · C = T · M · G 

mit G als Geometrievektor und M als Basismatrix. Weiters sind 

B = T · M


kubische Polynome, die Blending Functions. 

9.19.5 Showing Examples: Three methods of Defining Curves 

Slide 9.122 introduces three definitions of curves that are frequently used in engineering. Let’s 

take a look at an example in Slide 9.123. In that slide we have a continuous curve represented 

by 2 segments S and C. They are connected at a joint. Depending on the tangent vector at the 

joint we may have different curves. Illustrated in Slide 9.123 are 3 examples C 0 , C 1 , and C 2 . C 0 

is obtained if we simply enforce at the joint that the function be continuous, but we don’t worry 

about the tangent vectors to be have the same direction. C 1 results if we say that the function 

has to have the same derivative. C 2 further defines that also the length must be identical at the 

joint. So we have 3 different types of continuity at the joint: function, velocity, acceleration. This 

type of continuity is narrower than mere geometric continuity with function, slope and curneture. 

In computer graphics one describes the type of continuity by the direction and the length of the 

tangent vector. Slide 9.124 again illustrates how a point P 2 is the joint between curve segments 

Q 1 and Q 2 , two curves passing through P 1 , P 2 , P i and P 3 . Defining two different lengths for the 

tangent (representing velocity) leads to two different curve segments Q 2 , Q 3 . 

Slide 9.125 describes a curve with two segments joining at point P . We indicate equal time 

intervals, showing a “velocity” that reduces as we approach point P . At point P we change 

direction and accelerate. In this case of course, the function is continuous but as shown in that 

example, the tangent is not continuous. We have a discontinuity in the first derivative at point P . 

9.19.6 Hermite’s Approach 

There is a concept in the representation of curves by means of cubic parametric equations called the 

Hermite’s Curves. We start out with the beginning and end point of a curve and the beginning 

and end tangent vector of that curve, and with those elements we can define a geometry vector G 

as discussed earlier in Slide 9.121. Slide 9.127 explains several cases where we have a beginning 

and end point of a curve defined, associated with a tangent vector and as a result we can now 

describe a curve. Two points and two tangent vectors define four elements of a curve. In 2D space 

this is a third order or cubic curve with coefficients a, b, c and d. Slide 9.128’s curves are basically 

defined by 2 points and 2 tangent vectors. Since the end point of one curve is identical to the 

beginning point of the next, we obtain a continuous curve. The tangent vectors are parallel but 

point into opposite directions. Geometrically we are continuous in the shape, but the vertices are 

opposing one another. This lends itself to describing curves by placing points interactively on a 

monitor with a tangent vector. This is being done in constructing complex shapes, say in the car 

industry where car bodies need to be designed. A particular approach to accomplishing this has 

been proposed by Bezier. 

9.20 Bezier’s Approach 

Pierre Bezier worked for a French car manufacturer and invented an approach of designing 3- 

dimensional shapes, but we will discuss this in 2 dimensions only. He wanted to represent a smooth 

curve by means of 2 auxiliary points which are not on the curve. Note that so far we have had 

curves go through our points, and Bezier wanted a different approach. So he defined 2 auxiliary 

points for a curve and the directions of the tangent vectors. Slide 9.130 defines the beginning and 

end points, P 1 and P 4 and the tangent at P 1 using an auxiliary point P 2 and the tangent at P 4 by 

using an auxiliary point P 3 . By moving P 2 and P 3 one can obtain various shapes as one pleases, 

passing through P 1 and P 4 .

9.21. SUBDIVIDING CURVES AND USING SPLINE FUNCTIONS 185 

Definition 26 Bezier-curves in 2D 

Sind definierte Punkte P 0 bis P n gegeben, die durch eine Kurve angenähert werden sollen, dann 

ist die dazugehörige Bézierkurve: 

P (t) = 

n∑ 

Bi n (t)P i 0 ≤ t ≤ 1 1.0 

i=0 

Die Basisfunktionen, Bernsteinpolynaome genannt, ergeben sich aus: 

B n i (t) = 

( n 

i 

) 

t i (1 − t) n−i 

( n 

) 

n! 

mit = 

i i!(n − i)! 

2.0 

Sie können auch rekursiv berechnet werden. Bézierkurven haben die Eigenschaften, dass sie: 

• Polynome (in t) vom Grad n sind, wenn n+1 Punkte gegeben sind, 

• innerhalb der konvexen Hülle der definierenden Punkte liegen, 

• im ersten Punkt P 0 beginnen und im letzten Punkt P n enden und 

• alle Punkte P 0 bis P n Einfluss auf den Verlauf der Kurve haben. 

Slide 9.131 illustrates the mathematics behind it. Obviously, we have a tangent at P 1 denoted as 

R 1 , which is according to Bezier 3 · (P 2 − P 1 ). The analogous applies to tangent R 4 . If we define 

tangents in that way, we then obtain a third order parametric curve Q(t) as shown in Slide 9.131. 

Slide 9.132 recalls what we have discussed before, how these cubic polynomials for a parametric 

representation of a curve or surface can be decomposed into a geometric vector and a basis matrix 

and how we define a blending function. Slide 9.133 illustrates geometrically some of those blending 

functions for Bezier. Those particular ones are called Bernstein-curves. 

Now let’s proceed in Slide 9.134 to the construction of a complicated curve that consists of 2 

polynomial parts. We therefore need the beginning and end point for the first part, P 1 and P 4 , 

and the beginning and end point for the second part which is P 4 and P 7 . We then need to have 

auxiliary points P 2 , P 3 , P 5 and P 6 to define the tangent vectors at P 1 , P 4 , P 7 . P 3 defines the 

tangent at P 4 for the first curve segment and P 5 defines the tangent at point P 4 for the second 

segment. We are operating here with piece-wise functions. If P 3 , P 4 , and P 5 are colinear, then 

the curve is geometrically continuous. Study Slide 9.134 for details. 


• Was ist die Grundidee bei der Konstruktion von 2-dimensionalen ” 

Bezier-Kurven“? 

• Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von 

Kurven, und erläutern Sie anhand einer Skizze ein Approximationsverfahren Ihrer Wahl! 

9.21 Subdividing Curves and Using Spline Functions 

We can generalize the ideas of Bezier and other people and basically define spline functions 3 

as functions that are defined by a set of data points P 1 , P 2 , . . . , P n to describe an object and 

we approximate the object by piecewise polynomial functions that are valid on certain intervals. 

In the general case of splines the curve does not necessarily have to go through P 1 , P 2 , . . . , P n . 

3 in German: Biegefunktionen


Algorithm 24 Casteljau 

1: {Input: array p[0:n] of n+1 points and real number u} 

2: {Output: point on curve, p(u)} 

3: {Working: point array q[0:n]} 

4: for i := 0 to n do 

5: q[i] := p[i] {save input} 


7: for k := 1 to n do 

8: for i := 0 to n - k do 

9: q[i] := (1 - u)q[i] + uq[i + 1] 



12: return q[0] 

We need to define the locations of the joints, and the type of continuity we want. Note that we 

abondon here the used for a parametric representation. 

Let us examine the idea that our points describing an object may be in error, for example those 

points may be reconstructions from photographs taken of an object using a stereo reconstruction 

process. Because the points may be in error and therefore be noisy, we do not want the curve 

or surface to go through the points. We want an approximation of the shape. In that case we 

need to have more points than we have unknown parameters of our function. In the Least Squares 

approach discussed earlier, we would get a smooth spline going nearly through the points. Slide 

9.136 illustrates the idea of a broken-up curve and defines a definition area for each curve between 

joints P 2 , P 3 and P 3 , P 4 . We enforce continuity of the curves at the joints, for example by saying 

that the tangent has to be identical. A spline that goes exactly through the data points is different 

from the spline that approximates the data points only. Note that the data points are called control 

points 4 . 

Of course the general idea of a spline function can be combined with Bezier as suggested in 

Slide 9.137 curve. For added flexibility we want to replace a single Bezier curve by two Bezier 

curves which are defined on a first and second part of the original Bezier curve. We solve this 

problem by finding auxiliary points and tangents such that the conditions apply, by propertionally 

segmenting distance as shown. 

Slide 9.139 illustrates the process. The technique is named after a French engineer Casteljeau. 

The single curve defined by P 1 , P 4 (and auxiliary points P 2 , P 3 ) is broken into two smaller curves 

defined by L 1 , . . . , L 4 and another curve defined by R 1 , . . . , R 4 . 

Spline functions of a special kind exist if we enforce that the tangents at the joint are parallel to 

the line going through adjacent neighboring joints. Slide 9.140 explains. The technique is named 

after Catmull-Rom. 


• In Abbildung B.26 sehen Sie vier Punkte P 1 , P 2 , P 3 und P 4 , die als Kontrollpunkte für 

eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des 

Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 1 3 , also x( 1 3 

), und 

erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26 

eintragen, eine skizzenhafte Darstellung ist ausreichend. 

Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der 

Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird. 

Antwort: Die Strecken sind rekursiv im Verhältnis 1 3 : 2 3 

zu teilen (siehe Abbildung 9.2). 

4 in German: Pass-Punkte

¡ 

¥ 

£ 

¥ 

¡ 

¥ 

9.22. GENERALIZATION TO 3 DIMENSIONS 187 

¦¥ 

¤£ 

© 

¢¡ 

¨§ 

Figure 9.2: Konstruktion einer Bezier-Kurve nach Casteljau 

9.22 Generalization to 3 Dimensions 

Slide 9.142 suggests a general idea of taking the 2-dimensional discussions we just had and transporting 

them into 3 dimensions. Bezier, splines and so forth, all exist in 3-D as well. That in 

effect is where the applications are. Instead of having coordinates (x, y) or parameters t we now 

have coordinates (x, y, z) or parameters t 1 , t 2 . Instead of having points define a curve we now have 

a 3-dimensional arrangement of auxiliary points that serve to approximate a smooth 3D-surface. 

9.23 Graz and Geometric Algorithms 

On a passing note, a disproportional number of people who have been educated at the TU Graz 

have become well-known and respected scientists in the discussion of geometric algorithms. Obviously, 

Graz has been a hot bed of geometric algorithms. Look out for classes on “Geometric 

Algorithms”. Note that these geometric algorithms we have discussed are very closely related to 

mathematics and really are associated with theoretical computer science and less so with computer 

graphics and image processing. The discussion of curves and surfaces also is a topic of descriptive 

geometry. In that context one speaks of “free-form curves and surfaces”. Look out for classes and 

that subject as well!

188 CHAPTER 9. TRANSFORMATIONS

9.23. GRAZ AND GEOMETRIC ALGORITHMS 189 










































Chapter 10 

Data Structures 

10.1 Two-Dimensional Chain-Coding 

Algorithm 25 Chain coding 

1: resample boundary by selecting larger grid spacing 

2: starting from top left search the image rightwards until a pixel P[0] belonging to the region is 

found 

3: initialize orientation d with 1 to select northeast as the direction of the previous move 

4: initialize isLooping with true 

5: initialize i with 1 

6: while isLooping do 

7: search the neighbourhood of the current pixel for another unvisited pixel P[i] in a clockwise 

direction beginning from (d + 7) mod 8, increasing d at every search step 

8: if no unvisited pixel found then 

9: set isLooping false 

10: else 

11: print d 

12: end if 

13: increase i 

14: end while 

We start from a raster image of a linear object. We are looking for a compact and economical 

representation by means of vectors. Slide 10.3 illustrates the 2-dimensional raster of a contour 

image, which is to be encoded by means of a chain-code. We have to make a decision about the 

level of generalization or elimination of detail. Slide 10.4 describes the 4 and 8 neighborhood for 

each pixel and indicates by a sequence of numbers how each neighbor is labeled as 1, 2, 3, 4, . . . , 8. 

Using this approach, we can replace the actual object by a series of pixels and in the process 

obtain a different resolution. We have resampled the contour of the object. Slide 10.6 shows how 

a 4-neighborhood and an 8-neighborhood will serve to describe the object by a series of vectors, 

beginning at an initial point. The encoding itself is represented by a string of integer numbers. 

Obviously we obtain a very compact representation of that contour. 

Next we can think of a number of normalizations of that coding scheme. We may demand that 

the sum of all codes be minimized. Instead of recording the codes themselves to indicate in which 

direction each vector points, we can look at code differences only, which would have the advantage 

that they are invariant under rotations. 

Obviously the object will look different if we change the direction of the grid at which we resample 

195

196 CHAPTER 10. DATA STRUCTURES 

the contour. An extensive theory of chain codes has been introduced by H. Freeman and one of 

the best-known coding schemes is therefore also called the Freeman-Chain-Code. 


• Gegeben sei eine Punktfolge entsprechend Abbildung ?? und ein Pixelraster, wie dies in 

Abbildung ?? dargestellt ist. Geben Sie bitte sowohl grafisch als auch numerisch die kompakte 

Kettenkodierung dieser Punktfolge im Pixelraster an, welche mit Hilfe eines 8-Codes 

erhalten wird. 

10.2 Two-Dimensional Polygonal Representations 

Algorithm 26 Splitting 

1: Splitting methods work by first drawing a line from one point on the boundary to another. 

2: Then, we compute the perpendicular distance from each point along the segment to the line. 

3: If this exceeds some threshold, we break the line at the point of greatest error. 

4: We then repeat the process recursively for each of the two new lines until we don’t need to 

break any more. 

5: 

6: For a closed contour, we can find the two points that lie farthest apart and fit two lines 

between them, one for one side and one for the other. Then, we can apply the recursive 

splitting procedure to each side. 

Let us assume that we do have an object with an irregular contour as shown in Slide 10.9 on the 

left side. We describe that object by a series of pixels and the transition from the actual detailed 

contour to the simplification of a representation by pixels must follow some rules. One of those 

is a minimum parameter rule which takes the idea of a rubber band that is fit along the contour 

pixels as shown on the right-hand side of Slide 10.9. 

At issue is many times the simplification of a shape in order to save space, while maintaining the 

essence of the object. Slide 10.10 explains how one may replace a polygonal representation of an 

object by a simplified minimum quadrangle. One will look for the longest distance that can be 

defined from points along the contour of the object. This produces a line segment ab. We then 

further subdivide that shape by looking for the longest line that is perpendicular to the axis that 

we just found. This produces a quadrangle. We can now continue on and further refine this shape 

by a simplifying polygon defining a maximum deviation between the actual object contour and its 

simplification. If the threshold value is set at 0.25 then we obtain the result shown in Slide 10.10. 

The process is also denoted as splitting (algorithm 26). 


• Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale 

Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie 

einen Schritt des Algorithmus im Detail anhand Ihrer Zeichnung! Wählen Sie den Schwellwert 

so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann 

vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung 

B.35 einzeichnen.

10.3. A SPECIAL DATA STRUCTURE FOR 2-D MORPHING 197 

Definition 27 2D morphing for lines 

Problems with other kinds of representation can be taken care of by the parametric representation. 

In Parametric representation a single parameter t can represent the complete straight line once 

the starting and ending points are given. In parametric representation 

x = X(t), y = Y (t) 

For starting point (x1, y1) and ending point (x2, y2) 

(x, y) = (x1, y1) if t = 0 

(x, y) = (x2, y2) if t = 1 

Thus any point (x, y) on the straight line joining two points (x1, y1) and (x2, y2) is given by 

x = x1 + t(x2 − x1) 

y = y1 + t(y2 − y1) 

10.3 A Special Data Structure for 2-D Morphing 

Suppose the task is defined as in Slide 10.13 where an input figure, in this particular case a cartoon 

of President Bush, needs to be transformed into an output figure, namely the cartoon of President 

Clinton. The approach establishes a relationship between the object contour points of the input 

and output cartoons. Each point on the input cartoon will correspond to one or no point on the 

output cartoon. In order to morph the input into the output one needs now to take these vectors 

which link these points. We introduce a parametric representation x = f x (t), y = f y (t). We 

gradually increase the value of the parameter t from 0 to 1. At a value of the parameter t = 0 one 

has the Bush cartoon, at the parameter t = 1, one has the Clinton cartoon. The transition can be 

illustrated in as many steps as one likes. The basic concept is shown in Slide ?? and Slide 10.14 

and the result is shown in Slide 10.15. 


• In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in 

eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder 

als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen 

werden benötigt? Erläutern Sie Ihre Antwort anhand einer beliebigen Strecke 

aus Abbildung B.3! 

10.4 Basic Concepts of Data Structures 

For a successful data structure we would like to have a direct access to data independent of how big 

a data base is. We would like to have simple arrays, our data should be stored sequentially and we 

might use pointer lists, thus pointers, chains, trees, and rings. This all is applicable in geometric 

data represented by coordinates. Slide 10.17 illustrates how we can build a directed graph of some 

geometric entities that are built from points in 3-dimensional space with coordinates x, y, z at the 

base. From those points, we produce lists of edges which combine two points into an edge. From 

the edges, one builds regions or areas which combine edges into contours of areas. 

Slide 10.18 shows that we request an ease of dynamic changes in the data, so we can insert or delete 

points and objects or areas. We will also like to be able to change dynamically a visualization: 

if we delete an object we should not be required to completely recompute everything. We would


like to have support for a hierarchical approach so that we can look at an overview as well as at 

detail. And we would like to be able to group objects into hyper-objects and we need to have a 

random access to arbitrary objects independent of the number of objects in the data base. Let us 

now examine a few data structures. 


• Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch 

seine (polygonale) Oberfläche genutzt werden kann! 

10.5 Quadtree 

Algorithm 27 Quadtree 

1: {define datastructure quadtree} 

2: quadtree=(SW,SE,NW,NE:Pointer of quadtree,value) 

{SW south-western son, SE south-eastern son} 

{NW north-western son, NE north-eastern son} 

{value holds e.g. brightness} 

3: init quadtree = (NULL,NULL,NULL,NULL,0) 

4: while the entire image has not been segmented do 

5: segment actually processed area into 4 squares 

6: if there is no element of the object left in a subdivided square then 

7: link a leaf to the quadtree according to the actually processed square {leaf = 

quadtree(NULL,NULL,NULL,NULL,value)} 

8: else 

9: link new node to (SW or SE or NW or NE) of former quadtree according to the actually 

processed square 

{node = quadtree (SW,SE,NW,NE,0)} 

10: if node holds four leafs containing the same value then 

11: replace node with leaf containig value 

12: end if 

13: end if 

14: end while 

A quadtree is a tree data structure for 2-dimensional graphical data, where we subdivide the root, 

the 2-dimensional space, into squares of equal size, so we subdivide an entire area into 4 squares, 

we subdivide those 4 squares further into 4 squares and so forth. We number each quadrant as 

shown in Slide 10.20. Now if we have an object in an image or in a plane we describe the object 

by a quadtree by breaking up the area sequentially, until such time that there is no element of the 

object left in a subdivided square. In this case we call this a leaf of the tree structure, an empty 

leaf. So we have as a node a quadrant, and each quadrant has four pointers to its sons. The sons 

will be further subdivided until such time that there is either the entire quadrant filled with the 

object or it is entirely empty. 

A slight difference to the quadtree is the Bin-tree. In it, each node has only two sons and not four 

like in the quadtree. Slide 10.21 explains. 

If there is a mechanical part available as shown in Slide 10.22 then a pixel representation may be 

shown on the left and the quadtree representation at right. The quadtree is more efficient. There 

is an entire literature on geometric operations in quadtrees such as geometric transformations, 

scale changes, editing, visualization, Boolean operations and so forth. Slide 10.23 represents the 

mechanical part of Slide 10.24 in a quadtree representation.

10.6. DATA STRUCTURES FOR IMAGES 199 

A quadtree has “levels of subdivisions”, obviously, and its root is at the highest level, with a single 

node. The next level up is shown in Slide 10.24 and has one empty and three full nodes which are 

further subdivided into a third level with some empty and some full leaves and some nodes that 

are further subdivided into a fourth level. The leafs are numbered sequentially from north-west to 

south-east. Slide 10.25 again illustrates how a raster image with pixels of equal area is converted 

into a quadtree representation. It is more efficient since there are fewer leafs in a quadtree than 

there are pixels in an image, except when the image is totally chaotic. 

One may want to store all leaves whether they are empty or full or one stores only the full leafs, 

thereby saving storage space. Typically this may save 60 percent as in the example of Slide 10.26. 


• Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung 

dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung 

” 

B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes 

grafisch mitzuteilen. 

• Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen 

Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält 

sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen? 

10.6 Data Structures for Images 

So far we have looked at data structures for binary data, showing objects by means of their 

contours, or as binary objects in a raster image. In this chapter, we are looking at data structures 

for color and black and white gray value images. A fairly complete list of such data structures can 

be seen in PhotoShop (Slide 10.28 and Slide 10.29). Let us review a few structures as shown in 

Slide 10.30. 

We can store an image by storing it pixel by pixel, and all information that belongs to a pixel 

is stored sequentially, or we store row by row and we repeat say red, green, blue for each row of 

images or we can go band sequential which means we store a complete image, one for the red, one 

for the green, one for the blue channel. Those forms are called BSSF or BIFF (Band Sequential 

File Format or similar). The next category is the TIFF-format, a tagged image file format, another 

one is to store images in tiles, in little 32 by 32 or 128 by 128 windows. 

The idea of hexagonal pixels has been proposed. An important idea is that of pyramids, where a 

single image is reproduced at different resolutions, and finally representations of images by fractals 

or wavelets and so forth exist. Slide 10.31 illustrates the idea of an image pyramid. The purpose 

of pyramids is to start an image analysis process on a much reduced version of an image, e.g. to 

segment it into its major parts and then guide a process which refines the preliminary segmentation 

from resolution level to resolution level. This increases the robustness of an approach and also 

reduces computing times. At issue is how one takes a full resolution image and creates from it 

reduced versions. This may be by simple averaging or by some higher level processes and filters 

that create low resolutions from neighborhoods of higher resolution pixels. 

Slide 10.32 suggests that data structures for images are important in the context of image compression 

and we will address that subject under the title “Compression” towards the end of this 

class. 


• In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das 

erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht


nur mehr aus einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo 

wird sie eingesetzt (nennen Sie mindestens ein Beispiel)? 

• In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt 

gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge 

des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten 

Speicherbedarf einer solchen Repräsentation her, wobei 

– das erste (größte) Bild aus N × N Pixeln besteht, 

– alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden, 

– eine mögliche Komprimierung nicht berücksichtigt werden soll! 

Hinweis: Benutzen Sie die Gleichung ∑ ∞ 

i=0 qi = 1 

1−q 

für q ∈ R, 0 < q < 1. 

Antwort: 

∞∑ 

( i 1 

S(N) < N 2 · 

4) 

10.7 Three-Dimensional Data 

i=0 

= N 2 1 · 

1 − 1 4 

= N 2 · 1 

3 

= 4 3 N 2 

4 

The requirements for a successful data structure are listed in Slide 10.34. Little needs to be added 

to the contents of that slide. 


• Nennen Sie allgemeine Anforderungen an eine Datenstruktur zur Repräsentation dreidimensionaler 

Objekte! 

10.8 The Wire-Frame Structure 

Definition 28 Wireframe structure 

The simplest three-dimensional data structure is the wire-frame. A wireframe model captures the 

shape of a 3D object in two lists, a vertex list and an edge list. The vertex list specifies geometric 

information: where each corner is located. The edge list provides conectivity information, 

specifying (in arbitrary order) the two vertices that form the endpoints of each edge. 

The vertex-lists are used to build edges, the edges build edge-lists which then build faces 

or facets and facets may build objects. In a wire-frame, there are no real facets, we simply go 

from edges to objects directly. 

The simplest three-dimensional data structure is the wire-frame. At the lowest level we have a list 

of three-dimensional coordinates. The point-lists are used to build edges, the edges build edge-lists 

which then build faces or facets and facets may build objects. In a wire-frame, there are no real 

facets, we simply go from edges to objects directly. Slide 10.36 shows the example of a cube with

10.9. OPERATIONS ON 3-D BODIES 201 

the object, the edge-lists and the point-lists. The edges or lines and the points or vertices are 

again listed in Slide 10.37 for a cube. In Slide 10.38 the cube is augmented by an extra-plane and 

represented by two extra vertices and three extra lines. 


• In Abbildung B.2 ist das Skelett eines menschlichen Fußes in verschiedenen Darstellungstechniken 

gezeigt. Benennen Sie die vier Darstellungstechniken! 

10.9 Operations on 3-D Bodies 

Assume that we have 2 cubes, A and B, and we need to intersect them. A number of Boolean 

operations can be defined as an intersection or a union of 2 bodies, subtracting B from A or A 

from B leading to different results. 

10.10 Sweep-Representations 

A sweep-representation creates a 3-D object by means of a 2-D shape. An object will be created 

by moving the 2-D representation through 3-D space denoting the movement as sweep. We may 

have a translatory or a rotational sweep as shown in Slide ?? and Slide 10.43. A translatory sweep 

can be obtained by a cutting tool. A rotational sweep obviously will be obtained by a rotational 

tool. We have in Slide 10.43 the cutting tool, the model of a part and the image of an actual part 

as produced in a machine. 



Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese 

Art der Objektrepräsentation? 

• In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene 

Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben! 

10.11 Boundary-Representations 

A very popular representation of objects is by means of their boundaries. Generally, these representations 

are denoted as B-reps. They are built from faces with vertices and edges. Slide 10.45 

illustrates an object and asks the question of how many objects are we facing here, how many 

faces, how many edges and so forth? A B-rep system makes certain assumptions about the topology 

of an object. In Slide 10.46 we show a prism that is formed from 5 faces, 6 vertices, 9 edges. 

A basic assumption is that differential small pieces on the surface of the object can be represented 

by a plane as shown in the left and central elements of Slide 10.46. On the right-hand side of Slide 

10.46 is a body that does not satisfy the demands on a 2-manifold topology and that is the type 

of body we may have difficulties with in a B-rep system. 

A boundary representation takes advantage of Euler’s Formula. It relates the number of vertices, 

faces and edges to one another as shown in Slide 10.47. A simple polyhedron is a body that can be 

deformed into a sphere and therefore has no holes. In this case, Euler’s Formula applies. Slide 

10.48 shows three examples that confirm the validity of Euler’s Formula. Slide 10.49 illustrates 

a body with holes. In that case, Euler’s formula needs to be modified.



• Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die 

Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die 

Reihenfolge, damit beide Flächen ” 

in die gleiche Richtung weisen“! 


gezeigt. Benennen Sie die vier Darstellungstechniken! 

10.12 A B-Rep Data Structure 

Definition 29 Boundary representation 

A B-Rep structure describes the boundary of an object with the help of 3-dimensional Polygon 

Surfaces. The B-Rep model consists of three different object types: vertices, edges and surfaces. 

The B-Rep strucure is often organized in: 

• V: A set of vertices (Points in 3D-Space) 

• E: A set of edges. The edges are defined by 2 points referenced from E 

• S: A set of surfaces. Each surface is defined by a sequence of edges from V (at least 3 edges 

define a surface) 

The direction of the normal vector of the surfaces is usually given by the order of its edges 

(clockwise or counterclockwise). Due to the referencing the B-Rep permits a redundancy-free 

managment of the geometric information. 

A B-rep structure is not unlike a wire-frame representation, but it does represent an object with 

pointers to polygons and lists of polygons with pointers to edges and one differentiates between 

spaces that are outside and inside the object taking advantage of the sequence of edges. Slide 

10.52 illustrates a body that is represented by 2 faces in 3-D. We show the point-list, list of edges 

and list of phases. 

Slide 10.53 illustrates a B-rep representation of a cube with the list of faces, the list of edges, 

the point-lists, and the respective pointers. Slide 10.54 explains the idea of inside and outside 

directions for each face. The direction of the edges defines the direction of the normal vector onto 

a face. As shown in Slide 10.54, A would be inside of B in one case, and outside of B in the other 

depending on the direction of the normal onto face B. 

10.13 Spatial Partitioning 

An entirely different approach to 3-dimensional data structures is the idea of a spatial partitioning 

approach. 

In Slide 10.56 we choose the primitives to be prisms and cubes. They build the basic cells for a 

decomposition. From those basic elements we can now build up various shapes as shown in that 

slide. A special case occurs if the primitive is a cube of given size as shown in Slide 10.57. Slide 

10.58 introduces the idea of the oct-tree which is the 3-dimensional analogon to the quadtree. Slide 

10.59 explains how the 3 dimensional space as a root is decomposed into 8 sons, which then are 

further decomposed until there is no further decomposition necessary because each son is either 

empty or full. The example of Slide 10.59 has 2 levels and therefore the object can be created 

from 2 types of cubes. Slide 10.60 illustrates the resulting representation in a computer that takes

10.14. BINARY SPACE PARTITIONING BSP 203 

the root, subdivides it into 8 sons, calls them either white or black and if it needs to be further 

subdivided then substitutes for the element another expression with 8 sons and so forth. 

Slide 10.61 illustrates an oct-tree representation of a coffee cup. We can see how the surface, 

because of its curvature, requires many small cubes to be represented whereas on the inside of 

the cup the size of the elements increases. The data structure is very popular in medical imaging 

because there exist various sensor systems that produce voxels, and those voxels can be generalized 

into oct-trees, similar to pixels that can be generalized into quadtrees in 2 dimensions. 


• Erklären Sie den Begriff ” 

spatial partitioning“ und nennen Sie drei räumliche Datenstrukturen 

aus dieser Gruppe! 

10.14 Binary Space Partitioning BSP 

Definition 30 Cell-structure 

An example for a 3-dimensional data structure is the idea of spatial partitioning. Therefor 

some primitives like prisms or cubes are choosen. These primitives build the ”CELLS” for a 

decomposition of an object. Every geometrical object can be build with these cells. 

A special case occurs, if the primitive is an object of a given size. 

A very common datastructure to find the decomposition is the oct-tree. The root (3-dimensional 

space) of the oct-tree is subdivided into 8 cubes of equal size and these resulting cubes are 

subdivided themselve again until there is no further decomposition necessary. A son in the tree is 

marked as black or white (represented or not) or is marked as gray. Then a further decomposition 

is needed. 

This type of datastructure is very popular in medical imaging. The different sensor systems, 

like ”Computer Aided Tomography”, are producing voxels. These voxels can be generalized 

into oct-trees. 

A more specific space partitioning approach is the Binary Space Partitioning or BSP. We subdivide 

space by means of planes that can be arbitrarily arranged. The Binary Space Partition is a tree 

in which the nodes are represented by the planes. Each node has two sons which are the spaces 

which result on the two sides of a plane, we have the inner and the outer half space. 

Slide 10.63 illustrates the basic idea in 2 dimensions where the plane degenerates into straight 

lines. The figure on the left side of Slide 10.63 needs to be represented by a BSP structure. The 

root is the straight line a, subdividing a 2-D space into half spaces, defining an outside and an 

inside by means of a vector shown on the left side of the slide. There are two sons and we take 

the line b and the line j as the two sons. We further subdivide the half-spaces. We go on until 

the entire figure is represented in this manner. 

A similar illustration representing the same idea is shown in Slide 10.64. At the root is line 1, 

the outside half-space is empty, the inside half-space contains line 2, again with an outside space 

empty and the inside space containing line 3 and we repeat the structure. 

If we start out with line 3 at the root, we obtain a different description of the same object. We 

have in the outside half-space line 4 and on the inside half-space line 2, and now the half-space 

defined by line 4 within the half-space defined by line 3 contains only the line segment 1b and the 

other half-space as seen from line 3 which is then further subdivided into a half-space by line 2 

contains line-segment 1a. The straight line 1 in this case is appearing twice, once in the form of 

1a and another time in the form of 1b.


Algorithm 28 Creation of a BSP tree 

1: polygon root; {current Root-Polygon} 

2: polygon *backList, *frontList; {polygons in current Halfspaces} 

3: polygon p, backPart, frontPart; {temporary variables} 

4: if (polyList == NULL) then 

5: return NULL; {no more polygons in this halfspace} 

6: else 

7: root = selectAndRemovePolygon(&polyList); {prefer polygons defining planes that don’t 

intersect with other polygons} 

8: backList = NULL; 

9: frontList = NULL; 

10: for (each remaining polygon in polyList) do 

11: if (polygon p in front of root) then 

12: addToList(p, &frontList); 

13: else 

14: if (polygon p in back of root) then 

15: addToList(p, &backList); 

16: else {polygon p must be split} 

17: splitPoly(p, root, &frontPart, &backPart); 

18: addToList(frontPart, &frontList); 

19: addToList(backPart, &backList); 

20: end if 

21: end if 


23: return new BSPTREE(root, makeTree(frontList), makeTree(backList)); 

24: end if 


• Geben Sie einen ” 

Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten 

für das Polygon aus Abbildung B.17 an und zeichnen Sie die von Ihnen verwendeten Trennebenen 

ein! 

10.15 Constructive Solid Geometry, CSG 

This data structure takes 3D-primitives as input and produces Boolean operations, translations, 

scaling and rotational operators to construct 3-dimensional objects from the primitives. Slide 

10.66 and Slide 10.67 explain. A complex object as shown in Slide 10.67 may be composed of a 

cylinder with an indentation and a rectangular body of which a corner is cut off. The cylinder itself 

is obtained by subtracting a smaller cylinder from a larger cylinder. The cut-off is obtained by 

subtracting from a fully rectangular shape another rectangular shape. So we have 2 subtractions 

and one union to produce our object. In Slide 10.67 we have again 2 primitives, a block and a 

cylinder, we can scale them, so we start out with two types of blocks and two types of cylinders. 

By an operation of intersection union and difference we obtain a complicated object from those 

primitives. 

Slide 10.68 explains how Constructive Solid Geometry can produce a result in two different ways. 

We can take two blocks and subtract them from one another or we can take two blocks and form 

the union of them to obtain a particular shape. We cannot say generally that those two operations 

are equivalent, because if we change the shapes of the two blocks, the same two operations may 

not result in the same object shown in Slide 10.68.

10.16. MIXING VECTORS AND RASTER DATA 205 


• Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva 

bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der 

Konstruktion des Objektes (ohne Lampe). 

10.16 Mixing Vectors and Raster Data 

When we have photo-realistic representations of 3-D objects, we may need to mix data structures, 

e.g. vector data or three-dimensional data structures representing 3-D objects and raster data 

coming from images. The example of city models has illustrated this issue. introduces a particular 

hierarchical structure for the geometric data. It is called the LoD/R-Tree-data structure for Level 

of Detail and Rectangular Tree-structure. The idea is that objects are approximated by boxes 

in 3D generalized from rectangles in 2 dimensions. These blocks can overlap and so we have the 

entire city being at the root of a tree, represented by one block. Each district now is a son of that 

root and is represented by blocks. Within each district we may have city blocks, within the city 

blocks we may have buildings, and one particular building may therefore be the leaf of this data 

structure. 

We also have the problem of a level of detail for the photographic texture. We create an image 

pyramid by image processing and then store the pyramids and create links to the geometric 

elements in terms of level of detail, so that if we have wanted an overview of an object we get very 

few pixels to process. 

If we take a vantage point to look at the city, we have in the foreground a high resolution for the 

texture and in the background low resolution. So we precompute per vantage point a hierarchy 

of resolutions that may fall within the so-called View-Frustum. As we change our vantage point 

by rotating our eyes, we have to call up from a data base a related element. If we move, thus 

change our position, we have to call up from the data base different elements at high resolution 

and elements at low resolution. 

Slide 10.72 illustrates how the vector data structure describes nothing but the geometry whereas 

the raster data describes the character of the object in Slide 10.73. 

We may also use a raster data structure for geometric detail as shown in Slide 10.74. In that 

case we have an (x, y) pattern of pixels and we associate with each pixel not the gray value but 

an elevation representing therefore a geometry in the form of a raster which we otherwise have 

typically used for images only. 

10.17 Summary 

We summarize the various ideas for data structures of spatial objects, be they in 2D or in 3D. 

Slide 10.76 addresses 3D. 



gezeigt. Benennen Sie die vier Darstellungstechniken!

206 CHAPTER 10. DATA STRUCTURES

10.17. SUMMARY 207 
















10.17. SUMMARY 209 






210 CHAPTER 10. DATA STRUCTURES

Chapter 11 

3-D Objects and Surfaces 

11.1 Geometric and Radiometric 3-D Effects 

We are reviewing various effects we can use to model and perceive the 3-dimensional properties of 

objects. This could be radiometric or geometric effects of reconstructing and representing objects. 

When we look at a photograph of a landscape as in Slide 11.3, we notice various depth cues. Slide 

11.4 summarizes these and other depth cues. Total of eight different cues are being described. For 

example, colors tend to become bluer as the objects are farther away. Obviously, objects that are 

nearby would cover and hide objects that are farther away. Familiar objects, such as buildings 

will appear smaller as the distance grows. Our own motion will make nearby things move faster. 

We have spatial viewing by stereoscopy. We have brightness that reduces as the distance grows. 

Focus for one distance will have to change at others distance. Texture of a nearby object will 

become simple shading of a far-away object. 

Slide 11.5 shows that one often times differentiates between so-called two-dimensional, two-anda-half 

and three-dimensional objects. When we deal with two-and-half objects, we deal with one 

surface of that object, essentially a function z(x, y) that is single-valued. In contrast a threedimensional 

object may have multiple values of z for a given x and y. Slide 11.5 is a typical 

example of a two-and-a-half-dimensional object, Slide 11.7 of a three-D object. 


• Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 1 2D- oder 3D- 

Modellen. Definieren Sie die Objektbeschreibung durch 2 1 2D- bzw. 3D-Modelle mittels Gleichungen 

und erläutern Sie in Worten den wesentlichen Unterschied! 

11.2 Measuring the Surface of An Object (Shape from X) 

”Computer Vision“ is an expression that is particularly used when dealing with 3-D objects. 

Methods that determine the surface of an object are numerous. One generally denotes methods 

that will create a model of one side of a object (a two-and-a-half-dimensional model), as shapefrom-X. 

One typically will include the techniques which use images as the source of information. 

In Slide 11.9 we may have sources of shape information that are not images. Slide 11.10 highlights 

the one technique that is mostly used for small objects that can be placed inside a measuring 

device. This may or may not use images to support the shape reconstruction. A laser may scan a 

profile across the object, measuring the echo-time, and creating the profile sequentially across the 

211

212 CHAPTER 11. 3-D OBJECTS AND SURFACES 

object thereby building up the shape of the object. The object may rotate under a laser scanner, 

or the laser scanner may rotate around the object. In that case we obtain a complete three-D 

model of the object. Such devices are commercially available. For larger objects airborne laser 

scanners exist such as shown in Slide 11.11 and previously discussed in the Chapter 2. A typical 

product of an airborne laser scanner is shown in Slide 11.12. 

The next technique is so-called Shape-from-Shading. In this technique, an illuminated object’s 

gray tones are used to estimate a slope of the surface at each pixel. Integration of the slopes 

to a continuous surface will lead to a model of the surface’s shape. This technique is inherently 

unstable and under-constrained. There is not a unique slope associated with a pixel’s brightness. 

The same gray value may be obtained from various illumination directions and therefore slopes. 

In addition, the complication with this technique is that we must that the reflectance properties 

of the surface. We have knowledge in an industrial environment where parts of known surface 

properties are kept in a box and a robot needs to recognize the shape. In natural terrain, shading 

alone is an insufficient source of information to model the surface shape. Slide 11.14 suggests an 

example where a picture of a sculpture of Mozart is used to recreate the surface shape. With 

perfectly known surface properties and with a known light source, we can cope with the variables 

and constrain the problem sufficiently to find a solution. 

An analogy of Shape-from-Shading is Photometric Stereo, where multiple images are taken of 

a single surface from multiple vantage points that are known, but where the geometry of the 

individual images is identical, only the illumination is not. This can be used in microscopy as 

shown in the example of Slide 11.16. 

Shape-from-Focus is also usable in microscopes, but also in a natural environment with small 

objects. A Shape-from-Focus imaging system finds the portion of an object that is in focus, thereby 

producing a contour of the object. By changing the focal distance we obtain a moving contour 

and can reconstruct the object. Slide 11.18 illustrates a system that can do a shape reconstruction 

in real time using the changing focus. Slide 11.19 illustrates two real-time reconstructions by 

Shape-from-Focus. Slide 11.20 has additional examples. 

The method of Structured Light projects a pattern onto an object and makes one or more images 

of the surface with the pattern. Depending on the type of patterns we can from a single image 

reconstruct the shape, or we can use the pattern as a surface texture to make it easy for an 

algorithm to find overlapping image points in the stereo-method we will discuss in a moment. 

Slide 11.22 through Slide 11.25 illustrate the use of structured light. In case of Slide 11.22 and 

Slide 11.23 a stereo-pair is created and matching is made very simple. Slide 11.24 illustrates the 

shape that is being reconstructed. Slide 11.25 suggests that by using a smart pattern, we can 

reconstruct the shape from the gray-code that is being projected. 

Slide 11.27 illustrates a fairly new technique for mapping terrain using interferometric radar. A 

single radar pulse is being transmitted from an antenna in an aircraft or satellite and this is 

reflected off the surface of the Earth and is being received by the transmitting antenna and an 

auxiliary second antenna that is placed in the vicinity of the first one, say at the two wings of an 

airplane. The difference in arrival time of the echoes at the two antennas is indicative of the angle 

under which the pulse has traveled to the terrain and back. The method is inherently accurate to 

within the wavelength of the used radiation. This technique is available even for satellites, with 

two antennas on the space shuttle (NASA mission SRTM for Shuttle Radar Topography Mission, 

1999), or is applicable to systems with a single antenna on a satellite, where the satellite repeats an 

orbit very close, to within a few hundred meters of the original orbit, and in the process produces 

a signal as if the two antennas had been carried along simultaneously. 

The most popular and most widely used technique of Shape-from-X is the stereo-method. Slide 

11.29 suggests a non-traditional arrangement, where two cameras take one image each of a scene 

where the camera’s stereo-base b is the distance from one another. Two objects P k and P t are 

at different depths as seen from the stereo-base and we can from the two images determine a 

parallactic angle γ which allows us to determine the depth difference between the two points.

11.3. SURFACE MODELING 213 

Obviously, a scene as shown in Slide 11.29 will produce a 2-D representation on a single image in 

which the depth between P t and P k is lost. However, given two images, we can determine the angle 

(and the distance to point P k and we can also determine the angle dγ (and obtain the position 

of point P t at a depth different from P k ’s. Slide 11.30 illustrates two images of a building. The 

two images are illuminated in the same manner by the sunlight. The difference between the two 

images is strictly geometrical. We have lines in the left image and corresponding lines in the right 

images that are called “epi-polar lines”. Those are intersections of a special plane in 3-d space 

with each of the two images. These planes are formed by the two projection centers and a point 

on the object. If we have a point on the line of the left image, we know that it’s corresponding 

matching point must be on the corresponding epi-polar line in the right image. Epi-polar lines 

help in reducing the searching for match points for automated stereo. Slide 11.31 is a stereo 

representation from an electron microscope. Structures are very small, pixels may have the size 

of a few nanometers in object-space. We do not have a center-perspective camera model as the 

basis for this type of stereo. However, the electron microscopic mode of imaging can be modeled 

and we can reconstruct the surface by a method similar to classical camera stereo. 

Slide 11.32 addresses a last technique of Shape-from-X, tomography. Slide ?? and Slide ?? illustrate 

from medical imaging, a so-called computer-aided tomographic CAT scan of a human scull. 

Individual images represent a slice through the object. By stacking up a number of those images 

we obtain a replica of the entire original space. Automated methods exist that collect all the 

voxels that belong to a particular object and in the process determine the surface of that object. 

The result is shown in Slide 11.34. 


• Erstellen Sie bitte eine Liste aller Ihnen bekannten Verfahren, welche man als ” 

Shape-from- 

X“ bezeichnet. 

• Wozu dient das sogenannte ” 

photometrische Stereo“? Und was ist die Grundidee, die diesem 

Verfahren dient? 

• In der Vorlesung wurden Tiefenwahrnehmungshilfen ( ” 

depth cues“) besprochen, die es dem 

menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene 

dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in 

der digitalen Bildverarbeitung von verschiedenen ” 

shape from X“-Verfahren gelöst. Welche 

” depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden ” shape 

from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung 

kann kein solcher Zusammenhang hergestellt werden? 

11.3 Surface Modeling 

There is an entire field of study to optimally model a surface from the data primitives one may 

have obtained from stereo or other Shape-from-X techniques. We are dealing with point clouds, 

connecting the point clouds to triangles, building from triangles polygonal faces, then take the 

faces and replace them by continuos functions such as bi-cubic or quadric functions. Slide 11.36 

illustrates a successfully constructed network of triangles, using as input a set of points created 

from stereo. Slide 11.37 illustrates the triangles formed from all the photogrammetrically obtained 

points of Emperor Charles in the National Library in Vienna. Also shown is a rendering of that 

surface using photographic texture. calls to mind that these problems of creating a surface from 

measured points. triangulating points etc have been previously discussed in the Chapters 9 and 

10.


11.4 Representing 3-D Objects 

In representing 3-D objects we have to cope with 2 important subjects: 

• hidden edges and hidden surfaces 

• the interaction of light and material 

In dealing with hidden edges and surfaces, we essentially differentiate among two classes of procedures. 

The first is an image space method where we go through all the pixels of an image and 

find the associated object point that is closest to the image. This method is very susceptible to 

aliasing effects. The object space method searches through the object among all object elements 

and is checking what can be seen from the vantage point of the user. These techniques are less 

prone to suffer from aliasing. 

The issue of hidden lines or surfaces is illustrated in Slide 11.40 with a single-valued function 

y = f(x, z). We might represent this surface by drawing profiles from the left edge to the right 

edge of the 2-D surface. The resulting image in Slide 11.40 is not very easily interpreted. Slide 

11.42 illustrates the effect of removing hidden lines. Hidden lines are being removed by going from 

profile to profile through the data set and plotting them into a 2-D form as shown in Slide 11.43. 

Each profile is compared with the background and we can find by a method of clipping which 

surface elements are hidden by previous profiles. This can be done in one dimension as shown in 

Slide 11.43 and then in a second dimension (Slide 11.44). When we look at Slide 11.44 we might 

see slight differences between two methods of hidden line removal in case (c) and case (d). 

Many tricks are being applied to speed up the computation of hidden lines and surfaces. One 

employs the use of neighborhoods or some geometric auxiliary transformations, some accelerations 

using bounding boxes around objects or finding surfaces that are facing away from the view position 

(back-face culling), a subdivision of the view frustum and the use of hierarchies. Slide 11.46 

illustrates the usefulness of enclosing rectangles or bounding boxes. Four objects exist in a 3-D 

space and it is necessary to decide which ones cover the other ones up. Slide 11.47 illustrates 

that the bounding box approach, while helping many times, may also mislead one to suspecting 

overlaps when there are none. 


• Bei der Erstellung eines Bildes mittels ” 

recursive raytracing“ trifft der Primärstrahl für ein 

bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen 

aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die 

Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten 

jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen 

geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel 

zugeordnete Intensität, wenn 

1. die Rekursionstiefe nicht beschränkt ist, 

2. der Strahl nur genau einmal aufgeteilt wird, 

3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt! 

Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des 

Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden! 

Antwort:

11.5. THE Z-BUFFER 215 

1. ohne Beschränkung: 

I = 2.7 + 0.1 · 2 + 0.5 · (3 + 0.4 · 2 + 0.1 · 4) 

= 2.7 + 0.2 + 0.5 · (3 + 0.8 + 0.4) 

= 2.9 + 0.5 · 4.2 

= 2.9 + 2.1 

= 5 

2. Rekursionstiefe beschränkt: 

I = 2.7 + 0.1 · 2 + 0.5 · 3 

= 2.7 + 0.2 + 1.5 

= 4.4 

3. Abbruch nach Gewichtung: 

I = 2.7 + 0.5 · (3 + 0.4 · 2) 

= 2.7 + 0.5 · 3.8 

= 2.7 + 1.9 

= 4.6 

11.5 The z-Buffer 

Algorithm 29 z-buffer 

1: Set zBuffer to infinite 

2: for all possible polygons plg that have to be drawn do 

3: for all possible scanlines scl of that polygon plg do 

4: for all possible pixels pxl of that scanline scl do 

5: if z-Value of pixel pz is nearer than zBuffer then 

6: set zBuffer to z-Value of pixel pz 

7: draw pixel pxl 

8: end if 




The most popular approach to hidden line and surface removal is the well-known z-Buffer method 

(algorithm 29). It has been introduced in 1974 and uses a transformation of an object’s surface 

facets into the image plane and keeping track at each pixel of the distance between the camera 

and the corresponding element on an object facet. One is keeping that gray value in each pixel 

which comes from an object point that is closest to the image plane. 

Another procedure is illustrated in Slide 11.50 with an oct-tree. The view reference point V as 

shown in that slide leads to labeling of the octtree space and shows that the element 7 will be seen 

most. 


• Die vier Punkte aus Aufgabe B.2 bilden zwei Strecken 

A = p 1 p 2 , B = p 3 p 4 ,


12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

z 

§ ¢¡ 

B 

¢¡£ 

A 

0 

0 1 2 3 4 5 6 7 8 9 10 

x 

− − B B A A B B B − − 

Figure 11.1: grafische Auswertung des z-Buffer-Algorithmus 

¤¡¥ 

¤¡¦ 

deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline 

fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt 

(A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist! 

Hinweis: Zeichnen Sie p 1 p 2 und p 3 p 4 in die xz-Ebene des Gerätekoordinatensystems ein! 


11.6 Ray-tracing 

The most popular method to find hidden surfaces (but also used in other contexts) is the so-called 

ray-tracing method. Slide 11.52 illustrates the basic idea that we have a projection center, an 

image window and the object space. We cast a ray from the projection center through a pixel 

into the object space and check to see where it hits the objects. To accelerate the ray-tracing we 

subdivide the space and instead of intersecting the ray with each actual object we do a search 

through the bounding boxes surrounding the objects. In this case we can dismiss many objects 

because they are not along the path of the ray that is cast for a particular pixel. Pseudocode can 

be seen in algorithm 30 


• Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen! 

” 

Optimierungen können helfen, den Rechenaufwand zu verringern? 

Welche 

Antwort: Vom Projektionszentrum aus wird durch jedes Pixel der Bildebene ein Strahl 

in die Szene geschickt und mit allen Objekten geschnitten. Von allen getroffenen Objekten 

bestimmt jenes, dessen Schnittpunkt mit dem Strahl dem Projektionszentrum am nächsten 

liegt, den Farbwert des Pixels. 

– Die Zahl der benötigten Schnittberechnungen kann durch Verwendung von hierarchischen 

bounding-Volumina stark reduziert werden. 

– Das getroffene Objekt (bei recursive ray-tracing nur im ersten Schnitt) kann auch mit 

Hilfe des z-buffer Algorithmus ermittelt werden.

11.6. RAY-TRACING 217 

Algorithm 30 Raytracing for Octrees 

Raytracing - Algorithmus 

Für jede Zeile des Bildes 

Für jedes Pixel der Zeile 

Bestimme Strahl vom Auge zum Pixel; 

Pixelfarbe = Raytrace(Strahl); 

Raytrace(Strahl) 

Für alle Objekte der Szene 

Wenn Strahl Objekt schneidet und Schnittpunkt ist bisher 

am nächsten 

notiere Schnitt; 

Wenn kein Schnitt dann Ergebnis:=Hintergrundfarbe sonst 

Ergebnis:= 

Raytrace(reflektierter Strahl) + Raytrace(gebrochener Strahl); 

Für alle Lichtquellen 

Für alle Objekte der Szene 

Wenn Strahl zur Lichtquelle Objekt schneidet 

Schleifenabbruch, nächste Lichtquelle 

Wenn kein Schnitt gefunden 

Ergebnis += lokale Beleuchtung 

Octree - Implementierung 

Aufbau 

Lege Quader q um Szene 

Für alle Objekte o 

Einfügen(o, q) 

Einfügen (Objekt o, Quader q) 

Für alle acht Teilquader t von q 

Wenn o ganz in t passt 

Ggf. t erstellen 

Einfügen(o,t ) 

return 

Ordne Objekt o Quader q zu 

Schnitt 

Schnitt (Quader q, Strahl s) 

Wenn q leer return NULL 

Wenn Schnitttest(q, s) 

Für alle acht Teilquader t von q 

res += Schnitt(t, s) 

Für alle zugeordneten Objekte o 

res += Schnitttest(o, s) 

return nächsten Schnitt(res)


11.7 Other Methods of Providing Depth Perception 

Numerous methods exist to help us create the impression of depth in the rendering of a 3-D model. 

These may include coding by brightness or coding in color. Slide 11.55 illustrates depth encoding 

by means of the brightness of lines. The closer an object is to the viewer, the brighter it is. In Slide 

11.56 we even add color to help obtaining a depth perception. Of course the depth perception 

improves dramatically if we use the removal of edges as shown in Slide 11.57. We now can take 

advantage of our knowledge that nearby objects cover up objects that are farther away. Slide 

11.60 indicates that the transition to illumination methods for rendering 3-D objects is relevant 

for depth perception. 

Slide 11.58 introduces the idea of halos to represent 3-D objects, and Slide 11.59 is an example. 

At first we see a wire-frame model of a human head and we see the same model after removing 

the hidden lines but also interrupting some of the lines when they intersect with other lines. The 

little interruption is denoted as a halo.

11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION 219









11.7. OTHER METHODS OF PROVIDING DEPTH PERCEPTION 221 










Chapter 12 

Interaction of Light and Objects 

Radiation and the natural environment have a complex interaction. If we assume as in Slide 12.2 

that the sun illuminates the Earth, we have atmospheric scattering as the radiation approaches 

the surface. We have atmospheric absorption that reduces to power in the light coming from the 

sun. Then we have reflection of the top surface, which can be picked up by a sensor and used 

in image formation. The light will go through an object and will be absorbed, but at the same 

time an object might emit radiation, such as for example in the infrared wave length. Finally 

the radiation will hit the ground and might again be absorbed, reflected or emitted. As the light 

returns from the Earth’s surface to the sensor we again have atmospheric absorption and emission. 

In remote sensing many of those factors will be used to describe and analyze objects based on 

sensed images. In computer graphics we use a much simplified approach. 

12.1 Illumination Models 

Definition 31 Ambient light 

In the Ambient Illumination Model the light intensity I after reflection from an object´s surface 

is given by the equation 

I = I a k a 

I a is the intensity of ambient light, assumed to be constant for all objects. k a is a constant 

between 0 and 1, called ambient-reflection coefficient. k a is a material property, and must be 

defined for every object. 

Ambient light alone creates unnatural images, because every point on an object´s surface is assigned 

the same intensity. Shading is not possible with this kind of light. Ambient light is used 

mainly as an additional term in more complex illumination models, to illuminate parts of an object 

that are visible to the viewer, but invisible to the light source. The resulting image then becomes 

more realistic. 

The simplest case is illumination by ambient light (definition 31). The existing light will be 

multiplied with the properties of an object to produce the intensity of an object point in an 

image. Slide 12.4 illustrates this with the previously used indoor scene. 

Slide 12.5 goes one step further and introduces the diffuse Lambert reflection. There is a light 

source which illuminates the surface under an angle Θ from the surface normal. The illumination 

intensity I is the amount of incident light × the surface property k × the angle under which the 

light is falling onto the surface is then being reflected. Slide 12.6 illustrates the effect of various 

223

224 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS 

Definition 32 Lambert model 

The Lambert Model describes the reflection of the light of a point light source on a matte surface 

like chalk or fabrics. 

Light emitted on a matte surface is diffuse reflected. This means that the light is reflected 

uniformely in any direction. Because of the uniform reflection in any direction the amount of light 

seen from any angle in front of the surface is the same. As the point of view does not influence 

the amount of reflected light seen, the position of the light source has to. This relationship is 

described in the Lambertian law: 

Lambertian law: 

Assume that a surface facet is directly illuminated by light so that the normal vector of the 

surface is parallel to the vector from the light source to the surface facet. Now if you tilt the 

surface facet by an angle θ the amount of light falling on the surface facet reduces by cos θ. 

A tilted surface is illuminated by less light than a surface normal to the light direction. 

So it reflects less light. This is called the diffuse Lambertian reflection: 

I = I p · k d · cos θ 

Where I is the amount of reflected light; I p is the intensity of the point light source; k d is the 

materials diffuse reflection coefficient and cos θ is the angle between the surface normal and the 

light vector. 

values of the parameter k. The angle cos Θ in reality can also be expressed as an in-product of 

two vectors, namely the vector from the light and the surface normal. Considering this diffuse 

Lambert reflection our original image becomes Slide 12.7. 

The next level of complexity is to add the two brightnesses together, the ambient and the Lambert 

illumination. A next sophistication gets introduced if we add an atmospheric attenuation of the 

light as a function of distance to the object shown in Slide 12.9. So far we have not talked about 

mirror reflection. For this we need to introduce a new vector. We have the light source L, the 

surface normal N, the mirror reflection vector R and the direction to a camera or viewer V . We 

have a mirror reflection component in the system that is illustrated in Slide 12.10 with a term 

W cos n α. α is the angle between the viewing direction and the direction of mirror reflection. W is 

a value that the user can choose to indicate how mirror-like the surface is. Phong introduced the 

model of this mirror reflection in 1975 and explained the effect of the power of n of cos n α. The 

larger the power is, the more focussed and smaller will the area of mirror reflection be. But not 

only does the power n define the type of mirror reflection, but also the parameter W as shown in 

Slide 12.12 where the same amount of mirror reflection produces different appearances by varying 

the value of the parameter W . W is describing the blending of the mirror reflection into the 

background whereas the value n is indicating how small or large the area is that is affected by 

the mirror reflection. Slide 12.13 introduces the idea of a light source that is not a point. In that 

case we introduce a point light source and a reflector, which will reflect light onto the scene. The 

reflector represendts the extended light source. 


• Was ist eine einfache Realisierung der ” 

Spiegelreflektion“ (engl.: specular reflection) bei 

der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den 

Namen eines Verfahrens nach seinem Erfinder. 

• In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell 

von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parame-

12.2. REFLECTIONS FROM POLYGON FACETS 225 

ter der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter 

wahrgenommene Intensität I dieses Punktes! 

Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge 

gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die 

Näherung (1 − x) k ≈ 1 − kx für kleine x verwendbar ist. 

12.2 Reflections from Polygon Facets 

Gouraud introduced the idea of interpolated shading. Each pixel on a surface will have a brightness 

in an image that is interpolated using the three surrounding corners of the triangular facet. 

The computation is made along a scan line as shown in Slide 12.15 with auxiliary brightness values 

I a and I b . Note that the brightnesses are computed with a sophisticated illumination model at 

positions I 1 , I 2 and I 3 of the triangle and then a simple interpolation scheme is used to obtain 

the brightness in I p . Gouraud does not consider a specular reflection while Phong does. 

Gouraud just interpolated brightnesses (algorithm 31), Phong interpolates surface normals from 

the corners of a triangle (algorithm 32). Slide 12.16 explains. Slide 12.17 illustrates the appearance 

of a Gouraud illumination model. Note how smooth the illumination changes along the 

surface whereas the geometry of the object is not smoothly interpolated. Slide 12.18 adds specular 

reflection to Gouraud. Phong, as shown in Slide 12.19, is creating a smoother appearance of the 

surface because of its interpolation of the surface normal. Of course it includes specular reflection. 

In order to not only have smoothness in the surface illumination but also in the surface geometry, 

facets of the object must be replaced by curved surfaces. Slide 12.20 illustrates the idea: the 

model’s appearance is improved, also due to the specular reflection of the Phong model. 

Slide 12.21 finally is introducing additional light sources. Slide 12.22 summarizes the various types 

of reflection. We have the law of Snell, indicating that the angle of incidence equals the angle of 

reflection and these angles are measured with respect to the surface normal. A mirror or specular 

reflection is very directed and the incoming ray is reflected in the opposite output direction. The 

opposite of specular reflection is the ”diffuse“ reflection. If it is near perfect, it will radiate into 

all directions almost equally. The Lambert reflection is a perfect diffuse reflector as shown in on 

the right-hand side. 


• Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt 

nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte 

ist I A = 100, I B = 50 und I C = 0. Berechne die Beleuchtungswerte nach dem Gouraud- 

Verfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden 

Pixeln. 

• Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks, 

das zu einer beleuchteten polygonalen Szene gehört. 

12.3 Shadows 

Typically, shadows are computed in two steps or phases. The computations for shadows are related 

to the computation of hidden surfaces, because areas in shadows are areas that are not seen from 

the illuminating sun or light source. Slide 12.24 explains the two types of transformation. We first 

have to transform a 3-D object into a fictitious viewing situation with the view point at the light 

source. That produces the visible surfaces in that view. A transform into the model coordinates 

produces shadow edges. We now have to merge the 3-D viewing and the auxiliary lines from


Algorithm 31 Gouraud shading 

Prozedur ScanLine (x a , I a , x b , I b , y) 

1: grad = (I b − I a )/(x b − x a ) {Schrittweite berechnen} 

2: if x b > x a then 

3: x c = (int)x a + 1 {x c und x d auf Mittelpunkt von Pixel setzen} 

4: x d = (int)x b 

5: else 

6: x c = (int)x b + 1 

7: x d = (int)x a 

8: end if 

9: I = I a + (x c − x a ) ∗ grad {Startwert für erstes Pixel berechnen} 

10: while x c ≤ x d do 

11: I auf Pixel (x c ,y) anwenden 

12: x c = x c + 1 {einen Schritt weiter gehen} 

13: I = I + grad 

14: end while 

Function Triangle(x 1 , y 1 , I 1 , x 2 , y 2 , I 2 , x 3 , y 3 , I 3 ) 

1: Punkte aufsteigend nach der y-Koordinate sortieren 

2: ∆x a = (x 2 − x 1 )/(y 2 − y 1 ) {Schrittweiten für linke Kante berechnen} 

3: ∆I a = (I 2 − I 1 )/(y 2 − y 1 ) 

4: ∆x b = (x 3 − x 1 )/(y 3 − y 1 ) {Schrittweiten für rechte Kante berechnen} 

5: ∆I b = (I 3 − I 1 )/(y 3 − y 1 ) 

6: y = (int)y 1 + 1 {Startzeile berechnen} 

7: y end = (int)(y 2 + 0.5) {Endzeile für oberes Teildreieck berechnen} 

8: x a = x 1 + (y − y 1 ) ∗ ∆x a {Startwerte berechnen} 

9: x b = x 1 + (y − y 1 ) ∗ ∆x b 

10: I a = I 1 + (y − y 1 ) ∗ ∆I a 

11: I b = I 1 + (y − y 1 ) ∗ ∆I b 

12: while y < y end do 

13: eine Zeile mit ScanLine(x a , I a , x b , I b , y) berechnen 

14: x a = x a + ∆x a {einen Schritt weiter gehen} 

15: x b = x b + ∆x b 

16: I a = I a + ∆I a 

17: I b = I b + ∆I b 

18: y = y + 1 

19: end while {oberes Teildreieck fertig} 

20: ∆x a = (x 3 − x 2 )/(y 3 − y 2 ) {Schrittweiten für Kante berechnen} 

21: ∆I a = (I 3 − I 2 )/(y 3 − y 2 ) 

22: y end = (int)(y 3 + 0.5) {Endzeile für unteres Teildreieck berechnen} 

23: x a = x 2 + (y − y 2 ) ∗ ∆x a {Startwert berechnen} 

24: while y < y end do 

25: eine Zeile mit ScanLine(x a , I a , x b , I b , y) berechnen 

26: x a = x a + ∆x a {einen Schritt weiter gehen} 

27: x b = x b + ∆x b 

28: I a = I a + ∆I a 

29: I b = I b + ∆I b 

30: y = y + 1 

31: end while {unteres Teildreieck fertig}

12.3. SHADOWS 227 

Algorithm 32 Phong - shading 

1: for all polygons do 

2: compute the surface normal in the corners of the polygon. 

3: project the corners of the polygon into the plane 

4: for all scanlines, which are overlaped by the polygon do 

5: compute the linear interpolated surface normals on the left and right edge of the polygon 

6: for all pixels of the polygon on the scanline do 

7: compute the linear interpolated surface normals 

8: normalize the surface normals 

9: compute the illuminating modell and set the color of the pixel to the computed value 




Algorithm 33 Shadow map 

1: make lightsource coordinate be center of projection 

2: render object using zbuffer 

3: assign zbuffer to shadowzbuffer 

4: make camera coordinate be center of projection 

5: render object using zbuffer 

6: for all pixels visible do 

7: Map coordinate from ’camera space’ into ’light space’ 

8: Project transformed coordinate to 2D (x’,y’) 

9: if transformed Z-coordinate > shadowzbuffer[x’, y’] then 

10: shadow pixel {A Surface is nearer to the point than the lightsource} 

11: end if 


Algorithm 34 Implementation of Atheron-Weiler-Greeberg Algorithm 

1: make lightpoint be center of projection 

2: determine visible parts of polygones 

3: split visible and invisible parts of partial lightened polygones 

4: transform to modelling database 

5: merge original database with lightened polygones {results a object splitted in lightened an 

unlightened polygones} 

6: make (any) eye point be center of projection 

7: for all polygons do {reder scene} 

8: if polygone is in shadow then 

9: set shading model to ambient model 

10: else 

11: set shading model to default model 

12: end if 

13: draw polygones 



the shadow boundaries into a combined polygon data base. The method of computing hidden 

surfaces from the viewer’s perspective is repeated in Slide 12.25. Slide 12.26 illustrates the use of 

the z-buffer method for the computation of shadow boundaries (algorithm 33). L is the direction 

of the light, V is the position of the viewer. We first have to do a z-buffer from the light source, 

and then we do a z-buffer from the viewer’s perspective. The view without shadows and the view 

with them give a dramatically different impression of realism of the scene with two objects. 


• Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels 

z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile. 

12.4 Physically Inspired Illumination Models 

There is a complex world of illumination computations that are concerned with the bi-directional 

reflectivity function BRDF. In addition we can use ray-tracing for illumination and a very particular 

method called radiosity. We will spend a few thoughts on each of those three subjects. 

A BRDF in Slide 12.28 describes the properties of a surface as a function of illumination. A 3-D 

shape indicates how the incoming light from a light source is being reflected from a particular 

surface. Many of the mathematical models used to describe those complex shapes bear their 

inventors’ names. 

12.5 Regressive Ray-Tracing 

As discussed before, we have to cast a ray from the light source onto the object and find points 

in shadow or illuminated. Similarly, rays cast from the observer’s position will give us the hidden 

lines from the viewer’s reference point. Slide 12.30 illustrates again the geometry of ray-tracing to 

obtain complex patterns in an image from an object and from light cast from other objects onto 

that surface. Transparent object reflections may be obtained from the interface of the object with 

the air at the back, away from the viewer. 

12.6 Radiosity 

A very interesting illumination concept that has been studied extensively during the last ten years 

is called radiosity. It is a method that derives from modeling the distribution of temperature in 

bodies in mechanical engineering (see Algorithm 35). 

We subdivide the surface of our 3-D space into small facets. We have a light source, illuminating all 

the facets, but the facets illuminate one another, and they become a form of secondary illumination 

source. Each surface facet has associated with it the differential surface area dA. We can set up 

an equation that relates the incoming light of the facets to all other facets. Very large systems 

of equations comes about. They can, however, be efficiently reduced in the number of unknowns, 

and therefore efficiently be solved. 

Let’s have a look at the few of the examples of these technique. In Slide 12.38, we see a radiosity 

used in the representation of a classroom, Slide 12.39 is an artificial set of cubes, Slide 12.39 

illustrates one table at two levels of resolution. In the first case, the facets used for radiosity 

are fairly large, in the second the facets are made much smaller. We see how the realism in this 

illumination model increases.

12.6. RADIOSITY 229 

Algorithm 35 Radiosity 

1: load scene 

2: divide surfaces into patches 

3: for all patches do {initialize patches} 

4: if patch is a light then 

5: patch.emmision := amount of light 

6: patch.available emmision := amount of light 

7: else 

8: patch.emmision := 0 

9: patch.available emmision := 0 

10: end if 


12: 

13: repeat {render scene} 

14: for all patches i, starting at the patch with the highest emmision available do 

15: place hemicube on top of patch i {needed to calculate form factors} 

16: for all patches j do 

17: calculate form factor between patch i and patch j {needed to calculate amount of 

light} 


19: for all patches j do 

20: ∆R := amount of light from patch i to patch j {using the form factor and properties 

of the patches} 

21: j.emmision available := j.emmision available +∆R 

22: j.emmision := j.emmision +∆R 


24: i.emmision available := 0 {all aviailable light has been distributed to the other patches} 


26: until good enough


Similarly, we have radiosity in modeling a computer room in Slide 12.39. We have internal illumination, 

and in one case on the lower right of Slide 12.40 we have illumination from the outside of 

the room. In slide we see a radiosity-based computation of an indoor scene again in two levels of 

detail in the mesh sizes for the radiosity computation.

12.6. RADIOSITY 231









12.6. RADIOSITY 233 




Slide 12.41

234 CHAPTER 12. INTERACTION OF LIGHT AND OBJECTS

Chapter 13 

Stereopsis 

13.1 Binokulares Sehen 

The 3-dimensional impressions of our environment as received by our two eyes is called binocular 

vision. Slide 13.10 explains that the human perceives two separate images via two eyes, merges 

the two images in the brain and reconstructs a depth-model of the perceived scene in the brain. 

The two images obtained by the two eyes differ slightly, because of the two different vantage 

points. The stereo-base for natural binocular vision is typically six-and-a-half centimeters, thus 

the distance between the eyes. 

Recall that natural depth perception is defined by many depth queues other that binocular vision. 

We talked about depth queues by color, by size, by motion, by objects covering up one another 

etc. (see Chapter 11) 

Slide 13.4 explains geometrically the binocular stereo-effect. On the retina, two points, P and Q, 

will be imaged on top of one another in one eye, but will be imaged side by side subtending an small 

angle dγ in the other eye. We call γ the parallactic angle or parallax an dγ a parallel difference. 

It is the measure of disparity which is sensed and used in the brain for shape reconstruction. 

The angle γ itself gives us the absolute distance to a point P and is usually computed from the 

stereobase b a . Note that our eyes are sensitive to within a parallactic angle of 15 seconds of arc 

(15”), and may be limited to perceive a parallactic angle no larger than 7 minutes of arc (7’). 

Slide 13.5, Slide 13.6, and Slide 13.7 illustrate two cases of stereo-images taken from space and one 

from microscopy. Note the difference between binocular viewing and stereo-viewing as discussed 

in a moment. 

What is of interest in a natural binocular viewing environment is the sensitivity of the eyes to 

depth. Slide 13.8 explains that the difference in depth between 2 points, d, can be obtained by 

our sensitivity to the parallactic angle, dγ. Since this is typically no smaller than 17 seconds of 

arc, we have a depth differentiation ability dγ as shown in Slide 13.8. At a distance of 25 cm we 

may be able to perceive depth differences as small a few p micrometers. At a meter it may be a 

tenth of a millimeter, but at ten meters distance, it may already be about a meter. At a distance 

of about 900 meters, we may not see any depth at all from our binocular vision. 


• Gegeben sei eine Distanz y A = 3 Meter vom Auge eines scharfäugigen Betrachters mit 

typischem Augenabstand zu einem Objektpunkt A. Wie viel weiter darf sich nun ein zweiter 

Objektpunkt B vom Auge befinden, sodass der Betrachter den Tiefenunterschied zwischen 

den beiden Objektpunkten A und B gerade nicht mehr wahrnehmen kann? Es wird um die 

235

236 CHAPTER 13. STEREOPSIS 

entsprechende Formel, das Einsetzen von Zahlenwerten und auch um die Auswertung der 

Formel gebeten. 

• Auf der derzeit laufenden steirischen Landesausstellung ” 

comm.gr2000az“ im Schloss Eggenberg 

in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen 

soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu 

können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden. 

Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte 

Skizze der Anordnung ist in Abbildung B.63 dargestellt. 

Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63 

markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit 

halber folgende Kameraparameter an: 

– Brennweite: 10 Millimeter 

– geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter 

Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition 

verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie 

eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen 

zur Unschärfe der binokularen Tiefenwahrnehmung verwenden. 

13.2 Stereoskopisches Sehen 

We can now trick our two eyes to think they would see the natural environment, when in fact 

they look at two images presented separately to the left and right eye. Since those images will 

not be at an infinite distance, but will be perhaps at 25 cm, we will be forced with our eyes to 

focus at 25 cm, yet use in our brain an attitude as if one were to look at a much larger distance 

where the eye’s optical axes are parallel. Many people have difficulties focussing at 25 cm, and 

simultaneously obtaining a stereoscopic impression. 

To help, one has auxiliary tools called a mirror stereoscope. Two images are placed on a table, an 

assembly of two mirrors and a lens present each image separately to each eye, whereby the eye is 

permitted to focus at infinity and not at 25 cm. 

Slide 13.12 lists alternative modes of stereo-viewing. We mentioned the mirror stereoscope with 

separate optical axes. A second approach is by anaglyphs, implemented in the form of glasses, 

where one eye is only receiving the red, the other one only the green component in an image. A 

third approach is polarization, where the images presented to the eyes are polarized differently 

for the left and right eye. And a further approach is the ultimate presentation of images by 

shutters and glasses and presentation by projection or on a monitor. All four approaches have 

been implemented on computer monitors. 

You can think of a mirror stereoscope looking at two images by putting two optical systems on 

a monitor and have the left half present one image, the right half the other. Anaglyphs are a 

classical case of looking at stereo on a monitor by presenting in the green and red channel two 

images, and wearing glasses to perceive a stereo impression. The most popular way of presenting 

soft copy images on a monitor is by polarization, wearing simple glasses that look at two polarized 

images on a monitor or active glasses, that are being controlled from the monitor and presenting 

120 images per second, 60 to one eye and 60 to the other eye, and by polarization ensuring that 

the proper image hits the proper eye. This will be called image flickering using polarization. 

Slide 13.13 explains how stereoscopic viewing by means of two images increases the ability of the 

human to perceive depth way beyond the ability available from binocular vision. The reason is very 

simple. Binocular vision is limited by the six-and-half cm distance between the two eyes, whereas

13.3. STEREO-BILDGEBUNG 237 

Definition 33 total plastic 

Let 

be the total plastic, whereby 

p = n · v 

n ... image magnification 

v ... eye base magnification 

The synthetic eye base dA, typically 6.5 cm, can be magnified by the stereo base dK from 

which the images are taken. That implies 

v = dK/dA 

stereoscopic vision can employ images taken from a much larger stereobase. Take the example of 

aerial photography, where two images may be taken from an airplane with the perspective centers 

600 meters apart. We obtain an increase in our stereo-perception that is called total plastic (see 

Definition 13.2). We look at a ratio between the eye base and the stereo base from which the images 

are taken which gives us a factor v. In addition, we look at the images under a magnification n, 

and a total plastic increases our stereo-ability, by n · v, thus by a factor of tens of thousand. As 

a result, even though the object may be, say, a thousand meters away, we still may have a depth 

acuity of three cm. 


• Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das ” 

Geheimnis“, welches 

es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine 

wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem 

Sehen möglich ist. 

• Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines ” 

echten“ 

(dreidimensionalen) Raumeindrucks einer vom Computer dargestellten Szene! 

13.3 Stereo-Bildgebung 

We need to create two natural images, with one camera at two positions, taking images sequentially, 

or with a camera pair, taking images simultaneously. The simultaneous imaging is preferred when 

the object moves. Slide 13.15 illustrates the two camera positions looking at a two-dimensional 

scene and explaining again the concept of a stereobase b, of the angle γ of convergence giving 

us the distance to the object P K and the parallactic difference angle dγ which is a measure of 

depth between two points P K and P T . Slide 13.16 repeats the same idea for the case of aerial 

photography where an airplane takes one image at position O 1 and a second image at position O 2 . 

The distance between 0 1 and 0 2 is this aerial stereobase b, the distance to the ground is the flying 

height H, the ratio b/H is called base-to-height-ratio and is a measure for the stereo acuity of an 

image pair. Slide 13.17 repeats again the case of two images taken from an overhead position. Note 

that the two images look identical to the casual observer. What makes the stereo-process work 

are the minute small geometric differences between the two images which occur in the direction 

of flight. There are no geometric differences in the direction perpendicular to the flight direction. 

Going back to Slide 13.16, we may appreciate the necessity of recreating in the computer the 

relative position and orientation of the two images in space. An airplane or satellite may make


unintended motions that will lead the user to not get an accurate measure of the positions O 1 and 

O 2 and of the direction of imaging for the two camera positions. A stereo-process will therefore 

typically require that sets of points are extracted from overlapping images representing the same 

object on the ground. These are called homologue points. In Slide 13.16 is suggested that a 

rectangular pattern of six points has been observed in image 1 and the same six points have been 

observed in image 2. What now needs to happen mathematically is that two bundles of rays 

are created from the image coordinates and the knowledge of the perspective centers O 1 and O 2 

in the camera system. And then the two bundles of rays need to be arranged such, that the 

corresponding rays (homologue rays) intersect in the three-dimensional space of the object world. 

We call the reconstruction of a bundle of rays from image coordinates the inner orientation. We 

call the process by which we arrange the two images such that all corresponding rays intersect in 

object space, the relative orientation. And we call the process by which we take the final geometric 

arrangement and we make it fit into the world coordinate system by a three-dimensional conformal 

transformation the absolute transformation. 


• Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben 

Sie typische Anwendungsfälle beider Methoden! 

13.4 Stereo-Visualization 

images to stereoscopically view the natural environment is stereo-visualization by creating artificial 

images presented to the eyes and obtaining a three-dimensional impression of an artificial world. 

We visit Slide 13.19 to explain that we need to create two images for the left and the right eye 

of a geometric scene, represented in the slide by a cube and its point P . Slide 13.20 shows that 

we compute two images of each world point W , assuming that we have two cameras, side by side, 

at a stereobase b and with their optical axes being parallel. Recall that in computer graphics the 

optical axes are called view point normals, the lens center is the view point V P . Slide 13.21 is 

the ground view of the geometric arrangement. We have used previously Slide 13.22 to illustrate 

the result obtained by creation of two images of a three-dimensional scene. In this particular case 

it is a wire-frame representation for the left and right eye. If we present those two images at a 

distance of about six-and-a-half cm on a piece on a flat table, and we look vertically down and 

think we are looking at infinity (so that your eye-axes are parallel) we will be able to merge the 

two images into a three-dimensional model of that object. However, we will notice that we will 

not have a focused image, because our eyes will tend to focus at infinity, when we force our eye 

axes to be parallel. 

Computer generated stereo-images are the essence of virtual environments and augmented environments. 

Slide 13.23 illustrates how a person does look at artificial images and receives a 

three-dimensional impression, using motion detectors, that will feed the head’s position and orientation 

into the computer, so that as the head gets moved, a new image will be projected to 

the eyes, and the motion of the head will be consistent with the experience of the natural environment. 

In contrast, Slide 13.24 illustrates again augmented reality, where the monitors are 

semi-transparent and therefore the human observer does not only see the artificial virtual impression 

of computed images, but has superimposed on them the natural environment which is visible 

binocularly: augmented reality uses both, the binocular and stereo-vision. 

13.5 Non-Optical Stereo 

Eyes are very forgiving, and the images we observe stereoscopically need not necessarily be taken 

by a camera and therefore need not be centrally perspective. Slide 13.26 explains how NASA Space

13.6. INTERACTIVE STEREO-MEASUREMENTS 239 

Shuttle has created radar images in sequential orbits. Those images overlap with one another and 

show the same terrain. Slide 13.27 illustrates a mountain range in Arizona imaged by radar. Note 

that the two images look more different than our previous optical images did. Shadows are longer 

in one image than the other. Yet a stereo-impression can be obtained in the same way as we have 

obtaining it with optical imagery. The quality of the stereo-measurement will be lower, because 

of the added complexity that the two images are less similar in gray tones. 

The basic idea of this type of stereo is repeated in Slide 13.28. We have two antennas illuminating 

the ground and receiving echoes as in a traditional radar image, and the overlap area can be 

presented to the eyes as if they were two optical images. The basic idea is also explained in Slide 

13.29. Note that in each radar image, point P is projected into position P ′ or P ′′ and we get a 

parallactic distance d p . The corresponding camera position that will produce from a point P the 

same positions P ′ and P ′′ and this parallax distance d p would be camera positions 1 and 2 shown 

in Slide 13.29. 


• Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der 

Stereo-Bildgebung! 

13.6 Interactive Stereo-Measurements 

If we want to make measurements using the stereo-impression from two images, we need to add 

something to our visual impression: a measuring mark. Slide 13.31 explains the two stereo-images, 

and our eyes viewing the same point M in the two images, where they are presented as M 1 and 

M 2 . If we add a measuring mark as shown in e will perceive the measuring mark (M) to float 

above or below the ground. If we now move the measuring mark in the two images, such that 

they superimpose the points M 1 and M 2 , the measure mark will coincide with the object point 

M. We can now measure the elevation differences between two points by tracking the motion that 

we have to apply to the measuring mark in image space. Slide 13.32 explains the object point M, 

the measuring mark (M) and their positions in image space at M 1 , M 2 . 

Slide 13.33 is an attempt at illustrating the position of the measuring mark above the ground, on 

the ground and below the ground. In this particular case, the stereo-perception is anaglyphic. 

13.7 Automated Stereo-Measurements 

See Algorithm ??. The measuring mark for stereo-measurements needs to be placed on a pair 

of homologue points. Knowing the location of the stereo-measuring mark permits us to measure 

the coordinates of the 3D point in the world coordinate system. A systematic description of the 

terrain shape, or more generally, the shape of 3D objects, requires many surface measurements 

to be made by hand. This can be automated if the location of homologue points can be found 

without manual interference. Slide 13.35 and Slide 13.36 explain. 

Two images exist, building a stereo-pair, and a window is taken out of each image to indicate a 

homologue area. The task exists, as shown in Slide 13.37 to automatically find the corresponding 

locations in such windows. For such purpose, we define a master-and-slave image. We take a 

window of the master image and move it over the slave image and at each location we compute a 

value, describing the similarity between the two image windows. At a maximum value of similarity, 

we have found a point of correspondence. We have as a result a point 1’ in image (’) and a point 

1” in image (”). These two points define two perspective rays from a perspective center through 

the image plane into the world coordinate system, and intersect at a surface point 1. We need to


verify that the surface point 1 makes sense, we will not accept that point if it is totally inconsistent 

with its neighborhood, we will call this a gross error. We will accept the point if it is consistent 

with its neighborhood. 

Slide 13.38 explains the process of matching with the master-and-slave image windows. Note that 

the window may be of size K × J and we are looking in the master window of size N × M, 

obtaining many measures of similarity. Slide 13.39 defines individual pixels within the sub-image 

and is the basis for one particular measure of similarity shown in Slide 13.40. In it, a measure of 

similarity, called normalized correlation as defined by a value RN 2 (m, n) at location (m, n). The 

values in this formula are the gray values W in the master and S in the slave image. A double 

summation occurs, because of the two-dimensional nature of the windows of size M × N. Slide 

13.41 illustrates two additional image correlation measures. The normalized correlation produces 

a value R, which typically assumes numbers between 0 and 1. Full similarity is expressed with 

a value 1, total dissimilarity results in a value 0. A non-normalized correlation will not have a 

range between 0 and 1, but will assume much larger ranges. However, whether the correlation is 

normalized or not, one will likely find the same extremas and therefore the same matchpoints. 

A much different measure of similarity is the sum of absolute differences in gray values. We 

essentially sum up the absolute differences in gray between the master-and-slave images at a 

particular location (m, n) of the window. The computation is much faster than the computation 

of a correlation since we avoid the squaring of values, also if a measure of similarity becomes larger 

than a previous value, we can stop the double summation, since we have already found a lower 

value of absolute differences, and therefore a more likely place at which maximum similarities are 

achieved. Slide 13.42 explains how the many computations of correlation values result in a window 

of such correlation values and we need to find the extremum, the highest correlation within the 

window, as marked by a star in Slide 13.42. 

Problems occur if we have multiple extremas and we don’t know which one to choose. 

Slide 13.43 suggests that various techniques exist that accelerate the matching process. Slide 13.44 

indicates how the existence of a pyramid will allow us to do a preliminary match with reduced 

versions of the two images and then limit the size of the search windows dramatically and thereby 

increase the speed of finding successful matches. We call this a hierarchical matching approach. 

Another trick is shown in Slide 13.45 where an input image is converted into a gradient image or 

an image of interesting features. Instead of matching two gray value images, we match two edge 

images. A whole theory exists on how to optimize the search for edges in images in preparation 

for a stereo-matching approach. Slide 13.46 explains that a high-pass filter that suppresses noise 

and computes edges is preferable. Such a filter is the so-called LoG-filter or Laplacian-of-Gaussian 

transformation of an image. Where we get two lines for each edge since we are looking for zerotransitions 

1 . That subject is an extension of the topic of filtering. 


• Bestimmen Sie mit Hilfe der normalisierten Korrelation RN 2 (m, n) jenen Bildausschnitt innerhalb 

des fett umrandeten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen 

Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie 

den gefundenen Bereich in Abbildung B.25! 

Antwort: 

[ ∑M ∑ 2 

N 

RN(m, 2 j=1 k=1 W (j, k)S m,n(j, k)] 

n) = ∑ M ∑ N 

j=1 k=1 [W (j, k)]2 · ∑M ∑ N 

k=1 [S m,n(j, k)] 2 

j=1 

1 in German: Nulldurchgänge

13.7. AUTOMATED STEREO-MEASUREMENTS 241 

c WS := 

c WW := 

c SS := 

⎡ 

⎤ 

M∑ N∑ 

⎣ W (j, k)S m,n (j, k) ⎦ 

j=1 k=1 

M∑ N∑ 

[W (j, k)] 2 

j=1 k=1 

M∑ N∑ 

[S m,n (j, k)] 2 

j=1 k=1 

2 

Position c WS c WW c SS RN 2 (m, n) 

links oben 25 6 5 0.833 

rechts oben 25 6 6 0.694 

links unten 16 6 6 0.444 

rechts unten 4 6 6 0.111 

Die beste Übereinstimmung besteht links oben. 

• Nach welchem Grundprinzip arbeiten Verfahren, die aus einem Stereobildpaar die Oberfläche 

eines in beiden Bildern sichtbaren Körpers rekonstruieren können?

242 CHAPTER 13. STEREOPSIS

13.7. AUTOMATED STEREO-MEASUREMENTS 243 














Chapter 14 

Classification 

14.1 Introduction 

Concepts of classification cannot just be used in image analysis and computer vision but also in 

many other fields where one has to make decisions. 

First, we want to define the problem, then see some examples. We then review an heuristic approach 

called minimum distance classifier. We finally go through the Bayes Theorem as the basis 

of statistical classification. We round out this chapter with a and sketch of a simple implementation 

based on the Bayes Theorem. 

Classification is a topic based to a considerable extent on the field of statistics, dealing with 

probabilities, errors, estimations. We will stay away from statistics here, but only take a short 

look. 

What is the definition of classification? 

We have object classes C i , i = 1, . . . , n, and we search a certain class C i which belongs with a set of 

observations. The question is first which observations to make and then second is the classification 

itself, namely the decisions to which class the observations belong. 

14.2 Object Properties 

Let us review object features. Objects have colors, texture, height, whatever one can imagine. If 

we classify the types of land use in Austria, as suggested in Slide 14.5, a set of terrain surface 

properties will be needed perhaps from satellite images and public records. Slide 14.6 enumerates 

the 7 properties of electromagnetic radiation one can sense remotely, say by camera, thermal 

images, radiometry, radar and interferometric sensors. As a sensor collects image data about a 

scene from a distance, up to 7 characteristics are accessible. 

However, the properties of the sensed signal may be used to “invert” it into a physical parameter 

of the object. Examples may be the object point’s moisture or roughness, possibly its geometric 

shape. Slide ?? illustrates a camera image of a small segment of skin with a growth called lesion 

that could be cancer. One can extract from the physically observed color image some geometric 

properties of the lesion such as length, width, roughness of the edge etc. 

Slide ?? is a fingerprint, Slide 14.9 a set of derived numbers describing the finger print. Each 

number is associated with a pixel for a feature vector per pixel, or with a larger object such as 

the lesion or finger print. The feature vector x is the input to a classification. 

245

246 CHAPTER 14. CLASSIFICATION 


• Welche physikalischen Merkmale der von einem Körper ausgesandten oder reflektierten 

Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)? 

14.3 Features, Patterns, and a Feature Space 

Algorithm 36 Feature space 

1: FeatureSpace = CreateHyperCube(n-Dimensional); {Create an n-Dimensional Hypercube} 

2: for all Pixels in Image do 

3: FeatureSpace[Pixel[Plane-1], Pixel[Plane-2], .. Pixel[Plane-n]] +=1; {Increment the 

corresponding Point in the FeatureSpace by 1} 

4: end for {This algorithm creates a Feature-Space represented by a n-Dimensional Hypercube.} 

If we have to a color classification, then our features will be “color”. In a color image we represent 

color via the red-green-blue (RGB) planes. Recall the eight bit gray value image representing the 

R channel, next the G channel and last the B channel, representing red, green, blue. 

SlideFigure x suggests color classifications, but has 4 images or channels, for instance, infrared 

(IR) in addition to RGB, or temperature or whatever we can find as an object feature. 

We now build up a feature space. In the case of RGB we would have three dimensions. Slide 14.12 

presents just tow dimensions for simplicity, for example R and G. If we add more features (B, IR, 

temperature...) we end up with hyperspaces which are hard to visualize. 

14.4 Principle of Decisions 

Slide 14.14 illustrates what we would like to get from the classifier’s decisions: each object, in this 

case pixel, is to be assigned to a class, here denoted by O 1 , O 2 , O 3 . . . 

The simplest method of classification is a so-called minimum-distance classifier. Slide 14.19 

presents a 2-dimensional feature space. Each entry into this 2D space is a vector x = (x 1 , x 2 ) T or 

(g 1 , g 2 ) T , with the observations x 1 , x 2 or g 1 , g 2 , for example representing the amount of red (R) 

or green (G) as an 8-bit digital number DN from an image. 

These observations describe in this case one pixel each and we find that the value for R may be 

50 and for G 90. This determines a unique entry in the feature space. As we make observations 

of known objects we may define a so-called learning phase, in which we find feature pairs defining 

a specific class. R = 50, G = 90 might be a type of object. 

We now calculate the mean value of a distribution which is nothing else than the expected value of 

a set of observations. The arithmetic mean in this case is obtained by summing up all the values 

and calculating the mean. We connect those means via a straight line and define a line halfway 

between the means perpendicular to the connection line. This is the boundary between the two 

classes is called the discriminating function. 

If we now make an observation of a new unknown object (pixel), we simply determine the distances 

to the various means. In Slide 14.16 the new object belongs class O 3 . This is the minimum distance 

classifier. 

What could be a problem with the minimum distance classifier? Suppose that in the learning 

phase one makes an error and for the class O 3 we make an “odd” observation. This will affect

14.4. PRINCIPLE OF DECISIONS 247 

Algorithm 37 Classification without rejection 

TYPE pattern = 

feature: ARRAY [1 .. NbOfFeatures] of Integer; 

classIdentifier: Integer; 

Classify-by-MinimumDistance (input: pattern) 

this Method sets the ”classIdentifier” of ”input” 

to the class represented by the nearest sample-Pattern 

for i:=1 to NbOfSamples do 

Distance := 0 

...initial value 

Summarizing all differences between ”input” and ”SamplePattern[i]”: 

for j:=1 to NbOfFeatures do 

Difference := input.feature[j] - SamplePattern[i].feature[j] 

Distance := Distance + |Difference| 

end for 

if i=1 then 

minDistance := Distance 

end if 


Setting the Class: 

if Distance ≤ minDistance then 

minDistance := Distance 

input.classIdentifier := SamplePattern[i].classIdentifier 

end if 


Classify-by-DiscriminationFunction (input: pattern) 

this Method sets the ”classIdentifier” of ”input” 

to the class with maximum function result 

for i:=1 to NbOfClasses do 

Sum := 0 


Summarizing all function results of the input-features: 

for j:=1 to NbOfFeatures do 

functionResult := DiscriminationFunction[i] (input.feature[j]) 

Sum := Sum + functionResult 


if i=1 then 

maxSum := Sum 

end if 

Setting the Class: 

if Sum ≥ maxSum then 

maxSum := Sum 

input.classIdentifier := i 

end if 



...representing the actual function set


the expected value for the entire data set. One problem is then that we have not considered 

the “uncertainty” of the observation in defining the various classes. This “uncertainty” would 

be represented by the “variance” of our observations. If the observations are clustered together 

closely, then their variance is small. If they are spread out widely, then their variance is larger. 

Variance is not considered in a minimum classifier. Figure x illustrates that each pixel gets 

classified and assigned to a class. There are no rejections where the classifier is unable to make a 

decision and rejects a pixel/object/feature vector as belonging to none of the classes 


• Gegeben seien Trainingspixel mit den in der beiliegenden Tabelle ?? angegebenen Grauwerten. 

Gegeben sei auch ein neues Pixel x neu = (13, 7). 

1. Spannen Sie bitte nun grafisch einen zwei-dimensionalen Merkmalsraum auf und tragen 

Sie die Lage der Trainingspixel ein. 

2. Beschreiben Sie bitte einen einfachen Rechenvorgang (Algorithmus) zur Entscheidung, 

welcher ” 

Objektklasse“ dieses neue Pixel mit hoher Wahrscheinlichkeit angehören wird. 

3. Führen Sie die numerische Berechnung dieser Entscheidung durch und begründen Sie 

daher numerisch die Zurdnung des neuen Pixels zu einer der in den Trainingspixeln 

dargestellten Objektklassen. 

14.5 Bayes Theorem 

Algorithm 38 Classification with rejection 

1: P max := −1 {initial value} 

2: P min := 0.6 {choosen border for what to classify} 

3: while there is a pixel to classify do 

4: pick and remove pixel from the to-do-list 

5: x := f(pixel) {n-dim feature vector, represents information about pixel} 

6: for all existing classes C i do 

7: with x calculate a posteriori probability P (C i |x) for pixel 

8: if P (C i |x) > P max then 

9: P max := P (C i |x) 

10: k := i {store the actual most probable class k for pixel} 

11: end if 


13: if P max > P min then 

14: add pixel to corresponding class k {classification} 

15: else 

16: leave pixel unclassified {rejection} 

17: end if 

18: end while 

Bayes Theorem looks complicated, but is not. We define a probability that an observation x 

belongs to a class C i . We call it an a-posteriori probability because it is a probability of a result, 

after the classification. This resulting probability is computed from 3 other probabilities. The 

first is the result of the learning phase which is the probability that given a class C i , we make the 

observation x. Second, we have the a-priori knowledge of the expert providing a probability that 

a class C i may occur. The third probability is the so-called joint probability of observation x and 

class C i .

14.5. BAYES THEOREM 249 

This formula will not help us with the implementation of software codes. But the expression in 

Slide 14.18 serves to explain relationships. A sketch of a possible implementation follows. First 

we make a very common assumption in Slide 14.18. This assumption is called the closed world 

assumption over all the classes stating that there is no unknown class and that an observation will 

belong to one of the n classes. This expresses itself in statistics by means of a sum of all posteriori 

probabilities being 1. For example colors: there is no pixel in the image where we do not know 

a color. Bayes Theorem simplifies under this assumption since the joint probability is a constant 

factor 1/a. 

The problem with all classifiers is the need to model expert knowledge, then to learn one’s system. 

The hard thing is to find a correct computation model. One simple implementation would thus 

be that we just calculate the variances of our observations in the learning phase. We compute 

not only the means, as we did before, but also the variance or standard deviation. We need to 

learn our pixels, our colors, our triplets in color, we need to assign certain triplets to certain colors 

and this will give us our means and our variances as in Slide 14.23. Note that the slide shows 2, 

not 3 dimensions. The mean value and the variance define a Gauss an function representing the 

so-called distribution of the observations. 

In Slide 14.23 the ellipse may for instance define the 1-sigma border: “Sigma” or σ is the standard 

deviation, σ 2 is the variance. The probability is represented by a curve or surface in 2D that is 

called a “Gaussian curve” or surface. 

This means that the probability that an observation within the ellipse of O 3 is 66%. If the ellipse 

is drawn at 3σ (3 times the standard deviation), then the probability goes to 99%. 

By calculating the variance and the sigma border for each class C i or O i we produce n Gaussian 

functions. In Slide ?? we have two dimensions, red and green. We make an observation which we 

want to classify. We do not calculate the minimum distance, but we check in which ellipse the 

vector of a new observation will come to lie. 

To summarize, we have performed two steps: we calculate the mean and variance of each class in 

the learning phase and then “intersect” the unknown observation with the result of the learning 

phase. A simple Bayes classifier requires no more than to determine the Gaussian function 

discussed above. 

The Gaussian function in a single dimension for classing is 

d j (x) = √ 1 exp 

[− (x − m j) 2 ] 

, 

2σj 2σ j 

with x being the feature vector, σ j the standard deviation, m j the mean and j is the index 

associated with a specific class. In more than one dimension, m and x get replaced by vectors 

and σ becomes a matrix. 

This algorithm is summarized in Slide 14.23: m is the mean of each class, C is the variance. In 

a multi-dimensional context, C is a matrix of numbers, the so-called co-variance matrix. It is 

computed using the coordinates of the mean m. The expression E{·} in Slide 14.23 is denoted as 

expected value and can be estimated by 

or equivalently 

c ij = 1 N 

C = 1 N 

N∑ 

xx T − mm T 

k=1 

N∑ 

(x k,i − m i )(x k,j − m j ), i, j = 1 . . . M, 

k=1 

where M is the dimension of feature space and N is the number of feature vectors or pixels per 

class for the learning phase.


As shown in Slide 14.23, each class of objects gets defined by an ellipse. 


• In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion 

der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine Gausssche Normalverteilung 

zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren 

benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p 1 , p 2 und p 3 in zwei Dimensionen 

(also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C! 

Antwort: 

Zuerst den Mittelwert m berechnen: 

m = 1 [( 1 

3 −1 

) ( 3 

+ 

3 

) ( 2 

+ 

4 

)] ( ) 2 

= 

2 

dann die (p i − m) · (p i − m) T bestimmen: 

( ) 

(p 1 − m) · (p 1 − m) T −1 

= · ( −1 −3 ) ( 1 3 

= 

−3 

3 9 

( ) 

(p 2 − m) · (p 2 − m) T 1 

= · ( 1 1 ) ( ) 1 1 

= 

1 

1 1 

( ) 

(p 3 − m) · (p 3 − m) T 0 

= · ( 0 2 ) ( ) 0 0 

= 

2 

0 4 

) 

Die Kovarianzmatrix ist 

C = 1 3 

3∑ 

(p i − m) · (p i − m) T = 1 3 

i=1 

( 2 4 

4 14 

• Es sei p(x), x ∈ R 2 die Wahrscheinlichkeitsdichtefunktion gemäß Gaussscher Normalverteilung, 

deren Parameter aufgrund der drei Merkmalsvektoren p 1 , p 2 und p 3 aus Aufgabe 

B.2 geschätzt wurden. Weiters seien zwei Punkte x 1 = (0, 3) T und x 2 = (3, 6) T im Merkmalsraum 

gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre 

Antwort): 

1. p(x 1 ) < p(x 2 ) 

2. p(x 1 ) > p(x 2 ) 

Hinweis: Zeichnen Sie die beiden Punkte x 1 und x 2 in Abbildung B.28 ein und überlegen Sie 

sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C aus Aufgabe B.2 weisen. 

Antwort: Es ist p(x 1 ) < p(x 2 ), da x 2 in Richtung“ des größten Eigenvektors von C liegt 

” 

(gemessen vom Klassenzentrum m) und daher die Wahrscheinlichkeit von x 2 größer ist als 

die von x 1 . 

) 

14.6 Supervised Classification 

The approach where training/learning data exist is called supervised classification. Unsupervised 

is a method where pixels (or objects) get entered into the feature space not knowing what they 

are. In that case a search gets started to detect clusters in the data. The search comes up with 

aggregations of pixels/objects and simply defines that each aggregate is a class.

14.7. REAL LIFE EXAMPLE 251 

In contrast to this approach common classification starts out from known training pixels or objects. 

A real life case is shown in Slide 14.22. A clustering algorithm may find here 3 clusters. In fact, 

Slide ?? is the actual segmentation of these training pixels into 6 object classes (compare with 

Slide 14.23). 

The computation in the learning or training phase which leads to Slide ??, is the basis to receive 

new pixels. If they fall within the agreed-upon range of a class, the pixel is assigned to that class.. 

Otherwise it is not assigned to any class: it gets rejected. 

14.7 Real Life Example 

Slide 14.26 to Slide 14.31 illustrate a classification of the territory of Austria on behalf of a cellphone 

project where surface cover was needed for wave propagation and signal strength assessment. 

It is suggested that the classification was unsupervised, thus without training pixels and simply 

looking for groups of similar pixels (clusters). A rather “noisy” result is obtained in Slide 14.28, 

Slide 14.29 presents the forest pixels where many pixels get assigned to different classes, although 

they are adjacent to one another. This is the result of not considering “neighborhoods”. One can 

fix this by means of a filter that will aggregate adjacent pixels into one class if this does not totally 

contradict the feature space. The city of Vienna’s surface cover and landuse result is shown in 

Slide 14.31. 

14.8 Outlook 

In the specialization class on “Image Processing and Pattern Recognition” we will discuss more 

details of this important and central topic of: 

• Multi-variable probabilities 

• Neural network classification 

• Dependencies between features 

• Non statistical classification (shape, chain codes) 

• Transition to Artificial Intelligence AI

252 CHAPTER 14. CLASSIFICATION

14.8. OUTLOOK 253 










Chapter 15 

Resampling 

We have previously discussed the idea of resampling under the heading of Transformation (Chapter 

9). It was a side-topic in that chapter, essentially an application. We will focus on the topic here, 

using many of the illustrations from previous chapters. 


• Was versteht man unter (geometrischem) ” 

Resampling“, und welche Möglichkeiten gibt es, 

die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren 

anhand einer Skizze und ggf. eines Formelausdrucks! 

15.1 The Problem in Examples of Resampling 

Slide 15.3 recalls an input image that is distorted and illustrates in connection with Slide 15.4 

the rectification of the image, a geometric transformation from the input geometry to an output 

geometry. The basic idea is illustrated in Slide 15.5. On the left, we have an input image geometry, 

representing an distorted image. On the right, we have the output geometry, representing a 

corrected or rectified image. The suggestion is here that we take a grid mesh of lines to cut up the 

input image and we stretch each quadrilateral on the input image to fit into a perfect square on the 

output image. This casual illustration of geometric transformation actually presents reasonably 

fairly what happens in geometric transformation and resampling in digital image processing. 

Resampling is also applicable in a context where we have individual images taken at different times 

from different vantage points and we need to merge them into a continuous large image. We call 

this process mosaicing. The images might overlap, and the overlap is used to achieve a match 

between the images, finding homologue points. Those are the basis for a geometric transformation 

and resampling process, to achieve the mosaic. 

Finally, resampling is also an issue in computer graphics when dealing with texture. We may 

have an input image, showing a particular pattern, and as we geometrically transform or change 

the scale of that pattern, we will have to resample the texture. The illustration shows so-called 

MIP-maps which are small image segments which are rich in detail. 

15.2 A Two-Step Process 

Geometric transformation and resampling really are typically performed in a two-step process. 

The first step is the establishment of a geometric relationship between the input and the output 

255

256 CHAPTER 15. RESAMPLING 

images, essentially a coordinate processing issue. We typically have a regular pattern of pixels 

in the input image, and conceptually we need to find a geometric location in the output image, 

representing the center of each pixel from the input image. Vice-versa, we may have a regular 

image matrix on the output side (the ground), and for each center of an output pixel, we need to 

find the location in the input image, from where to pick a gray value. Slide 15.9 explains. Slide 

15.10 and augment that explanation. We do have an input image that is geometrically distorted. 

The object might be a stick figure as suggested in Slide 15.10. The output or target image is a 

transformed stick figure. We have regular pixels in the target or output image, that need to be 

assigned gray values as a function of the input image. Slide 15.12 explains the idea of the two-step 

process: We have on the one hand a step 1 with a manipulation of coordinates, mapping the input 

(x, y) into output (ˆx, ŷ) coordinates. We have on the other hand a step 2 with a search for a gray 

value for each output pixel, starting form the output location of a pixel and looking in the input 

image for that gray value. 

15.2.1 Manipulation of Coordinates 

We have correspondence points between image-space and target or output space. These correspondence 

points serve to establish a geometric transformation that converts the input (x, y) 

coordinates of an arbitrary image location into an output as (i, j) coordinate in the target space. 

This particular transformation has its unknown transformation parameters which have to be computed 

in a separate process called spatial transformation. We will discuss in a moment how this 

is done efficiently. 

15.2.2 Gray Value Processing 

Once this spatial transformation is known, we need to go through the output image and for each 

pixel center (i, j) we need to find an input coordinate location (x, y) and we need to grab that 

gray value and place that value at the pixel location of the output or target image. 

Algorithm 39 Calculation with a node file 

1: while there is another quadrangle quadin in the input node file do 

2: if there is a corresponding quadrangle quadout in the output node file then 

3: read the four mesh points of the quadrangle quadin 

4: read the four mesh points of the quadrangle quadout 

5: calculate the (eight) parameters params of the (bilinear) transformation 

6: save the parameters params 

7: else 

8: error {no corresponding quadrangle quadout for quadrangle quadin} 

9: end if 

10: end while 

11: for all pixels pout of the output image do 

12: get the quadrangle quadout in which pixel pout lies 

13: get the parameters params corresponding to the quadrangle quadout 

14: calculate the input image position pin of pout with the parameters params 

15: calculate the grey value grey of pixel pout according to the position of pin 

16: assign the grey value grey to pout 


15.3. GEOMETRIC PROCESSING STEP 257 

15.3 Geometric Processing Step 

See Algorithm 39. We go back to the idea that we cut up the input image into irregular meshes, and 

each corner of the mesh pattern represents a corner of a regular mesh pattern in the output image. 

We call these mesh points also nodes, we obtain a node file in the input image that corresponds 

to the node file in the output image. Slide 15.15 suggests that the geometric transformation that 

will relate the irregular meshes of the input image to the rectangular meshes of the output image 

could be a polynomial transformation as previously discussed. More generally, we use a simple 

transformation that takes four input points into the four output points as suggested in Slide 15.16. 

That is a bi-linear transformation with 8 coefficients. The relationships between the mesh points 

of the input and output image are obtained as a function of control points 1 . 

Suggested in Slide 15.16 and Slide 15.17 are control points at the locations marked by little stars. It 

is those stars that define the parameters of a complex transformation function. The transformation 

function is applied to the individual mesh points in the input and output images. For each location 

in the output image, we compute the corresponding input mesh point. Slide 15.18 summarizes the 

result of these transactions. Recall that we had given control points, which we use to compute the 

transformation function. With the transformation function we establish image coordinates that 

belong to mesh points in the image representing regularly spaced mesh points in the output image. 

With this process, we have established the geometric relationship between input and output image 

using the ideas of transformations and resulting with a node file in the input- and output images. 

Algorithm 40 Nearest neighbor 

1: read float-coordinates x 0 and y 0 of the input-point 

2: x 1 := round(x 0 ) {result is an integer} 

3: y 1 := round(y 0 ) {result is an integer} 

4: return grayvalue of the new point (x 1 , y 1 ) 

15.4 Radiometric Computation Step 

After the geometric relationships have been resolved, we now go to an arbitrary output pixel and 

using its position within a square mesh, we compute the location in the input image, using the 

bi-linear relationship within the mesh to find the location in the input image as suggested in Slide 

15.20. That location will be an arbitrary point (x, y) that is not at the center of any pixel. 

We now can select among various techniques to find a gray value for that location to be put into 

the output pixel. Suggested in Slide 15.20 are 3 different techniques. If we take the gray value of 

the pixel onto which location (x, y) falls, we call this the nearest neighbor(see Algorithm 40). If we 

take four pixels that are nearest to the location (x, y), we can compute a bi-linear interpolation (see 

Algorithm ??). If we use the 9 closest pixels, we can use a bi-cubic interpolation. We differentiate 

between nearest neighbor, bi-linear and bi-cubic resampling in accordance with the technique 

for gray value assignment. Slide 15.21 specifically illustrates the bi-linear interpolation: which 

gray value do we assign to the output pixel as shown in Slide 15.21? We take the 4 gray values 

nearest the location (x, y), those gray values are g 1 , g 2 , g 3 , g 4 , and by a simple interpolation, using 

auxiliary values a and b, we obtain a gray value bi-linearly interpolated from the four gray values 

g 1 , g 2 , g 3 , g 4 . 


1 in German: Pass-Punkte


• Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das 

Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Transformation des 

Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Transformation 

einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1 

angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit 

grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung 

erfolgt. 

15.5 Special Case: Rotating an Image by Pixel Shifts 

We show in Slide 15.23 an aerial oblique image of an urban scene. We want to rotate that image 

by 45 o . We achieve this by simply shifting rows and columns of pixels (see Algorithm ??).. In 

a first step, we shift each column of the image, going from right to left and increasingly shifting 

the rows down. In a second step, we now take the rows of the resulting image and shift them 

horizontally. As a result, we obtain a rotated version of the original image.

15.5. SPECIAL CASE: ROTATING AN IMAGE BY PIXEL SHIFTS 259









Chapter 16 

About Simulation in Virtual and 

Augmented Reality 

16.1 Various Realisms 

Recall that we have earlier defined various types of reality. We talked about virtual reality, that 

presents objects to the viewer that are modeled in a computer. Different from that is photographic 

reality that we experience by an actual photograph of the natural environment. It differs from the 

experience we have when we go in the real world and experience physical reality. You may recall 

that we also talked about emotions and therefore talked about psychological reality, different from 

the physical one. Simulation is now an attempt at creating a virtual environment that provides 

essential aspects of the physical or psychological reality in a human being without the presence of 

the full physical reality. 

16.2 Why simulation? 

To save money when training pilots, bus drivers, ship captains, soldiers, etc. 

Simulation servers may be used for disaster preparedness training. Simulation is big business. 

How realistic does a simulation have to be? Sufficiently realistic to serve the training purpose. 

Therefore not under all circumstances do we need photorealism in simulation. We just need to 

have enough visual support to challenge the human in a training situation. 

16.3 Geometry, Texture, Illumination 

Simulation needs information about the geometry of a situation, the illumination and the surface 

properties. These are three factors, illustrated in Slide 16.8, Slide 16.9, Slide 16.10. The geometry 

will not suffice if we need to recognize a particular scene. We will have difficulties with depth 

queues as a function of size. We have a much reduced quality of data if we ignore texture. Texture 

provides a greatly enhanced sense of realism and helps us better to estimate depth. In a disasterpreparedness 

scenario, the knowledge of windows and doors may be crucial and it may only be 

available through texture and not through geometry. 

Illumination is a third factor that creates shadows and light, again to help better understand the 

context of a scene, estimate distances and intervisibility. 

261

262 CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY 

16.4 Augmented Reality 

We combine the real world and the computer generated representation of a modeled world that 

does not need to be in existence in reality. A challenge is the calibration of a system. We need 

to see the real world and what is superimposed on it is shown on the two monitors. This needs 

to match geometrically and in scale with the real environment that we see. Therefore we need to 

define a world coordinate system and communicate that to the computer. 

We also need sufficient speed, so if we turn our head, the two stereo-images computed for visual 

consumption are recomputed instantly as a function of the changed angle. We need also to be 

accurate to assess any rotations or change of position. 

Magnetic positioning often is too slow and too inaccurate to serve the purpose well. For that 

reason, an optical auxiliary system may be included in an augmented reality environment, so that 

the world is observed through the camera and any change in attitude or position of the viewer 

is more accurately tracked than the magnetic position could achieve. However, a camera-based 

optical tracking system may be slow, too slow to act in real time at a rate of about thirty positioning 

computations per second. Therefore the magnetic positioning may provide an approximate 

solution that is only refined by the optical tracking. 

Slide 16.13 illustrates an application with a game played by two people seeing the same chess 

board. An outside observer seeing the two players will see nothing. It is the two chess players who 

will see one another and the game board. 


• Beschreiben Sie den Unterschied zwischen ” 

Virtual Reality“ und ” 

Augmented Reality“. 

Welche Hardware wird in beiden Fällen benötigt? 

16.5 Virtual Environments 

If we exclude the real world from being experienced, then we talk about the virtual environment 

or, more customarily, virtual reality. We immerse ourselves in the world of data. However, we 

still have our own position and direction of viewing. As we move or turn our head we would like 

to have in a virtual environment a resulting effect of looking at a new situation. Therefore, much 

as in augmented reality, do we have a need to recompute very rapidly the stereo-impression of the 

data world. However, virtual reality is simpler than augmented reality, because we don’t have the 

accuracy requirement to superimpose the virtual over the real, as we have in augmented reality. 

In a virtual reality environment, we would like to interact with the computer using our hands 

and as a result we need some data garments that allow us to provide inputs to the compute, for 

example by motions of our hands and fingers. 



Trackingverfahren und erläutern Sie deren Vor- und Nachteile! 

Antwort: 

Tracking Vorteile Nachteile 

magnetisch robust kurze Reichweite 

schnell 

ungenau 

optisch genau Anforderung an Umgebung 

aufwändig

16.5. VIRTUAL ENVIRONMENTS 263 





264 CHAPTER 16. ABOUT SIMULATION IN VIRTUAL AND AUGMENTED REALITY

Chapter 17 

Motion 

17.1 Image Sequence Analysis 

A fixed sensor may observe a moving object, as suggested in Slide 17.3, where a series of images 

is taken of moving ice in the arctic ocean. There is not only a motion of the ice, there is also 

a change of the ice over time. Slide 17.4 presents a product obtained from an image sequence 

analysis, representing a vector diagram of ice flows in the arctic ocean. The source of the results 

was a satellite radar system of NASA, called Seasat that flew in 1978. This is now available also 

from recent systems such as Canada’s Radarsat, currently orbiting the globe. 

17.2 Motion Blur 

Slide 17.6 illustrates a blurred image that is a result of an exposure taken while an object moved. 

If the motion is known, then its effect can be removed and we can restore an image as if no motion 

had happened. 

The inverse occurs in Slide 17.7, where the object was stable but the camera moved during the 

exposure. The same applies: if we can model the motion of the camera we will obtain a successful 

reconstruction of the object by removal of the motion blur of the camera Slide 17.7 suggests that 

simple filtering will not remove that blur. We need to model the effect of the motion. Yet the 

process itself is called an Anti-Blur filter. 



motion blur“, und unter welcher Voraussetzung kann dieser Effekt 

aus einem Bild wieder entfernt werden? 

Antwort: Durch Bewegung des aufgenommenen Objekts relativ zur Kamera während der 

endlichen Öffnungszeit der Blende wird das Bild verwischt“. Eine Entfernung dieses Effekts 

” 

setzt voraus, dass diese Bewegung genau bekannt ist. 

17.3 Detecting Change 

Change may occur because of motion. Slide 17.9 explains the situation in which a group of people 

is imaged while a person is moving out of the field-of-view of the camera. An algorithm can be 

constructed that will detect the change between each image and its predecessors and in the process 

265

266 CHAPTER 17. MOTION 

allows one to map just changes. The inverse idea is to find what is constant and eliminate changes 

form a sequence of images. An example is to compute texture of a building’s facade covered by 

trees. 

17.4 Optical Flow 

A rapid sequence of images may be obtained of a changing situation. An example is the observation 

of traffic. Optical flow is the analysis of the sequence of images, and the assessment of the motion 

that is evident from the image stream. A typical representation of optical flow is by vectors 

representing moving objects. Slide 17.12 explains.

17.4. OPTICAL FLOW 267 





Slide 17.17

268 CHAPTER 17. MOTION

Chapter 18 

Man-Machine-Interfacing 

Our University offers a separate class on Man-Machine Interaction or Human-Computer-Interfaces 

(HCI) as part of the multi-media program and as part of the computer graphics program. This 

topic relates to elements of Computer-Graphics and Image Analysis since visual information in 

created and manipulated. 

18.1 Visualization of Abstract Information 

Use of color and shape are a widely applicable tool in converging information. We have seen 

examples in the Chapter on Color, encoding terrain elevation or temperature in color, or marking 

contours of objects in color. 

A very central element in the man-machine interaction is the use of the human visual sense to 

present non-visual information for communication and interaction. An example is shown in Slide 

18.3 where a diagram is presented that has on one axis the calendar time and on the other axis 

a measure of popularity of movies. The interface serves to find movies on a computer monitor 

by popularity and by age. Simultaneously, we can switch between various types of movies, like 

drama, mystery, comedy and so forth. 

Slide 18.4 is a so-called table-lens. This is a particular type of Excel sheet which shows the entire 

complexity of the sheet in the background and provides a magnifying class that can be moved over 

the spread sheet. 

Another idea is shown in Slide 18.5 with the so-called cone-tree, representing a file structure. It 

is a tree, which at its root has an entire directory, this is broken up into folders or subdirectories 

which are then further broken up until each leaf is reached representing an individual file. A 

similar idea is shown in Slide 18.6 called information slices. We have a very large inventory of 

files, organized in subdirectories and directories. We can take subgroups of these subdirectories 

and magnify them, until we can recognize each individual file. 

18.2 Immersive Man-Machine Interactions 

The subject of man-machine interaction also is involved in an immersion of the human in the world 

of data, as we previously discussed in virtual reality which is sometimes denoted as immersive 

visualization. Of particular interest is the input to the computer by means other than a keyboard 

and mouse. This of course is increasingly by speech, but also by motions of the hands and 

fingers, or by the recognition of facial expressions. This represents a hot subject in man-machine 

interaction and ties in with computer graphics and image analysis. 

269

270 CHAPTER 18. MAN-MACHINE-INTERFACING 




Chapter 19 

Pipelines 

19.1 The Concept of an Image Analysis System 

Various ideas exist in the literature about a system for image analysis. The idea of a pipeline 

comes about if we consider that we have many components and algorithms in a repository of an 

image possessing library. In order to set up an entire image analysis process, we plug the individual 

processing steps together, much like a plumber will put a plumbing system in a building together 

from standard components. In computer graphics and image processing we call this plumbing also 

creation of a pipeline. 

As shown in Slide 19.3 an image analysis system always begins with image acquision and sensing. 

We build up a system by going through preprocessing and segmentation to representation, 

recognition and final use of the results of the image analysis system. All of this is built around 

knowledge. 

A somewhat different view combines the role of image analysis with the role of computer graphics 

and separates the role into half worlds, one of reality and one of computer models. In the simplest 

case, we have the world, within it a scene from which we obtain an image which goes into the 

computer. The image will be replaced by an image description which then leads to a scene 

description, which ultimately ends up with a description of the world. 

We can close the loop from the description of the world, go back to the world, make the transition 

from computer to reality by computer graphics. 

The idea of active vision is going from the world to a description of the world, closing the loop 

from an incomplete description of the world to a new second loop through the selection of a scene, 

selection of images and so forth as shown in Slide 19.12. If as in analogy to the previous model, we 

assign a central control element with expert knowledge, we have a similar idea as shown before. 


• Skizzieren Sie die ” 

Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen 

Szene mittels z-buffering und Gouraud-shading! 

19.2 Systems of Image Generation 


• Was wird in der Bildanalyse mit dem Begriff ” 

Active Vision“ bezeichnet? 

271

272 CHAPTER 19. PIPELINES 

19.3 Revisiting Image Analysis versus Computer Graphics 

Slide 19.18 suggests that the transition from an image to a model of a scene is the subject of 

image understanding or image processing. The inverse, the transition from a scene model to an 

image is the subject of computer graphics. We do have a great overlap between image analysis 

and computer graphics when it concerns the real world. Image analysis will always address the 

real world, whereas computer graphics may deal with a virtual world that does not exist in reality. 

In cases where one goes from a model of a non-existing world to an image, we are not dealing with 

the inverse of image analysis. 


• Welche ist die wesentliche Abgrenzung zwischen Computergrafik und Bildanalyse, welches 

ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung 

erwünscht. 

Algorithm 41 z-buffer pipeline 

1: for y = 0 to YMAX do 

2: for x = 0 to XMAX do 

3: WritePixel(x, y, backgroundcolor) 

4: Z[x, y] := 0 



7: for all Polygons polygon do 

8: for all pixel in the projection of the polygon do 

9: pz :=GetZValue(polygon,x,y) 

10: if pz ≥ Z[x, y] then 

11: Z[x, y] := pz {new point is in front} 

12: WritePixel(x, y, Color of polygon at (x,y) ) 

13: end if 



Algorithm 42 Phong pipeline 

1: set value ai {ai is the ambient intensity.} 

2: set value il {il is the intensity of the light source.} 

3: diff:=diffuse() {calculates the amount of light which directly fall in.} 

4: reflect:=reflection() {calculates the amount of light which reflect.} 

5: result:= ai + il * (diff + reflect) {formula developed by Phong.}

19.3. REVISITING IMAGE ANALYSIS VERSUS COMPUTER GRAPHICS 273 






274 CHAPTER 19. PIPELINES

Chapter 20 

Image Representation 

The main goal of this chapter is to briefly describe some of the most common graphic file formats 

for image files, as well as how to determine which file format to use for certain applications. 

When an image is saved to a specific file format, one tells the application how to write the image’s 

information to disk. The specific file format which is chosen depends on the graphics software 

application one is using (e.g., Illustrator, Freehand, Photoshop) and how and where the image will 

be used (e.g., the Web or a print publication). 

There are three different categories of file formats: bitmap, vector and metafiles. When an image 

is stored as a bitmap file, its information is stored as a pattern of pixels, or tiny, colored or black 

and white dots. When an image is stored as a vector file, its information is stored as mathematical 

data. The metafile format can store an image’s information as pixels (i.e. bitmap), mathematical 

data (i.e., vector), or both. 

20.1 Definition of Terms 

20.1.1 Transparency 

Transparency is the degree of visibility of a pixel against a fixed background. A totally transparent 

pixel is invisible. Normal images are opaque, in the sense that no provision is made to allow the 

manipulation and display of multiple overlaid images. To allow image overlay, some mechanism 

must exist for the specification of transparency on a per-image, per-strip, per-tile, or per-pixel 

bases. In practice, transparency is usually controlled through the addition of information to each 

element of the pixel data. 

The simplest way to allow image overlay is the addition of an overlay bit to each pixel value. 

Setting the overlay bit in an area of an image allows the rendering application or output device 

to selectively ignore those pixel values with the bit sample. 

Another simple way is to reserve one unique color as transparency color, e.g. the background color 

of a homogenous background. As all images are usually rectangular - regardless of the contours of 

whatever have been drawn within the image - this property of background transparency is useful 

for concealing image-backgrounds and making it appear that they are non rectangular. This 

feature is widely used e.g., for logos on Web pages. 

A more elaborate mechanism for specifying image overlays allows variations in transparency between 

bottom and overlaid images. Instead of having a single bit of overlay information, each pixel 

value has more (usually eight bits). The eight transparency bits are sometimes called the alpha 

channel. The degree of pixel transparency for an 8-bit alpha channel ranges from 0 (the pixel is 

completely invisible or transparent) to 255 (the pixel is completely visible or opaque). 

275

276 CHAPTER 20. IMAGE REPRESENTATION 

20.1.2 Compression 

This is a new concept not previously discussed in this class, except in the context of encoding 

contours of objects. The amount of image data produced from all kinds of sensor, like digital 

cameras, remote sensing satellites medical imaging devices, video cameras, increases steadily with 

increasing number of sensors, resolution and color capabilities. Especially for transmission and 

storage of this large amount of image data compression is a big issue. 

We separate data compression into two classes, lossless and lossy compression. Lossless compression 

preserves all information present in the original data, the information is only stored in an 

optimized way. Examples for lossless compression are run-length-encoding, where subsequent pixels 

of the same color are replaced by one color information and the number of following identical 

pixels, Huffman coding uses codewords of different size instead of the usual strictly 8 or 24 bits, 

shorter codewords are assignd to symbols which occur more often, this usually reduces the total 

number of bits used to code an image. Compression rates between 2:1 and maximum 5:1 can be 

achieved using lossless compression. 

Lossy compression on the other hand removes invisible or only slightly visible information from 

the image, e.g. only a reduced set of colors is used or high spatial frequencies in the image are 

removed. The amount of compression which can be achieved by lossy compression is superior to 

lossless compression schemes, at compression rates of 10:1 with no visible difference is feasible, the 

quality for photographs is usually sufficient after a 20:1 compression. However, the information 

content is changed by such an operation, therefore lossy compressed images are not suitable for 

further image processing stages. We will see exampels of JPEG compressed images further on in 

this lecture. 

Algorithms 43 and ?? illustrate the principles. 

Algorithm 43 Pipeline for lossless compression 

load image; 

// find redundancy and eliminate redundancy 

for i = 0 to number of image columns do 

for j = 0 to number of image rows do 

// find out how often each pixel value appears 

// (needed for the variable-length coding) 

for pixel value = 0 to 2 b do 

histogram[pixel value]++; 


huffman (histogram, image); 

// instead of Huffman other procedures can be used that 

// produce variable-length code but Huffman leads to 

// best compression results 



save image; 

20.1.3 Progressive Coding 

Progressive image transmision is based on the fact that transmitting all image data may not be 

necessary under some circumstances. Imagine a situation in which an operator is searching an 

image database looking for a particular image. If the transmission is based on a raster scanning 

order, all the data must be transmitted to view the whole image, but often it is not necessary to 

have the highest possible image quality to find the image for which the operator is looking. Images

20.1. DEFINITION OF TERMS 277 

Algorithm 44 Pipeline for lossy compression 

load image; 

// find irrelevancy like high frequencies and 

// eliminate them 

split image in nxn subimages; 

// a common value for n is 8 or 16 

transform in frequency domain; 

cut off high frequencies; 

// find redundancy and eliminate redundancy 

for i = 0 to number of image columns do 

for j = 0 to number of image rows do 

// find out how often each pixel value appears 

// (needed for the variable-length coding) 

for pixel value = 0 to 2 b do 

histogram[pixel value]++; 


huffman (histogram, image); 

// instead of Huffman other procedures can be used that 

// produce variable-length code but Huffman leads to 

// best compression results 



save image; 

do not have to be displayed with the hightest available resolution, and lower resolution may be 

sufficient to reject an image and to begin displaying another one. This approach is also commonly 

used to decrease the waiting time needed for the image to start appearing after transmission and 

is used by WWW image transmission. 

In progressive transmissions, the images are represented in a pyramid structure, the higher pyramid 

levels (lower resolution) being transmitted first. The number of pixels representing a lowerresolution 

image is substantially smaller and thus the user can decide from lower resolution images 

whether further image refinement is needed. 

20.1.4 Animation 

A sequence of two or more images displayed in a rapid sequence so as to provide the illusion of 

continuous motion. Animations are typically played back at a rate of 12 to 15 frames per second. 

20.1.5 Digital Watermarking 

A digital watermark is a digital signal or pattern inserted into a digital image. Since this signal 

or pattern is present in each unaltered copy of the original image, the digital watermark may 

also serve as a digital signature for the copies. A given watermark may be unique to each copy 

(e.g., to identify the intended recipient), or be common to multiple copies (e.g., to identify the 

document source). In either case, the watermarking of the document involves the transformation 

of the original into another form. 

Unlike encryption, digital watermarking leaves the original image or (or file) basically intact and 

recognizable. In addition, digital watermarks, as signatures, may not be validated without special 

software. Further, decrypted documents are free of any residual effects of encryption, whereas


digital watermarks are designed to be persistent in viewing, printing, or subsequent re-transmission 

or dissemination. 

Two types of digital watermarks may be distinguished, depending upon whether the watermark 

appears visible or invisible to the casual viewer. Visible watermarks Slide ?? are used in much 

the same way as their bond paper ancestors. One might view digitally watermarked documents 

and images as digitally ”stamped”. 

Invisible watermarks Slide ??, on the other hand, are potentially useful as a means of identifying 

the source, author, creator, owner, distributor or authorized consumer of a document or image. 

For this purpose, the objective is to permanently and unalterably mark the image so that the credit 

or assignment is beyond dispute. In the event of illicit usage, the watermark would facilitate the 

claim of ownership, or the receipt of copyright revenues. 

20.2 Common Image File Formats 

Following are descriptions of some commonly used file formats: 

20.2.1 BMP: Microsoft Windows Bitmap 

The bitmap file format is used for bitmap graphics on the Windows platform only. Unlike other 

file formats, which store image data from top to bottom and pixels in red/green/blue order, the 

BMP format stores image data from bottom to top and pixels in blue/green/red order. This 

means that if memory is tight, BMP graphics will sometimes appear drawn from bottom to top. 

Compression of BMP files is not supported, so they are usually very large. 

20.2.2 GIF: Graphics Interchange Format 

The Graphics Interchange Format was originally developed by CompuServe in 1987. It is one of 

the most popular file formats for Web graphics for exchanging graphics files between computers. 

It is most commonly used for bitmap images composed of line drawings or blocks of a few distinct 

colors. The GIF format supports 8 bits of color information or less. Therefore it is not suitiable 

for photographs. In addition, the GIF89a file format supports transparency, allowing you to 

make a color in your image transparent. (Please note: CompuServe Gif(87) does not support 

transparency). This feature makes GIF a particularly popular format for Web images. 

When to use GIF Use the GIF file format for images with only a few distinct colors, such 

as illustrations, cartoons, and images with blocks of color, such as icons, buttons, and horizontal 

rules. 

GIF, like JPEG, is a “lossy” file format! It reduces an image’s file size by removing bits of 

color information during the conversion process. The GIF format supports 256 colors or less. 

When creating images for the Web, be aware that only 216 colors are shared between Macintosh 

and Windows monitors. These colors, called the “Web palette,” should be used when creating 

GIFs for the Web because colors that are not in this palette display differently on Macintosh and 

Windows monitors. The restriction to only 256 colors is the reason why GIF is not siutable for 

color photographs.

20.2. COMMON IMAGE FILE FORMATS 279 

20.2.3 PICT: Picture File Format 

The Picture file format is for use primarily on the Macintosh platform; it is the default format 

for Macintosh image files. The PICT format is most commonly used for bitmap images, but can 

be used for vector images was well. Avoid using PICT images for print publishing. The PICT 

format is “lossless,” meaning it does not remove information from the original image during the 

file format conversion process. Because the PICT format supports only limited compression on 

Macintoshes with QuickTime installed, PICT files are usually large. When saving an image as a 

PICT, add the extension “.pct” to the end of its file name. Use the PICT format for images used 

in video editing, animations, desktop computer presentations, and multimedia authoring. 

20.2.4 PNG: Portable Network Graphics 

The Portable Network Graphics format was developed to be the successor to the GIF file format. 

PNG is not yet widely supported by most Web browsers; Netscape versions 4.04 and later and 

Internet Explorer version 4.0b1 and later currently support this file format. However, PNG is 

expected to become a mainstream format for Web images and could replace GIF entirely. It is 

platform independent and should be used for single images only (not animations). Compared 

with GIF, PNG offers greater color support, better compression, gamma correction for brightness 

control across platforms, better support for transparency (alpha channel), and a better method 

for displaying progressive images. 

20.2.5 RAS: Sun Raster File 

The Sun Raster image file format is the native bitmap format of the SUN Microsystems UNIX 

platforms using the SunOS operating system. This format is capable of storing black-and-white, 

gray-scale, and color bitmapped data of any pixel depth. The use of color maps and a simple 

Run-Length data compression are supported. Typically, most images found on a SunOS system 

are Sun Raster images, and this format is supported by most UNIX imaging applications. 

20.2.6 EPS: Encapsulated PostScript 

The Encapsulated PostScript file format is a metafile format; it can be used for vector images or 

bitmap images. The EPS file format can be used on a variety of platforms, including Macintosh 

and Windows. When you place an EPS image into a document, you can scale it up or down 

without information loss. This format contains PostScript information and should be used when 

printing to a PostScript output device. The PostScript language , which was developed by Adobe, 

is the industry standard for desktop publishing software and hardware. EPS files can be graphics 

or images of whole pages that include text, font, graphic, and page layout information. 

20.2.7 TIFF: Tag Interchange File Format 

The Tag Interchange File Format is a tag-based format that was developed and maintained by 

Aldus (now Adobe). TIFF, which is used for bitmap images, is compatible with a wide range of 

software applications and can be used across platforms such as Macintosh, Windows, and UNIX. 

The TIFF format is complex, so TIFF files are generally larger than GIF or JPEG files. TIFF 

supports lossless LZW (Lempel-Ziv-Welch) compression ; however, compressed TIFFs take longer 

to open. When saving a file to the TIFF format, add the file extension “.tif” to the end of its file 

name.


20.2.8 JPEG: Joint Photographic Expert Group 

Like GIF, the Joint Photographic Experts Group format is one of the most popular formats for 

Web graphcis. It supports 24 bits of color information, and is most commonly used for photographs 

and similar continous-tone bitmap images. The JPEG file format stores all of the color information 

in an RGB image, then reduces the file size by compressing it, or saving only the color information 

that is essential to the image. Most imaging applications and plug-ins let you determine the 

amount of compression used when saving a graphic in JPEG format. Unlike GIF, JPEG does not 

support transparency. 

When to use JPEG? JPEG uses a “lossy” compression technique, which changes the original 

image by removing information during the conversion process. In theory, JPEG was designed 

especially for photographs so that changes made to the orginal image during conversion to JPEG 

would not be visible to the human eye. Most imaging applications let you control the amount of 

lossy compression performed on an image, so you can tade off image quality for smaller file size 

and vice versa. Be aware that the chances of degrading our image when converting it to JPEG 

increase proportionally with the amount of compression you use. 

JPEG is superior to GIF for storing full-color or grayscale images of “realistic” scenes, or images 

with continouos variation in color. For example, use JPEG for scanned photographs and naturalistic 

artwork with hightlights, shaded areas, and shadows. The more complex and subtly rendered 

the image is, the more likeley it is that the image should be converted to JPEG. 

Do not use JPEG for illustrations, cartoons, lettering, or any images that have very sharp edges 

(e.g., a row of black pixels adjacent to a row of white pixels). Sharp edges in images tend to 

blur in JPEG unless you use only a small amount of compression when converting the image. 

The JPEG data compression is being illustrated with an original image shown in Slide ??. We 

have an input parameter into a JPEG compression scheme that indicates how many coefficients 

one is carrying along. This is expressed by a percentage. Slide ?? shows 75% of the coefficients, 

leading to a 15:1 compression of that particular image. We go on to 50% of the coefficients in 

Slide ?? and 20% in Slide ??. We can appreciate the effect of the compression on the image 

by comparing a enlarged segment of the original image with a similarly enlarged segment of the 

de-compressed JPEG-image. Note how the decompression reveals that we have contaminated the 

image, because objects radiate out under the effect of the forward transform that cannot fully be 

undone by an inverse transform using a reduced set of coefficients. The effect of the compression 

and the resulting contamination of the image is larger as we use fewer and fewer coefficients of 

the transform as shown in Slide ?? and Slide ??. The effect of the compression can be shown 

by computing a difference image of just the intensity component (black and white component) as 

shown in Slide ??, Slide ??, and Slide ??. 

The basic principle of JPEG compression is illustrated in Algorithm 45. 


• Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern? 

20.3 Video File Formats: MPEG 

Slide ?? illustrates the basic idea of the MPEG-1 standard for the compression of movies. MPEG 

stands for Motion Picture Expert Group. Note that the MPEG approach takes key frames and 

compresses them individually as shown as image frames I in Slide ??. Slides P get interpolated between 

frames I. Frames are then further interpolated using the frames P . Fairly large compression 

rates can be achieved of 200:1. This leads to the ability of showing movies on laptop computers at

20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG 281 

Algorithm 45 JPEG image compression 

1: divide the picture into blocks of 8x8 pixels 

2: for all blocks do 

3: transform the block by DCT-II methode 

4: for all values in the block do 

5: quantize the value dependent from the position in the block {high frequencies are less 

important} 


7: reorder the values in a zic-zac way {DC value of block is replaced by difference to DC value 

of previous block} 

8: perform a run-length encoding of the quantized values 

9: compress the resulting bytes with Huffmann coding 


this time. Slide ?? explains that the requirements for the standard, as they are defined, includes 

the need to have the ability to play backwards and forwards, to compress time, to support fast 

motions and rapid changes of scenes, and to randomly access any part of the movie. 

The basic principle of MPEG compression is illustrated in Algorithm 46. 


• Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche 

Kompressionsraten können erzielt werden? 

20.4 New Image File Formats: Scalable Vector Graphic - 

SVG 

A Vector graphic differs from a raster graphic in that its content is described by mathematical 

statements. The statements instruct a computer’s drawing engine what to display on screen i.e. 

pixel information for a bitmap is not stored in the file and loaded into the display device as it is 

in the case of JPEG and GIF. Instead shapes and lines, their position and direction, colours and 

gradients are drawn. Vector graphics files contain instructions for the rasterisation of graphics 

as the statements arrive at the viewer’s browser - ’on the fly’. Vector graphics are resolution 

independent. That is, they can be enlarged as much as required with no loss of quality as there 

is no raster type image to enlarge and pixelate. A vector graphic will always display at the best 

quality that the output device is set to. When printing out a vector graphic from a Web page it 

will print at the printer’s optimum resolution i.e. without ’jaggies’. 

Until recently only proprietary formats such as Macromedia Flash or Apple’s QuickTime have 

allowed Web designers to create and animate vector graphics for the Web. That is going to 

change with the implementation of SVG (Scalable Vector Graphics). 

SVG is the standard, based on XML (Extensible Mark-up Language), which is currently undergoing 

development by the W3C consortium. 

An SVG file is itself comprised of text, that is the drawing engine instructions within it are 

written in ordinary text and not the binary symbols 1 and 0. The file can therefore be edited in an 

application no more complicated than a plain text editor, unlike raster graphics which have to be 

opened in image editing applications where pixel values are changed with the use of the program’s 

tools. If the appearance of a vector graphic is required to change in the Web browser, then the 

text file is edited via:


Algorithm 46 MPEG compression pipeline 

1: Open MPEG stream {Encoder, not specified as part of MPEG standard. Subject to various 

implementation dependant enhancements.} 

2: Close MPEG stream 

3: Open MPEG stream {Decoder} 

4: for all PictureGroups in MPEG stream do 

5: for all Pictures in PictureGroup do 

6: for all Slices in Picture do 

7: for all MacroBlock in Slice do 

8: for all Blocks in MacroBlock do {all I,P,B pictures} 

9: Variable Length Decoder {Huffman with fixed DC Tables} 

10: Inverse Quantizer 

11: Inverse ZigZag 

12: Inverse Diskrete Cosine Transformation {IDCT} 




16: if Picture != I then {interpolated pictures P and B} 

17: average +1/2 interpolation 

18: new-Picture = IDCT-Picture + interpolated-Picture 

19: else 

20: new-Picture is ready 

21: end if 

22: Dither new-Picture for display 

23: display new-Picture 



26: Close MPEG stream

20.4. NEW IMAGE FILE FORMATS: SCALABLE VECTOR GRAPHIC - SVG 283 

• Editing the graphic in an SVG compliant drawing application (e.g. Adobe Illustrator 9) 

• Editing the text of which the file is comprised in a text editor 

• The actions of the viewer in the Web browser - clicking the mouse which triggers a script 

which changes the text in the vector file 

As the files are comprised of text the images themselves can be dynamic. For instance CGI and 

PERL can generate images and animation based on user choices made in the browser. SVG 

graphics can be used to dynamically (in real time) render database information, change their 

appearance, and respond to user input and subsequent database queries. 

As the SVG standard is based on XML it is fully compatible with existing Web standards such as 

HTML (HyperText Mark Up Language), CSS (Cascading Style Sheets), DOM (Document Object 

Model), JavaScript and CGI (Common Gateway Interface) etc. 

The SVG format supports 24-bit colour, ICC color profiles for colour management, pan, zoom, 

gradients and masking and other features. Type rendered as SVG will look smoother and attributes 

such as kerning (spacing between characters), paths (paths along which type is run) and ligatures 

(where characters are joined together) are as controllable as in DTP and drawing applications. 

Positioning of SVG graphics in the Web browser window will be achieved with the use of CCS 

(Cascading Style Sheets) which are part of the HTML 4 standard.

284 CHAPTER 20. IMAGE REPRESENTATION

Appendix A 

Algorithmen und Definitionen 

Algorithmus 1: Affines Matching (siehe Abschnitt 0.6) 

Definition 2: Modellieren einer Panoramakamera (siehe Abschnitt 0.15) 

Definition 3: Berechnung der Datenmenge eines Bildes (siehe Abschnitt 1.2) 

Algorithmus 4: Bildvergrößerung (Raster vs. Vektor) (siehe Abschnitt 1.5) 

Definition 5: Berechnung der Nachbarschaftspixel (siehe Abschnitt 1.6) 

Definition 6: Berechnung des Zusammenhanges (siehe Abschnitt 1.6) 

Definition 7: Berechnung der Distanz zwischen zwei Pixeln (siehe Abschnitt 1.6) 

Algorithmus 8: Berechnung logischer Maskenoperationen (siehe Abschnitt 1.7) 

Algorithmus 9: Berechnung schneller Maskenoperationen (siehe Abschnitt 1.7) 

Definition 10: Modellierung einer perspektiven Kamera (siehe Abschnitt 2.2) 

Algorithmus 11: DDA einer Geraden (siehe Abschnitt 3.1) 

Algorithmus 12: Bresenham einer Geraden (siehe Abschnitt 3.1) 

Algorithmus 13: Füllen eines Polygons (siehe Abschnitt 3.2) 

Algorithmus 14: Zeichnen dicker Linien (siehe Abschnitt 3.3) 

Definition 15: Skelettberechnung via MAT (siehe Abschnitt 3.4) 

Definition 16: Translation (siehe Abschnitt 4.1) 

Definition 17: Reflektion (siehe Abschnitt 4.1) 

Definition 18: Komplement (siehe Abschnitt 4.1) 

Definition 19: Differenz (siehe Abschnitt 4.1) 

Algorithmus 20: Dilation (siehe Abschnitt 4.2) 

Definition 21: Erosion (siehe Abschnitt 4.2) 

Definition 22: Öffnen (siehe Abschnitt 4.3) 

Definition 23: Schließen (siehe Abschnitt 4.3) 

Definition 24: Filtern (siehe Abschnitt 4.4) 

Definition 25: Hit oder Miss (siehe Abschnitt 4.5) 

285

286 APPENDIX A. ALGORITHMEN UND DEFINITIONEN 

Definition 26: Umriss (siehe Abschnitt 4.6) 

Definition 27: Regionenfüllung (siehe Abschnitt 4.6) 

Algorithmus 28: Herstellung von Halbtonbildern (siehe Abschnitt 5.1) 

Definition 29: Farbtransformation in CIE (siehe Abschnitt 5.3) 

Definition 30: Farbtransformation in CMY (siehe Abschnitt 5.6) 

Definition 31: Farbtransformation in CMYK (siehe Abschnitt 5.7) 

Algorithmus 32: HSV-HSI-HLS-RGB (siehe Abschnitt 5.8) 

Definition 33: YIK-RGB (siehe Abschnitt 5.9) 

Algorithmus 34: Umwandlung von Negativ- in Positivbild (siehe Abschnitt 5.14) 

Algorithmus 35: Bearbeitung eines Masked Negative (siehe Abschnitt 5.14) 

Algorithmus 36: Berechnung eines Ratiobildes (siehe Abschnitt 5.16) 

Definition 37: Umrechnung lp/mm in Pixelgröße (siehe Abschnitt 6.4) 

Algorithmus 38: Berechnung eines Histogrammes (siehe Abschnitt 6.6) 

Algorithmus 39: Äquidistanzberechnung (siehe Abschnitt 6.6) 

Definition 40: Spreizen des Histogrammes (siehe Abschnitt 6.6) 

Algorithmus 41: Örtliche Histogrammäqualisierung (siehe Abschnitt 6.6) 

Algorithmus 42: Differenzbild (siehe Abschnitt 6.6) 

Algorithmus 43: Schwellwertbildung (siehe Abschnitt 7) 

Definition 44: Kontrastspreitzung (siehe Abschnitt 7) 

Definition 45: Tiefpassfilter mit 3 × 3 Fenster (siehe Abschnitt 7.2) 

Algorithmus 46: Medianfilter (siehe Abschnitt 7.2) 

Algorithmus 47: Faltungsberechnung (siehe Abschnitt 7.3) 

Definition 48: USM Filter (siehe Abschnitt 7.4) 

Definition 49: Allgemeines 3 × 3 Gradientenfilter (siehe Abschnitt 7.5) 

Definition 50: Roberts-Filter (siehe Abschnitt 7.5) 

Definition 51: Prewitt-Filter (siehe Abschnitt 7.5) 

Definition 52: Sobel-Filter (siehe Abschnitt 7.5) 

Algorithmus 53: Berechnung eines gefilterten Bildes im Spektralbereich (siehe Abschnitt 7.6) 

Algorithmus 54: Ungewichtetes Antialiasing (siehe Abschnitt 7.9) 

Algorithmus 55: Gewichtetes Antialiasing (siehe Abschnitt 7.9) 

Algorithmus 56: Gupte-Sproull-Antialiasing (siehe Abschnitt 7.9) 

Definition 57: Statistische Texturberechnung (siehe Abschnitt 8.2) 

Definition 58: Berechnung eines spektralen Texturmasses (siehe Abschnitt 8.4) 

Algorithmus 59: Aufbringen einer Textur (siehe Abschnitt 8.5) 

Definition 60: Berechnung einer linearen Transformation in 2D (siehe Abschnitt 9.2) 

Definition 61: Konforme Transformation (siehe Abschnitt 9.3) 

Definition 62: Modellierung einer Drehung in 2D (siehe Abschnitt 9.4)

287 

Definition 63: Aufbau einer 2D Drehmatrix bei gegebenen Koordinatenachsen (siehe Abschnitt 

9.4) 

Definition 64: Rückdrehung in 2D (siehe Abschnitt 9.4) 

Definition 65: Aufeinanderfolgende Drehungen (siehe Abschnitt 9.4) 

Definition 66: Affine Transformation in 2D in homogenen Koordinaten (siehe Abschnitt 9.5) 

Definition 67: Affine Transformation in 2D in kartesischen Koordinaten (siehe Abschnitt 9.5) 

Definition 68: Allgemeine Transformation in 2D (siehe Abschnitt 9.6) 

Algorithmus 69: Berechnung unbekannter Transformationsparameter (siehe Abschnitt 9.6) 

Algorithmus 70: Cohen Sutherland (siehe Abschnitt 9.8) 

Definition 71: Aufbau einer homogenen Transformationsmatrix in 2D (siehe Abschnitt 9.9) 

Definition 72: 3D Drehung (siehe Abschnitt 9.10) 

Definition 73: 3D affine Transformation in homogenen Koordinaten (siehe Abschnitt 9.11) 

Definition 74: Bezier-Kurven in 2D (siehe Abschnitt 9.20) 

Algorithmus 75: Casteljau (siehe Abschnitt 9.21) 

Algorithmus 76: Berechnung einer Kettenkodierung (siehe Abschnitt 10.1) 

Algorithmus 77: Splitting (siehe Abschnitt 10.2) 

Definition 78: Parameterdarstellung einer Geraden für 2D Morphing (siehe Abschnitt 10.3) 

Algorithmus 79: Aufbau eines Quadtrees (siehe Abschnitt 10.5) 

Definition 80: Aufbau einer Wireframestruktur (siehe Abschnitt 10.8) 

Definition 81: Aufbau einer B-Rep-Struktur (siehe Abschnitt 10.12) 

Definition 82: Aufbau einer ” 

Cell“-Struktur (siehe Abschnitt 10.14) 

Algorithmus 83: Aufbau einer BSP-Struktur (siehe Abschnitt 10.14) 

Algorithmus 84: z-Buffering für eine Octree-Struktur (siehe Abschnitt 11.5) 

Algorithmus 85: Raytracing für eine Octree-Struktur (siehe Abschnitt 11.6) 

Definition 86: Ambient Beleuchtung (siehe Abschnitt 12.1) 

Definition 87: Lambert Modell (siehe Abschnitt 12.1) 

Algorithmus 88: Gouraud (siehe Abschnitt 12.2) 

Algorithmus 89: Phong (siehe Abschnitt 12.2) 

Algorithmus 90: Objektgenaue Schattenberechnung (siehe Abschnitt 12.3) 

Algorithmus 91: Bildgenaue Schattenberechnung (siehe Abschnitt 12.3) 

Algorithmus 92: Radiosity (siehe Abschnitt 12.6) 

Definition 93: Berechnung der Binokularen Tiefenschärfe (siehe Abschnitt 13.1) 

Definition 94: Berechnung der totalen Plastik (siehe Abschnitt 13.2) 

Algorithmus 95: Berechnung eines Stereomatches (siehe Abschnitt 13.7) 

Definition 96: LoG Filter als Vorbereitung auf Stereomatches (siehe Abschnitt 13.7) 

Algorithmus 97: Aufbau eines Merkmalsraums (siehe Abschnitt 14.3) 

Algorithmus 98: Pixelzuteilung zu einer Klasse ohne Rückweisung (siehe Abschnitt 14.4)

288 APPENDIX A. ALGORITHMEN UND DEFINITIONEN 

Algorithmus 99: Pixelzuteilung zu einer Klasse mit Rückweisung (siehe Abschnitt 14.4) 

Algorithmus 100: Zuteilung eines Merkmalsraumes mittels Trainingspixeln (siehe Abschnitt 14.6) 

Algorithmus 101: Berechnung einer Knotendatei (siehe Abschnitt 15.3) 

Algorithmus 102: Berechnung eines nächsten Nachbars (siehe Abschnitt 15.4) 

Algorithmus 103: Berechnung eines bilinear interpolierten Grauwerts (siehe Abschnitt 15.4) 

Algorithmus 104: Bilddrehung (siehe Abschnitt 15.5) 

Algorithmus 105: z-Buffer Pipeline (siehe Abschnitt 19.2) 

Algorithmus 106: Phong-Pipeline (siehe Abschnitt 19.2) 

Algorithmus 107: Kompressionspipeline (siehe Abschnitt 20.1.2) 

Algorithmus 108: JPEG Pipeline (siehe Abschnitt 20.2.8) 

Algorithmus 109: MPEG Pipeline (siehe Abschnitt 20.3)

Appendix B 

Fragenübersicht 

B.1 Gruppe 1 

• Es besteht in der Bildverarbeitung die Idee eines sogenannten Bildmodelles“. Was ist 

” 

darunter zu verstehen, und welche Formel dient der Darstellung des Bildmodells? [#0001] 

(Frage I/8 14. April 2000) 

• Bei der Betrachtung von Pixeln bestehen Nachbarschaften“ von Pixeln. Zählen Sie alle 

” 

Arten von Nachbarschaften auf, die in der Vorlesung behandelt wurden, und beschreiben Sie 

diese Nachbarschaften mittels je einer Skizze. [#0003] 

(Frage I/9 14. April 2000, Frage I/1 9. November 2001) 

• Beschreiben Sie in Worten die wesentliche Verbesserungsidee im Bresenham-Algorithmus 

gegenüber dem DDA-Algorithmus. [#0006] 

(Frage I/5 11. Mai 2001, Frage 7 20. November 2001) 

• Erläutern Sie die morphologische ” 

Erosion“ unter Verwendung einer Skizze und eines Formelausdruckes. 

[#0007] 


• Gegeben sei der CIE Farbraum. Erstellen Sie eine Skizze dieses Farbraumes mit einer 

Beschreibung der Achsen und markieren Sie in diesem Raum zwei Punkte A, B. Welche Farbeigenschaften 

sind Punkten, welche auf der Strecke zwischen A und B liegen, zuzuordnen, 

und welche den Schnittpunkten der Geraden durch A, B mit dem Rand des CIE-Farbraumes? 

[#0012] 


• Zu welchem Zweck würde man als Anwender ein sogenanntes ” 

Ratio-Bild“ herstellen? Verwenden 

Sie bitte in der Antwort die Hilfe einer Skizze zur Erläuterung eines Ratiobildes. 

[#0015] 


• Welches Maß dient der Beschreibung der geometrischen Auflösung eines Bildes, und mit 

welchem Verfahren wird diese Auflösung geprüft und quantifiziert? Ich bitte Sie um eine 

Skizze. [#0017] 

(Frage I/10 14. April 2000) 

289

290 APPENDIX B. FRAGENÜBERSICHT 

• Eines der populärsten Filter heißt ” 

Unsharp Masking“ (USM). Wie funktioniert es? Ich bitte 

um eine einfache formelmäßige Erläuterung. [#0021] 

(Frage I/11 14. April 2000) 

• In der Vorlesung wurde ein ” 

Baum“ für die Hierarchie diverser Projektionen in die Ebene 

dargestellt (Planar Projections). Skizzieren Sie bitte diesen Baum mit allen darin vorkommenden 

Projektionen. [#0026] 

(Frage I/12 14. April 2000) 

• Wozu dient das sogenannte ” 

photometrische Stereo“? Und was ist die Grundidee, die diesem 

Verfahren dient? [#0033] 

(Frage I/5 14. April 2000, Frage I/1 28. September 2001) 

• Was ist eine einfache Realisierung der ” 

Spiegelreflektion“ (engl.: specular reflection) bei 

der Darstellung dreidimensionaler Objekte? Ich bitte um eine Skizze, eine Formel und den 

Namen eines Verfahrens nach seinem Erfinder. [#0034] 

(Frage I/6 14. April 2000, Frage I/6 28. September 2001, Frage I/6 1. Februar 2002) 

• Welche ist die wesentliche Abgrenzung zwischen Computergrafik und Bildanalyse, welches 

ist ihr Zusammenhang? Hier ist die Verwendung einer grafischen Darstellung in der Beantwortung 

erwünscht. [#0041] 


• Was bedeuten die Begriffe ” 

geometrische“ bzw. ” 

radiometrische“ Auflösung eines Bildes? 

Versuchen Sie, Ihre Antwort durch eine Skizze zu verdeutlichen. [#0047] 

(Frage I/1 14. Dezember 2001) 


Rasterkonversion“, und welche Probleme können dabei auftreten? 

[#0058] 

(Frage I/1 26. Mai 2000, Frage I/8 15. März 2002) 

• Erläutern Sie das morphologische ” Öffnen“ unter Verwendung einer Skizze und eines Formelausdruckes. 

[#0059] 

(Frage I/2 26. Mai 2000, Frage I/4 10. November 2000) 

• Erklären Sie das Problem, das bei der Verwendung von ” 

einem Pixel breiten“ Linien auftritt, 

wenn eine korrekte Intensitätswiedergabe gefordert ist. Welche Lösungsmöglichkeiten gibt 

es für dieses Problem? Bitte verdeutlichen Sie Ihre Antwort anhand einer Skizze! (Hinweis: 

betrachten Sie Linien unterschiedlicher Orientierung!) [#0060] 

(Frage I/3 26. Mai 2000) 





Bereiches! [#0061] 

(Frage I/5 30. Juni 2000, Frage 1 20. November 2001, Frage I/5 15. März 2002) 

• Können von einem RGB-Monitor alle vom menschlichen Auge wahrnehmbaren Farben dargestellt 

werden? Begründen Sie Ihre Antwort anhand einer Skizze! [#0062] 

(Frage I/4 26. Mai 2000, Frage I/5 10. November 2000, Frage I/2 9. November 2001, Frage 

4 20. November 2001)

B.1. GRUPPE 1 291 

• Was ist ein Medianfilter, was sind seine Eigenschaften, und in welchen Situationen wird er 

eingesetzt? [#0063] 

(Frage I/5 26. Mai 2000, Frage I/7 10. November 2000, Frage I/11 30. März 2001, Frage I/5 

28. September 2001, Frage 3 20. November 2001) 

• Erklären Sie die Bedeutung von homogenen Koordinaten für die Computergrafik! Welche 

Eigenschaften weisen homogene Koordinaten auf? [#0066] 

(Frage I/6 26. Mai 2000, Frage 1 15. Jänner 2002) 

• Was versteht man unter (geometrischem) ” 

Resampling“, und welche Möglichkeiten gibt es, 

die Intensitäten der Pixel im Ausgabebild zu berechnen? Beschreiben sie verschiedene Verfahren 

anhand einer Skizze und ggf. eines Formelausdrucks! [#0067] 

(Frage I/7 26. Mai 2000, Frage I/6 10. November 2000, Frage I/3 28. September 2001, Frage 

I/9 9. November 2001, Frage 6 20. November 2001, Frage 6 15. Jänner 2002) 

• Beschreiben Sie mindestens zwei Verfahren, bei denen allein durch Modulation der Oberflächenparameter 

(ohne Definition zusätzlicher geometrischer Details) eine realistischere Darstellung 

eines vom Computer gezeichneten Objekts möglich ist! [#0068] 

(Frage I/8 26. Mai 2000) 

• Ein dreidimensionaler Körper kann mit Hilfe von Zellen einheitlicher Größe (Würfeln), die in 

einem gleichmäßigen Gitter angeordnet sind, dargestellt werden. Beschreiben Sie Vor- und 

Nachteile dieser Repräsentationsform! Begründen Sie Ihre Antwort ggf. mit einer Skizze! 

[#0070] 

(Frage I/9 26. Mai 2000) 

• Erklären Sie (ohne Verwendung von Formeln) das Prinzip des ” 

Radiosity“-Verfahrens zur 

Herstellung realistischer Bilder mit dem Computer. Welche Art der Lichtinteraktion kann 

mit diesem Modell beschrieben werden, und welche kann nicht beschrieben werden? [#0073] 

(Frage I/10 26. Mai 2000) 

• In der Einführungsvorlesung wurde der Begriff ” 

Affine Matching“ verwendet. Wozu dient 

das Verfahren, welches dieser Begriff bezeichnet? [#0079] 


• Skizzieren Sie die ” 

Grafik-Pipeline“ für die Darstellung einer digitalen dreidimensionalen 

Szene mittels z-buffering und Gouraud-shading! [#0082] 

(Frage I/10 30. Juni 2000, Frage I/9 10. November 2000) 

• Beschreiben Sie den Unterschied zwischen ” 

Virtual Reality“ und ” 

Augmented Reality“. 

Welche Hardware wird in beiden Fällen benötigt? [#0083] 

(Frage I/9 30. Juni 2000, Frage I/8 28. September 2001, Frage I/8 14. Dezember 2001, Frage 

I/2 15. März 2002) 

• Wie werden in der Stereo-Bildgebung zwei Bilder der selben Szene aufgenommen? Beschreiben 

Sie typische Anwendungsfälle beider Methoden! [#0084] 

(Frage I/8 30. Juni 2000, Frage I/8 10. November 2000) 

• Erklären Sie den Vorgang der Schattenberechnung nach dem 2-Phasen-Verfahren mittels 

z-Buffer! Beschreiben Sie zwei Varianten sowie deren Vor- und Nachteile. [#0086] 

(Frage I/7 30. Juni 2000)


• Man spricht bei der Beschreibung von dreidimensionalen Objekten von 2 1 2D- oder 3D- 

Modellen. Definieren Sie die Objektbeschreibung durch 2 1 2D- bzw. 3D-Modelle mittels Gleichungen 

und erläutern Sie in Worten den wesentlichen Unterschied! [#0087] 

(Frage I/6 30. Juni 2000, Frage I/6 9. November 2001, Frage I/6 14. Dezember 2001, Frage 

5 15. Jänner 2002) 

• Welche Eigenschaften weist eine (sich regelmäßig wiederholende) Textur im Spektralraum 

auf? Welche Aussagen können über eine Textur anhand ihres Spektrums gemacht werden? 

[#0093] 

(Frage I/4 30. Juni 2000) 

• Erklären Sie, unter welchen Umständen ” 

Aliasing“ auftritt und was man dagegen unternehmen 

kann! [#0094] 

(Frage I/3 30. Juni 2000) 

• Geben Sie die Umrechnungsvorschrift für einen RGB-Farbwert in das CMY-Modell und in 

das CMYK-Modell an und erklären Sie die Bedeutung der einzelnen Farbanteile! Wofür wird 

das CMYK-Modell verwendet? [#0095] 

(Frage I/2 30. Juni 2000, Frage 2 20. November 2001) 


Panorama-) Kameras gegenüber herkömmlichen (perspektiven) Kameras? [#0097] 

(Frage I/1 30. Juni 2000) 

• Definieren Sie den Begriff ” 

Kante“. [#0105] 

(Frage I/1 13. Oktober 2000) 

• Erklären Sie anhand einer Skizze den zeitlichen Ablauf des Bildaufbaus auf einem Elektronenstrahlschirm! 

[#0109] 

(Frage I/2 13. Oktober 2000, Frage I/2 1. Februar 2002, Frage I/10 15. März 2002) 

• Erklären Sie, wie man mit Hilfe der Computertomografie ein dreidimensionales Volumenmodell 

vom Inneren des menschlichen Körpers gewinnt. [#0110] 


• Nennen Sie verschiedene Techniken, um ” 

dicke“ Linien (z.B. Geradenstücke oder Kreisbögen) 

zu zeichnen. [#0111] 

(Frage I/4 13. Oktober 2000, Frage I/1 10. November 2000, Frage I/10 9. November 2001) 

• Zum YIQ-Farbmodell: 

1. Welche Bedeutung hat die Y -Komponente im YIQ-Farbmodell? 

2. Wo wird das YIQ-Farbmodell eingesetzt? 


[#0112] 

• Skizzieren Sie die Form des Filterkerns eines Gaussschen Tiefpassfilters. Worauf muss man 

bei der Wahl der Filterparameter bzw. der Größe des Filterkerns achten? [#0115] 

(Frage I/6 13. Oktober 2000, Frage I/3 10. November 2000) 

• Nennen Sie drei Arten der Texturbeschreibung und führen Sie zu jeder ein Beispiel an. 

[#0116] 

(Frage I/7 13. Oktober 2000, Frage I/10 10. November 2000)

B.1. GRUPPE 1 293 


Sweep“-Repräsentation? Welche Vor- und Nachteile hat diese 

Art der Objektrepräsentation? [#0117] 

(Frage I/8 13. Oktober 2000, Frage I/2 10. November 2000, Frage 4 15. Jänner 2002) 

• Welche physikalischen Merkmale der von einem Körper ausgesandten oder reflektierten 

Strahlung eignen sich zur Ermittlung der Oberflächeneigenschaften (z.B. zwecks Klassifikation)? 

[#0118] 

(Frage I/9 13. Oktober 2000, Frage I/5 14. Dezember 2001) 

• Beschreiben Sie zwei Verfahren zur Interpolation der Farbwerte innerhalb eines Dreiecks, 

das zu einer beleuchteten polygonalen Szene gehört. [#0119] 


• Was versteht man in der Sensorik unter Einzel- bzw. Mehrfachbildern? Nennen Sie einige 

Beispiele für Mehrfachbilder! [#0121] 

(Frage I/1 15. Dezember 2000, Frage I/5 9. November 2001, Frage I/3 14. Dezember 2001) 

• Skizzieren Sie drei verschiedene Verfahren zum Scannen von zweidimensionalen Vorlagen 

(z.B. Fotografien)! [#0122] 


• Beschreiben Sie das Prinzip der Bilderfassung mittels Radar! Welche Vor- und Nachteile 

bietet dieses Verfahren? [#0123] 



Trackingverfahren und erläutern Sie deren Vor- und Nachteile! [#0124] 

(Frage I/4 15. Dezember 2000, Frage I/4 1. Februar 2002) 

• Beschreiben Sie den Unterschied zwischen der Interpolation und der Approximation von 

Kurven, und erläutern Sie anhand einer Skizze ein Approximationsverfahren Ihrer Wahl! 

[#0125] 

(Frage I/5 15. Dezember 2000, Frage 2 15. Jänner 2002) 

• Geben Sie die Transferfunktion H(u, v) im Frequenzbereich eines idealen Tiefpassfilters mit 

der ” 

cutoff“-Frequenz D 0 an! Skizzieren Sie die Transferfunktion! [#0127] 

(Frage I/6 15. Dezember 2000, Frage I/7 14. Dezember 2001) 

• Erklären Sie, wie in der Visualisierung die Qualität eines vom Computer erzeugten Bildes 

durch den Einsatz von Texturen verbessert werden kann. Nennen Sie einige Oberflächeneigenschaften 

(insbesondere geometrische), die sich nicht zur Repräsentation mit Hilfe einer 

Textur eignen. [#0128] 


• Erklären Sie, warum bei der Entzerrung von digitalen Rasterbildern meist Resampling“ 

” 

erforderlich ist. Nennen Sie zwei Verfahren zur Grauwertzuweisung für das Ausgabebild! 

[#0130] 


• Erklären Sie, wie ein kreisfreier gerichteter Graph zur Beschreibung eines Objekts durch 

seine (polygonale) Oberfläche genutzt werden kann! [#0131] 

(Frage I/9 15. Dezember 2000, Frage I/2 28. September 2001)


128x128 

256x256 

512x512 

Figure B.1: wiederholte Speicherung eines Bildes in verschieden Größen 

• Erklären Sie den Begriff ” Überwachen beim Klassifizieren“. Wann kann man dieses Verfahren 

einsetzen? [#0133] 


• Im praktischen Teil der Prüfung wird bei Aufgabe B.2 nach einer Transformationsmatrix (in 

zwei Dimensionen) gefragt, die sich aus einer Skalierung und einer Rotation um ein beliebiges 

Rotationszentrum zusammensetzt. Wie viele Freiheitsgrade hat eine solche Transformation? 

Begründen Sie Ihre Antwort! [#0167] 

(Frage I/1 2. Februar 2001) 

• Mit Hilfe von Radarwellen kann man von Flugzeugen und Satelliten aus digitale Bilder 

erzeugen, aus welchen ein topografisches Modell des Geländes (ein Höhenmodell) aus einer 

einzigen Bildaufnahme erstellt werden kann. Beschreiben Sie jene physikalischen Effekte der 

elektromagnetischen Strahlung, die für diese Zwecke genutzt werden! [#0169] 


• In Abbildung B.1 ist ein digitales Rasterbild in verschiedenen Auflösungen zu sehen. Das 

erste Bild ist 512 × 512 Pixel groß, das zweite 256 × 256 Pixel usw., und das letzte besteht 

nur mehr aus einem einzigen Pixel. Wie nennt man eine solche Bildrepräsentation, und wo 

wird sie eingesetzt (nennen Sie mindestens ein Beispiel)? [#0170] 

(Frage I/6 2. Februar 2001, Frage I/1 1. Februar 2002) 


gezeigt. Benennen Sie die vier Darstellungstechniken! [#0175] 


• In Abbildung B.3 soll eine Karikatur des amerikanischen Ex-Präsidenten George Bush in 

eine Karikatur seines Amtsnachfolgers Bill Clinton übergeführt werden, wobei beide Bilder 

als Vektordaten vorliegen. Welches Verfahren kommt hier zum Einsatz, und welche Datenstrukturen 

werden benötigt? Erläutern Sie Ihre Antwort anhand einer beliebigen Strecke 

aus Abbildung B.3! [#0177] 


• Was ist eine ” 

3D Textur“? [#0178] 

(Frage I/9 2. Februar 2001, Frage I/4 28. September 2001)

B.1. GRUPPE 1 295 

• Welche Rolle spielen die sogenannten ” 

Passpunkte“ (engl. Control Points) bei der Interpolation 

und bei der Approximation von Kurven? Erläutern Sie Ihre Antwort anhand einer 

Skizze! [#0179] 


• Beschreiben Sie eine bilineare Transformation anhand ihrer Definitionsgleichung! [#0180] 


• Zählen Sie Fälle auf, wo in der Bildanalyse die Fourier-Transformation verwendet wird! 

[#0184] 


• Nach welchem Prinzip arbeitet die JPEG-Komprimierung von digitalen Rasterbildern? [#0185] 

(Frage I/10 2. Februar 2001, Frage I/9 19. Oktober 2001) 

• Geben Sie zu jedem der Darstellungsverfahren aus Abbildung B.2 an, welche Informationen 

über das Objekt gespeichert werden müssen! [#0187] 


• Erläutern Sie den Begriff ” 

Sensor-Modell“! [#0193] 

(Frage I/1 30. März 2001, Frage I/7 19. Oktober 2001) 

• Wie wird die geometrische Auflösung eines Filmscanners angegeben, und mit welchem Verfahren 

kann man sie ermitteln? [#0194] 

(Frage I/2 30. März 2001) 


passiver Radiometrie“? [#0195] 

(Frage I/3 30. März 2001, Frage I/9 1. Februar 2002) 

• Gegeben sei ein Polygon durch die Liste seiner Eckpunkte. Wie kann das Polygon ausgefüllt 

(also mitsamt seinem Inneren) auf einem Rasterbildschirm dargestellt werden? Welche Probleme 

treten auf, wenn das Polygon sehr ” 

spitze“ Ecken hat (d.h. Innenwinkel nahe bei Null)? 

[#0196] 

(Frage I/4 30. März 2001, Frage I/2 14. Dezember 2001) 

• Wie ist der ” 

Hit-or-Miss“-Operator A ⊛ B definiert? Erläutern Sie seine Funktionsweise zur 

Erkennung von Strukturen in Binärbildern! [#0199] 

(Frage I/5 30. März 2001) 


(pseudo color image)? Nennen Sie je einen typischen Anwendungsfall! [#0200] 

(Frage I/6 30. März 2001) 


der beim Offset-Druck. Welche Farbmodelle kommen dabei zum Einsatz? [#0202] 

(Frage I/7 30. März 2001, Frage I/10 19. Oktober 2001, Frage I/4 14. Dezember 2001) 


prozeduralen Texturen“, wie werden sie erzeugt und welche Vorteile 

bringt ihr Einsatz? [#0206] 

(Frage I/8 30. März 2001) 

• Erklären Sie den Begriff ” 

spatial partitioning“ und nennen Sie drei räumliche Datenstrukturen 

aus dieser Gruppe! [#0208] 

(Frage I/9 30. März 2001)



feature“ (Merkmal), ” 

feature space“ (Merkmalsraum) und ” 

cluster“ 

im Zusammenhang mit Klassifikationsproblemen und verdeutlichen Sie Ihre Antwort 

anhand einer Skizze! [#0209] 

(Frage I/10 30. März 2001, Frage I/9 28. September 2001, Frage I/7 1. Februar 2002) 

• Im Folgenden sehen Sie drei 3 × 3 Transformationmatrizen, wobei jede der Matrizen einen 

bestimmten Transformationstyp für homogene Koordinaten in 2D beschreibt: 

⎛ 

A = ⎝ a ⎞ 

11 0 0 

0 a 22 0 ⎠ , a 11 , a 22 beliebig 

⎛ 

0 0 1 

B = ⎝ b ⎞ 

11 b 12 0 

−b 12 b 11 0 ⎠ , b 2 11 + b 2 12 = 1 

⎛ 0 0 1 

C = ⎝ 1 0 c ⎞ 

13 

0 1 c 23 

⎠ , c 13 , c 23 beliebig 

0 0 1 

Um welche Transformationen handelt es sich bei A, B und C? [#0213] 

(Frage I/1 11. Mai 2001) 

• In der Computergrafik ist die Abbildung eines dreidimensionalen Objekts auf die zweidimensionale 

Bildfläche ein mehrstufiger Prozess (Abbildung B.4), an dem verschiedene Transformationen 

und Koordinatensysteme beteiligt sind. Benennen Sie die Koordinatensysteme A, 

B und C in Abbildung B.4! [#0215] 

(Frage I/1 26. Juni 2001) 

• Gegeben sei ein verrauschtes monochromes digitales Rasterbild. Gesucht sei ein Filter, das 

zur Bereinigung eines solchen Bildes geeignet ist, wobei folgende Anforderungen gestellt 

werden: 

– Kanten müssen erhalten bleiben und dürfen nicht ” 

verwischt“ werden. 

– Im Ausgabebild dürfen nur solche Grauwerte enthalten sein, die auch im Eingabebild 

vorkommen. 

Schlagen Sie einen Filtertyp vor, der dafür geeignet ist, und begründen Sie Ihre Antwort! 

[#0216] 

(Frage I/2 11. Mai 2001, Frage I/5 19. Oktober 2001) 

• In der Computergrafik kennt man die Begriffe ” 

Phong-shading“ und ” 

Phong-illumination“. 

Erklären Sie diese beiden Begriffe! [#0219] 

(Frage I/3 11. Mai 2001) 

• Bei der Erstellung realistischer Szenen werden in der Computergrafik u.a. die zwei Konzepte 

” shading“ und shadow“ verwendet, um die Helligkeit der darzustellenden Bildpunkte zu 

” 

ermitteln. Was ist der Unterschied zwischen diesen beiden Begriffen? [#0220] 

(Frage I/3 26. Juni 2001, Frage I/2 19. Oktober 2001, Frage I/4 15. März 2002) 

• Nennen Sie Anwendungen von Schallwellen in der digitalen Bildgebung! [#0225] 

(Frage I/4 11. Mai 2001) 

• Nennen Sie allgemeine Anforderungen an eine Datenstruktur zur Repräsentation dreidimensionaler 

Objekte! [#0230] 

(Frage I/7 11. Mai 2001)

B.1. GRUPPE 1 297 

• Beschreiben Sie das ray-tracing“-Verfahren zur Ermittlung sichtbarer Flächen! Welche 

” 

Optimierungen können helfen, den Rechenaufwand zu verringern? [#0231] 

(Frage I/9 11. Mai 2001, Frage I/8 19. Oktober 2001, Frage 8 15. Jänner 2002) 

• Beschreiben Sie Anwendungen von ” 

Resampling“ und erläutern Sie den Prozess, seine Varianten 

und mögliche Fehlerquellen! [#0232] 

(Frage I/10 11. Mai 2001) 

• Nennen Sie verschiedene technische Verfahren der stereoskopischen Vermittlung eines ” 

echten“ 

(dreidimensionalen) Raumeindrucks einer vom Computer dargestellten Szene! [#0233] 

(Frage I/11 11. Mai 2001) 

• Erklären Sie den Unterschied zwischen ” 

supervised classification“ und ” 

unsupervised classification“! 

Welche Rollen spielen diese Verfahren bei der automatischen Klassifikation der 

Bodennutzung anhand von Luftbildern? [#0234] 

(Frage I/8 11. Mai 2001) 

• Erklären Sie die Arbeitsweise der MPEG-Kompression von digitalen Videosequenzen! Welche 

Kompressionsraten können erzielt werden? [#0235] 

(Frage I/6 11. Mai 2001, Frage I/9 14. Dezember 2001, Frage I/1 15. März 2002) 


motion blur“, und unter welcher Voraussetzung kann dieser Effekt 

aus einem Bild wieder entfernt werden? [#0238] 

(Frage I/2 26. Juni 2001, Frage I/10 14. Dezember 2001) 

• Welchem Zweck dient ein sogenannter ” 

Objektscanner“? Nennen Sie drei verschiedene Verfahren, 

nach denen ein Objektscanner berührungslos arbeiten kann! [#0239] 

(Frage I/4 26. Juni 2001) 

• Erklären Sie anhand eines Beispiels den Vorgang des morphologischen Filterns! [#0240] 

(Frage I/6 26. Juni 2001) 

• Was versteht man unter der geometrischen Genauigkeit (geometric accuracy) eines digitalen 

Rasterbildes? [#0243] 


• Beschreiben Sie anhand einer Skizze das ” 

Aussehen“ folgender Filtertypen im Frequenzbereich: 

1. Tiefpassfilter 

2. Hochpassfilter 

3. Bandpassfilter 

(Frage I/8 26. Juni 2001) 

[#0245] 

• Welche statistischen Eigenschaften können zur Beschreibung von Textur herangezogen werden? 

Erläutern Sie die Bedeutung dieser Eigenschaften im Zusammenhang mit Texturbildern! 

[#0246] 

(Frage I/5 26. Juni 2001)


• Wird eine reale Szene durch eine Kamera mit nichtidealer Optik aufgenommen, entsteht ein 

verzerrtes Bild. Erläutern Sie die zwei Stufen des Resampling, die erforderlich sind, um ein 

solches verzerrtes Bild zu rektifizieren! [#0249] 

(Frage I/10 26. Juni 2001) 

• In der Computergrafik gibt es zwei grundlegend verschiedene Verfahren, um ein möglichst 

(photo-)realistisches Bild einer dreidimensionalen Szene zu erstellen. Verfahren A kommt 

zum Einsatz, wenn Spiegelreflexion, Lichtbrechung und Punktlichtquellen simuliert werden 

sollen. Verfahren B ist besser geeignet, um diffuse Reflexion, gegenseitige Lichtabstrahlung 

und Flächenlichtquellen darzustellen und die Szene interaktiv zu durchwandern. Benennen 

Sie diese beiden Verfahren und erläutern Sie kurz deren jeweilige Grundidee! [#0253] 

(Frage I/7 26. Juni 2001) 

• Was versteht man unter einem ” 

LoD/R-Tree“? [#0254] 

(Frage I/9 26. Juni 2001) 


immersiver Visualisierung“? [#0256] 

(Frage I/11 26. Juni 2001) 

• Beschreiben Sie die Farberzeugung beim klassischen Offsetdruck! Welches Farbmodell wird 

verwendet, und wie wird das Auftreten des Moiree-Effekts verhindert? [#0265] 

(Frage I/10 28. September 2001) 

• Nennen Sie ein Beispiel und eine konkrete Anwendung eines nicht-optischen Sensors in der 

Stereo-Bildgebung! [#0266] 

(Frage I/7 28. September 2001) 

• Was versteht man unter data garmets“ (Datenkleidung)? Nennen Sie mindestens zwei 

” 

Geräte dieser Kategorie! [#0273] 


• Skizzieren Sie die Übertragungsfunktion eines idealen und eines Butterworth-Hochpassfilters 

und vergleichen Sie die Vor- und Nachteile beider Filtertypen! [#0274] 



konformen Transformation“? [#0275] 


• Nach welchem Grundprinzip arbeiten Verfahren, die aus einem Stereobildpaar die Oberfläche 

eines in beiden Bildern sichtbaren Körpers rekonstruieren können? [#0276] 


• Beschreiben Sie mindestens zwei Verfahren oder Geräte, die in der Medizin zur Gewinnung 

digitaler Rasterbilder verwendet werden! [#0278] 

(Frage I/3 9. November 2001, Frage I/7 15. März 2002) 

• Was ist ” 

Morphologie“? [#0279] 

(Frage I/7 9. November 2001) 

• Was versteht man unter einem dreidimensionalen Farbraum (bzw. Farbmodell)? Nennen Sie 

mindestens drei Beispiele davon! [#0280] 

(Frage I/4 9. November 2001)

B.1. GRUPPE 1 299 

• Erläutern Sie die strukturelle Methode der Texturbeschreibung! [#0281] 

(Frage I/8 9. November 2001) 

• Nennen Sie ein Verfahren zur Verbesserung verrauschter Bilder, und erläutern sie deren 

Auswirkungen auf die Qualität des Bildes! Bei welcher Art von Rauschen kann das von 

Ihnen genannte Verfahren eingesetzt werden? [#0296] 


• Erläutern Sie die Octree-Datenstruktur und nennen Sie mindestens zwei verschiedene Anwendungen 

davon! [#0298] 


• Erklären Sie den z-buffer-Algorithmus zur Ermittlung sichtbarer Flächen! [#0299] 


• Beschreiben Sie die Arbeitsweise des Marr-Hildreth-Operators 1 ! [#0311] 

(Frage I/9 15. März 2002) 

• Nennen Sie vier dreidimensionale Farbmodelle, benennen Sie die einzelnen Komponenten 

und skizzieren Sie die Geometrie des Farbmodells! [#0313] 

(Frage I/6 15. März 2002) 

• Versuchen Sie eine Definition des Histogramms eines digitalen Grauwertbildes! [#0314] 

(Frage I/3 15. März 2002) 

1 Dieser Operator wurde in der Vorlesung zur Vorbearbeitung von Stereobilder besprochen und erstmals im 

Wintersemester 2001/02 namentlich genannt.


(a) Verfahren 1 

(b) Verfahren 2 

(c) Verfahren 3 

(d) Verfahren 4 

Figure B.2: dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt

B.1. GRUPPE 1 301 

Figure B.3: Überführung einer Vektorgrafik in eine andere 

Modellierungs− 

A B Projektion 

C 

Transformation 

Figure B.4: Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale 

Bildfläche


Figure B.5: Pixelraster 

B.2 Gruppe 2 

Figure B.6: binäres Rasterbild 

• Gegeben sei ein Druckverfahren, welches einen Graupunkt mittels eines Pixelrasters darstellt, 

wie dies in Abbildung B.5 dargestellt wird. Wieviele Grauwerte können mit diesem Raster 

dargestellt werden? Welcher Grauwert wird in Abbildung B.5 dargestellt? [#0011] 

(Frage II/13 14. Dezember 2001) 

• Gegeben sei das binäre Rasterbild in Abbildung B.6. Gesucht sei die Quadtree-Darstellung 

dieses Bildes. Ich bitte Sie, einen sogenannten traditionellen“ Quadtree der Abbildung 

” 

B.6 in einer Baumstruktur darzustellen und mir die quadtree-relevante Zerlegung des Bildes 

grafisch mitzuteilen. [#0029] 

(Frage II/14 14. April 2000) 

• Welche Speicherplatzersparnis ergibt sich im Fall der Abbildung B.6, wenn statt eines traditionellen 

Quadtrees jener verwendet wird, in welchem die Nullen entfernt sind? Wie verhält 

sich dieser spezielle Wert zu den in der Literatur genannten üblichen Platz-Ersparnissen? 

[#0030] 


• Gegeben sei der in Abbildung B.7 dargestellte Tisch (ignorieren Sie die Lampe). Als Primitiva 

bestehen Quader und Zylinder. Beschreiben Sie bitte einen CSG-Verfahrensablauf der 

Konstruktion des Objektes (ohne Lampe). [#0031] 

(Frage II/17 14. April 2000)

B.2. GRUPPE 2 303 

Figure B.7: Tisch 

• Quantifizieren Sie bitte an einem rechnerischen Beispiel Ihrer Wahl das ” 

Geheimnis“, welches 

es gestattet, in der Stereobetrachtung mittels überlappender photographischer Bilder eine 

wesentlich bessere Tiefenwahrnehmung zu erzielen, als dies bei natürlichem binokularem 

Sehen möglich ist. [#0037] 


• Gegeben sei ein Inputbild mit den darin mitgeteilten Grauwerten (Abbildung B.8). Das 

Inputbild umfasst 5 Zeilen und 7 Spalten. Durch eine geometrische Transformation des 

Bildes gilt es nun, einigen bestimmten Pixeln im Ergebnisbild nach der Transformation 

einen Grauwert zuzuweisen, wobei der Entsprechungspunkt im Inputbild die in Tabelle B.1 

angegebenen Zeilen- und Spaltenkoordinaten aufweist. Berechnen Sie (oder ermitteln Sie mit 

grafischen Mitteln) den Grauwert zu jedem der Ergebnispixel, wenn eine bilineare Grauwertzuweisung 

erfolgt. [#0039] 

1 2 3 4 5 6 7 

2 3 4 5 6 7 8 

3 4 5 6 7 8 9 

4 5 6 7 8 9 10 

5 6 7 8 9 10 11 

Figure B.8: Inputbild 

Zeile Spalte 

2.5 1.5 

2.5 2.5 

4.75 5.25 

Table B.1: Entsprechungspunkte im Inputbild 

(Frage II/16 14. April 2000, Frage II/17 30. März 2001, Frage II/14 14. Dezember 2001) 

• Zeichnen Sie in Abbildung B.9 jene Pixel ein, die vom Bresenham-Algorithmus erzeugt 

werden, wenn die beiden markierten Pixel durch eine (angenäherte) Gerade verbunden werden. 

Geben Sie außerdem die Rechenschritte an, die zu den von Ihnen gewählten Pixeln 

führen. [#0057]


10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

1 2 3 4 

5 6 

7 

8 

9 10 11 12 

Figure B.9: Die Verbindung zweier Pixel soll angenähert werden 

Figure B.10: Objekt bestehend aus zwei Flächen 

(Frage II/11 26. Mai 2000) 

• Finden Sie eine geeignete Bezeichnung der Elemente in Abbildung B.10 und geben Sie die 

Boundary-Representation dieses Objekts an (in Form von Listen). Achten Sie dabei auf die 

Reihenfolge, damit beide Flächen ” 

in die gleiche Richtung weisen“! [#0069] 

(Frage II/12 26. Mai 2000, Frage II/12 10. November 2000, Frage II/15 11. Mai 2001, Frage 

II/11 14. Dezember 2001) 

• Bei der Erstellung eines Bildes mittels ” 

recursive raytracing“ trifft der Primärstrahl für ein 

bestimmtes Pixel auf ein Objekt A und wird gemäß Abbildung B.11 in mehrere Strahlen 

aufgeteilt, die in weiterer Folge (sofern die Rekursionstiefe nicht eingeschränkt wird) die 

Objekte B, C, D und E treffen. Die Zahlen in den Kreisen sind die lokalen Intensitäten 

jedes einzelnen Objekts (bzgl. des sie treffenden Strahles), die Zahlen neben den Verbindungen 

geben die Gewichtung der Teilstrahlen an. Bestimmen Sie die dem betrachteten Pixel 

zugeordnete Intensität, wenn 

1. die Rekursionstiefe nicht beschränkt ist, 

2. der Strahl nur genau einmal aufgeteilt wird, 

3. die Rekursion abgebrochen wird, sobald die Gewichtung des Teilstrahls unter 15% fällt! 

Kennzeichnen Sie bitte für die letzten beiden Fälle in zwei Skizzen diejenigen Teile des 

Baumes, die zur Berechnung der Gesamtintensität durchlaufen werden! [#0072] 

(Frage II/15 26. Mai 2000)

B.2. GRUPPE 2 305 

2,7 A 

0,1 

0,5 

2 B 3 C 

0,4 0,1 

2 D 4 E 

Figure B.11: Aufteilung des Primärstrahls bei ” 

recursive raytracing“ 

y 

7 

6 

5 

4 

3 

2 

1 

0 

A 

M 

B 

0 1 2 3 4 5 6 7 8 

9 10 11 

x 

Figure B.12: Lineare Transformation M eines Objekts A in ein Objekt B 

• In Abbildung B.12 ist ein Objekt A gezeigt, das durch eine lineare Transformation M in das 

Objekt B übergeführt wird. Geben Sie (für homogene Koordinaten) die 3 × 3-Matrix M an, 

die diese Transformation beschreibt (zwei verschiedene Lösungen)! [#0074] 

(Frage II/13 26. Mai 2000, Frage II/13 10. November 2000) 

• Definieren Sie den Sobel-Operator und wenden Sie ihn auf die Pixel innerhalb des fett 

umrandeten Bereiches des in Abbildung B.13 gezeigten Grauwertbildes an! Sie können das 

Ergebnis direkt in Abbildung B.13 eintragen. [#0075] 

(Frage II/14 26. Mai 2000) 

• Wenden Sie ein 3 × 3-Median-Filter auf die Pixel innerhalb des fett umrandeten Bereiches 

des in Abbildung B.14 gezeigten Grauwertbildes an! Sie können das Ergebnis direkt in 

Abbildung B.14 eintragen. [#0080] 

(Frage II/11 30. Juni 2000, Frage II/14 10. November 2000) 

• In Abbildung B.15 ist ein Objekt gezeigt, dessen Oberflächeneigenschaften nach dem Beleuchtungsmodell 

von Phong beschrieben werden. Tabelle B.2 enthält alle relevanten Parameter 

der Szene. Bestimmen Sie für den eingezeichneten Objektpunkt p die vom Beobachter 

wahrgenommene Intensität I dieses Punktes! 

Hinweis: Der Einfachkeit halber wird nur in zwei Dimensionen und nur für eine Wellenlänge 

gerechnet. Zur Ermittlung der Potenz einer Zahl nahe 1 beachten Sie bitte, dass die 

Näherung (1 − x) k ≈ 1 − kx für kleine x verwendbar ist. [#0085] 

(Frage II/12 30. Juni 2000, Frage II/15 15. Dezember 2000)


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 1 

1 1 

1 1 

1 2 

2 2 

2 2 

1 

1 

2 

3 

2 

2 

1 

1 

2 

2 

2 

2 

1 

1 

1 

1 

1 

2 

1 

1 

1 

1 

1 

1 

Sobel 

Figure B.13: Anwendung des Sobel-Operators auf ein Grauwertbild 

0 

0 

0 

0 

0 

0 

0 

0 5 0 0 0 0 0 0 

0 5 0 0 4 0 0 0 

0 1 5 0 0 1 2 4 

0 0 5 2 4 5 5 5 

0 1 3 5 5 5 5 5 

1 3 5 5 5 2 5 5 

2 5 5 3 5 5 5 5 

Figure B.14: Anwendung eines Median-Filters auf ein Grauwertbild 

• Ermitteln Sie zu dem Grauwertbild aus Abbildung B.16 eine Bildpyramide, wobei jedem 

Pixel einer Ebene der Mittelwert der entsprechenden vier Pixel aus der übergeordneten 

(höher aufgelösten) Ebene zugewiesen wird! [#0088] 

(Frage II/13 30. Juni 2000, Frage II/11 10. November 2000, Frage II/12 15. Dezember 2000, 

Frage II/14 28. September 2001) 


Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten 


ein! [#0089] 

(Frage II/14 30. Juni 2000, Frage II/15 10. November 2000, Frage II/15 1. Februar 2002) 

• Erklären Sie die einzelnen Schritte des Clipping-Algorithmus nach Cohen-Sutherland 

anhand des Beispiels in Abbildung B.18. Die Zwischenergebnisse mit den half-space Codes 

sind darzustellen. Es ist jener Teil der Strecke AB zu bestimmen, der innerhalb des Rechtecks 

R liegt. Die dazu benötigten Zahlenwerte (auch die der Schnittpunkte) können Sie direkt 

aus Abbildung B.18 ablesen. [#0092] 

(Frage II/15 30. Juni 2000) 

• Gegeben seien die Transformationsmatrix 

⎛ 

und zwei Punkte 

M = 

⎛ 

p 1 = ⎝ 

⎜ 

⎝ 

3 

−1 

1 

0 2 0 0 

0 0 2 0 

1 0 0 −5 

−2 0 0 8 

⎞ 

⎠ , p 2 = 

⎛ 

⎞ 

⎟ 

⎠ 

⎝ 2 4 

−1 

in Objektkoordinaten. Führen Sie die beiden Punkte p 1 und p 2 mit Hilfe der Matrix M in 

die Punkte p ′ 1 bzw. p ′ 2 in (normalisierten) Bildschirmkoordinaten über (beachten Sie dabei 

die Umwandlungen zwischen dreidimensionalen und homogenen Koordinaten)! [#0099] 

⎞ 

⎠

B.2. GRUPPE 2 307 

Lichtquelle 

Beobachter 

N 

L 

V 

p 

Figure B.15: Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell 

Parameter Formelzeichen Wert 

diffuser Reflexionskoeffizient k d 0.2 

Spiegelreflexionskoeffizient W (θ) = k s 0.5 

Spiegelreflexionsexponent n 3 

Richtung zur Lichtquelle L (−0.6, 0.8) T 

Richtung zum Beobachter V (0.8, 0.6) T 

Oberflächennormalvektor N (0, 1) T 

Intensität des ambienten Umgebungslichtes I a 0 

Intensität der Lichtquelle I p 2 

Table B.2: Parameter für das Phongsche Beleuchtungsmodell in Abbildung B.15 

(Frage II/11 13. Oktober 2000) 

• Wenden Sie den Clipping-Algorithmus von Cohen-Sutherland (in zwei Dimensionen) 

auf die in Beispiel B.2 gefundenen Punkte p ′ 1 und p ′ 2 an, um den innerhalb des Quadrats 

Q = {(0, 0) T , (0, 1) T , (1, 1) T , (1, 0) T } liegenden Teil der Verbindungsstrecke zwischen p ′ 1 und 

p ′ 2 zu finden! Sie können das Ergebnis direkt in Abbildung B.19 eintragen und Schnittberechnungen 

grafisch lösen. [#0100] 


• Das Quadrat Q in normalisierten Bildschirmkoordinaten aus Beispiel B.2 wird in ein Rechteck 

R mit den Abmessungen 10 × 8 in Bildschirmkoordinaten transformiert. Zeichnen Sie die 

Verbindung der zwei Punkte p ′ 1 und p ′ 2 in Abbildung B.20 ein und bestimmen Sie grafisch 

jene Pixel, die der Bresenham-Algorithmus wählen würde, um die Verbindung diskret zu 

approximieren! [#0102] 


• Zu dem digitalen Rasterbild in Abbildung B.21 soll das Gradientenbild gefunden werden. 

Geben Sie einen dazu geeigneten Operator an und wenden Sie ihn auf die Pixel innerhalb des 

fett umrandeten Rechtecks an. Sie können das Ergebnis direkt in Abbildung B.21 eintragen. 

Führen Sie außerdem für eines der Pixel den Rechengang vor. [#0103] 

(Frage II/14 13. Oktober 2000)


3 

2 

0 

0 

8 

7 

3 

1 

9 9 

6 8 

6 9 

2 7 

Figure B.16: Grauwertbild als höchstauflösende Ebene einer Bildpyramide 

3 

2 

4 

1 

Figure B.17: Polygon für BSP-Darstellung 

• Nehmen Sie an, der Gradientenoperator in Aufgabe B.2 hätte das Ergebnis in Abbildung 

B.22 ermittelt. Zeichnen Sie das Histogramm dieses Gradientenbildes und finden Sie einen 

geeigneten Schwellwert, um Kantenpixel“ zu identifizieren. Markieren Sie in Abbildung 

” 

B.22 rechts alle jene Pixel (Kantenpixel), die mit diesem Schwellwert gefunden werden. 

[#0104] 


• Beschreiben Sie mit Hilfe morphologischer Operationen ein Verfahren zur Bestimmung des 

Randes eines Region. Wenden Sie dieses Verfahren auf die in Abbildung B.23 eingezeichnete 

Region an und geben Sie das von Ihnen verwendete 3 × 3-Formelement an. In Abbildung 

B.23 ist Platz für das Endergebnis sowie für Zwischenergebnisse. [#0106] 


• In Abbildung B.24 sind zwei Binärbilder A und B gezeigt, wobei schwarze Pixel logisch ” 

1“ 

und weiße Pixel logisch ” 

0“ entsprechen. Führen sie die Boolschen Operationen 

1. A and B, 

2. A xor B, 

3. A minus B 

aus und tragen Sie die Ergebnisse in Abbildung B.24 ein! [#0132] 

(Frage II/11 15. Dezember 2000, Frage II/15 15. März 2002) 

• Gegeben sei ein Farbwert C RGB = (0.8, 0.5, 0.1) T im RGB-Farbmodell. 

1. Welche Spektralfarbe entspricht am ehesten dem durch C RGB definierten Farbton? 

2. Finden Sie die entsprechende Repräsentation von C RGB im CMY- und im CMYK- 

Farbmodell! 

(Frage II/13 15. Dezember 2000, Frage II/14 19. Oktober 2001) 

[#0134]

B.2. GRUPPE 2 309 

y 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

A 

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 

R 

B 

x 

Figure B.18: Anwendung des Clipping-Algorithmus von Cohen-Sutherland 

y 

3 

2 

1 

Q 

−2 −1 0 1 2 

−1 

3 

x 

−2 

Figure B.19: Clipping nach Cohen-Sutherland 

• Bestimmen Sie mit Hilfe der normalisierten Korrelation RN 2 (m, n) jenen Bildausschnitt innerhalb 

des fett umrandeten Bereichs in Abbildung B.25, der mit der ebenfalls angegebenen 

Maske M am besten übereinstimmt. Geben Sie Ihre Rechenergebnisse an und markieren Sie 

den gefundenen Bereich in Abbildung B.25! [#0135] 

(Frage II/14 15. Dezember 2000) 

• In Abbildung B.26 sehen Sie vier Punkte P 1 , P 2 , P 3 und P 4 , die als Kontrollpunkte für 

eine Bezier-Kurve x(t) dritter Ordnung verwendet werden. Konstruieren Sie mit Hilfe des 

Verfahrens von Casteljau den Kurvenpunkt für den Parameterwert t = 1 3 , also x( 1 3 

), und 

erläutern Sie den Konstruktionsvorgang! Sie können das Ergebnis direkt in Abbildung B.26 

eintragen, eine skizzenhafte Darstellung ist ausreichend. 

Hinweis: der Algorithmus, der hier zum Einsatz kommt, ist der gleiche, der auch bei der 

Unterteilung einer Bezier-Kurve (zwecks flexiblerer Veränderung) verwendet wird. [#0164] 

(Frage II/13 2. Februar 2001, Frage II/12 9. November 2001, Frage II/15 14. Dezember 2001, 

Frage II/14 15. März 2002) 

• Berechnen Sie jene Transformationsmatrix M, die eine Rotation um 45 ◦ im Gegenuhrzeigersinn 

um den Punkt R = (3, 2) T und zugleich eine Skalierung mit dem Faktor √ 2 bewirkt


R 

Figure B.20: Verbindung zweier Punkte nach Bresenham 

0 0 0 1 2 2 3 

0 1 2 3 3 3 3 

1 2 3 7 7 6 3 

1 2 7 8 9 8 4 

2 2 8 8 8 9 5 

Figure B.21: Anwendung eines Gradientenoperators 

(wie in Abbildung B.27 veranschaulicht). Geben Sie M für homogene Koordinaten in zwei 

Dimensionen an (also eine 3 × 3-Matrix), sodass ein Punkt p gemäß p ′ = Mp in den Punkt 

p ′ übergeführt wird. 

Hinweis: Sie ersparen sich viel Rechen- und Schreibarbeit, wenn Sie das Assoziativgesetz für 

die Matrixmultiplikation geeignet anwenden. [#0166] 

(Frage II/15 2. Februar 2001) 

• In der Bildklassifikation wird oft versucht, die unbekannte Wahrscheinlichkeitsdichtefunktion 

der N bekannten Merkmalsvektoren im m-dimensionalen Raum durch eine Gausssche Normalverteilung 

zu approximieren. Hierfür wird die m×m-Kovarianzmatrix C der N Vektoren 

benötigt. Abbildung B.28 zeigt drei Merkmalsvektoren p 1 , p 2 und p 3 in zwei Dimensionen 

(also N = 3 und m = 2). Berechnen Sie die dazugehörige Kovarianzmatrix C! [#0173] 


• Skizzieren Sie das Histogramm des digitalen Grauwertbildes aus Abbildung B.29, und kommentieren 

Sie Ihre Skizze! [#0176] 

(Frage II/12 2. Februar 2001, Frage II/13 19. Oktober 2001, Frage II/12 14. Dezember 2001)

B.2. GRUPPE 2 311 

0 1 2 2 0 0 0 

2 3 5 7 6 4 2 

1 4 8 7 7 7 4 

0 6 8 3 2 6 3 

0 8 8 1 0 5 0 

Figure B.22: Auffinden der Kantenpixel 

Figure B.23: Rand einer Region 






(Frage II/14 2. Februar 2001, Frage II/15 19. Oktober 2001) 

[#0182] 

• Wenden Sie auf das Binärbild in Abbildung B.31 links die morphologische Operation Öffnen“ 

mit dem angegebenen Formelement an! Welcher für das morphologische Öffnen typische ” 



” ” 

rechts in Abbildung B.31 eintragen. [#0186] 


• Gegeben sei ein Farbwert C RGB = (0.8, 0.4, 0.2) T im RGB-Farbmodell. Schätzen Sie grafisch 

die Lage des Farbwertes C HSV in Abbildung B.32 (also die Entsprechung von C RGB im HSV- 

Modell). Skizzieren Sie ebenso die Lage eines Farbwertes C HSV ′ , der den gleichen Farbton 

und die gleiche Helligkeit aufweist wie C HSV , jedoch nur die halbe Farbsättigung! [#0201] 

(Frage II/13 11. Mai 2001, Frage II/13 1. Februar 2002)


A B and xor minus 

Figure B.24: Boolsche Operationen auf Binärbildern 

0 1 1 1 2 2 2 

0 0 1 0 1 1 2 

1 1 1 0 0 1 2 

1 2 2 1 2 1 1 

2 2 1 0 1 0 0 

0 1 0 0 1 1 0 

0 1 

1 2 

M 

Figure B.25: Ermittlung der normalisierten Korrelation 

• Abbildung B.33 zeigt einen Graukeil, in dem alle Grauwerte von 0 bis 255 in aufsteigender 

Reihenfolge vorkommen, die Breite beträgt 50 Pixel. Zeichnen Sie das Histogramm dieses 

Bildes und achten Sie dabei auf die korrekten Zahlenwerte! Der schwarze Rand in Abbildung 

B.33 dient nur zur Verdeutlichung des Umrisses und gehört nicht zum Bild selbst. [#0203] 

(Frage II/12 30. März 2001) 

• Wenden Sie auf den fett umrandeten Bereich in Abbildung B.34 den Roberts-Operator zur 

Kantendetektion an! Sie können das Ergebnis direkt in Abbildung B.34 eintragen. [#0204] 

(Frage II/14 30. März 2001) 

• Wenden Sie den Splitting-Algorithmus auf Abbildung B.35 an, um eine vereinfachte zweidimensionale 

Polygonrepräsentation des gezeigten Objekts zu erhalten, und kommentieren Sie 

einen Schritt des Algorithmus im Detail anhand Ihrer Zeichnung! Wählen Sie den Schwellwert 

so, dass die wesentlichen Details des Bildes erhalten bleiben (der Mund der Figur kann 

vernachlässigt werden). Sie können das Ergebnis (und die Zwischenschritte) direkt in Abbildung 

B.35 einzeichnen. [#0207] 

(Frage II/13 30. März 2001) 

• Gegeben seien eine 4 × 4-Matrix 

sowie vier Punkte 

M = 

⎛ 

⎜ 

⎝ 

8 0 8 −24 

0 8 8 8 

0 0 0 24 

0 0 1 1 

p 1 = (3, 0, 1) T 

p 2 = (2, 0, 7) T 

p 3 = (4, 0, 5) T 

p 4 = (1, 0, 3) T 

im dreidimensionalen Raum. Die Matrix M fasst alle Transformationen zusammen, die zur 

Überführung eines Punktes p in Weltkoordinaten in den entsprechenden Punkt p ′ = M · p 

⎞ 

⎟ 

⎠

B.2. GRUPPE 2 313 

¦¥ 

¤£ 

¢¡ 

Figure B.26: Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau 

¨§ 

y 

y 

5 

5 

4 

4 

3 

3 

2 

1 

R 

2 

1 

R 

0 

0 

1 2 3 4 5 6 

x 

0 

0 

1 2 3 4 5 6 

x 

Figure B.27: allgemeine Rotation mit Skalierung 

in Gerätekoordinaten erforderlich sind (siehe auch Abbildung B.36, die Bildschirmebene und 

daher die y-Achse stehen normal auf die Zeichenebene). Durch Anwendung der Transformationsmatrix 

M werden die Punkte p 1 und p 2 auf die Punkte 

p ′ 1 = (4, 8, 12) T 

p ′ 2 = (6, 8, 3) T 

in Gerätekoordinaten abgebildet. Berechnen Sie in gleicher Weise p ′ 3 und p ′ 4! [#0210] 

(Frage II/15 30. März 2001) 

• Die vier Punkte aus Aufgabe B.2 bilden zwei Strecken 

A = p 1 p 2 , B = p 3 p 4 , 

deren Projektionen in Gerätekoordinaten in der Bildschirmebene in die gleiche Scanline 

fallen. Bestimmen Sie grafisch durch Anwendung des z-Buffer-Algorithmus, welches Objekt 

(A, B oder keines von beiden) an den Pixelpositionen 0 bis 10 dieser Scanline sichtbar ist! 

Hinweis: Zeichnen Sie p 1 p 2 und p 3 p 4 in die xz-Ebene des Gerätekoordinatensystems ein! 

[#0211] 

(Frage II/16 30. März 2001) 

• In Abbildung B.37 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender 

Reihenfolge vorkommen (also f(x) = x im angegebenen Koordinatensystem, zur 

Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil 

1. ein lineares Tiefpassfilter F 1 ,


y 

6 

5 

4 

3 

2 

1 

0 

−1 

¤£ 

¦¥ 

¢¡ 

−2 

−1 0 1 2 3 4 5 6 

x 

Figure B.28: drei Merkmalsvektoren im zweidimensionalen Raum 

Figure B.29: digitales Grauwertbild (Histogramm gesucht) 

2. ein lineares Hochpassfilter F 2 

mit 3×3-Filterkernen Ihrer Wahl an und geben Sie Ihr Ergebnis in Form eines Bildausschnitts 

wie in Abbildung B.37 oder als Funktionen f 1 (x) und f 2 (x) an! Zeichnen Sie außerdem 

die von Ihnen verwendeten Filterkerne. Randpixel müssen nicht gesondert berücksichtigt 

werden. [#0214] 

(Frage II/12 11. Mai 2001, Frage II/11 9. November 2001) 

• In Abbildung B.38(a) ist ein digitales Grauwertbild gezeigt, in dem mittels normalisierter 

Kreuzkorrelation das Strukturelement aus Abbildung B.38(b) gesucht werden soll. Markieren 

Sie in Abbildung B.38(a) die Position, an der der Wert der normalisierten Kreuzkorrelation 

maximal ist! Die Aufgabe ist grafisch zu lösen, es sind keine Berechnungen erforderlich. 

[#0223] 

(Frage II/14 11. Mai 2001) 

• Wenden Sie die ” 

medial axis“ Transformation von Bloom auf das Objekt in Abbildung B.39 

links an! Sie können das Ergebnis direkt in Abbildung B.39 rechts eintragen. [#0226] 

(Frage II/16 11. Mai 2001)

B.2. GRUPPE 2 315 

(a) Tiefpass 

(b) Hochpass 

Figure B.30: leere Filtermasken 

Formelement 

Figure B.31: morphologisches Öffnen 

• Gegeben seien eine 3 × 3-Transformationsmatrix 

⎛ 

⎞ 

M = −4 3 1 ⎠ 

⎝ 3 4 2 

0 0 1 

sowie drei Punkte 

a = (2, 0) T , 

b = (0, 1) T , 

c = (0, 0) T 

im zweidimensionalen Raum. Die Matrix M beschreibt in homogenen Koordinaten eine 

konforme Transformation, wobei ein Punkt p gemäß p ′ = Mp in einen Punkt p ′ übergeführt 

wird. Die Punkte a, b und c bilden ein rechtwinkeliges Dreieck, d.h. die Strecken ac und 

bc stehen normal aufeinander. 

1. Berechnen Sie a ′ , b ′ und c ′ durch Anwendung der durch M beschriebenen Transformation 

auf die Punkte a, b und c! 

2. Da M eine konforme Transformation beschreibt, müssen auch die Punkte a ′ , b ′ und 

c ′ ein rechtwinkeliges Dreieck bilden. Zeigen Sie, dass dies hier tatsächlich der Fall 

ist! (Hinweis: es genügt zu zeigen, dass die Strecken a ′ c ′ und b ′ c ′ normal aufeinander 

stehen.) 

[#0229]


grün 

gelb 

cyan 

weiß 

rot 

blau 

magenta 

Figure B.32: eine Ebene im HSV-Farbmodell 

50 

256 

Figure B.33: Graukeil 

(Frage II/17 11. Mai 2001) 

• Geben Sie je eine 3 × 3-Filtermaske zur Detektion 

1. horizontaler 

2. vertikaler 

Kanten in einem digitalen Rasterbild an! [#0247] 

(Frage II/12 26. Juni 2001, Frage II/11 15. März 2002) 

• In Abbildung B.40 ist einen Graukeil gezeigt, in dem alle Grauwerte von 0 bis 255 in aufsteigender 

Reihenfolge vorkommen (also f(x) = x im angegebenen Koordinatensystem, zur 

Verdeutlichung ist ein Ausschnitt vergrößert dargestellt). Wenden Sie auf den Graukeil die 

in Aufgabe B.2 gefragten Filterkerne an und geben Sie Ihr Ergebnis in Form eines Bildausschnitts 

wie in Abbildung B.40 oder als Funktionen f 1 (x) und f 2 (x) an! Randpixel müssen 

nicht gesondert berücksichtigt werden. [#0248] 

(Frage II/14 26. Juni 2001) 

• Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.41 links an. Verwenden 

Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren 

Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.41 rechts 

eintragen. [#0255] 

(Frage II/16 26. Juni 2001, Frage II/14 1. Februar 2002) 

• Gegeben seien eine Kugel mit Mittelpunkt m S , ein Punkt p S auf der Kugeloberfläche und 

eine Lichtquelle an der Position p L mit der Intensität I L . Die Intensität soll physikalisch 

korrekt mit dem Quadrat der Entfernung abnehmen. Die Oberfläche der Kugel ist durch das

B.2. GRUPPE 2 317 

9 9 8 8 6 7 6 6 

7 8 9 8 7 2 3 1 

6 8 7 8 3 2 0 1 

8 7 8 2 3 1 1 2 

7 6 7 1 0 2 3 1 

7 6 8 2 2 1 2 0 

Figure B.34: Roberts-Operator 

Lambert’sche Beleuchtungsmodell beschrieben, der diffuse Reflexionskoeffizient ist k d . Die 

Szene wird von einer synthetischen Kamera an der Position p C betrachtet. Berechnen Sie 

die dem Punkt p S zugeordnete Intensität I S unter Verwendung der Angaben aus Tabelle 

B.3! 

Hinweis: der Punkt p S ist von der Kameraposition p C aus sichtbar, diese Bedingung muss 

nicht überprüft werden. [#0257] 


Kugelmittelpunkt m S (−2, 1, −4) T 

Oberflächenpunkt p S (−4, 5, −8) T 

Position der Lichtquelle p L (2, 7, −11) T 

Intensität der Lichtquelle I L 343 

diffuser Reflexionskoeffizient k d 1 

Position der Kamera p C (−e 2 , 13.7603, −4π) T 

Table B.3: Geometrie und Beleuchtungsparameter der Szene 

(Frage II/13 26. Juni 2001) 

• In Abbildung B.42(a) ist eine diskret approximierte Linie eingezeichnet. Erzeugen Sie daraus 

auf zwei verschiedene Arten eine ” 

drei Pixel dicke“ Linie und beschreiben Sie die von Ihnen 

verwendeten Algorithmen! Sie können die Ergebnisse direkt in die Abbildungen B.42(b) und 

B.42(c) einzeichnen. [#0258] 

(Frage II/15 26. Juni 2001, Frage II/11 28. September 2001, Frage 10 20. November 2001) 

• Geben Sie eine 4 × 4-Matrix für homogene Koordinaten in drei Dimensionen an, die eine 

perspektivische Projektion mit dem Projektionszentrum p 0 = (2, 3, −1) T beschreibt! 

Hinweis: das Projektionszentrum wird in homogenen Koordinaten auf den Punkt (0, 0, 0, 0) T 

abgebildet. [#0260] 

(Frage II/17 26. Juni 2001) 

• Gegeben seien eine Kugel S (durch Mittelpunkt M und Radius r), ein Punkt p S auf der 

Kugeloberfläche und ein Dreieck T (durch die drei Eckpunkte p 1 , p 2 und p 3 ). Berechnen 

Sie unter Verwendung der Angaben aus Tabelle B.4 

1. den Oberflächennormalvektor n S der Kugel im Punkt p S , 

2. den Oberflächennormalvektor n T des Dreiecks! 

Eine Normierung der Normalvektoren auf Einheitslänge ist nicht erforderlich. [#0262] 

(Frage II/13 28. September 2001)


Figure B.35: zweidimensionale Polygonrepräsentation 

• Zeichnen Sie in Abbildung B.43 die zweidimensionale Figur ein, die durch den dort angeführten 

Kettencode definiert ist. Beginnen Sie bei dem mit ” 

×“ markierten Pixel. Um welche 

Art von Kettencode handelt es sich hier (bzgl. der verwendeten Nachbarschaftsbeziehungen)? 

[#0268] 

(Frage II/15 28. September 2001) 

• Ein Laserdrucker hat eine Auflösung von 600dpi. Wie viele Linienpaare pro Millimeter sind 

mit diesem Gerät einwandfrei darstellbar (es genügen die Formel und eine grobe Abschätzung)? 

[#0269] 

(Frage II/12 28. September 2001, Frage II/15 9. November 2001, Frage 8 20. November 2001) 

• Gegeben seien ein Punkt p O = (3, −2, −1) T in Objektkoordinaten sowie die Matrizen 

⎛ 

⎞ ⎛ 

⎞ 

4 0 0 −3 

1 0 0 0 

M = ⎜ 0 2 0 4 

⎟ 

⎝ 0 0 3 6 ⎠ , P = ⎜ 0 1 0 0 

⎟ 

⎝ 0 0 0 1 ⎠ , 

0 0 0 1 

0 0 1 0 

wobei M die Modellierungs- und P die Projektionsmatrix beschreiben. Berechnen Sie 

1. den Punkt p W = M · p O in Weltkoordinaten, 

2. den Punkt p S = P · p W in Bildschirmkoordinaten, 

3. die Matrix M ′ = P · M!

B.2. GRUPPE 2 319 

z 

8 

7 

6 

5 

4 

3 

2 

1 

0 

¤£ 

¦¥ 

¨§ © 

¢¡ 

−1 

−2 

−1 0 1 2 3 4 5 6 7 

x 

Figure B.36: Objekt und Kamera im Weltkoordinatensystem 

0 256 

x 

0 1 2 3 4 5 

0 

0 

0 

0 

0 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 

5 


Hinweis zu 3: die Multplikation mit P entspricht hier lediglich einer Zeilenvertauschung. 

[#0272] 


• In Abbildung B.44 sind vier Punkte A, B, C und D eingezeichnet. Transformieren Sie diese 

Punkte nach der Vorschrift 

x ′ = 2x − 3y + xy + 4 

y ′ = 4x + y − 2xy + 2 

und zeichnen Sie Ihr Ergebnis (A ′ , B ′ , C ′ und D ′ ) direkt in Abbildung B.44 rechts ein! Um 

welche Art von Transformation handelt es sich hier? [#0277] 


• Abbildung B.45 zeigt ein digitales Rasterbild, das als Textur verwendet wird. Durch die große 

Entfernung von der virtuellen Kamera erscheint die Fläche im Verhältnis 1:3 verkleinert, 

wobei aus Gründen der Effizienz der einfache Sub-Sampling-Algorithmus für die Verkleinerung 

verwendet wird. Zeichnen Sie in Abbildung B.45 rechts das Bild ein, wie es am Ausgabegerät


(a) 

(b) 

Figure B.38: Anwendung der normalisierten Kreuzkorrelation 

Figure B.39: Anwendung der medial axis Transformation 

erscheint, und markieren Sie links die verwendeten Pixel. Welchen Effekt können Sie hier 

beobachten, und warum tritt er auf? [#0284] 

(Frage II/13 9. November 2001) 

• Abbildung B.46 zeigt drei digitale Grauwertbilder und deren Histogramme. Geben Sie für 

jedes der Bilder B.46(a), B.46(c) und B.46(e) an, welches das dazugehörige Histogramm ist 

(B.46(b), B.46(d) oder B.46(f)), und begründen Sie Ihre jeweilige Antwort! [#0285] 

(Frage II/14 9. November 2001) 

• Zeichnen Sie in Abbildung B.47 jene Pixel ein, die benötigt werden, um im Halbtonverfahren 

die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der 

Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen 

” 

dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0289] 


• Zeichnen Sie in Abbildung B.48 jene Pixel ein, die benötigt werden, um im Halbtonverfahren 

die angegebenen Grauwerte 0 bis 9 darzustellen! Verwenden Sie dazu die bei der 

Veranschaulichung des Halbtonverfahrens übliche Konvention, dass on“-Pixel durch einen 

” 

dunklen Kreis markiert werden. Achten Sie auf die Reihenfolge der Werte 0 bis 9! [#0294] 


• Wenden Sie auf das Binärbild in Abbildung B.49 links die morphologische Operation ” 

Schließen“ 

mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische 

Effekt tritt auch in diesem Beispiel auf?

B.2. GRUPPE 2 321 

0 256 

x 

0 1 2 3 4 5 

0 

0 

0 

0 

0 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 5 

1 2 3 4 

5 


X 

Figure B.41: Anwendung des Hit-or-Miss-Operators auf ein Binärbild 


” ” 



• Wenden Sie den Hit-or-Miss-Operator auf das Binärbild in Abbildung B.50 links an. Verwenden 

Sie das angebene Strukturelement X (Zentrumspixel ist markiert) und definieren 

Sie ein geeignetes Fenster W ! Sie können das Ergebnis direkt in Abbildung B.50 rechts 

eintragen. [#0301] 


• Wenden Sie auf das Binärbild in Abbildung B.51 links die morphologische Operation Schließen“ ” 

mit dem angegebenen Formelement an! Welcher für das morphologische Schließen typische 



” ” 




Binary Space Partitioning Tree“ (BSP-Tree) mit möglichst wenig Knoten


(a) ” 

dünne“ Linie (b) ” 

dicke“ Linie (Variante 1) (c) ” 

dicke“ Linie (Variante 2) 

Figure B.42: Erstellen dicker Linien 


Kugelmittelpunkt M S (−2, 1, −4) T 

Kugelradius r 6 

Punkt auf Kugeloberfläche p S (−4, 5, −8) T 

Dreieckseckpunkt p 1 (2, 1, 3) T 



Table B.4: Geometrie der Objekte 


ein! [#0305] 


• Gegeben seien das Farbbildnegativ in Abbildung B.53 sowie die durch die Kreise markierten 

Farbwerte A, B und C laut folgender Tabelle: 

Farbe 

A 

B 

C 

Farbwert (RGB) 

(0.6, 0.4, 0.3) T 

(0.3, 0.2, 0.1) T 

(0.5, 0.3, 0.1) T 

Berechnen Sie die Farbwerte A ′ , B ′ und C ′ , die das entsprechende Positivbild an den gleichen 

markierten Stellen wie in Abbildung B.53 aufweist! [#0312] 

(Frage II/12 15. März 2002) 

• In Abbildung B.54 sollen eine überwachte Klassifikation ( supervised classification“) anhand 

” 

gegebener Trainingsdaten durchgeführt und auf ebenfalls gegebene neue Daten angewandt 

werden. Der Merkmalsraum ( feature space“) ist eindimensional, d.h. es ist nur ein skalares 

” 

Merkmal ( feature“) zu berücksichtigen. Die Werte dieses Merkmals sind in Abbildung 

” 

B.54(a) für ein 3 × 3 Pixel großes digitales Grauwertbild eingetragen, Abbildung B.54(b) 

zeigt die dazugehörigen Zuordnungen zu den Klassen A und B. 

Die Klassifikation soll unter der Annahme einer Normalverteilung (Gauss’sche Wahrscheinlichkeitsdichte) 

der Daten erfolgen. Bestimmen Sie die Klassenzuordnung der Pixel in Abbildung 

B.54(c) (tragen Sie Ihr Ergebnis in Abbildung B.54(d) ein) und geben Sie ebenso 

Ihre Zwischenergebnisse an! 

Hinweis: die Standardabweichung σ beider Klassen ist gleich und muss nicht berechnet werden. 

[#0315]

B.2. GRUPPE 2 323 

Figure B.43: Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz 

” 221000110077666434544345“ 

y 

10 

9 

8 

7 

6 

5 

4 

D 

3 

C 

2 

1 

A B 

0 

0 1 2 3 4 5 6 7 8 9 10 x 

y 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

0 1 2 3 4 5 6 7 8 9 10 x 

Figure B.44: Transformation von vier Punkten 

(Frage II/13 15. März 2002)


Figure B.45: Sub-Sampling

B.2. GRUPPE 2 325 

(a) Vancouver (b) Histogramm 1 

(c) Kluane (d) Histogramm 2 

(e) Steiermark (f) Histogramm 3 

Figure B.46: drei digitale Grauwertbilder und ihre Histogramme


0 

1 2 3 4 5 6 7 8 9 

Figure B.47: Halbtonverfahren 

9 8 7 6 5 4 3 2 1 0 


Formelement 

Figure B.49: morphologisches Schließen 

X 

Figure B.50: Anwendung des Hit-or-Miss-Operators auf ein Binärbild

B.2. GRUPPE 2 327 

Formelement 


2 

3 

1 

4 

Figure B.52: Polygon für BSP-Darstellung 

A 

C 

B 

Figure B.53: Farbbildnegativ 

2 1 5 

A A A 

9 8 12 

A 

A 

B 

3 8 

8 6 14 

B B B 

7 11 

(a) Trainingsdaten 

(b) Klassifikation 

(c) neue Daten 

(d) Ergebnis 

Figure B.54: überwachte Klassifikation


Figure B.55: Rechteck mit Störobjekten 

A 

B.3 Gruppe 3 

Figure B.56: Pixelanordnung 

• Abbildung B.55 zeigt ein rechteckiges Objekt und dazu einige kleinere Störobjekte. Erläutern 

Sie bitte ein Verfahren des morphologischen Filterns, welches die Störobjekte eliminiert. 

Verwenden Sie bitte dazu Formelausdrücke und zeigen Sie mit grafischen Skizzen den Verfahrensablauf. 

Stellen Sie auch das Ergebnisbild dar. [#0008] 

(Frage III/20 14. April 2000) 

• Gegeben sei die in Abbildung B.56 dargestellte Pixelanordnung. Beschreiben Sie grafisch, 

mittels Formel oder in Worten einen Algorithmus zur Bestimmung des Schwerpunktes dieser 

Pixelanordnung. [#0010] 


• Gegeben sei Abbildung B.57 mit den angebenen linienhaften weißen Störungen. Welche 

Methode der Korrektur schlagen Sie vor, um diese Störungen zu entfernen? Ich bitte um

B.3. GRUPPE 3 329 

Figure B.57: Bild mit Störungen 

die Darstellung der Methode und die Begründung, warum diese Methode die Störungen 

entfernen wird. [#0018] 


• Gegeben sei die Rasterdarstellung eines Objektes in Abbildung B.58, wobei das Objekt 

nur durch seine drei Eckpunkte A, B und C dargestellt ist. Die Helligkeit der Eckpunkte 

ist I A = 100, I B = 50 und I C = 0. Berechne die Beleuchtungswerte nach dem Gouraud- 

Verfahren in zumindest fünf der zur Gänze innerhalb des Dreieckes zu liegenden kommenden 

Pixeln. [#0035] 


• Gegeben sei das Grauwertbild in Abbildung B.59. Bestimmen Sie das Histogramm dieses 

Bildes! Mit Hilfe des Histogramms soll ein Schwellwert gesucht werden, der geeignet ist, 

das Bild in Hintergrund (kleiner Wert, dunkel) und Vordergrund (großer Wert, hell) zu 

segmentieren. Geben Sie den Schwellwert an sowie das Ergebnis der Segmentierung in Form 

eines Binärbildes (mit 0 für den Hintergrund und 1 für den Vordergrund)! [#0064] 

(Frage III/16 26. Mai 2000, Frage 9 20. November 2001) 

• Die Transformationsmatrix M aus Abbildung B.60 ist aus einer Translation T und einer 

Skalierung S zusammengesetzt, also M = T · S (ein Punkt p wird gemäß q = M · p in den 

Punkt q übergeführt). Bestimmen Sie T, S und M −1 (die Inverse von M)! [#0065] 

(Frage III/17 26. Mai 2000, Frage III/16 10. November 2000, Frage III/19 28. September 

2001) 

• In der Vorlesung wurden Tiefenwahrnehmungshilfen ( ” 

depth cues“) besprochen, die es dem 

menschlichen visuellen System gestatten, die bei der Projektion auf die Netzhaut verlorengegangene 

dritte Dimension einer betrachteten Szene zu rekonstruieren. Diese Aufgabe wird in 

der digitalen Bildverarbeitung von verschiedenen ” 

shape from X“-Verfahren gelöst. Welche 

” depth cues“ stehen in unmittelbarem Zusammenhang mit einem entsprechenden ” shape 

from X“-Verfahren, und für welche Methoden der natürlichen bzw. künstlichen Tiefenabschätzung 

kann kein solcher Zusammenhang hergestellt werden? [#0071] 

(Frage III/18 26. Mai 2000)


B 

A 

C 

Figure B.58: Rasterdarstellung eines Objekts 

1 3 6 2 

5 6 6 1 

6 7 5 2 

6 4 1 0 

Figure B.59: Grauwertbild 

• In Abbildung B.61 ist ein digitales Rasterbild gezeigt, das durch eine überlagerte Störung in 

der Mitte heller ist als am Rand. Geben Sie ein Verfahren an, das diese Störung entfernt! 

[#0076] 

(Frage III/19 26. Mai 2000) 

• Abbildung B.62 zeigt ein eingescanntes Farbfilmnegativ. Welche Schritte sind notwendig, 

um daraus mittels digitaler Bildverarbeitung ein korrektes Positivbild zu erhalten? Berücksichtigen 

Sie dabei, dass die optische Dichte des Filmes auch an unbelichteten Stellen größer 

als Null ist. Geben Sie die mathematische Beziehung zwischen den Pixelwerten des Negativund 

des Positivbildes an! [#0077] 

(Frage III/20 26. Mai 2000) 

• Auf der derzeit laufenden steirischen Landesausstellung ” 

comm.gr2000az“ im Schloss Eggenberg 

in Graz ist ein Roboter installiert, der einen ihm von Besuchern zugeworfenen Ball fangen 

soll. Um den Greifer des Roboters zur richtigen Zeit an der richtigen Stelle schließen zu 

können, muss die Position des Balles während des Fluges möglichst genau bestimmt werden. 

Zu diesem Zweck sind zwei Kameras installiert, die das Spielfeld beobachten, eine vereinfachte 

Skizze der Anordnung ist in Abbildung B.63 dargestellt. 

Bestimmen Sie nun die Genauigkeit in x-, y- und z-Richtung, mit der die in Abbildung B.63 

markierte Position des Balles im Raum ermittelt werden kann! Nehmen Sie der Einfachkeit 

halber folgende Kameraparameter an: 

– Brennweite: 10 Millimeter 

– geometrische Auflösung des Sensorchips: 100 Pixel/Millimeter

B.3. GRUPPE 3 331 

M = 

⎛ 

⎜ 

⎝ 

1 0 0 4 

0 2.5 0 −3 

0 0 2 0 

0 0 0 1 

⎞ 

⎟ 

⎠ 

Figure B.60: Transformationsmatrix 

Figure B.61: Digitales Rasterbild mit zum Rand hin abfallender Intensität 

Sie können auf die Anwendung von Methoden zur subpixelgenauen Bestimmung der Ballposition 

verzichten. Bei der Berechnung der Unsicherheit in x- und y-Richtung können Sie 

eine der beiden Kameras vernachlässigen, für die z-Richtung können Sie die Überlegungen 

zur Unschärfe der binokularen Tiefenwahrnehmung verwenden. [#0078] 

(Frage III/17 30. Juni 2000) 

• Ein Koordinatensystem K 1 wird durch Rotation in ein anderes Koordinatensystem K 2 

übergeführt, sodass ein Punkt mit den Koordinaten p in K 1 in den Punkt q = Mp in 

K 2 transformiert wird. In Tabelle B.5 sind vier Entsprechungspunkte zwischen den beiden 

Koordinatensystemen gegeben. Bestimmen Sie die 3 × 3-Matrix 2 M! 

Hinweis: Beachten Sie, dass (da es sich um eine Rotation handelt) ||a|| = ||b|| = ||c|| = 1 

und weiters a · b = a · c = b · c = 0, wobei ”·“ das Skalarprodukt bezeichnet. [#0081] 

Punkt in K 1 Punkt in K 2 

(0, 0, 0) T (0, 0, 0) T 

a = (a 1 , a 2 , a 3 ) T (1, 0, 0) T 

b = (b 1 , b 2 , b 3 ) T (0, 1, 0) T 

c = (c 1 , c 2 , c 3 ) T (0, 0, 1) T 

Table B.5: Entsprechungspunkte zwischen den zwei Koordinatensystemen K 1 und K 2 


• Geben Sie für homogene Koordinaten eine 3 × 3-Matrix M mit möglichst vielen Freiheitsgraden 

an, die geeignet ist, die Punkte p eines starren Körpers (z.B. eines Holzblocks) gemäß 

q = Mp zu transformieren (sog. ” 

rigid body transformation“)! 

2 Homogene Koordinaten bringen hier keinen Vorteil, da keine Translation vorliegt.


Figure B.62: Farbfilmnegativ 

Hinweis: In der Fragestellung sind einfache geometrische Zusammenhänge verschlüsselt“ 

” 

enthalten. Wären sie hingegen explizit formuliert, wäre die Antwort eigentlich Material der 

Gruppe I“. [#0090] 

” 


• Dem digitalen Rasterbild in Abbildung B.64 ist eine regelmäßige Störung überlagert (kohärentes 

Rauschen). Beschreiben Sie ein Verfahren, das diese Störung entfernt! 

[#0091] 


• Auf das in Abbildung B.65 links oben gezeigte Binärbild soll die morphologische Operation 

Erosion“ angewandt werden. Zeigen Sie, wie die Dualität zwischen Erosion und Dilation 

genutzt werden kann, um eine Erosion auf eine Dilation zurückzuführen. (In anderen 

” 

Worten: statt der Erosion sollen andere morphologische Operationen eingesetzt werden, die 

in geeigneter Reihenfolge nacheinander ausgeführt das gleiche Ergebnis liefern wie eine Erosion.) 

Tragen Sie Ihr Ergebnis (und Ihre Zwischenergebnisse) in Abbildung B.65 ein und 

benennen Sie die mit den Zahlen 1, 2 und 3 gekennzeichneten Operationen! Das zu verwendende 

Formelement ist ebenfalls in Abbildung B.65 dargestellt. 

Hinweis: Beachten Sie, dass das gezeigte Binärbild nur einen kleinen Ausschnitt aus der 

Definitionsmenge Z 2 zeigt! [#0096] 

(Frage III/20 30. Juni 2000, Frage III/17 10. November 2000, Frage III/17 14. Dezember 

2001) 

• Die Dualität von Erosion und Dilation betreffend Komplementarität und Reflexion lässt sich 

durch die Gleichung 

(A ⊖ B) c = A c ⊕ ˆB 

formulieren. Warum ist in dieser Gleichung die Reflexion ( ˆB) von Bedeutung? [#0107] 

(Frage III/18 13. Oktober 2000) 

• Das in Abbildung B.66 gezeigte Foto ist kontrastarm und wirkt daher etwas ” 

flau“. 

1. Geben Sie ein Verfahren an, das den Kontrast des Bildes verbessert. 

2. Welche Möglichkeiten gibt es noch, die vom Menschen empfundene Qualität des Bildes 

zu verbessern? 

Wird durch diese Methoden auch der Informationsgehalt des Bildes vergrößert? Begründen 

Sie Ihre Antwort. [#0108] 

(Frage III/20 13. Oktober 2000, Frage III/19 10. November 2000)

B.3. GRUPPE 3 333 

4m 

Roboter 

2m 

Kamera 1 Kamera 2 

x 

z 

Wurfbahn des Balles 

aktuelle 

Ballposition 

2m 

y 

z 

Figure B.63: Vereinfachter Aufbau des bällefangenden Roboters auf der Landesausstellung 

comm.gr2000az 

• Wie äußern sich für das menschliche Auge 

1. eine zu geringe geometrische Auflösung 

2. eine zu geringe Grauwerteauflösung 

eines digitalen Rasterbildes? [#0113] 

(Frage III/19 13. Oktober 2000, Frage III/20 10. November 2000, Frage III/20 28. September 

2001) 

• Welche Aussagen kann man über die Summen der Maskenkomponenten eines ( ” 

vernünftigen“) 

Tief- bzw. Hochpassfilters treffen? Begründen Sie Ihre Antwort. [#0114] 

(Frage III/17 13. Oktober 2000, Frage III/18 10. November 2000) 

• Ein Farbwert C RGB = (R, G, B) T im RGB-Farbmodell wird in den entsprechenden Wert 

C YIQ = (Y, I, Q) T im YIQ-Farbmodell gemäß folgender Vorschrift umgerechnet: 

⎛ 

0.299 0.587 0.114 

C YIQ = ⎝ 0.596 −0.275 −0.321 

⎞ 

⎠ · C RGB 

0.212 −0.528 0.311 

Welcher biologische Sachverhalt wird durch die erste Zeile dieser Matrix ausgedrückt? (Hinweis: 

Überlegen Sie, wo das YIQ-Farbmodell eingesetzt wird und welche Bedeutung in diesem 

Zusammenhang die Y-Komponente hat.) [#0120]


Figure B.64: Bild mit überlagertem kohärentem Rauschen 

(Frage III/19 14. Dezember 2001) 

• Um den Effekt des morphologischen Öffnens (A ◦ B) zu verstärken, kann man3 die zugrundeliegenden 

Operationen (Erosion und Dilation) wiederholt ausführen. Welches der 

folgenden beiden Verfahren führt zum gewünschten Ergebnis: 

1. Es wird zuerst die Erosion n-mal ausgeführt und anschließend n-mal die Dilation, also 

(((A ⊖B) . . . ⊖ B) ⊕B) . . . ⊕ B 

} {{ } } {{ } 

n−mal ⊖ n−mal ⊕ 

2. Es wird die Erosion ausgeführt und anschließend die Dilation, und der Vorgang wird 

n-mal wiederholt, also 

(((A ⊖B) ⊕ B) . . . ⊖ B) ⊕ B 

} {{ } 

n−mal abwechselnd ⊖/⊕ 

Begründen Sie Ihre Antwort und erklären Sie, warum das andere Verfahren versagt! [#0126] 

(Frage III/16 15. Dezember 2000, Frage III/20 11. Mai 2001, Frage III/16 14. Dezember 

2001, Frage III/17 15. März 2002) 

• In Aufgabe B.1 wurde nach geometrischen Oberflächeneigenschaften gefragt, die sich nicht 

zur Visualisierung mittels Textur eignen. Nehmen Sie an, man würde für die Darstellung 

solcher Eigenschaften eine Textur unsachgemäß einsetzen. Welche Artefakte sind für solche 

Fälle typisch? [#0129] 


• In Abbildung B.67 sehen Sie die aus der Vorlesung bekannte Skizze zur Auswirkung des 

morphologischen Öffnens auf ein Objekt (Abbildung B.67(a) wird durch Öffnen mit dem 

gezeigten Strukturelement in Abbildung B.67(b) übergeführt). Wie kommen die Rundungen 

in Abbildung B.67(b) zustande, und wie könnte man deren Auftreten verhindern? [#0149] 

3 abgesehen von einer Vergrößerung des Maskenelements B

B.3. GRUPPE 3 335 

1 2 

3 

Formelement 

Figure B.65: Alternative Berechnung der morphologischen Erosion 

(Frage III/18 15. Dezember 2000, Frage III/23 30. März 2001) 

• In Abbildung B.68 sind ein Geradenstück g zwischen den Punkten A und B sowie zwei 

weitere Punkte C und D gezeigt. Berechnen Sie den Abstand (kürzeste Euklidische Distanz) 

zwischen g und den Punkten C bzw. D. [#0150] 


• In Abbildung B.69 sind ein digitales Rasterbild sowie die Resultate der Anwendung von 

drei verschiedenen Filteroperationen gezeigt. Finden Sie die Operationen, die auf Abbildung 

B.69(a) angewandt zu den Abbildungen B.69(b), B.69(c) bzw. B.69(d) geführt haben, und 

beschreiben Sie jene Eigenschaften der Ergebnisbilder, an denen Sie die Filter erkannt haben. 

[#0151] 

(Frage III/20 15. Dezember 2000, Frage III/19 19. Oktober 2001) 

• Es besteht eine Analogie zwischen der Anwendung eines Filters und der Rekonstruktion einer 

diskretisierten Bildfunktion. Erklären Sie diese Behauptung! [#0158] 

(Frage 4 16. Jänner 2001, Frage III/18 14. Dezember 2001) 

• In der Vorlesung wurden zwei Verfahren zur Ermittlung der acht Parameter einer bilinearen 

Transformation in zwei Dimensionen erläutert: 

1. exakte Ermittlung des Parametervektors u, wenn genau vier Input/Output-Punktpaare 

gegeben sind 

2. approximierte Ermittlung des Parametervektors u, wenn mehr als vier Input/Output- 

Punktpaare gegeben sind ( ” 

Least squares method“) 

Die Methode der kleinsten Quadrate kann jedoch auch dann angewandt werden, wenn genau 

vier Input/Output-Punktpaare gegeben sind. Zeigen Sie, dass man in diesem Fall das gleiche 

Ergebnis erhält wie beim ersten Verfahren. Welche geometrische Bedeutung hat diese 

Feststellung?


Figure B.66: Foto mit geringem Kontrast 

(a) 

(b) 

Strukturelement 

Figure B.67: Morphologisches Öffnen 

Hinweis: Bedenken Sie, warum die Methode der kleinsten Quadrate diesen Namen hat. 

[#0163] 

(Frage III/23 2. Februar 2001) 

• In Abbildung B.70 ist ein Zylinder mit einer koaxialen Bohrung gezeigt. Geben Sie zwei verschiedene 

Möglichkeiten an, dieses Objekt mit Hilfe einer Sweep-Repräsentation zu beschreiben! 

[#0165] 

(Frage III/19 2. Februar 2001, Frage III/18 19. Oktober 2001) 

• In Aufgabe B.1 wurde nach einer Bildrepräsentation gefragt, bei der ein Bild wiederholt 

gespeichert wird, wobei die Seitenlänge jedes Bildes genau halb so groß ist wie die Seitenlänge 

des vorhergehenden Bildes. Leiten Sie eine möglichst gute obere Schranke für den gesamten 

Speicherbedarf einer solchen Repräsentation her, wobei 

– das erste (größte) Bild aus N × N Pixeln besteht, 

– alle Bilder als Grauwertbilder mit 8 Bit pro Pixel betrachtet werden, 

– eine mögliche Komprimierung nicht berücksichtigt werden soll! 

Hinweis: Benutzen Sie die Gleichung ∑ ∞ 

i=0 qi = 1 

1−q 

(Frage III/18 2. Februar 2001, Frage III/20 1. Februar 2002) 

für q ∈ R, 0 < q < 1. [#0171]

B.3. GRUPPE 3 337 

9 

8 

B 

7 

g 

6 

5 

4 

3 

C 

2 

A D 

1 

3 4 5 6 7 8 9 10 11 12 13 

Figure B.68: Abstandsberechnung 

• Gegeben seien eine Ebene ε und ein beliebiger (zusammenhängender) Polyeder P im dreidimensionalen 

Raum. Wie kann man einfach feststellen, ob die Ebene den Polyeder schneidet 

(also P ∩ ε ≠ {})? [#0172] 


• Es sei p(x), x ∈ R 2 die Wahrscheinlichkeitsdichtefunktion gemäß Gaussscher Normalverteilung, 

deren Parameter aufgrund der drei Merkmalsvektoren p 1 , p 2 und p 3 aus Aufgabe 

B.2 geschätzt wurden. Weiters seien zwei Punkte x 1 = (0, 3) T und x 2 = (3, 6) T im Merkmalsraum 

gegeben. Welche der folgenden beiden Aussagen ist richtig (begründen Sie Ihre 

Antwort): 

1. p(x 1 ) < p(x 2 ) 

2. p(x 1 ) > p(x 2 ) 

Hinweis: Zeichnen Sie die beiden Punkte x 1 und x 2 in Abbildung B.28 ein und überlegen Sie 

sich, in welche Richtung die Eigenvektoren der Kovarianzmatrix C aus Aufgabe B.2 weisen. 

[#0174] 


• Das digitalen Rasterbild aus Abbildung B.71 soll segmentiert werden, wobei die beiden 

Gebäude den Vordergrund und der Himmel den Hintergrund bilden. Da sich die Histogramme 

von Vorder- und Hintergrund stark überlappen, kann eine einfache Grauwertsegmentierung 

hier nicht erfolgreich sein. Welche anderen Bildeigenschaften kann man verwenden, 

um dennoch Vorder- und Hintergrund in Abbildung B.71 unterscheiden zu können? 

[#0181] 

(Frage III/20 2. Februar 2001, Frage III/20 14. Dezember 2001) 

• In der Vorlesung wurde darauf hingewiesen, dass die Matrixmultiplikation im Allgemeinen 

nicht kommutativ ist, d.h. für zwei Transformationsmatrizen M 1 und M 2 gilt M 1·M 2 ≠ M 2· 

M 1 . Betrachtet man hingegen im zweidimensionalen Fall zwei 2 × 2-Rotationsmatrizen R 1 

und R 2 , so gilt sehr wohl R 1·R 2 = R 2·R 1 . Geben Sie eine geometrische oder mathematische 

Begründung für diesen Sachverhalt an! 

Hinweis: Beachten Sie, dass das Rotationszentrum im Koordinatenursprung liegt! [#0192] 

(Frage III/18 30. März 2001, Frage III/16 9. November 2001) 

• Nehmen Sie an, Sie müssten auf ein Binärbild die morhpologischen Operationen ” 

Erosion“ 

bzw. ” 

Dilation“ anwenden, haben aber nur ein herkömmliches Bildbearbeitungspaket zur


(a) Originalbild (b) Filter 1 

(c) Filter 2 (d) Filter 3 

Figure B.69: verschiedene Filteroperationen 

Verfügung, das diese Operationen nicht direkt unterstützt. Zeigen Sie, wie die Erosion 

bzw. Dilation durch eine Faltung mit anschließender Schwellwertbildung umschrieben werden 

kann! 

Hinweis: die gesuchte Faltungsoperation ist am ehesten mit einem Tiefpassfilter zu vergleichen. 

[#0197] 

(Frage III/19 30. März 2001) 

• Gegeben sei ein zweidimensionales Objekt, dessen Schwerpunkt im Koordinatenursprung 

liegt. Es sollen nun gleichzeitig“ eine Translation T und eine Skalierung S angewandt 

” 

werden, wobei 

⎛ 

T = ⎝ 1 0 t ⎞ ⎛ 

x 

0 1 t y 

⎠ , S = ⎝ s 0 0 ⎞ 

0 s 0 ⎠ . 

0 0 1 

0 0 1 

Nach der Tranformation soll das Objekt gemäß S vergrößert erscheinen, und der Schwerpunkt 

soll gemäß T verschoben worden sein. Gesucht ist nun eine Matrix M, die einen Punkt p 

des Objekts gemäß obiger Vorschrift in einen Punkt p ′ = M · p des transformierten Objekts 

überführt. Welche ist die richtige Lösung:

B.3. GRUPPE 3 339 

Figure B.70: Zylinder mit koaxialer Bohrung 

1. M = T · S 

2. M = S · T 

Begründen Sie Ihre Antwort und geben Sie M an! [#0198] 


• In Abbildung B.72 sehen Sie ein perspektivisch verzerrtes schachbrettartiges Muster. Erklären 

Sie, wie die Artefakte am oberen Bildrand zustandekommen, und beschreiben Sie eine 

Möglichkeit, deren Auftreten zu verhindern! [#0205] 

(Frage III/21 30. März 2001, Frage III/20 19. Oktober 2001) 

• Warum ist die Summe der Maskenelemente bei einem reinen Hochpassfilter immer gleich 

null und bei einem reinen Tiefpassfilter immer gleich eins? [#0212] 


• In Aufgabe B.1 wurde nach den Begriffen ” 

Phong-shading“ und ” 

Phong-illumination“ 

gefragt. Beschreiben Sie eine Situation, in der beide Konzepte sinnvoll zum Einsatz kommen! 

[#0218] 

(Frage III/19 11. Mai 2001) 

• Wendet man in einem digitalen (RGB-)Farbbild auf jeden der drei Farbkanäle einen Median- 

Filter an, erhält man ein Ergebnis, das vom visuellen Eindruck ähnlich einem Mediangefilterten 

Grauwertbild ist. Welche Eigenschaft des Median-Filters geht bei einer solchen 

Anwendung auf Farbbilder jedoch verloren? Begründen Sie Ihre Antwort! [#0221] 


• Wenden Sie wie bei Frage B.2 ein 3 × 3-Medianfilter F 3 auf den Graukeil in Abbildung B.37 

an und begründen Sie Ihre Antwort! [#0222] 

(Frage III/18 11. Mai 2001) 

• 1. Kommentieren Sie die Wirkung des hohen Rauschanteils von Abbildung B.38(a) (aus 

Aufgabe B.2) auf die normalisierte Kreuzkorrelation!


Figure B.71: Segmentierung eines Grauwertbildes 

Figure B.72: Artefakte bei einem schachbrettartigen Muster 

2. Welches Ergebnis würde man bei Anwendung der normalisierten Kreuzkorrelation mit 

dem selben Strukturelement (Abbildung B.38(b)) auf das rotierte Bild in Abbildung 

B.73 erhalten? Begründen Sie Ihre Antwort! 

(Frage III/21 11. Mai 2001) 

[#0224] 

• Welche Farbe liegt ” 

in der Mitte“, wenn man im RGB-Farbraum zwischen den Farben gelb 

und blau linear interpoliert? Welcher Farbraum wäre für eine solche Interpolation besser 

geeignet, und welche Farbe läge in diesem Farbraum zwischen gelb und blau? [#0227] 

(Frage III/23 11. Mai 2001) 

• Abbildung B.74(a) zeigt das Schloss in Budmerice (Slowakei), in dem alljährlich ein Studentenseminar 

4 und die Spring Conference on Computer Graphics stattfinden. Durch einen 

4 Für interessierte Studenten aus der Vertiefungsrichtung Computergrafik besteht die Möglichkeit, kostenlos an 

diesem Seminar teilzunehmen und dort das Seminar/Projekt oder die Diplomarbeit zu präsentieren.

B.3. GRUPPE 3 341 

Figure B.73: Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild 

automatischen Prozess wurde daraus Abbildung B.74(b) erzeugt, wobei einige Details (z.B. 

die Wolken am Himmel) deutlich verstärkt wurden. Nennen Sie eine Operation, die hier zur 

Anwendung gekommen sein könnte, und kommentieren Sie deren Arbeitsweise! [#0228] 

(a) Originalbild 

(b) verbesserte Version 

Figure B.74: automatische Kontrastverbesserung 

(Frage III/22 11. Mai 2001) 

• In Frage B.1 wurde festgestellt, dass die Abbildung eines dreidimensionalen Objekts auf 

die zweidimensionale Bildfläche durch eine Kette von Transformationen beschrieben werden 

kann. Erläutern Sie mathematisch, wie dieser Vorgang durch Verwendung des Assoziativgesetzes 

für die Matrixmultiplikation optimiert werden kann! [#0237] 


• In der Vorlesung wurden die Operationen ” 

Schwellwert“ und ” 

Median“, anzuwenden auf 

digitale Rasterbilder, besprochen. Welcher Zusammenhang besteht zwischen diesen beiden 

Operationen im Kontext der Filterung? [#0244]


1 1 1 1 3 7 7 7 7 

1 1 1 1 3 7 7 7 7 

1 1 1 1 3 7 7 7 7 

1 1 1 1 3 7 7 7 7 

Figure B.75: unscharfe Kante in einem digitalen Grauwertbild 


• Um einem Punkt p auf der Oberfläche eines dreidimensionalen Objekts die korrekte Helligkeit 

zuweisen zu können, benötigen alle realistischen Beleuchtungsmodelle den Oberflächennormalvektor 

n an diesem Punkt p. Wird nun das Objekt einer geometrischen Transformation 

unterzogen, sodass der Punkt p in den Punkt p ′ = Mp übergeführt wird 5 , ändert sich 

auch der Normalvektor, und zwar gemäß n ′ = (M −1 ) T n. Geben Sie eine mathematische 

Begründung für diese Behauptung! 

Hinweis: die durch p und n definierten Tangentialebenen vor bzw. nach der Transformation 

sind in Matrixschreibweise durch die Gleichungen n T x = n T p bzw. n ′T x ′ = n ′T p ′ gegeben. 

[#0250] 


• In Abbildung B.75 sehen Sie einen vergößerten Ausschnitt aus einem digitalen Grauwertbild, 

der eine unscharfe Kante darstellt. Beschreiben Sie, wie diese Kante aussieht, wenn 

1. ein lineares Tiefpassfilter 

2. ein Medianfilter 

mit Maskengröße 3 × 3 mehrfach hintereinander auf das Bild angewendet wird. Begründen 

Sie Ihre Antwort! [#0251] 

(Frage III/22 26. Juni 2001, Frage III/19 9. November 2001, Frage III/16 1. Februar 2002) 



Farbton in Worten! [#0252] 




Farbton in Worten! [#0261] 

(Frage III/20 9. November 2001) 

• Skizzieren Sie das Histogramm eines 

1. dunklen, 

2. hellen, 

3. kontrastarmen, 

4. kontrastreichen 

5 Dieses konkrete Beispiel ist in kartesischen Koordinaten leichter zu lösen als in homogenen Koordinaten. Wir 

betrachten daher nur 3 × 3-Matrizen (ohne Translationsanteil).

B.3. GRUPPE 3 343 

monochromen digitalen Rasterbildes! [#0263] 

(Frage III/16 28. September 2001) 

• Bei vielen Algorithmen in der Computergrafik ist eine Unterscheidung zwischen der Vorderund 

Rückseite“ eines Dreiecks notwendig (z.B. BSP-Baum, back face culling etc.). Wie ” 

kann der Oberflächennormalvektor eines Dreiecks genutzt werden, um diese Unterscheidung 

mathematisch zu formulieren (d.h. mit welcher Methode kann man für einen gegebenen 

Punkt p feststellen, auf welcher Seite eines ebenfalls gegebenen Dreiecks T er sich befindet)? 

Geben Sie außerdem an, ob der Vektor n T aus Aufgabe 2 unter dieser Definition in den 

der Vorder- oder Rückseite des Dreiecks zugewandten Halbraum weist. Begründen Sie Ihre 

Antwort! [#0264] 


• Erläutern Sie, wie ein monochromes digitales Rasterbild, das ein Schwarzweißfilm-Negativ 

repräsentiert, durch Manipulation seines Histogramms in das entsprechende Positivbild umgewandelt 

werden kann! [#0267] 


• In Abbildung B.76 sind die Histogramme von zwei verschiedenen digitalen Grauwertbildern 

A und B gezeigt. Nehmen Sie an, es würde nun auf beide Bilder die Operation Histogrammäqualisierung“ 

angewandt werden, sodass die neuen Bilder A ′ bzw. B ′ daraus entste- 

” 

hen. 

1. Skizzieren Sie die Histogramme von A ′ und B ′ . 

2. Kommentieren Sie die Auswirkung der Histogrammäqualisierung bei den Bildern A 

und B bzgl. Helligkeit und Kontrast! 

Begründen Sie Ihre Antworten! [#0270] 

(a) Histogramm von Bild A 

(b) Histogramm von Bild B 

Figure B.76: Histogramme von zwei verschiedenen Bildern 


• Bei der perspektivischen Transformation werden entfernte Objekte zwar verkleinert abgebildet, 

Geraden bleiben jedoch auch in der Projektion als Geraden erhalten. Geben Sie eine 

mathematische Begründung dieser Eigenschaft anhand der Projektionsmatrix 

M = 

⎛ 

⎜ 

⎝ 

1 0 0 0 

0 1 0 0 

0 0 0 1 

0 0 1 0 

die einen Punkt p gemäß p ′ = Mp in den Punkt p ′ überführt! 

Hinweis: die x- und z-Koordinate einer Geraden stehen über die Gleichung x = kz + d 

⎞ 

⎟ 

⎠ ,


zueinander in Beziehung (Sonderfälle können vernachlässigt werden). Zeigen Sie, dass nach 

der Transformation x ′ = k ′ z ′ + d ′ gilt, und verfahren Sie analog für y. [#0271] 


• In Abbildung B.77 ist ein Torus mit strukturierter Oberfläche gezeigt, wobei sich die Lichtquelle 

einmal links (Abbildung B.77(a)) und einmal rechts (Abbildung B.77(b)) vom Objekt befindet. 

Zur Verdeutlichung sind in den Abbildungen B.77(c) und B.77(d) vergrößerte Ausschnitte 

dargestellt. Welche Technik wurde zur Visualisierung der Oberflächenstruktur eingesetzt, 

und was sind die typischen Eigenschaften, anhand derer man das Verfahren hier erkennen 

kann? [#0282] 

(a) Beleuchtung von links 

(b) Beleuchtung von rechts 

(c) Detail aus Abbildung B.77(a) 

(d) Detail aus Abbildung B.77(b) 


Figure B.77: Torus mit Oberflächenstruktur

B.3. GRUPPE 3 345 

• Die morphologische Dilation A ⊕ B kann als 

A ⊕ B = ⋃ 

geschrieben werden, also als Mengenvereinigung des an jedes Pixel x ∈ A verschobenen 

Maskenelements B. Zeigen Sie unter Verwendung dieser Definition die Kommutativität der 

Dilation, also A ⊕ B = B ⊕ A! 

Hinweis: Schreiben Sie A ⊕ B = A ⊕ (B ⊕ E), wobei E das 1 × 1 Pixel große ” 

Einheitsmaskenelement“ 

ist, das das Objekt bei der Dilation unverändert lässt. [#0283] 


• Welche der folgenden Transformationen sind in homogenen Koordinaten durch eine Matrixmultiplikation 

(x ′ = M · x) darstellbar? Begründen Sie Ihre Antwort! 

– Translation 

– perspektivische Projektion 

– Rotation 

– bilineare Transformation 

– Scherung 

– Skalierung 

– bikubische Transformation 

x∈A 

B x 


[#0290] 

• Gegeben sei die Matrix 

⎛ 

M = ⎝ 

2 −2 3 

2 2 −4 

0 0 1 

mit deren Hilfe ein Punkt p im zweidimensionalen Raum in homogenen Koordinaten in einen 

Punkt ˜p ′ = M · ˜p übergeführt wird. Diese Operation lässt sich in kartesischen Koordinaten 

alternativ als 

p ′ = s · R(ϕ) · p + t 

anschreiben, wobei s der Skalierungsfaktor, R(ϕ) die Rotationsmatrix (Drehwinkel ϕ) und 

t der Translationsvektor sind. Ermitteln Sie s, ϕ und t! [#0292] 


• Das Auge des kanadischen Bergschafes in Abbildung B.78(a) ist in den Abbildungen B.78(b) 

bis B.78(d) vergößert dargestellt 6 . Zur Interpolation wurden das nearest neighbor Verfahren, 

bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren 

den drei Bildern B.78(b) bis B.78(d) zu und begründen Sie Ihre Antwort! [#0293] 


⎞ 

⎠ , 

6 Der Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen.


• Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne 

Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei 

1 bietet das Plakat in der Farbe C (1) 

CMYK 

an, die Druckerei 2 legt ein Angebot für ein Plakat 

der Farbe C (2) 

CMYK , wobei C (1) 

CMYK = (0.6, 0.1, 0.7, 0.0)T , 

C (2) 

CMYK = (0.2, 0.0, 0.3, 0.3)T . 

Welcher Druckerei würden Sie den Auftrag erteilen, wenn 

1. möglichst geringe Herstellungskosten 

2. ein möglichst intensiver Farbton 

das Auswahlkriterium ist? Begründen Sie Ihre Antwort! [#0295] 


• Nehmen Sie an, Sie seien Manager der Firma Rasen&Mäher und sollen für eine Werbekampagne 

Angebote von Druckereien für ein einfärbiges grünes Plakat einholen. Die Druckerei 

1 bietet das Plakat in der Farbe C (1) 

CMYK 

an, die Druckerei 2 legt ein Angebot für ein Plakat 

der Farbe C (2) 

CMYK , wobei C (1) 

CMYK = (0.5, 0.0, 0.6, 0.1)T , 

C (2) 

CMYK = (0.5, 0.3, 0.6, 0.0)T . 

Welcher Druckerei würden Sie den Auftrag erteilen, wenn 

1. möglichst geringe Herstellungskosten 

2. ein möglichst intensiver Farbton 

das Auswahlkriterium ist? Begründen Sie Ihre Antwort! [#0300] 


• Das Auge des kanadischen Bergschafes in Abbildung B.79(a) ist in den Abbildungen B.79(b) 

bis B.79(d) vergößert dargestellt 7 . Zur Interpolation wurden das nearest neighbor Verfahren, 

bilineare und bikubische Interpolation verwendet. Ordnen Sie diese Interpolationsverfahren 

den drei Bildern B.79(b) bis B.79(d) zu und begründen Sie Ihre Antwort! [#0304] 


• Der in Abbildung B.80 gezeigte BSP-Baum beschreibt ein zweidimensionales Polygon. Die 

Trennebenen (bzw. -geraden, da wir den zweidimensionalen Fall betrachten) in jedem Knoten 

sind durch Gleichungen der Form ax+by = c gegeben, wobei die Außenseite jeweils durch die 

Ungleichung ax + by > c und die Innenseite durch ax + by < c charakterisiert sind. Weiters 

führen (wie in Abbildung B.80 gezeigt) die ” 

Außen“-Pfade nach links und die ” 

Innen“-Pfade 

nach rechts. 

Zeichnen Sie in einem geeignet beschrifteten Koordinatensystem das Polygon, das durch 

diesen BSP-Baum beschrieben wird, und kennzeichnen Sie, welche Kante zu welcher Gleichung 

gehört! [#0307] 


7 Der Ausschnitt wurde zur Verdeutlichung der Ergebnisse einer Kontraststreckung unterzogen.

B.3. GRUPPE 3 347 


Grenzfrequenz“ (cutoff frequency) und ideales vs. nicht ideales 

Filter im Zusammenhang mit digitalen Rasterbildern! In welchem Zusammenhang stehen 

diese Konzepte mit dem Aussehen des Ausgabebildes eines Filters? [#0309] 


• In Abbildung B.81 wurde der bekannte Stanford-Bunny mit drei verschiedenen Beleuchtungsmodellen 

dargestellt. Um welche Beleuchtungsmodelle handelt es sich in den Abbildungen 

B.81(a), B.81(b) und B.81(c)? Anhand welcher Eigenschaften der Bilder haben Sie die 

gesuchten Beleuchtungsmodelle erkannt? [#0310] 

(Frage III/16 15. März 2002)


(a) Originalbild (b) Verfahren 1 

(c) Verfahren 2 (d) Verfahren 3 

Figure B.78: Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren

B.3. GRUPPE 3 349 

(a) Originalbild (b) Verfahren 1 

(c) Verfahren 2 (d) Verfahren 3 

Figure B.79: Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren


¡£¢¥¤ 

out 

¨¢¥¤ 

out 

¡§¦©¨§¢ 

out 

in 

Figure B.80: BSP-Baum 

(a) Modell 1 (b) Modell 2 (c) Modell 3 

Figure B.81: Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle

Index 

xy-stitching, 50 

z-Buffer, 209 

Weber-Ratio, 87 

8-Code, 190 

absolute transformation, 232 

Abtastung, 34, 117 

Active Vision, 265 

active vision, 265 

Affine matching, 285 

anaglyphs, 230 

anterior chamber, 45 

Anti-Blur filter, 259 

Approximation, 179, 287, 289 

approximation, 155 

Auflösung, geometrische, 34, 117, 283, 284 

Auflösung, radiometrische, 34, 284 

Augenabstand, 229 

Augmented Reality, 256, 285 

augmented reality, 56 

Augmented Relity, 57, 256, 287 

back-face culling, 208 

basis matrix, 177 

Basisfunktion, 177 

Bezier-Kurve, 181, 307 

Bezier-Kurven, 179 

bi-directional reflectivity function, 222 

Bilderkennung, 35 

Bildmodell, 34, 283 

bilineare Transformation, 164, 329 

binokulares Sehen, 231, 297 

blending functions, 177 

Blending Funktionen, 177 

blind spot, 45 

Boundary-Representation, 196, 298 

bounding boxes, 208 

box function, 134 

Bresenham-Algorithmus, 65, 283, 297, 301 

BSP-Tree, 198, 300, 315 

Bump-Mapping, 148, 285 

bump-mapping, 148 

cabinet, 171 

calibration, 256 

Casteljau-Algorithmus, 181, 307 

cavalier, 171 

chain code, 312, 317 

chain-code, 189 

chromaticity, 91 

classification 

supervised, 291 

unsupervised, 291 

Clipping, 167 

clipping, 165 

CMY-Farbmodell, 95, 302 

CMYK-Farbmodell, 93, 95, 289, 302 

Cohen-Sutherland, 167, 300, 301 

color model, 107, 292 

Computer Graphics/Visualization, 57 

computer-aided tomographic, 207 

Computergrafik und Bildanalyse, 266, 284 

Computertomografie, 53, 286 

cone-tree, 263 

cones, 45 

control points, 180, 289 

convolution, 130 

cornea, 45 

CSG, 199, 296 

cut-of-frequency, 134 

cutoff-Frequenz, 135, 287 

data garmets, 56, 292 

data-garments, 56 

DDA-Algorithmus, 65, 283 

density, 88 

density slicing, 105 

depth cues, 207, 323 

descriptive geometry, 171 

direct capture system, 52 

Diskretisierung, 34 

distance 

between pixels, 38 

dither matrix, 88 

dots, 107 

dynamic range, 88 

dynamischer Bereich, 88, 115, 284 

edge-image, 132 

Elektronenstrahlschirm, 35 

351

352 INDEX 

Entzerrung, 287 

Erosion, 75, 326 

exterior orientation, 48, 56 

Füllen von Polygonen, 65, 289 

Farbfilmnegativ, 107, 324 

Farbmodell, CIE, 92, 93, 283, 284 

Farbmodell, CMY, 95, 286 

Farbmodell, CMYK, 95, 286 

Farbmodell, RGB, 93, 95, 284 

feature, 290 

feature space, 290 

feature vector, 46 

Fenster, 40 

fiducial marks, 173 

Filter, 40 

filter 

high pass 

Butterworth, 136, 292 

ideal, 136, 292 

filter mask, 127 

Fourier-Transformation, 289, 326 

fovea, 45 

Freiheitsgrad, 168, 325 

gamma, 49 

Gauss-filter, 129, 286 

Geometrievektor, 177 

geometry vector, 177 

Gouraud-shading, 219, 287, 323 

gradation curve, 119 

Gradientenbild, 134, 301 

Grafik-Pipeline, 265, 285 

Grauwertzuweisung, 287 

gross fog, 49 

Halbraumcodes, 167 

half tone, 314 

half-space codes, 166 

halo, 212 

head-mounted displays, 56 

hierarchical matching, 234 

histogram, 337 

equalization, 50, 105, 117, 119 

spreading, 119 

Histogramm, 120, 323 

Hit-or-miss Operator, 80, 289 

Hochpassfilter, 132, 324 

homogene Koordinaten, 157, 168, 285, 299, 

300, 325 

homologue points, 232 

HSV-Farbmodell, 97, 305 

hue, 96 

Human-Computer-Interfaces, 263 

hyper-spectral, 46 

ideal filter, 134 

illuminate, 51 

image 

black & white, 87 

color, 87 

false color, 90 

half tone, 88 

image flickering, 230 

Image Processing/Computer Vision, 57 

image quality, 115 

immersive visualization, 263 

information slices, 263 

inner orientation, 232 

intensity slicing, 105 

Interpolation, 179, 287, 289 

interpolation, 155, 176 

Interpolation, bilineare, 252, 297 

Kante, 38, 286 

Kantendetektion, 134, 306 

Kell-factor, 117 

Kettenkodierung, 190 

Klassifikation, 240, 244, 287, 304 

Klassifizierung, 242 

Koeffizientenmatrix, 162 

Koordinatentransformation, 325 

Korrelation, normalisiert, 234, 303 

leaf, 192 

Least Squares, 180 

Least Squares Method, 164 

Least squares method, 164, 329 

Level of Detail, 199 

light, 90 

line pairs per millimeter, 50 

Linie, 38 

Linienpaar, 117 

listening mode, 53 

logische Verknüpfung, 40 

luminance, 90 

luminosity, 117 

Man-Machine Interaction, 263 

Maske, 40 

masked negative, 106 

median filter, 128 

Median-Filter, 129, 299 

Medianfilter, 129, 285, 323 

Mehrfachbilder, 46, 287 

Merkmalsraum, 242 

mexican hat, 131 

MIP-maps, 249

INDEX 353 

mirror stereoscope, 230 

Moiree effect, 107, 292 

moments, 145 

morphological 

closing, 314, 315 

erosion, 75, 283 

filtering, 79, 291, 322 

opening, 77, 78, 284, 305, 328 

morphology, 75, 82, 326 

mosaicing, 249 

motion blur, 259, 291 

Motion Picture Expert Group, 274 

multi illumination, 56 

multi-images, 46 

multi-position, 46 

multi-sensor, 46 

multi-spectral, 46 

multi-temporal, 46 

multiple path, 50 

Multispektralbild, 32 

Multispektrales Abtastsystem, 52 

Nachbarschaft, 38, 283 

nearest neighbor, 165, 251 

negative color, 91 

nicht-perspektive Kamera, 51, 53, 286 

node file, 251 

nodes, 251 

normal equation matrix, 164 

offset print, 107, 292 

one-point perspective, 172 

Operationen, algebraische, 40 

operator 

Marr-Hildreth, 293 

optische Dichte, 107, 324 

parallactic angle, 229 

parallax, 229 

parallel difference, 229 

Parametervektor, 164, 329 

parametrische Kurvendarstellung, 177 

paraphernalia, 48 

passive Radiometrie, 54, 289 

Passpunkte, 289 

Phong-Modell, 218, 299 

Phong-shading, 219, 287 

photo detector, 49 

photo-multiplier, 49 

photography 

negative, 337 

Photometric Stereo, 206 

pigments, 90 

pipeline, 265 

polarization, 230 

pose, 48, 56 

preprocessing, 120 

projection, oblique, 171 

projection, orthographic, 171 

Projektionen, planar, 172, 284 

prozedurale Texturen, 150, 289 

pseudo-color, 90, 105 

push-broom technology, 49 

Quadtree, 193, 296 

Radar, 54, 287 

Radiosity, 285 

radiosity, 222 

Rasterdarstellung, 35 

Rasterkonversion, 36, 284 

ratio imaging, 107 

Ratio-Bild, 113, 283 

Rauschen, kohärentes, 326 

ray tracing, 210, 291 

ray-tracing, 210 

Raytracing, recursive, 208, 298 

Rectangular Tree, 199 

Region, 38 

relative orientation, 232 

remote sensing, 52 

Resampling, 287 

resampling, 165, 291 

Resampling, geometrisches, 249, 285 

resolution, 45 

RGB-Farbmodell, 93, 95, 97, 289, 302, 305 

rigid body transformation, 168, 325 

ringing, 135 

Roberts-Operator, 134, 306 

rods, 45 

Rotation, 157, 303 

Sampling, 34 

Scannen, 50, 287 

scanning electron-microscopes, 55 

Schwellwert, 120, 323 

Schwellwertbild, 32 

Schwerpunkt, 73, 322 

screening, 88 

Segmentierung, 120, 323 

sensor 

non-optical, 233, 292 

sensor model, 46 

Sensor-Modell, 48, 289 

Shape-from-Focus, 206 

Shape-from-Shading, 206 

Shape-from-X, 207, 323 

sinc-filter, 128

354 INDEX 

Skalierung, 157, 303 

Sobel-Operator, 133, 299 

sound, navigation and range, 54 

spatial partitioning, 197, 289 

spatial-domain representation, 130 

spectral representation, 130 

Spektralraum, 147, 286 

Spiegelreflexion, 218, 284 

splitting, 190, 306 

spy photography, 116 

starrer Körper, 168, 325 

step and stare, 49 

Stereo, 230, 232, 285, 324 

Stereo, photometrisches, 207, 284 

stereo-method, 206 

stereopsis, 175, 233, 292 

Structured Light, 206 

structured light, 55 

Strukturelement, 82 

support, 137 

sweep, 195 

Sweeps, 195, 287 

View Plane Normal, 174 

view point, 232 

view point normals, 232 

View Reference Point, 174 

View-Frustum, 199 

Virtual Reality, 256, 285 

vitreous humor, 45 

volume element, 53 

voxel, 53 

Voxel-Darstellung, 285 

Wahrscheinlichkeitsdichtefunktion, 244, 304 

window, 127 

wire-frame, 194 

XOR, 40 

YIQ-Farbmodell, 96, 286, 327 

Zusammenhang, 38 

table-lens, 263 

template, 127 

texels, 147 

Textur, 147, 286 

Texture-Mapping, 285 

Tiefenunterschied, 229 

Tiefenwahrnehmungshilfen, 207, 323 

Tiefpassfilter, 135, 287 

total plastic, 231 

track, 55 

Tracking, 57, 256, 287 

transform 

medial axis, 68, 308 

Transformation, 157, 162, 252, 297, 299 

transformations 

conform, 170, 292 

Transformationsmatrix, 157, 168, 299, 300, 

303, 323 

tri-chromatic coefficients, 91 

tri-stimulus values, 91 

trivial acceptance, 166 

Trivial rejection, 166 

undercolor removal, 95 

Unsharp Masking, 132, 284 

unsharp masking, 131 

US Air Force Resolution Target, 50 

vanishing point, 171 

Vektordarstellung, 35 

View Plane, 174

List of Algorithms 

1 Affine matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

2 Threshold image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 

3 Simple raster image scaling by pixel replication . . . . . . . . . . . . . . . . . . . . 42 

4 Image resizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

5 Logical mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 

6 Fast mask operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 

7 Digital differential analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 

8 Thick lines using a rectangular pen . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 

9 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

10 Halftone-Image (by means of a dither matrix) . . . . . . . . . . . . . . . . . . . . . 95 

11 Conversion from RGB to HSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 

12 Conversion from HSI to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 

13 Conversion from GRB to HSV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 

14 Conversion from HSV to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 

15 Conversion from RGB to HLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 

16 Conversion from HLS to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 

17 hlsvalue(N1,N2,HLSVALUE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 

18 Masked negative of a color image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 

19 Histogram equalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 

20 Local image improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 

21 Weighted Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 

22 Gupta-Sproull-Antialiasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 

23 Texture mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 

24 Casteljau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 

25 Chain coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 

26 Splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 

27 Quadtree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 

28 Creation of a BSP tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 

355

356 LIST OF ALGORITHMS 

29 z-buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 

30 Raytracing for Octrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 

31 Gouraud shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 

32 Phong - shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 

33 Shadow map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 

34 Implementation of Atheron-Weiler-Greeberg Algorithm . . . . . . . . . . . . . . . . 227 

35 Radiosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 

36 Feature space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 

37 Classification without rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 

38 Classification with rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 

39 Calculation with a node file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 

40 Nearest neighbor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 

41 z-buffer pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 

42 Phong pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 

43 Pipeline for lossless compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 

44 Pipeline for lossy compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 

45 JPEG image compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 

46 MPEG compression pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

List of Definitions 

1 Amount of data in an image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 

2 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 

3 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 

4 Perspective camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 

5 Skeleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 

6 Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

7 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 

8 Open . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 

9 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 

10 Morphological filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 

11 Hit or Miss Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 

12 Contour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 

13 Conversion from CIE to RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 

14 CMY color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 

15 CMYK color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

16 YIQ color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

17 Histogram stretching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 

18 Conformal transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 

19 Rotation in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 

20 2D rotation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 

21 Sequenced rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 

22 Affine transformation with 2D homogeneous coordinates . . . . . . . . . . . . . . . 168 

23 Bliniear transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 

24 Rotation in 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 

25 Affine transformation with 3D homogeneous coordinates . . . . . . . . . . . . . . . 176 

26 Bezier-curves in 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 

27 2D morphing for lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 

28 Wireframe structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 

357

358 LIST OF DEFINITIONS 

29 Boundary representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 

30 Cell-structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 

31 Ambient light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 

32 Lambert model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 

33 total plastic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

List of Figures 

4.1 Morphologische Erosion als Abfolge Komplement→Dilation→Komplement . . . . . 82 

4.2 morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 

5.1 Histogramm von Abbildung B.29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 

5.2 eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 

6.1 Histogramm eines Graukeils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 

6.2 Histogramme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 

7.1 Anwendung eines Median-Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 

7.2 Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 

7.3 Tief- und Hochpassfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 

7.4 Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 

9.1 rotated coordinate system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 

9.2 Konstruktion einer Bezier-Kurve nach Casteljau . . . . . . . . . . . . . . . . . 187 

11.1 grafische Auswertung des z-Buffer-Algorithmus . . . . . . . . . . . . . . . . . . . . 216 

B.1 wiederholte Speicherung eines Bildes in verschieden Größen . . . . . . . . . . . . . 294 

B.2 dreidimensionales Objekt mit verschiedenen Darstellungstechniken gezeigt . . . . . 300 

B.3 

Überführung einer Vektorgrafik in eine andere . . . . . . . . . . . . . . . . . . . . . 301 

B.4 Prozesskette der Abbildung eines dreidimensionalen Objekts auf die zweidimensionale 

Bildfläche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 

B.5 Pixelraster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 

B.6 binäres Rasterbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 

B.7 Tisch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 

B.8 Inputbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 

B.9 Die Verbindung zweier Pixel soll angenähert werden . . . . . . . . . . . . . . . . . 304 

B.10 Objekt bestehend aus zwei Flächen . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 

B.11 Aufteilung des Primärstrahls bei ” 

recursive raytracing“ . . . . . . . . . . . . . . . 305 

359

360 LIST OF FIGURES 

B.12 Lineare Transformation M eines Objekts A in ein Objekt B . . . . . . . . . . . . . 305 

B.13 Anwendung des Sobel-Operators auf ein Grauwertbild . . . . . . . . . . . . . . . 306 

B.14 Anwendung eines Median-Filters auf ein Grauwertbild . . . . . . . . . . . . . . . . 306 

B.15 Beleuchtetes Objekt mit spiegelnder Oberfläche nach dem Phong-Modell . . . . . 307 

B.16 Grauwertbild als höchstauflösende Ebene einer Bildpyramide . . . . . . . . . . . . 308 

B.17 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 

B.18 Anwendung des Clipping-Algorithmus von Cohen-Sutherland . . . . . . . . . . 309 

B.19 Clipping nach Cohen-Sutherland . . . . . . . . . . . . . . . . . . . . . . . . . . 309 

B.20 Verbindung zweier Punkte nach Bresenham . . . . . . . . . . . . . . . . . . . . . 310 

B.21 Anwendung eines Gradientenoperators . . . . . . . . . . . . . . . . . . . . . . . . . 310 

B.22 Auffinden der Kantenpixel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 

B.23 Rand einer Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 

B.24 Boolsche Operationen auf Binärbildern . . . . . . . . . . . . . . . . . . . . . . . . 312 

B.25 Ermittlung der normalisierten Korrelation . . . . . . . . . . . . . . . . . . . . . . . 312 

B.26 Konstruktion eines Kurvenpunktes auf einer Bezier-Kurve nach Casteljau . . . 313 

B.27 allgemeine Rotation mit Skalierung . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 

B.28 drei Merkmalsvektoren im zweidimensionalen Raum . . . . . . . . . . . . . . . . . 314 

B.29 digitales Grauwertbild (Histogramm gesucht) . . . . . . . . . . . . . . . . . . . . . 314 

B.30 leere Filtermasken . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 

B.31 morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 

B.32 eine Ebene im HSV-Farbmodell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 

B.33 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316 

B.34 Roberts-Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 

B.35 zweidimensionale Polygonrepräsentation . . . . . . . . . . . . . . . . . . . . . . . . 318 

B.36 Objekt und Kamera im Weltkoordinatensystem . . . . . . . . . . . . . . . . . . . . 319 

B.37 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 

B.38 Anwendung der normalisierten Kreuzkorrelation . . . . . . . . . . . . . . . . . . . 320 

B.39 Anwendung der medial axis Transformation . . . . . . . . . . . . . . . . . . . . . . 320 

B.40 Graukeil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 

B.41 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 321 

B.42 Erstellen dicker Linien . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 

B.43 Definition eines zweidimensionalen Objekts durch die Kettencode-Sequenz 221000110077666434544345“323 

” 

B.44 Transformation von vier Punkten . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 

B.45 Sub-Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 

B.46 drei digitale Grauwertbilder und ihre Histogramme . . . . . . . . . . . . . . . . . . 325 

B.47 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 

B.48 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326

LIST OF FIGURES 361 

B.49 morphologisches Schließen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 

B.50 Anwendung des Hit-or-Miss-Operators auf ein Binärbild . . . . . . . . . . . . . . . 326 

B.51 Halbtonverfahren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 

B.52 Polygon für BSP-Darstellung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 

B.53 Farbbildnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 

B.54 überwachte Klassifikation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 

B.55 Rechteck mit Störobjekten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 

B.56 Pixelanordnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 

B.57 Bild mit Störungen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 

B.58 Rasterdarstellung eines Objekts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 

B.59 Grauwertbild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 

B.60 Transformationsmatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 

B.61 Digitales Rasterbild mit zum Rand hin abfallender Intensität . . . . . . . . . . . . 331 

B.62 Farbfilmnegativ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 

B.63 Vereinfachter Aufbau des bällefangenden Roboters auf der Landesausstellung comm.gr2000az333 

B.64 Bild mit überlagertem kohärentem Rauschen . . . . . . . . . . . . . . . . . . . . . 334 

B.65 Alternative Berechnung der morphologischen Erosion . . . . . . . . . . . . . . . . . 335 

B.66 Foto mit geringem Kontrast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 

B.67 Morphologisches Öffnen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 

B.68 Abstandsberechnung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 

B.69 verschiedene Filteroperationen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 

B.70 Zylinder mit koaxialer Bohrung . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 

B.71 Segmentierung eines Grauwertbildes . . . . . . . . . . . . . . . . . . . . . . . . . . 340 

B.72 Artefakte bei einem schachbrettartigen Muster . . . . . . . . . . . . . . . . . . . . 340 

B.73 Anwendung der normalisierten Kreuzkorrelation auf ein gedrehtes Bild . . . . . . . 341 

B.74 automatische Kontrastverbesserung . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 

B.75 unscharfe Kante in einem digitalen Grauwertbild . . . . . . . . . . . . . . . . . . . 342 

B.76 Histogramme von zwei verschiedenen Bildern . . . . . . . . . . . . . . . . . . . . . 343 

B.77 Torus mit Oberflächenstruktur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 

B.78 Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348 

B.79 Vergrößerung eines Bildausschnittes unter Verwendung verschiedener Interpolationsverfahren 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 

B.80 BSP-Baum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 

B.81 Darstellung eines 3D-Modells unter Anwendung verschiedener Beleuchtungsmodelle 350

362 LIST OF FIGURES

Bibliography 

[FvDFH90] James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes. Computer 

Graphics, Principles and Practice, Second Edition. Addison-Wesley, Reading, 

Massachusetts, 1990. Overview of research to date. 

[GW92] 

Rafael C. Gonzalez and Richard E. Woods. Digital Image Processing. Addison-Wesley, 

June 1992. ISBN 0-201-50803-6. 

363

Lehrveranstaltungsinhalt aus - Institute for Computer Graphics and ...

Create successful ePaper yourself

Delete template?

Save as template?