Master Thesis - Fachbereich Informatik

Fachbereich Informatik 

Department of Computer Science 

Master Thesis 

Visual Inspection of Fast Moving Heat 

Shrink Tubes in Real-Time 

Alexander Barth 

A thesis submitted to the Bonn-Rhein-Sieg University of Applied Sciences 

in partial fulfillment of the requirements for the degree of 

Master of Science in Computer Science 

Date of submission: December 16, 2005 

Examination Committee: Prof. Dr.-Ing. Rainer Herpers (Supervisor) 

Prof. Dr. Dietmar Reinert

Declaration 

I hereby declare, that the work presented in this thesis is solely my work and 

thattothebestofmyknowledgethisworkisoriginal,exceptwhereindicated 

by references to other authors. 

This thesis has neither been submitted to another committee, nor has it been 

published before. 

St. Augustin, December 16, 2005 

Alexander Barth 

i

Acknowledgments 

First of all, I would like to thank my thesis advisor Prof. Dr.-Ing. Rainer Herpers and Prof. 

Dr. Dietmar Reinert for guiding this work and for their helpful input and discussions. 

Many thanks to the company DSG-Canusa for funding this work and for supporting me 

in designing the hardware setup. A special thanks to Thomas Schminke, Markus Greßnich, 

Manfred Hirn, Andreas Dederichs and Klaus Lanzerath. 

During my thesis work, several people, fellow students and friends working in the Computer 

Vision Lab at Bonn-Rhein-Sieg University of Applied Sciences provided me with 

useful comments and ideas. Particularly, I would like to thank Stefan Hahne, Axel Hau, 

Bernd Göbel, Ingmar Burak, Christian Becker and Nils Neumaier, who also modeled the 

nice 3D figures. I also thank Philipp Wegner and Patrick Schmitz for assisting me during 

the experiments and for measuring several hundreds of heat shrink tubes by hand. 

Furthermore, I appreciate the time I could spent at the York University in Toronto 

and at the Centre of Vision Research, Toronto. Thanks to all professors and students 

who contributed to my interest in Computer Vision. A special thanks goes to Markus 

Enzweiler. We really had a great time not only during our stay in Canada and always 

kept in touch for productive discussions. 

Many thanks to Gemma Adcock from New Zealand, who gave me great native support 

in writing my thesis in English. 

Finally, I would like to thank Steffi for being so understanding during stressful times. 

iii

Abstract 

Heat shrink tubing is widely used in electrical and mechanical applications for insulating 

and protecting cable splices. Especially in the automotive supply industry accuracy 

demands are very high and quality assurance is an important factor in establishing and 

maintaining customer relationships. In production, the heat shrink tubes are cut into 

lengths (between 20 and 100mm) from a continuous tube. During this process, however, 

deviations from the target length can occur. 

In this thesis, a prototype of a vision-based length measuring sensor for a range of heat 

shrink tubes is presented. The measuring is performed on a conveyor belt in real-time at 

velocities of up to 40m/min. The tubes can differ in color, diameter and length. 

In a multi-measurement strategy, the total length of each tube is computed based on up 

to 11 single measurements while the tube is in the visual field of the camera. Tubes that 

do not meet the allowed tolerances between ±0.5mm and ±1mm depending on the target 

length are sorted out by air-pressure. Both the engineering and the software development 

are part of this thesis work. 

About 70% of all manufactured tubes are transparent, i.e. they show a poor contrast 

to the background. Thus, sophisticated but fast algorithms are developed which reliably 

detect even low contrast tube edges under the presence of background clutter (e.g. belt 

texture or dirt) with subpixel accuracy. For this purpose, special tube edge templates are 

defined and combined with model knowledge about the inspected objects. In addition, 

perspective and lens specific distortions have to be compensated. 

An easy to operate calibration and teach-in step has been investigated which is importanttobeabletoproducedifferenttubetypesatthesameproductionlineinshort 

intervals. 

The prototype system has been tested in extensive experiments at varying velocities 

and for different tube diameters and lengths. The measuring precision of non deformed 

tubes can reach 0.03mm at a conveyor velocity of 30m/min. Even with elliptical deformations 

of the cross-section or deflections it is still possible to achieve an average precision 

of < 0.1mm. The results have been compared to manually acquired ground truth measurements, 

which also show a standard deviation of about 0.1mm under ideal laboratory 

conditions. Finally, a 100% control during production is possible with this system - reaching 

the same accuracy and precision than humans without getting tired. 

v

Contents 

Acknowledgments iii 

Abstract v 

List of Tables xi 

List of Figures xiii 

1. Introduction 1 

1.1. MachineVision-StateofArt.......................... 1 

1.2. ProblemStatement................................ 3 

1.3. Requirements................................... 4 

1.4. RelatedWork................................... 5 

1.5. ThesisOutline .................................. 6 

2. Technical Background 9 

2.1. VisualMeasurements............................... 9 

2.1.1. AccuracyandPrecision ......................... 9 

2.1.2. InverseProjectionProblem ....................... 10 

2.1.3. CameraModels.............................. 10 

2.1.4. CameraCalibration ........................... 13 

2.2. Illumination.................................... 16 

2.2.1. LightSources............................... 16 

2.2.2. IncidentLighting............................. 18 

2.2.3. Backlighting ............................... 19 

2.3. EdgeDetection.................................. 20 

2.3.1. EdgeModels ............................... 20 

2.3.2. DerivativeBasedEdgeDetection.................... 21 

2.3.3. CommonEdgeDetectors ........................ 23 

2.3.4. SubpixelEdgeDetection......................... 27 

2.4. TemplateMatching................................ 29 

3. Hardware Configuration 31 

3.1. Conveyor ..................................... 31 

3.2. Camerasetup................................... 33 

3.2.1. CameraSelection............................. 33 

3.2.2. CameraPositioning ........................... 38 

3.2.3. LensSelection .............................. 38 

3.3. Illumination.................................... 43 

3.4. BlowOutMechanism .............................. 48 

vii

viii Contents 

4. Length Measurement Approach 51 

4.1. SystemOverview................................. 51 

4.2. ModelKnowledgeandAssumptions ...................... 53 

4.2.1. CameraOrientation ........................... 53 

4.2.2. ImageContent .............................. 53 

4.2.3. TubesUnderPerspective ........................ 54 

4.2.4. EdgeModel................................ 56 

4.2.5. Translucency ............................... 56 

4.2.6. TubeOrientation............................. 57 

4.2.7. BackgroundPattern ........................... 59 

4.3. CameraCalibration ............................... 59 

4.3.1. CompensatingRadialDistortion .................... 59 

4.3.2. Fronto-Orthogonal View Generation . . . ............... 60 

4.4. TubeLocalization ................................ 65 

4.4.1. GrayLevelProfile ............................ 65 

4.4.2. ProfileAnalysis.............................. 66 

4.4.3. PeakEvaluation ............................. 68 

4.5. MeasuringPointDetection ........................... 75 

4.5.1. EdgeEnhancement............................ 75 

4.5.2. TemplateBasedEdgeLocalization................... 78 

4.5.3. TemplateDesign ............................. 80 

4.5.4. SubpixelAccuracy ............................ 87 

4.6. Measuring..................................... 89 

4.6.1. DistanceMeasure............................. 90 

4.6.2. PerspectiveCorrection.......................... 90 

4.6.3. TubeTracking .............................. 91 

4.6.4. TotalLengthCalculation ........................ 92 

4.7. Teach-In...................................... 93 

4.7.1. RequiredInput.............................. 93 

4.7.2. Detection Sensitivity ........................... 93 

4.7.3. PerspectiveCorrectionParameters................... 94 

4.7.4. CalibrationFactor ............................ 94 

5. Results and Evaluation 97 

5.1. ExperimentalDesign............................... 97 

5.1.1. Parameters ................................ 97 

5.1.2. EvaluationCriteria............................ 99 

5.1.3. GroundTruthMeasurements ......................102 

5.1.4. Strategies .................................104 

5.2. TestScenarios...................................105 

5.3. ExperimentalResults...............................107 

5.3.1. Noise ...................................107 

5.3.2. MinimumTubeSpacing .........................109 

5.3.3. ConveyorVelocity ............................110 

5.3.4. TubeDiameter ..............................116 

5.3.5. Repeatability...............................121 

5.3.6. Outlier ..................................123

Contents ix 

5.3.7. TubeLength ...............................124 

5.3.8. Performance ...............................126 

5.4. DiscussionandFutureWork...........................130 

6. Conclusion 133 

Appendix 135 

A. Profile Analysis Implementation Details 137 

A.1.GlobalROI ....................................137 

A.2.ProfileSubsampling ...............................138 

A.3.ScanLines.....................................138 

A.4.NotesonConvolution ..............................140 

B. Hardware Components 141 

B.1.Camera ......................................141 

B.2.IlluminationHardware..............................142 

Bibliography 145

x Contents

List of Tables 

1.1. Rangeoftubetypesconsideredinthisthesis. ................. 4 

1.2. Tolerancespecifications ............................. 5 

3.1. Lensselection-Overview ............................ 42 

3.2. Lensselection-FieldofViewatminimumobjectdistance.......... 43 

3.3. Lensselection-Workingdistances ....................... 43 

3.4. Blowoutcontrolprotocol ............................ 48 

4.1. Thresholdcomparisonofprofileanalysis.................... 73 

4.2. Comparisonofdifferentedgedetectors..................... 76 

4.3. Templatecurvaturetestsetparameters .................... 82 

5.1. Overviewondifferenttestparameters ..................... 98 

5.2. Constantsoftwareparametersettingsthroughouttheexperiments. ..... 98 

5.3. Testsetusedtodeterminethehumanvarianceinmeasuring. ........102 

5.4. Resultsof50mmtubesatdifferentvelocities(black) .............112 

5.5. Resultsof50mmtubesatdifferentvelocities(transparent)..........113 

5.6. Resultsof50mmtubeswithdifferentdiameterat30m/min .........116 

5.7. Resultsofblowoutexperiment .........................125 

5.8. Resultsof30mmand70mmtubesat30m/min ................127 

B.1. Camera specifications for the AVT Marlin F-033C and F-046B. . ......141 

B.2. Light Source (A20800.2) with DDL Lamp ...................142 

B.3.Backlightspecifications .............................142 

B.4.Lampspecifications................................143 

xi

xii List of Tables

List of Figures 

2.1. AccuracyandPrecision ............................. 10 

2.2. Parallellinesatperspective ........................... 11 

2.3. Pinholegeometry................................. 12 

2.4. Thinlensmodel ................................. 13 

2.5. Incidentlightingsetups ............................. 18 

2.6. Edgemodels ................................... 21 

2.7. Comparisonofdifferentedgedetectors..................... 24 

2.8. Orientationselectivefilters ........................... 27 

2.9. Subpixelaccuracyusinginterpolationtechniques ............... 28 

3.1. Hardwaresetupoftheprototype ........................ 32 

3.2. BAYERmosaic.................................. 34 

3.3. Comparisonofcolorandgraylevelcamera................... 36 

3.4. Colorinformationoftransparenttubes..................... 37 

3.5. Telecentriclens.................................. 40 

3.6. FieldofViewgeometry ............................. 42 

3.7. Tubesatdifferentfrontlightingsetups..................... 44 

3.8. Backlightingthroughaconveyorbelt ..................... 45 

3.9. Polarizedbacklighting.............................. 46 

3.10.Backlightpanel ................................. 47 

3.11.Blowoutsetup .................................. 48 

4.1. Systemoverview ................................. 52 

4.2. Potentialimagestates .............................. 53 

4.3. Tubemodels ................................... 54 

4.4. Measuringplanedefinition............................ 55 

4.5. Characteristicintensitydistributionoftransparenttubes........... 57 

4.6. Tubeorientationerror .............................. 58 

4.7. Cameracalibration-Calibrationimages.................... 60 

4.8. Cameracalibration-Subpixelcornerextraction................ 61 

4.9. Cameracalibration-Extrinsicparameters................... 61 

4.10.Cameracalibration-Radialdistortionmodel ................. 62 

4.11.Camerapositioning-OnlineGridCalibration................. 64 

4.12.Camerapositioning-controlpoints....................... 65 

4.13.Scanlinesforprofileanalysis .......................... 66 

4.14.Profileanalysis .................................. 70 

4.15.Motivationforaregion-basedprofilethreshold ................ 72 

4.16.Ghosteffect.................................... 73 

4.17.Characteristictubeedgeresponses ....................... 79 

xiii

xiv List of Figures 

4.18.TemplateDesign ................................. 81 

4.19.TemplateOccurrence............................... 83 

4.20.Templatewithextremeheightweightingcoefficient.............. 83 

4.21.TemplateWeighting ............................... 84 

4.22.Templaterotation-Motivation ......................... 85 

4.23.Templatecurvatureoccurrence ......................... 86 

4.24.Subpixelaccuratetemplatematching...................... 88 

4.25.Perspectivecorrectionfunction ......................... 90 

5.1. Measuring slide used for acquiring ground truth measurements by hand. . . 102 

5.2. Intraandinterhumanmeasuringvariance...................103 

5.3. Supplytube....................................104 

5.4. Accuracyevaluationoflengthmeasurementsatsyntheticsequences.....108 

5.5. Resultsofminimumspacingexperiment ....................109 

5.6. Minimumtubespacingforblacktubes.....................110 

5.7. Measuringresultsat20m/min..........................111 

5.8. Resultsof8mmblacktubesat30m/min....................113 

5.9. Resultsof8mmtransparenttubesat30m/min ................114 

5.10.Brightnessvarianceofanemptyconveyorbeltatbacklight .........115 

5.11.Bent6mmtube..................................116 

5.12.Experimentalresultsofblacktubeswith6and12diameter .........117 

5.13.Groundtruthdistanceofblacktubeswith6and12mmdiameter......118 

5.14.Influenceofcross-sectiondeformationsat12diametertubes.........119 

5.15. Experimental results of transparent tubes with 6 and 12mm diameter . . . 120 

5.16. Ground truth distance of transparent tubes with 6 and 12mm diameter . . 120 

5.17.Failureoftubeedgedetectionduetoapoorcontrast.............121 

5.18.Repeatabilityofthemeasurementofonetube.................122 

5.19. Repeatability of the measurement of a metallic cylinder . ..........123 

5.20.Resultsofoutlierexperiment ..........................124 

5.21.Resultsof30mmand70mmtubesat30m/min ................127 

5.22.Performanceevaluationresults .........................129 

5.23.Backgroundsuppressioninthefrequencydomain...............131 

A.1.Comparisonofdifferentscanlines........................139

1. Introduction 

Heat shrinkable tubing is widely used for electrical and mechanical insulation, sealing, 

identification and connection solutions. Customers are mainly from the automotive, electronics, 

military or aerospace sector. In terms of competition in world markets, high 

quality assurance standards are essential in establishing and maintaining customer relationships. 

Especially in the automotive supply industry, accuracy demands are very high, 

and tolerated outliers are specified in only a few parts-per-million. 

In this master thesis, a prototype of a vision-based sensor for real-time length measurement 

of heat shrink tubes in line production is presented. The main objectives are 

accuracy, reliability and meeting time constraints. 

The thesis work has been accomplished in cooperation with the company DSG-Canusa, 

Meckenheim, Germany. 

1.1. Machine Vision - State of Art 

This section gives an overview on the term Machine Vision (MV), the use of vision systems 

in industrial applications, and a brief historical review. In addition, the advantages and 

drawbacks of MV are discussed and related applications are presented. The term Machine 

Vision is defined by Davies [16] as follows: 

“Machine Vision is the study of methods and techniques whereby artificial vision systems 

can be constructed and usefully employed in practical applications. As such, it 

embraces both the science and engineering of vision.” 

Researchers and engineers argue whether the terms Machine Vision and Computer 

Vision can be used synonymously [7]. Both terms are part of a larger field called Artificial 

Vision and have many things in common. The main objective is to make artificial systems 

‘see’. However, the priorities of the two subjects differ. 

Computer Vision has arisen in the academic field and concentrates mainly on theoretical 

problems with a strong mathematical background. Usually, as the term Computer Vision 

indicates, a computer processes an input image or a sequence of images. Nevertheless, 

many methods and algorithms developed in Computer Vision can be adapted to practical 

applications. 

Machine Vision, on the other hand, implies practical solutions for many applications, 

and covers not only the image processing itself, but also the engineering that makes a 

system work [16]. This includes the right choice of the sensor, optics, illumination, etc. 

MV systems are often used in industrial environments making robustness, reliability and 

cost-effectiveness very important. If an application is highly time-constrained or computationally 

expensive, specific hardware (e.g. DSPs, ASICs, or FPGAs) is used instead of 

an off-the-shelf computer [42]. A current trend is to develop imaging sensors, that have 

1

2 CHAPTER 1. INTRODUCTION 

on-chip capabilities for image processing algorithms. Thus, the image processing moves 

from the computer into the camera superseding the bottleneck of data transfer. 

During the 1970s and 1980s, western companies faced a new challenge with the Asian 

market [7]. Especially countries like Japan established new production methods, leading 

to an increased significance of quality in manufacturing at the international markets. 

Many western companies proved unable to meet the challenge and failed to survive, while 

others realized the importance of quality assurance and started to investigate the use of 

new technologies like Machine Vision. MV has many advantages and is able to improve 

product quality, to enhance processing efficiency, and to increase operational safety. 

In the early 1980s, the development in the field of Artificial Vision was slow and mainly 

academic, and the industrial interest was low until the late 1980s and early 1990s [7]. A 

significant progress in computer hardware allows for real-time implementations of image 

processing algorithms, developed over the past 25 years, on standard platforms. The 

decreasing costs for computational power made MV systems more and more attractive, 

leading to a growth of MV applications and companies developing such systems. Today, 

the field of MV has become a confident multi-million dollar industry [7]. 

The objectives of MV systems include position recognition, identification, shape and 

dimension check, completeness check, image and object comparison, and surface inspection 

[18]. Usually, the goal is to detect and sort out production errors or to guide a robot arm 

(or other devices) in a particular task [42]. 

MV systems can be found in all industrial sectors and cover a huge range of inspected 

objects. Dimensional measuring tasks can be found for example in the inspection of 

bottles on assembly lines [72], wood [15, 50], screw threads [34], or thin-film disk heads 

[61]. Measuring objects is often related to 3D CAD models [23, 43]. An example for 

guiding a robot arm in grasping 3D sheet metal parts is given in [52]. Giving a detailed 

overview on all potential applications is beyond the scope of this thesis. 

Guaranteed product quality can help to establish and maintain customer relationships, 

enhancing the competitive position of a company. The main advantage of visual inspection 

in quality control is, beside its versatile range of applications, that it is non-contact, clean, 

fast [7]. 

Although the interpretative capability of today’s vision systems can not achieve the 

ability of the human visual system in the overall case, it is possible to develop systems that 

perform better than people at some quantitative task. However, this assumes controlled 

and circumscribable conditions, reducing the problem to a defined and repetitive task. 

Usually, such conditions can be established at manufacturing lines. 

A human operator can be expected to be only 70-80% efficient, even under ideal conditions 

[7]. In practice, there are many factors that can reduce this productive efficiency 

of humans like tiredness, sickness, boredom, alcohol or drugs. For example, if a human 

is instructed to observe objects on a conveyor, this task is tiring and it is not unlikely 

that the operator is distracted after a while. On the other hand, a MV system could, 

theoretically, perform the same task 24 hours a day and 365 days a year without getting 

tired. 

If the inspection is performed in surroundings were working can be unpleasant, intolerable, 

dangerous or harmful to health for a human being, MV is a welcome option. This 

includes working under high (or low) temperatures, chemical exhalation, smoke, biological

1.2. PROBLEM STATEMENT 3 

hazards, risk of explosion, x-rays, radioactive material, loud noise levels, etc. [7]. On the 

other hand, in applications that require aseptic conditions as in the food or pharmaceutical 

industry, a human operator can be a ‘polluter’ as a source of dirt particles (hair, danders, 

bacteria, etc.). In this case, a MV system is a clean alternative. 

Machines usually exceed humans in all kinds of accurate vision-based measurements. 

Human vision performs well in comparing objects and in detecting differences for example 

in shape, color or texture [27]. Large deviations can be detected quickly. As the difference 

is getting smaller, however, the time of inspection increases or the deviation can not be 

detected at all without technical tools. With respect to the task considered in this thesis, 

a human is not able to determine the length of an object at sub millimeter precision just 

by looking at it. Manual measurements are slow and not practicable in line production if 

100% control is desired, and can thus be used only for random inspection of few objects. 

MV systems on the other hand can measure the length (or other features) of an object 

without contact up to nm precision - depending on the optical system and the size of the 

object [16]. Furthermore, humans soon reach limits, if the number of objects to inspect per 

minute increases significantly. Many manufacturing processes are so fast that the human 

eye has problems to even perceive the objects, not to mention the ability to accomplish 

any inspection task. MV systems, however, can handle several hundred objects per minute 

with high accuracy. 

Although MV systems have many advantages for manufacturers, there are also drawbacks. 

Usually, a MV system is designed and optimized for a specific task in a constraint 

environment. If the requirements of the application change, the system has to be adapted, 

which can be difficult and expensive [7]. Furthermore, the system can be sensitive to a 

lot of influences of the (industrial) environment like heat, humidity, dirt, dust, or ambient 

lighting. Respective precautions have to be taken to protect the system. Finally, like in 

general in automation, vision systems that exceed the power of a human at some specific 

task, replace human operators and will therefore supersede mostly low-skilled jobs in the 

future. Addressing this problem in more detail is outside the scope of this thesis. 

1.2. Problem Statement 

A large variety of heat shrink tubes of different sizes, material and shrinking properties is 

available on the market. The focus in this thesis will be on the DSG- Canusa DERAY- 

SPLICEMELT series. These tubes are commonly used for insulation of cable splices in 

the automotive industry (see Figure 1.1 for an example). A film of hotmelt adhesive inside 

the heat shrink tubes provides a waterproof sealing around the splice after shrinking. In 

addition, the DERAY- SPLICEMELT series shows a strong resistance against thermal, 

chemical and mechanical strains. The easy and fast handling allows for an application in 

series production. Accordingly, if the heat shrinking is performed in an automated fashion, 

the accuracy demands increase. 

In production, the heat shrink tubes are cut into lengths from a continuous tube. During 

this process, however, deviations from a specific target length can occur. In terms of quality 

assurance, any deviations above a tolerable level must be detected so that failings can be 

sorted out.


(a) (b) (c) 

Figure 1.1: Application of a transparent heat shrink tube of type DERAY- SPLICEMELT. 

After shrinking the heat shrink tube provides a robust, waterproof insulation of the cable 

splice. (Source: DSG-Canusa) 

Property Attributes 

Color transparent, black 

Length 20-100mm 

Diameter 6, 8, 12mm 

Table 1.1: Range of tube types considered in this thesis. 

Delivering defectives must be avoided at highest priority to satisfy the customer and to 

retain a good reputation. In this context, tolerable failure rates are specified in parts per 

million. Rejected goods can be very expensive. 

Up to now, length measurements have been performed manually by a human operator. 

This has several drawbacks. First, only random samples can be controlled by hand, since 

10 parts per second and more considerably exceed the human capabilities. Furthermore, 

one operator is busy doing the monotone measuring task at one machine and can not be 

deployed to other tasks. This leads to a low effective productivity. In practice, more than 

one production line that cuts the heat shrink tubes into lengths is running in parallel, 

requiring even more human resources which is very expensive. In addition, there is always 

a non-negligible possibility of subjective errors when human operators carry out the 

inspections - they also show symptoms of fatigue over time in this highly repetitive task. 

The measuring quality varies detectable between morning and late shift. 

In this thesis work a machine vision inspection system is developed that is able to replace 

the human operator at this particular measuring task allowing for a reliable 100% control. 

1.3. Requirements 

The system must cover a range of tube types, differing in diameter, length or material 

properties. An overview of the variety of tube types can be found in Table 1.1. 

ThetwomainclassesofDERAY-SPLICEMELT heat shrink tubes considered in this 

thesis are black or transparent in color - transparent tubes cover about 70% of the production. 

Unlike black tubes, the transparent ones are translucent and appear slightly 

yellowish or reddish due to a film of hotmelt adhesive inside the tube. 

Most tubes have a printing on the surface that can consist of both letters and numbers 

(e.g. DSG2). Since this printing is plotted onto the continuous tube before being cut into

1.4. RELATED WORK 5 

Length [mm] Tolerance [mm] 

20 − 30 ±0.5 

31 − 50 ±0.7 

51 − 100 ±1.0 

Table 1.2: Tolerance specifications of different tube lengths 

lengths, the position of the printing is not consistent among the tubes and must not affect 

the measuring results. 

The tube length ranges from 20mm to 100mm. In this thesis, however, the focus will 

be on 50mm tubes since this is the dominant length in production. The outer diameter 

varies between 6mm and 12mm. 

The tolerances differ between 0.5 and1.0mm depending on the tube length as can be 

seen in Table 1.2. This table includes the tolerable deviations from a given target length 

in mm. 

Themeasurementshavetobeaccomplishedatlineproductiononaconveyorinrealtime. 

The system is intended to reach a 100% control without reducing production velocity. 

Currently the conveyor runs at approximately 20m/min, i.e. 3-17 tubes per second are 

cut depending on the segment size. Theoretically the cutting machine is able to run at 

up to 40m/min. A faster velocity results in less processing time per tube segment. The 

system design must be robust with respect to industrial use. Theoretically, it must be 

able to run stable 24 hours/day, 7 days/week and 365 days/year. 

Although there are many different tube types, only one kind of tube is processed at 

one production line over a certain period of time. This means, the tube segments to be 

inspected on the conveyor are all of the same kind. However, to be flexible to customer 

demands, a production line must be able to be rearranged to a different kind of tube 

several times a day. This emphasizes the importance of an easy to operate calibration and 

teach-in step of the inspection system for practical application. 

The goal of the visual inspection is a reliable good/bad decision for each tube segment 

whether it has to be sorted out or not. In the following, tube segments wrongly classified 

as proper, but nevertheless deviating from the given target length above the allowed 

tolerances (see Table 1.2), are denoted as false positives. On the other hand, false negatives 

are tube segments that are classified for sorting out, although the actual length meets the 

tolerances. To reach optimal product quality, the number of false positives must be reduced 

to zero. Large numbers of false negatives indicate that the system is not adjusted properly 

and has to be reconfigured. 

1.4. Related Work 

In Section 1.1 several examples of vision-based measuring systems in industrial applications 

have been presented. Much more work in this area has been done over the past 20 years 

[4]. However, MV related publications of academic interest often consider only specific 

subproblems, but do not present a detailed insight of the whole system. On the other 

hand, commercial manufacturers of MV systems hide the technical details in order to keep 

the competitive advantage [18].


There are several useful books addressing the fundamental methods, techniques and 

algorithms used to develop machine vision applications in a comprehensive fashion [7, 16, 

18, 62]. 

Dimensional measuring of objects requires knowledge of an object’s boundaries. A 

common indicator for object boundaries, both in human and artificial vision, are edges. 

Edge detection is a widely investigated area of vision research dating back from 1959 in the 

field of TV signal processing [37] to the present. The edge detection methods considered 

in this thesis are related to the work of Sobel [36, 51], Marr and Hildreth [45] and Canny 

[13]. 

In addition, anisotropic approaches have been proposed [69], i.e. orientation selective 

edge detectors. These filters have many applications for example in texture analysis or 

in the design of steerable filters that efficiently control the orientation and scale of filters 

to extract certain features in an adaptive way [25, 49]. Many of these approaches are 

motivated in early human vision. In their investigation of the visual cortex, Hubel and 

Wiesel discovered orientation selective cells in the striate cortex V1 [33]. In several theories 

it is assumed that humans perceive low-level features such as edges or lines by combinations 

of the response of these cells [27]. Many computer vision researchers, however, adapted 

the idea of orientation selective cells or filters which can be combined to produce a certain 

response. Such sets of filters are often called filter banks. Malik and Perona [44] used a 

filter bank based on even symmetric difference of offset Gaussians (DOOG) for texture 

discrimination. 

The discrete pixel grid resolution of CCD camera images limits the measuring accuracy. 

Thus, several techniques have been proposed that compute subpixel edge positions [6, 41, 

66, 56, 71]. 

A common task in vision applications is to search whether a particular pattern is part 

of an image, and if so, where it is located [28]. Template matching is one method to 

tackle this problem. Cross-correlation techniques are widely used as measure of similarity 

[64, 18, 62]. In stereo vision, correlation is used to solve the problem of correspondences 

between the left and right view [21, 65]. Other practical applications can be found in 

feature trackers, pattern recognition, or registration of e.g. medical image data. 

Accurate visual measurements often require a camera calibration step to relate 3D points 

in the real world to image coordinates and to compensate for lens distortions. One early 

approach was presented by Tsai [39, 67]. An extensive introduction into calibration is 

given by Faugeras [21] or Hartley and Zisserman [30]. The calibration approach in this 

thesis work is closely related to the work of Zhang [74] and Heikkilä and Silvén [31]. 

1.5. Thesis Outline 

The remainder of this thesis is organized as follows: Chapter 2 provides the theoretical 

background on models and techniques used in later sections with regard to measuring with 

video cameras. This chapter also gives an overview on different illumination techniques 

used for machine vision applications. 

In Chapter 3, the physical design of the system is introduced. Especially the camera 

and lens selection as well as the illumination setup are discussed in detail in this chapter.

1.5. THESIS OUTLINE 7 

The vision part of the system is presented in Chapter 4. After describing assumptions 

and model knowledge used throughout the inspection, the different steps of the length 

measuring are proposed. This chapter also contains the calibration and teach-in of the 

system as well as the algorithms and techniques used to perform the measuring task with 

respect to real-time demands. 

The system is systematically evaluated in Chapter 5. Therefore, several quantitative 

and qualitative evaluation criteria as well as different test scenarios are introduced. The 

automated measurements are compared to human measurements in terms of accuracy and 

precision. Finally, the results are discussed, and ideas for future work are given. The 

thesis concludes with a summary on the presented work in Chapter 6.

8 CHAPTER 1. INTRODUCTION

2. Technical Background 

2.1. Visual Measurements 

This section introduces the basic concepts and techniques making visual measurements 

possible. It is elementary to understand the fundamental process of image acquisition 

as well as the underlying camera models and geometries to be able to understand what 

parameters influence the measurement of real world objects in video images. Based on 

these concepts one can determine the factors that influence accuracy and precision. 

Extracting information about real world objects from images in machine vision applications 

is closely related to the area of photogrammetry. In [5], photogrammetry is defined 

as the art, science, and technology of obtaining reliable information about physical objects 

and the environment through the processes of recording, measuring, and interpreting photographic 

images and patterns of electromagnetic radiant energy and other phenomena. 

There are many traditional applications of photogrammetry in geography, remote sensing, 

medicine, archaeology, or crime detection. In machine vision applications, there is a 

wide range of measuring tasks including dimensional measuring (size, distance, diameter, 

etc.) or angles. Although sophisticated algorithms can increase accuracy, the quality and 

repeatability of measurements is always related to the hardware used (e.g. camera sensor, 

optical system, digitizer) as well as the environmental conditions (e.g. illumination). 

2.1.1. Accuracy and Precision 

Throughout this thesis the terms accuracy and precision are used quite often and are 

mostly related to measuring quality. Although these terms may be used synonymously in 

a different context, with respect to measurements they have a very distinct meaning. 

Accuracy relates a measured length to a known reference truth or ground truth. The 

closer a measurement approximates the ground truth, the more accurate is the measuring 

system. Precision represents the repeatability of measurements, i.e. how much different 

measurements of the same object vary. The more precise a measuring system is, the closer 

lie the measured values together. 

Figure 2.1 visualizes the definition of accuracy and precision in a mathematical sense. 

The distribution of a set of measurements can be expressed in terms of a Gaussian probability 

density function. The peak of this distribution corresponds to the mean value of the 

measurements. The distance between the mean value and the reference ground truth value 

determines the accuracy of this measurement. The standard deviation of the distribution 

can be used as measure of precision. 

It is important to state that accuracy does not have to imply precision and vice versa. 

For example the measuring result of a tube of 50mm length could be 50 ± 20mm. This 

statement is very accurate but not very precise. On the other hand a measuring system can 

be very precise, but not accurate if it is not calibrated correctly. Thus, good measurements 

for industrial inspection tasks have to be both accurate and precise. 

9

10 CHAPTER 2. TECHNICAL BACKGROUND 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

49.5 49.6 49.7 49.8 49.9 50 50.1 50.2 50.3 50.4 50.5 

Reference 

Value 

Figure 2.1: Visualization of the difference between accuracy and precision in terms of measurements. 

A good measuring system must be both accurate and precise. 

2.1.2. Inverse Projection Problem 

A general problem of human vision denoted as inverse projection problem [27] can also 

be applied for artificial systems. It states that the (perspective) projection of threedimensional 

world objects onto a two-dimensional image plane can not be inverted welldefined. 

The loss of dimension indicates a loss of information, which can not be compensated 

in general, since it is possible to produce the same stimulus on the human retina or 

the camera sensor by different origins. So several objects of different size or shape can look 

identical in an image. One important property to consider in this context is the influence 

of perspective. The term perspective is further discussed again in Section 2.1.3. 

Humans can compensate for the inverse projection problem by certain heuristics and 

model knowledge of the scene in many situations. Similar techniques can be adapted 

to artificial systems. Especially in machine vision applications where conditions are welldefined 

and known, model knowledge of the inspection task can be derived and integrated. 

2.1.3. Camera Models 

There are several approaches to model the geometry of a camera. Addressing all these 

models is outside the scope of this thesis. In the following only the most common camera 

models are introduced that provide a theoretic basis for visual measurements with CCD 

cameras. 

Pin Hole Camera The simplest form of a camera known as camera obscura was invented 

in the 16th century. The underlying principle of this camera was already known long before 

by Aristotle (384-322 BC): Light enters an image plane through an (ideally) infinite small 

hole, so only one ray of light from the world passes through the hole for each point in 

the 2D image plane leading to an one-to-one correspondence. Objects at a wide range 

of distances from the camera can be imaged sharp and undistorted [65, 73]. The camera 

obscura is formally named pin hole camera. In the non-ideal case the pinhole has a finite 

size, thus, each image point collects light from a cone of rays. 

Accuracy 

Precision

2.1. VISUAL MEASUREMENTS 11 

(a) 

Figure 2.2: Parallel lines intersect at horizon at perspective. Image taken by F. Wagenfeld 

at Alaska Highway between Watson Lake and Whitehorse, Canada. 

In the 15th century, Filippo Brunelleschi used the pin hole camera model to demonstrate 

the laws of perspective discovered earlier [24, 38]. Two main effects characterize the pin 

hole perspective or central perspective: 

Close objects appear larger than far ones 

Parallel lines intersect at horizon 

Figure 2.2 visualizes these effects of perspective at an example. 

A drawback of the pinhole camera with respect to practical use in combination with a 

photosensitive device is its long exposure time, since only a little amount of light enters 

the image plane at one time [65]. However, the pinhole model can be used to derive 

fundamental properties in a mathematical sense that describe the imaging process. These 

properties can be extended by more realistic models to imply real imaging devices. 

Figure 2.3(a) gives an overview over the pinhole geometry. The camera center O, also 

denoted as optical center or center of projection, is the origin of a 3D coordinate system 

with the axis X, Y and Z. This 3D coordinate system is denoted as camera reference 

frame or simply camera frame. The image plane ΠI is defined to be parallel to the XY 

plane, i.e. perpendicular to the Z axis. The point o where the Z axis intersects the image 

planeisreferredtoasimage center. TheZ axis, i.e. the line through O and o is denoted 

as optical axis. 

The fundamental equations of a perspective camera describe the relationship between 

apointP =(X, Y, Z) T in the camera frame and a point p =(x, y) T in the image plane: 

x = f X 

Z 

y = f Y 

Z 

(2.1) 

(2.2) 

where f is the focal length of the camera. p can be seen as the point of intersection of a 

line through P and the center of projection with the image plane ΠI [30]. This relationship 

can be easily derived from Figure 2.3(b). In the following, lower-case letters will always


(a) (b) 

Figure 2.3: (a) Pinhole geometry. (b) Projection of a point P in the camera frame onto the 

image plane ΠI (herewithregardtoY ). 

indicate image coordinates, while upper-case letters refer to 3D coordinates outside the 

image plane. 

Weak-Perspective Camera If the relative distance between points in the camera frame 

with respect to the Z axis (scene depth) is small compared to the average distance from 

the camera, these points are approximately projected onto the image plane like lying all 

on one Z-plane Z0. Thus,theZ coordinate of each point can be approximated by Z0 as: 

x ≈ f X 

Z0 

y ≈ f Y 

Z0 

(2.3) 

This has the effect of all points being projected with a constant magnification [24]. If 

thedistancebetweencameraandplaneZ0 increases to infinity, there is a direct mapping 

between 3D points in the camera frame and in the image plane: 

x = X (2.4) 

y = Y 

This projection is denoted as orthographic projection [65]. To overcome the described 

problems of pinhole cameras, real imaging systems are usually provided with a lens which 

collects rays of light and brings them into focus on the image plane. 

Thin Lens Camera The simplest optical system can be modeled by a thin lens. The 

main characteristics of a thin lens are [65]:


Figure 2.4: Thin lens camera model. 

Any ray entering the lens parallel to the axis on one side goes through the focus on 

the other side 

Any ray entering the lens from the focus on one side emerges parallel to the axis on 

the other side 

The geometry of a thin lens imaging system is shown in Figure 2.4. F and ˆ F are the 

focus points before and behind the lens. From this model one can derive the fundamental 

equation of thin-lenses [65]: 

1 1 1 

+ = (2.5) 

Z z f 

where Z is the distance or depth of a point to the lens and z the distance between the 

lens and the image plane. The focal length f, i.e. the distance between the focus point 

and the lens is equal at both sides of the thin lens in the ideal model. 

Thick Lens Camera Real lenses are represented much better by a thick lens model. The 

thin lens model does not consider several aberrations that come with real lenses. This 

includes defocusing of rays that are neither parallel nor go through the focus (spherical 

aberration), different refraction based on the wavelength or color of light rays entering 

the lens (chromatic aberration), or focusing of objects at different depths. Another factor 

that is important with real lenses with respect to accurate measuring applications, is lens 

distortion. Ideally, a world point, its image point and the optical center are collinear, and 

world lines are imaged as lines [30]. For real cameras this model does not hold. Especially 

at the image boundaries, straight lines appear curved (radial distorted). The effect of 

distortion will be re-addressed in following sections. 

2.1.4. Camera Calibration 

Until now, all relationships between 3D points and image coordinates have been defined 

with respect to a common (camera) reference frame. Usually, the location of a point 

in the world is not known in camera coordinates. Thus, if one wants to relate world 

coordinates to image coordinates, or vice versa, one has to consider geometric models and


physical parameters of the camera. At this stage, one can distinguish between intrinsic 

and extrinsic parameters [24]. 

Intrinsic Parameters The intrinsic parameters describe the projection of a point in the 

camera frame onto the image plane, i.e. the transformation of camera coordinates into 

image coordinates. This transformation extends the ideal perspective camera model introduced 

in the previous section with respect to properties of real CCD cameras. One can 

derive the following projection matrix Mi: 

⎛ 

⎞ 

−f/sx k ox 

Mi = ⎝ 0 −f/sy oy ⎠ (2.6) 

0 0 1 

where f represents the focal length, sx and sy theeffectivepixelsizeinxand y direction 

respectively, k the skew coefficient, and (ox,oy) the coordinates of the image center. α = sy 

is the aspect ratio of the camera. If α = 1, the sensors of the CCD array are ideally square. 

The skew coefficient k determines the angle between the pixel axis and is usually zero, i.e. 

the x- and y axis are perpendicular. (ox,oy) can be seen as an offset that translates the 

projection of the camera origin onto the image origin in pixel dimensions. If sx = sy =1 

and ox = oy = k =0,Mi represents an ideal pinhole perspective camera. 

Extrinsic Parameters The extrinsic parameters take the transformation between a fixed 

world coordinate system (or object coordinate system) and the camera coordinate system 

into account. This includes the translation and rotation of the coordinate axis [65], i.e. a 

translation vector T =(Tx Ty Tz) T and a 3 × 3 rotation matrix R such as: 

⎛ 

Me = ⎝ 

r11 r12 r13 −R T 1 T 

r21 r22 r23 −R T 2 T 

r31 r32 r33 −R T 3 T 

⎞ 

sx 

⎠ (2.7) 

where rij (i, j ∈{1, 2, 3}) are the matrix elements of R at (i, j) andRi indicates the 

ith row of R. 

Thus, the relationship between world and image coordinates can be written in terms of 

two matrix multiplications [65]: 

⎛ 

⎝ 

x1 

x2 

x3 

⎞ 

⎠ = Mi Me 

⎛ 

⎜ 

⎝ 

X 

Y 

Z 

1 

⎞ 

⎟ 

⎠ 

(2.8) 

with (X, Y, Z, 1) T representing a 3D world point in homogeneous coordinates, and image 

coordinates can be computed as x = x1/x3 andy = x2/x3 respectively. M = MiMe is 

denoted as projection matrix in the following. 

Image Distortion The resulting image coordinates may be distorted by the lens, i.e. 

linear projection is not guaranteed. If high accuracy and precision is required, the simple 

mathematical relationships introduced before are not sufficient. 

To overcome this effect, a model of the distortion has to be defined. A common radial 

distortion model [30] can be written as:


� xd 

yd 

� 

� 

˜x 

= L(˜r) 

˜y 

� 

(2.9) 

where (˜x, ˜y) T is the undistorted and (xd,yd) the corresponding distorted image position. 

The function L(˜r) determines the amount of distortion depending on the radial distance 

˜r = � ˜x 2 +˜y 2 from the center for radial distortion. 

The correction of the distortion at a measured position p =(x, y) can be computed as: 

ˆx = xc + L(r)(x − xc) (2.10) 

ˆy = yc + L(r)(y − yc) (2.11) 

where (ˆx, ˆy) T is the undistorted (corrected) position, (xc,yc) T the center of the radial 

distortion, and r = � (x − xc) 2 +(y − yc) 2 theradialdistancebetweenp and the center 

of distortion. 

An arbitrary distortion factor L(r) can be approximated by the following equation [30]: 

L(r) =1+ 

m� 

κir i 

i 

(2.12) 

whichisdefinedforr > 0andL(0) = 1. The distortion coefficients κi as well as 

the center of radial distortion (xc,yc) T can be seen as additional intrinsic parameters of 

the camera model. The number of coefficients m depends on the required accuracy and 

the available computation time. Usually less than the first three or four coefficients are 

considered. In common calibration procedures such as the calibration method proposed 

by Tsai [67], only the even coefficients (i.e. κ2, κ4,...) are taken into account while odd 

coefficients are set to zero. In this case, one or two coefficients are sufficient to compensate 

for the distortion in most cases [31]. 

Beside the radial distortion model, there are several other models including tangential, 

linear, and thin prism distortion [31]. Usually a radial distortion model is combined with 

a tangential model as proposed in [11, 12]. 

There are several approaches to compute the unknown intrinsic and extrinsic parameters 

of a camera. The most common methods are based on known correspondences between 

real world points and image coordinates. A chessboard-like calibration grid has become 

quite common as a calibration pattern. The corners of the grid provide a set of coplanar 

points. 

The world coordinates can be easily determined if one defines a coordinate system with 

the X- andY axis lying orthogonal in the chessboard plane and Z = 0 for all points. A 

corner not to close to the center represents the world origin. Based on these definitions, 

each corner of the calibration pattern can be described in the form (X, Y, Z) T . Threedimensional 

calibration rigs composed of orthogonal chessboard-planes are also used quite 

often. 

In a captured image of the calibration pattern, the corners can be extracted at pixel 

(or subpixel) level, and mapped to world coordinates. If there is a sufficient number of 

correspondences, one can try to solve a homogeneous linear system of equations based on


the projection matrix M. The solution is also denoted as implicit camera calibration, since 

the resulting parameters do not have any physical meaning [31]. In the next stage, the 

intrinsic and extrinsic camera parameters can be extracted from the computed solution of 

M [21]. 

There are linear and nonlinear methods to solve for the projection matrix. Linear methods 

assume an ideal pinhole camera and ignore distortion effects. Thus, these methods 

can be solved in closed-form. Abdel-Aziz and Karara [1] introduced a direct linear transform 

(DLT) to compute the parameters in a noniterative algorithm. If higher accuracy is 

needed, nonlinear optimization techniques have been investigated to accomplish for distortion 

models. Usually, the parameters are estimated by minimizing the pixel error between 

a measured point correspondence and the reprojected position of the world point using 

the projection matrix in least-square sense. This is an iterative process that may end 

up with a bad solution unless a good initial guess is available [70]. Therefore, linear and 

nonlinear methods are combined as the DLT can be used for initialization of the nonlinear 

optimization. One well-known two-step calibration method was proposed by Tsai [67, 39] 

2.2. Illumination 

In machine vision applications, the right choice of illumination can simplify the further 

image processing considerably [16]. It can preprocess the input signal (e.g. enhance 

contrast or object boundaries, eliminate background, diminish unwanted features etc.) 

without consuming any computational power. On the other hand, even the best imaging 

sensor can not compensate for the loss in image quality induced by poor illumination. 

There are several different approaches of illumination in MV. Depending on the application 

one has to consider what has to be inspected (e.g. boundaries, surface patterns 

or color), what are the material properties of the objects to be inspected (e.g. lighting 

reflection characteristics or translucency) and what are the environmental conditions (e.g. 

background characteristics, object dimension, camera position or the available space to 

install light sources). In the following, different types of light sources and lighting setups 

used in MV are introduced. 

2.2.1. Light Sources 

The light sources commonly used in machine vision include high-frequency fluorescent 

tubes, halogen lamps, xenon light bulbs, laser diodes and light emitting diodes (LED) 

[18]. 

High-frequency fluorescent lights High-frequency fluorescent light sources are widely 

used in machine vision applications, since they produce a homogeneous, uniform, and 

very bright illumination. They feature white or ultraviolet light at low development of 

heat, thus, there is no need for fan cooling. 

Standard fluorescent light tubes are not suitable for vision applications since they flicker 

cyclically with the power supply frequency. This yields unwanted changes in intensity or 

color in the video image, whereas the effect increases, if the capturing rate of the camera is 

close to the power supply frequency (e.g. 50Hz in Germany). High-frequency fluorescent

2.2. ILLUMINATION 17 

tubes alternate at about 25kHz, what is far beyond what can be captured by a video 

camera. 

Fluorescent lights exist at different sizes, shapes, and setups. Beside the common light 

tube, there are also fluorescent ring lights or rectangle area lights. Low costs and a long 

life-time make fluorescent lights even more attractive. 

Light Emitting Diodes A LED is a semiconductor device that emits incoherent, monochromatic 

light with the wavelength depending on the chemical composition of the semiconductor. 

Today, different wavelengths of the visible spectrum for humans ranging from 

about 400 to 780nm, as well as ultraviolet or infrared wavelengths, can be covered by 

LEDs. The emitted visible light appears for example red, green, blue or yellow. Furthermore, 

it is possible to produce LEDs that appear “white” by combining a blue LED with 

a yellowish phosphor coating. 

LEDs have many advantages compared to other light sources. Due to the small size, they 

can be used for a variety of lighting geometries [18]. This includes ring lights, dome lights, 

area or line lights, spot lights, dark-field lights and backlights. Theoretically, each single 

LED in a cluster can be controlled independently. Thus, it is possible to generate different 

illumination conditions (for example different lighting angles or intensities) with a single 

setup by enabling and disabling certain LEDs, e.g. automated or software controlled. It 

is also possible to use LEDs in strobe light mode. 

Another advantage of LEDs is their energy efficiency and long lifetime with only a little 

loss in intensity over time. Thus, LEDs have low maintenance costs. Operated at DC 

power, LEDs do not produce any flickering visible as intensity changes in the video image. 

Halogen lights Halogen lamps are an extension of light bulbs and filled with a halogen 

gas (e.g. bromine or iodine). With respect to machine vision applications, halogen lamps 

are often used in combination with fiber optic light guides [18]. The emitted light of a 

light source is transferred through this fiber optic light guides, allowing for very flexible 

illumination setups and geometries. This includes ring lights, dome lights, area or line 

lights, spot lights, dark-field lights and backlights as for LEDs. Furthermore, there are a 

range of fiber optic bundles at different sizes to route and position the light for user-defined 

lighting. 

One disadvantage of halogen lamps is a large heat development. Thus, usually active 

cooling is required. Nevertheless, due to the bright “white” light emitted by halogen 

lights (color temperature of about 6000K), they are also called cold light sources in the 

literature. If heat development of the light source can be harmful to heat sensitive objects, 

fiber optics can be useful to keep the light source away from the point of inspection. Like 

LEDs, halogen lamps do not produce flickering effects, if the light source is DC-regulated. 

Thus, halogen lamps qualify for high accuracy inspection tasks. 

Xenon lights, often used for strobe light mode, are quite similar to halogen lamps. These 

lights allow for very short and bright light pulses, which are used to reduce the effect of 

motion blur. 

Besidethedifferentwaysoflightgeneration,therearemultiplepossiblesetupsofhow 

light sources are arranged. Especially LED lights and fiber optics are very flexible as 

introduced before. They can be adapted to a wide range of machine vision tasks at almost 

any size and geometry.


(a) (b) 

(c) (d) 

Figure 2.5: Incident lighting setups. (a) Indirect diffuse illumination over a hemisphere. (b) 

Diffuse ring- or area light setup. (c) Darkfield illumination. (d) Coaxial illumination. 

2.2.2. Incident Lighting 

Incident lighting or front lighting is characterized by one or more light sources illuminating 

the object of interest from the cameras viewing direction. This includes diffuse front 

lighting, directional front lighting, polarized light, axial/in-line illumination and structured 

lighting [18]. Figure 2.5 gives an overview on different incident lighting setups. 

Diffuse Lighting Diffuse lighting reduces specular surface reflections and can be seen as a 

uniform, undirected illumination. It is usually generated by one or more light sources that 

are placed behind a diffuser at a certain distance. This yields the effect of one uniform 

area of light. A diffuser can be a pane of white translucent (acrylic) glass, mylar or other 

synthetic material. Instead of using a diffuser in front of a light source, indirect lighting 

can also result in diffuse illumination. A simple, but effective method reported in the 

literature [7] converts a Chinese wok into a hemisphere for diffuse lighting. The inner 

side of the wok is painted white. The camera can be placed at a hole on the top of the 

hemisphere (the bottom of the wok). The light sources are arranged in a way they can not 

directly illuminate the object, but the emitted light is reflected at the white screen inside 

the hemisphere (see Figure 2.5(a)). A diffuse illumination can also be achieved using a 

ring or area light source as in Figure 2.5(b) 

Directional Lighting Directional lighting is achieved by one or more directed light sources 

at a very low angle of incidence. The main characteristic of such type of illumination is the


effect of completely smooth objects appearing dark in the image, since the light rays are 

not reflected toward the camera, while unevenness leads to brighter image intensities. Due 

to this effect, directional lighting is also denoted as dark field illumination in the literature 

[18] (See Figure 2.5(c)). Directional lighting mostly qualifies for surface inspection tasks 

that consider the surface structure revealing irregularities or bumpiness. 

Polarized light In combination with a polarizing filter in front of the camera lens, incident 

lighting with polarized light can be used to avoid specular reflections. Such reflections 

preserve the polarization of a light ray, thus, with the right choice of filter, only scattered 

light rays can pass the filter and reach the camera. A maximal filter effect can be reached if 

the polarization of the light source and the filter are perpendicular to each other. Polarized 

light is often combined with a ring light setup to avoid both shadows and reflections. 

Structured lighting Structured lighting is used to obtain three-dimensional information 

of objects. A certain pattern of light (e.g. crisp lines, grids or cycles [18]) is projected 

onto the object. Based on deflections of this known pattern in the image, one can infer 

the object’s three-dimensional characteristics. For example, in [58], a 3D scanner using 

structured lighting is presented that integrates a real-time range scanning pipeline. In 

machine vision applications, structured lighting can be used for dimensional measuring 

tasks were the contrast between object and background is poor. 

Axial illumination In this type of illumination setup (see Figure 2.5(d)), also denoted as 

coaxial illumination in the literature, the light rays are directed to run along the optical 

axis of the camera [18]. This is achieved using an angled beam splitter or half-silvered 

mirror in combination with a diffuse light source. The beam of light has usually the same 

size as the camera’s field of view. The main application of axial illumination systems 

is to illuminate highly reflective, shiny materials such as plastic, metal or other specular 

materials, or for example to inspect the inside of bore holes. Axial illumination is typically 

used for inspection of small objects such as electrical connectors or coins. 

One potential problem with most incident lighting methods are shadows. Although the 

shadow contrast can be lowered using several light sources at different positions around 

the object (e.g. ring lights) or axial illumination setups, objects with sharp corners or 

concavities might have regions that can not be illuminated and therefore especially regions 

close to the object’s boundaries appear darker in the image. Thus, dark objects on a bright 

background may appear enlarged [16]. The effect of shadows is less significant for bright 

objects on dark background. In applications that require totally shadow-free conditions 

for highly accurate measurements of object contours, another lighting setup called back 

lighting can be used, as introduced in the following. 

2.2.3. Back lighting 

The setup were the object is placed between the light source and the camera, compared 

to incident lighting, is denoted as back light illumination. In this arrangement, the light 

enters the camera directly leading to bright intensity values at non-occluded regions. The 

object, on the other hand, casts a shadow on the image plane, thus, leading to darker 

intensity values. Non-translucent materials result in a very strong, shadow-free contrast,


which makes back lighting interesting for dimensional measuring tasks. Furthermore, 

surface structures or textures can be suppressed. If the only light source is placed below 

the object, there will be no shadows around. Back lighting can also be used for localization 

of wholes and cracks, or for measuring translucency. 

In combination with polarized light, back lighting can also be adapted to enhance the 

contrast of transparent materials which are difficult to detect in an image at other lighting 

setups. In a typical scenario, polarized light entering the camera directly is filtered out by 

an adequate polarization filter in front of the camera lens, while the polarization of the 

light is changed when passing through the object. Thus, in opposition to back lighting 

without polarization, background regions appear dark in the image while (translucent) 

objects result in brighter intensities. Figure 3.9 in Section 3.3 visualizes the effect of back 

lighting in combination with a polarization filter. 

2.3. Edge Detection 

An edge can be defined as particularly sharp change in (image) brightness [24], or more 

mathematically speaking a strong discontinuity of the spatial image gray level function 

[36]. 

Beside edges due to object boundaries, there are much more causes for edges in images 

such as shadows, reflectance, texture or depth. Thus, simply extracting edges in images 

is no general indicator for object boundaries. To yield a semantical meaning, edge information 

can be combined with other features including shape, color, texture, or motion. 

Model knowledge about expected properties can be useful to group these low-level features 

to objects. 

In real images there are many changes in brightness (or color), but with respect to a 

certain application it may be of interest to extract only the strongest edges or edges of a 

certain orientation. Thus, information such as edge strength and orientation have to be 

taken into account to link the results of the filter response. Furthermore, in real images 

there is also a certain amount of noise in the data which has to be handled carefully. 

2.3.1. Edge Models 

Edges can be modeled according to their intensity profiles [65]. 

considered in this thesis are shown in Figure 2.6. 

The two edge models 

The ideal step edge is the basis for most theoretic approaches. It can be defined as: 

� 

i1 

Eideal(x) = 

i2 

,x

2.3. EDGE DETECTION 21 

i 2 

i 1 

0 

(a) 

Figure 2.6: (a) Ideal step edge model. (b) Ramp edge model. 

reach a higher precision than the discrete pixel grid (See Section 2.3.4). Ramp edges also 

appear if an object is not in focus, or if imaged at motion (motion blur). 

There are three common criteria for optimal edge detectors proposed by Canny: 

Good detection 

Good localization 

Uniqueness of response 

The first criterion states an optimal edge detector must not be affected by noise, i.e. it 

must be robust against false positives (edges due to noise). On the other hand, edges of 

interest have to be conserved. 

The good localization criterion takes into account the precision of the detected edge 

position. The distance between the real edge location and the detected position must 

vanish. 

The last criterion requires to have distinct and unique results where only the local 

maxima of an edge is relevant. Responses of more than one pixel describe an edge location 

only poorly and should be suppressed. 

The Canny edge detector [13] is designed to optimize all three criteria (See Section 2.3.3 

for more details). However, there is definitive tradeoff between the detection and localization 

criterion, since it is not possible to improve both criteria simultaneously [65]. 

2.3.2. Derivative Based Edge Detection 

A common way to localize strong discontinuities in a mathematical function is to search for 

local extrema in the function’s first-order derivative or for zero crossings in the secondorder 

derivative. This principle can be easily adapted to images, thus, replacing the 

problem of edge detection by a search for extrema or zero crossings. 

In the discrete case, differentiation of the image gray level function f(x, y) can be approximated 

by finite differences. Since an image can be seen as a two-dimensional function, 

it can be differentiated in both the horizontal and vertical direction, i.e. with respect to 

the x- and y-axis respectively. Following the notation of [21], the partial derivatives ∂f 

∂x 

and ∂f 

∂y 

can be calculated as: 

i2 

i 1 

0 

(b)


∂f 

∂x (x, y) ∼ = ∆xf(x, y) =f(x +1,y) − f(x, y) (2.14) 

∂f 

∂y (x, y) ∼ = ∆yf(x, y) =f(x, y +1)− f(x, y) (2.15) 

The partial derivative operators ∆x and ∆y can be expressed by a discrete convolution 

of the image with the filter kernel [1 -1] and[1-1] T for x- andy-direction respectively 

(the ‘center’ elements of the asymmetric kernel are printed in bold). There are other approximations 

possible including the mirrored versions of the kernels above or a symmetric 

kernel 1/2[1 0 − 1] [36]. 

Accordingly, the second-order derivative can be approximated by the discrete operators 

∆ 2 x =[1 − 2 1] and ∆ 2 y =[1 − 2 1] T . 

Under the presence of noise (as usual in real images), edge detectors using the approximations 

introduced before work only poor. This is due to the fact that noise is mostly 

uncorrelated and is characterized by local changes in intensity. Assuming a uniform region, 

a good edge detector should result in a value of zero at this region. With noise the local 

intensity variations lead to noticeable responses (and local extrema) if using estimates of 

partial derivatives. Therefore, all common edge detectors include a certain smoothing step 

to reduce the influence of noise. The selection of the smoothing function, however, can 

differ between approaches. The most common smoothing function is a Gaussian. 

The Gaussian function is a widespread choice, since it comes with several advantages. 

This includes the property of a Gaussian that convolving a Gaussian with a Gaussian 

results in another Gaussian. Assume a Gaussian function G1 with standard deviation σ1 

and G2 with standard deviation σ2. The result of convolving G1 and G2 is a Gaussian 

with standard deviation σG1∗G2 : 

σG1∗G2 = 

� 

σ2 1 + σ2 2 

(2.16) 

Thus, instead of resmoothing a smoothed image to get a stronger smoothing, it is 

possible to use a single convolution with a Gaussian with larger standard deviation. This 

obviously saves computational costs, which is important since convolution is an expensive 

operation. 

Another advantage of a Gaussian kernel is its separability. This means, a two-dimensional, 

circularly symmetric Gaussian function Gσ(x, y) can be factored into two one-dimensional 

Gaussians (see [24]) as: 

Gσ(x, y) = 

= 

� 

1 (x2 + y2 ) 

exp 

2πσ2 2σ2 � 

� � 

1 x2 √2πσexp 2σ2 �� 

1 y2 √2πσexp 2σ2 �� 

(2.17) 

Since convolution is an associative operation, the same results can be achieved by convolving 

an image with a two-dimensional kernel, or by applying a convolution once with 

the separated version in x-direction and convolve the result with the y-version. In practice, 

a convolution with a discrete N × N kernel can be replaced by two convolutions with a


N × 1 kernel. This increases the performance significantly for large images and N. More 

information about convolution and filter separation can be found for example in [64]. 

The general procedure of edge enhancement in common derivative-based edge detectors 

can be summarized into two steps: 

1. Smoothing of the image by convolving with a smoothing function 

2. Differentiation of the smoothed image 

Mathematically, this can be expressed as follows (here with respect to x): 

Iedge(x, y) = K∂/∂x ∗ (S ∗ I(x, y)) (2.18) 

= (K∂/∂x ∗ S) ∗ I(x, y) 

= ∂S 

∗ I(x, y) 

∂x 

where K∂/∂x indicates the filter kernel approximating the partial derivative with respect 

to x. S represents the kernel of the smoothing function. Again, the associativity of the 

convolution can be used to optimize processing. Thus, instead of first smoothing the 

image with kernel S and then calculating the partial derivative, it is possible to reduce 

the problem to a single convolution with the partial derivative of the smoothing kernel 

∂S 

∂x . Hence, the first-order derivative of a Gaussian is suited as an edge detector which is 

less sensitive to noise compared to finite difference filters [24]. The response of the edge 

detector can be parametrized by the standard deviation of the Gaussian to control the 

scale of detected edges, i.e. the level of detail. A larger σ suppresses high-frequency edges 

for example. 

2.3.3. Common Edge Detectors 

Due to the large number of approaches in this section only a selection of common edge 

detectors can be presented. Figure 2.7 visualizes the edge responses of different edge 

detectors that will be introduced in the following in more detail. 

Sobel Edge Detector A very early edge detector that is still used quite often in the 

present is the Sobel operator. It was first described in [51] and attributed to Sobel. It is 

the smallest difference filter with odd number of coefficients that averages the image in 

the direction perpendicular to the differentiation [36]. The corresponding filter kernel for 

x and y are: 

⎡ 

SOBELX = ⎣ 

⎡ 

SOBELY = ⎣ 

1 0 −1 

2 0 −2 

1 0 −1 

⎤ 

1 2 1 

0 0 0 

−1 −2 −1 

⎦ (2.19) 

⎤ 

⎦ (2.20)


(a) (b) 

(c) (d) 

Figure 2.7: Comparison of different edge detectors. (a) Common LENA test image. (b) 

Gradient magnitude based on Sobel operator. (c) Edges enhanced via the discrete Laplace 

operator. (d) Result of Canny edge detector (Hyteresis thresholds: 150, 100).


These operators compute the horizontal and vertical components of a smooth gradient 

[21], denoted as gx and gy in the following. The total gradient magnitude g at a pixel 

position p in an image can be computed by the following equation: 

� 

g(p) = g2 x(p)+g2 y(p) (2.21) 

An example of the gradient magnitude based on the Sobel operator can be found in 

Figure 2.7(b). The following approximations can be used in order to save computational 

costs: 

g(p) ≈ |gx(p)| + |gy(p)| (2.22) 

g(p) ≈ max(|gx(p)|, |gy(p)|) (2.23) 

These approximations yield equally accurate results on average [22]. Beside the gradient 

magnitude it is possible to compute the angle of the gradient as: 

� � 

gy(p) 

φ(p) =arctan 

(2.24) 

gx(p) 

Although there is a certain angular error with the Sobel gradient [36], it is used very 

often in practice, since it provides a good balance between the computational load and 

orientation accuracy [16]. 

The Equations 2.21-2.24 are defined not only for the Sobel operator, but for every other 

operator that computes the horizontal and vertical gradient components. 

Canny Edge Detector Today, the Canny edge detector [13] is probably the most used 

edge detector, and is proven to be optimal in a precise, mathematical sense [65]. It is 

designed to detect noisy step edges of all orientations and consists of three steps: 

1. Edge enhancement 

2. Nonmaximum suppression 

3. Hysteresis thresholding 

The first step is based on a first-order Gaussian derivative as introduced before. For 

fast implementations, the separability of the filter kernel can be used to improve the performance. 

Gradient magnitude and orientation can be computed as in Equation 2.21 and 

2.24, or using the approximations. The standard deviation parameter σ of the Gaussian 

function influences the scale of the detected edges. A lower σ preserves more details (highfrequencies), 

but also noisy edges, while a larger σ leaves only the strongest edges. The 

appropriate σ depends on the image content and what kind of edges should be detected. 

The goal of the nonmaximum suppression step is to thin out ridges around local maxima 

and return a number of one pixel wide edges [65]. The dominant direction of the gradient 

calculated in step one determines the considered neighbors of a pixel. The gradient magnitude 

at this position must be larger than both neighbors, otherwise it is no maximum 

and its position is set to zero (suppressed) in the edge image.


In the last stage of the Canny edge detector, an edge tracking combined with hysteresis 

thresholding is applied. Starting at a local maxima that meets the upper threshold of the 

hysteresis function, the algorithm follows the contour of neighboring pixels that have not 

been visited before and meet the lower threshold. Due to step two, a set of one-pixel wide 

contours is the output of the edge detection (see Figure 2.7(d) for an example with an 

upper threshold of 150 and a lower threshold of 100). 

As in most cases, a thresholding is always a tradeoff between false positives (in this 

case edges due to noise) and false negatives (suppressed or fragmented edges of interest). 

As with the standard deviation of the Gaussian in step one, the hysteresis thresholds 

have to be adapted depending on the particular image content. Methods for estimating 

the threshold parameters dynamically from image-statistics are reported for example in 

[68] or [29]. There are many variations and extensions of the Canny edge detector. One 

popular approach motivated by the Canny’s work is the edge detector of Deriche [19]. 

Laplace The Laplace edge detector is a common representative for second-order derivative 

edge detectors. Recalling edges are localized at zero crossings in the second-order 

derivative of an image’s two-dimensional intensity function, the goal is to find zero crossings 

that are surrounded by strong peaks. 

The Laplacian of a function can be seen as sensible analogue to the second derivative 

and is rotationally invariant [24]. It is defined as 

∇ 2 (f(x, y)) = ∂2 f 

∂x 2 + ∂2 f 

∂y 2 

(2.25) 

As with first-order derivative edge detectors, a smoothing operation to reduce noise 

is performed before applying the edge detector, usually with a Gaussian. Analog to 

Equation 2.18, the two steps can be combined by applying the Laplacian function to 

the Gaussian smoothing kernel before convolution. This leads to an edge detector denoted 

as Laplacian of Gaussian (LoG) proposed by Marr and Hildreth [45]. It is quite common 

to replace the LoG with a Difference of Gaussians (DoG) [24] to reduce the computational 

load. 

A discrete Laplace operator can be derived directly from the first-order operators ∆ 2 x 

and ∆ 2 y as 

L∇2 = ∆ 2 x ⊕ ∆ 2 y (2.26) 

= [1 − 2 1]⊕ [1 − 2 1] T 

⎡ 

⎤ 

0 1 0 

= ⎣ 1 −4 1 ⎦ 

0 1 0 

where the ⊕ operator denotes the tensor product [10] in this context. The result of the 

discrete Laplace operator applied to the LENA test image can be found in Figure 2.7(c). 

Edge detectors based on the Laplacian are isotropic, meaning the response is equally 

over all orientations [36]. One drawback of this approach is that second-order derivative 

based methods are much more sensitive to noise than gradient-based methods.


(a) 0 ◦ 

(b) 90 ◦ 

(c) 30 ◦ 

(d) (e) (f) (g) 

Figure 2.8: Orientation selective filters based on rotated versions of a first derivative Gaussian 

(Images taken from [25]). 

Orientation Selective Edge Detection Until now all presented approaches for edge detection 

have been more or less isotropic, but there are also many approaches that consciously 

exploit anisotropy leading to orientation selective edge detectors. A good overview 

on anisotropic filters can be found for example in [69]. These filters have many applications 

for example in texture analysis or in the design of steerable filters that efficiently 

controltheorientationandscaleoffilterstoextractcertainfeaturesinanadaptiveway. 

An orientation selective filter can be generated from a rotated version of an elongated 

Gaussian derivative. Figure 2.8 shows an example of different filters that are mostly sensitive 

to 0 ◦ ,90 ◦ ,and30 ◦ oriented edges respectively. If many different orientations should 

be detected independently in one image, common optimizations exploit the associativity 

of the convolution operation. Instead of convolving the image with a large number of different 

orientation specific filters, the image is convolved with few basis filters only. Then, 

an anisotropic response of an arbitrary orientation can be estimated over a weighted sum 

of the basis filter responses. For more information on the technical background of this 

approach is referred to the original papers [25, 49]. 

2.3.4. Subpixel Edge Detection 

At image acquisition (e.g. with CCD cameras) light intensity is integrated over a finite, 

discrete array of sensor elements. Following the Sampling Theorem [36] this sampling can 

be seen as a low-pass filter on the incoming signal, cutting off high-frequencies. Hence, 

strong edges, which can be seen as high-frequency, may not be imaged precisely by the 

discrete grid. On the other hand, edge detectors that work on pixel level can detect the 

real edge position only roughly. The average localization error is 0.5 pixel since the center 

of the real edge could be anywhere within the pixel [65]. 

In many applications such as high precision measuring tasks, detected edges at pixel grid 

accuracy are often not accurate enough. Thus, subpixel techniques have been developed 

toovercomethelimitsofdiscreteimagesandtocomputecontinuousvaluesthatliein 

between the sampled grid.


(a) 

300 

250 

200 

150 

100 

50 

0 

Interpolated 

subpixel 

edge location 

Discrete 1st derivative 

Spline Interpolation 

Edge Profile 

50 

0 2 4 6 8 10 12 14 16 18 

x 

Figure 2.9: (a) Subpixel accuracy using bilinear interpolation. Pixel position P is a local 

maximum if the gradient magnitude of gradient g at P is larger than at the positions A 

and B respectively. These positions can be computed using bilinear interpolation between 

the neighboring pixels 0, 7and3, 4 respectively. The gradient direction determines which 

neighbors contribute to the interpolation. The edge direction is perpendicular to the gradient 

vector. (b) The discrete first derivative of a noisy step edge is approximated using cubic 

spline interpolation. The subpixel tube edge location is assumed to be at the maximum of the 

continuous spline function, which can lie in between two discrete positions (here at x =9.5). 

Interpolation is the most common technique to compute values between pixels by consideration 

of the local neighborhood of a pixel. This includes for example bilinear, polynomial, 

or B-spline interpolation. In [21], a linear interpolation of the gradient values within 

a3× 3 neighborhood around a pixel is proposed. Here, the gradient direction determines 

whichofthe8neighborsareconsidered(seeFigure2.9(a)). Sincethegradientdoesnot 

have to fall exactly on pixel positions on the grid, the gradient value is interpolated using 

a weighted sum of the two pixel positions respectively that are next to the position where 

the gradient intersects the pixel grid (denoted as A and B in the figure). In a nonmaximum 

suppression step, the center pixel is classified as edge pixel only if the gradient magnitude 

at this position is larger than at the interpolated neighbors. If so, the corresponding edge 

is perpendicular to gradient direction. 

Since the center pixel P lies still on the discrete pixel grid, one has to perform a second 

interpolation step, if higher precision is needed. The image gradient within a certain 

neighborhood along the gradient direction (e.g. A-P -B) can be approximated for example 

by a one-dimensional spline function [17, 66]. Figure 2.9(b) shows an example of a noisy 

step edge between the discrete pixel positions 9 and 10 in x-direction. The discrete first 

derivative of intensity profile is approximated with cubic splines. The extremum of this 

continuous function can be theoretically detected with an arbitrary precision representing 

the subpixel edge position. However, there are obviously limits of what is still meaningful 

with respect to the underlying input data. In this example, a resolution of 1/10 pixel was 

used. The maximum is found at 9.5, i.e. exactly in between the discrete positions. 

Rockett [56] analyzes the subpixel accuracy of a Canny implementation that uses interpolation 

by least-square fitting of a quadratic polynomial to the gradient normal to the 

detected edge. He found out that for high-contrast edges the edge localization reaches an 

(b)

2.4. TEMPLATE MATCHING 29 

accuracy of 0.01 pixels, while the error increases to about 0.1 pixels for low-contrast edges. 

Lyvers et al. [41] proposed a subpixel edge detector based on spatial moments of a gray 

level edge with an accuracy of better than 0.05 pixels for real image data. Aström [6] analyzes 

subpixel edge detection by stochastic models. A survey on subpixel measurements 

techniques can be found in [71]. 

2.4. Template Matching 

A common task in vision applications is to search whether a particular pattern is part 

of an image, and if so, where it is located [28]. Template matching is one method to 

tackle this problem. The search pattern or template can be represented as an image and is 

usually considerably smaller than the inspected input image. Then, the template is shifted 

over the input image and compared with the underlying values. A measure of similarity is 

computed at each position. Positions reaching a high score are likely to match the pattern, 

or the other way around, if the template matches at a certain location, the score has a 

maximum at this location. 

A technique denoted as cross-correlation is widely used as measure of similarity between 

image patches [64]. It can be derived from the sum of squared differences (SSD): 

cSSD(x, y) = 

W� −1 

i=0 

H−1 � 

j=0 

(T (i, j) − I(x + i, y + j)) 2 

(2.27) 

where I is the discrete image function and T the discrete template function. W and 

H indicate the template width and height respectively. Expanding the squared quantity 

yields: 

cSSD(x, y) = 

W� −1 

i=0 

H−1 � 

j=0 

T 2 (i, j) − 2T (i, j)I(x + i, y + j)+I 2 (x + i, y + j) (2.28) 

Since the template is constant, the sum over the template patch T 2 (i, j) is constant as 

well and does not contain any information on similarity. The same holds approximately 

for the sum over the image patch I 2 (x + i, y + j) if there are no strong variances in image 

intensity. Hence, the term T (i, j)I(x+i, y +j) remains the only real indicator of similarity 

that depends on both the image and the template. This leads to the cross-correlation 

equation: 

c(x, y) = 

W� −1 

i=0 

H−1 � 

j=0 

T (i, j)I(x + i, y + j) (2.29) 

It turns out that the correlation looks very similar to the discrete convolution. Indeed, 

the only difference between correlation and convolution is the sign of the summation in the 

second term [28]. Thus, theoretically a correlation can be replaced by a convolution with a 

flipped version of the template [64]. Like convolution, correlation is an expensive operation 

if applied to large images and templates. In some cases it is faster to convert the spatial 

images into the frequency domain using the (discrete) Fast Fourier Transformation (FFT),


multiply the resulting transform of one image with the complex conjugate of the other, 

and finally reconvert the result to the spatial domain using the inverse FFT [62, 64, 53]. 

Unfortunately, the assumption of image brightness constancy is weak. If there is, for 

example, a bright spot in the image, the cross-correlation results in much larger values at 

this position than at darker regions. This may lead to incorrect matches. To overcome 

this problem, several normalized correlation methods have been introduced. One common 

measure is denoted as correlation coefficient. It can be computed as: 

� W −1 

i=0 

�H−1 � � 

j=0 T (i, j) − T � 

I(x + i, y + j) − I(x, y) 

ccoeff (x, y) = 

(2.30) 

WHσT σI(x,y) where T represents the mean template brightness and I(x, y) the mean image brightness 

within the particular window at position (x, y). σT and σI(x,y) indicate the standard 

deviation of the template and the image patch respectively. The resulting values lie in 

the range between −1 and 1. Obviously, the correlation coefficient is computational more 

expensive. If the standard cross-correlation yields accurate enough results in a certain 

application it may be of interest to use a less expensive normalization that simply maps 

the results of the cross-correlation into the range of −1 to1. Thiscanbeachievedover 

the following equation: 

c ′ (x, y) = 

� W −1 

i=0 

��W −1 �H−1 i=0 

�H−1 j=0 T (i, j)I(x + i, y + j) 

j=0 T (i, j)2 �W −1 

i=0 

� 

�H−1 j=0 I(x + i, y + j)2 

� 

(2.31) 

The term cross-correlation is usually used if two different images are correlated. If one 

image is correlated with itself, i.e. I = T ,thetermautocorrelation is commonly used [28]. 

In practical applications it is often necessary to adapt the template by changing the 

orientation or scale to reach maximum matching results [28]. This increases the number 

of correlation operations, and thus, the computational load. Therefore, optimization 

strategies are used that try to exclude as many positions as possible that are very unlikely 

to match a template.

3. Hardware Configuration 

This chapter introduces the physical design of the visual inspection prototype. This includes 

the conveyor, the camera setup, the choice of illumination as well as the blow out 

mechanism. Figure 3.1 gives an overview on the hardware setup of the prototype. 

3.1. Conveyor 

For the prototype, a 200cm long and 10cm wide conveyor is used to simulate a production 

line. It can be manually fit with several tube segments where the exact number depends 

on the target length and the distance between two consecutive segments. The measuring 

is performed at a certain area of the conveyor denoted as measuring area in the following. 

The field of view of the camera is adjusted to this area, as well as the illumination as will 

be introduced in Section 3.2 and 3.3 respectively. 

The dimension of the measuring area depends on the size of the tubes to be measured. 

Therefore, with respect to the range of tube sizes, the measuring area is designed to cover 

the maximum tube size of 100mm in length and about 12mm in diameter. It must be 

even larger to be able to capture several images of each tube while passing the visual field 

of the camera. 

Since in production the tubes are cut to lengths from a continuous tube using a rotating 

knife (flying knife), there would not be a notable spacing between two consecutive tube 

segments if transfered to the measuring area with the same speed as entering the knife. 

Thus, it can be difficult to determine where one tube starts and ends in the continuous 

line by looking both for humans and artificial vision sensors. To overcome this problem, 

after cutting, the tube segments have to fall onto another conveyor with a faster velocity 

to separate them. The faster the second conveyor is compared to the first one, the larger 

the gap. 

Since processing time is expensive, the goal is to simplify the measuring conditions as 

much as possible using an elaborated hardware setup. One easy but effective simplification 

is to mount two guide bars to the conveyor that guarantee almost horizontal oriented tube 

segments. The guide bars are arranged like a narrow ‘V’ (see Figure 3.1(b)). The tubes 

enter the guide bars at the wider end and are adjusted into horizontal position while 

moving. At the measuring area the guide bars are almost parallel and just slightly wider 

than the diameter of the tubes. The distance of the guide bars can be easily changed using 

adjusting screws if the tube type changes. 

The color and structure of the conveyor belt is crucial to maximize the contrast between 

objects and background for the inspection task. Therefore, a white-colored belt is used. 

The advantage of this choice with respect to the range of tube types to be inspected in 

combination with the illumination setup will be discussed in more detail in Section 3.3. 

31

32 CHAPTER 3. HARDWARE CONFIGURATION 

(a) 

(b) 

Figure 3.1: Hardware setup of the prototype in the laboratory environment. (a) Total view. 

(b) View on the measuring area.

3.2. CAMERA SETUP 33 

3.2. Camera setup 

Machine vision applications have high demands on the imaging system, especially if high 

accuracy and precision is required. The camera and optical system, i.e. the lens, have to 

be selected with respect to the particular inspection task. This section gives an overview 

on the imaging system used in this application and how it was selected. 

3.2.1. Camera Selection 

The main criteria for camera selection with respect to the application in this thesis are: 

Image quality 

Speed 

Resolution 

The image quality is essential to allow for precise measurements. This includes a low 

signal-to-noise ratio, no or only a little cross-talking between neighboring pixels, and square 

pixel elements. As introduced in Section 1.3 the system is intended to work in continuous 

mode. Therefore, the speed, i.e. the possible frame rate, of the camera determines how 

many images of a tube can be captured within a given time period. Of course, this number 

is also depending on the velocity of the conveyor. Especially at higher velocities, a fast 

camera is important, since the idea of multi-image measurements fails if the camera is 

not able to capture more than one image of each tube that is possible to evaluate. The 

final frame rate should depend purely on the per frame processing time. This means, the 

camera must be able to capture at least as many frames as can be processed. Otherwise 

the camera would be a bottleneck. The frame rate of a camera is closely related to the 

image resolution. Higher resolutions mean a larger amount of data to be transferred and 

processed. Thus, there is a tradeoff between resolution and speed. A higher resolution 

means smaller elements on the CDD sensor array, hence, an object can be imaged more 

detailed. With respect to length measurements the effective pixel size decreases at a higher 

resolution, and a pixel represents a smaller unit in the real world. 

Three cameras have been tested and compared: 

Sony DFW VL-500 

AVT Marlin F-033C 

AVT Marlin F-046B 

These cameras are all IEEE 1394 (Firewire) progressive scan CCD cameras. 

The Sony camera has a 1/3” image device (Sony Wfine CCD) and provides VGA (640× 

480) resolution color images at a frame rate of 30 frames per second (fps). It is equipped 

with an integrated 12× zoom lens which can be adjusted over a motor. 

The Marlin F-033C is a color camera with a maximum resolution of 656 × 492 pixel in 

raw mode, while the F-046B is a gray scale camera with a resolution of 780 × 582 pixel in 

raw mode. Both cameras have a 1/2” image device (SONY IT CCD). The Marlin cameras 

reach much higher frame rates compared to the Sony. At full resolution, the F-033C


R1 G1 R2 G2 

G3 B1 G4 B2 

P1 P2 P3 

Figure 3.2: The sensor elements of single chip color cameras like the Marlin F-033C are 

provided with color filters, so that each sensor element gathers light of a certain range of 

wavelengths only, corresponding to red, green, and blue respectively. The arrangement of 

the filters is denoted as BAYER mosaic. Interpolation is needed to compute the missing two 

channels at each pixel. Image taken from [3]. 

features 74fps and the F-046B 53fps respectively. Since these cameras do not come with 

an integrated optical system, a particular lens (C-Mount) must be provided additionally. 

AmoredetailedspecificationoftheMarlincamerascanbefoundinAppendixB.1. 

It turned out that the Sony camera is not suited for this particular application. The 

main reason is the limited frame rate of 30fps, thus, a new image is captured approximately 

every 30ms. As mentioned before, the camera speed should not be the bottleneck of the 

application. However, as will be shown in Section 5.3.8, the processing time of one image is 

significantly less than 30ms, which excludes the Sony camera in this particular application. 

The Marlin cameras reach much higher frame rates and come with another advantage. 

Since the tube orientation can be considered as horizontal due to the guide bars as introduced 

in the previous section, one does not need the whole image height that a camera can 

provide. It is possible to reduce the image size to user-defined proportions also denoted 

as area of interest (AOI). This function is used to decrease the number of image rows to 

be transferred over the firewire connection, but keeping the full resolution in horizontal 

direction. For example, in a typical setup an image height of 160 pixels is large enough 

to include the whole region between the guide bars. The reduced image size is about 1/3 

of the original size. Combined with a short shutter time, the reduced number of image 

rows increases the effective frame rate significantly, so it is possible to reach frame rates 

of > 100fps. 

The decision whether to use the Marlin F-033C or the F-046B depends mainly on the 

question if color is a useful feature in this particular application. In general, single chip 

color cameras like the F-033C map a scene less accurate compared to gray scale cameras 

if image brightness is considered. 

This is due to how these cameras are designed. Each sensor cell of a single chip color 

camera is provided with a color filter for either red (R), green (G), or blue (B) respectively. 

Without these filters the sensor cells are equal to those in gray scale cameras. Usually, the 

filters are arranged in a pattern denoted as BAYER mosaic (see Figure 3.2). Within each 

2×2 region there are two green, one red, and one blue filter. This distribution is originated 

in human vision and leads to more natural looking images, since the human optical system 

is most sensitive to green light. The drawback of this approach is that the resolution of


eachcolorchannelisreduced. Toovercomethisproblem,onehastointerpolatethetwo 

missing color channels at each pixel position. There are several interpolation approaches 

also denoted as BAYER demosaicing. With respect to speed it is important to use a not 

too expensive computation. The F-033C computes R-G-B values at virtual points Pi at 

the center of each local 2 × 2 neighborhood as follows [3]: 

= R1 

P 1red 

P 1green = 1 

P 1blue 

P 2red 

P 2green = 1 

P 2blue 

P 3red 

P 3green = 1 

P 3blue 

2 (G1+G3) 

= B1 

= R2 

2 (G1+G4) 

= B1 

= R2 

2 (G2+G4) 

= B2 

(3.1) 

where the location of the different points can be found in Figure 3.2. Obviously, this 

interpolation technique reduces the resolution of the sensor in all channels, since values 

can be computed only at positions where four pixels meet and not at the boundaries of 

the image. 1 

If the inspection task can be performed at gray scale images, gray scale cameras should 

be used instead of color cameras. Intuitively, the accuracy of a color camera can not be 

the same as that of a gray scale camera, because this requires two interpolation steps. 

First, one interpolates the R-G-B color channels as introduced before, and then has to 

estimate the image brightness from these interpolated values. A gray scale camera offers 

a more direct transformation between light intensity and image values, thus, leading not 

only to more accurate images, but also to higher frame rates. This can be supported by 

the following experiment. 

A test image of graph paper has been captured once with the F-033C and once with the 

F-046B. A 16mm fix-focal length lens has been used respectively, and the distance between 

camera and graph paper as well as the viewing direction has been the same. The focus of 

the optical system was adjusted to obtain a sharp image in both cases. The results can 

be found in Figure 3.3. The color image in (a) has been converted into gray level values 

using the following equation: 

I(x, y) =0.299R(x, y)+0.587G(x, y)+0.114B(x, y) (3.2) 

where R, G, B represent the three color channels for red, green, and blue respectively 

and I is the resulting gray level image. 

The grid appears to be more sharp in the image of the gray scale camera, although the 

color image was also at focus during acquisition. The profiles of two scan lines of equal 

length through an edge of the grid (visualized in (b) and (d)) can be found in Figure 3.3(e). 

1 There are also color cameras that are provided with three chip sensors. The incoming light is split into 

different wavelength ranges via a prism. Thus, each sensor yields a full resolution image of one color 

channel and interpolation is not necessary. These cameras, however, are quite expensive and could not 

be tested.


170 

160 

150 

140 

130 

120 

110 

100 

90 

80 

(a) (b) 

(c) (d) 

Marlin F033C 

Marlin F046B 

70 

0 1 2 3 4 5 

(e) 

Figure 3.3: Comparison of the F-033C color and F-046B gray level camera. The test images 

show a graph paper captured from a distance of approximately 250mm using a 16mm fixfocal 

length lens. (a) Color image of the F-033C. (b) Zoom view showing the location of the 

scan line through a grid edge in the converted gray scale image of (a). (c) Gray scale image 

acquired with the F-046B. (d) Zoom view showing the location of the scan line through a grid 

edge in (c). (e) Profiles of the two scan lines. The F-046B acquires a significant sharper edge 

compared to the color camera which can be seen at the slope of the edge ramp.


Figure 3.4: Color information of transparent tubes in HSV color space. Rows include from 

top to bottom: Color input image, hue channel, saturation channel, value (brightness) channel, 

and in the bottom row the computed gray scale image using Eq. 3.2. Although all images are 

taken from the same sequence, tubes and background can have very different color. 

The position of the scan lines corresponds to the same real world location. It can be seen 

that both edges are ramp edges (see Section 2.3.1). The slope of the edge profile, however, 

is larger for the gray level camera, i.e. the edge can be located more precise. This is an 

important advantage with respect to accurate measuring. Therefore, if color has no other 

significant advantage over gray scale images, a gray scale camera should be preferred in 

this application. 

One can think of using color information to segment the transparent tubes from the 

background, since they appear yellowish or reddish while the conveyor belt should be 

white. For black tubes, color has obviously no significant benefit, hence, it is adequate to 

concentrate on the transparent tubes in this context. 

The idea is to use color as a measure to distinguish between transparent tubes and the 

background, since here the gray scale contrast is lower compared to black tubes. However, 

as can be seen in Figure 3.4, in real images of transparent heat shrink tubes on a conveyor, 

the color of the conveyor belt can appear quite different. The test images have been taken 

from a sequence of tubes on a moving conveyor. The images have been illuminated via a 

back light setup, which will be introduced in Section 3.3. It can be observed that some 

regions of the same conveyor belt look yellowish, while others appear blueish in the image 

(see left column in Figure 3.4). 

There are several color models beside the R-G-B model. Humans intuitively perceive 

and describe color experiences in terms of hue (chromatic color), saturation (absence of 

white) and brightness [27]. A corresponding color model is the H-S-V model, where H 

stands for hue, S for saturation, and V for (brightness) value respectively. More detailed 

information on color models can be found for example in [35]. 

In the hue domain, a yellowish transparent tube differs from a blueish background significantly 

(left column in Figure 3.4). If the background is also yellowish, the difference


between tube and background decreases (center column). Strong discontinuities in background 

color (as in the right column) could be wrongly classified as a tube. The saturation 

domain is also a quite unstable feature. If the background contains a lot of white, it is 

more desaturated than the object (like in the center column) and yields a quite strong 

contrast. The example in the left column, however, shows that the difference in saturation 

does not always have to be that clear. The brightness channel (fourth row) is very 

close to the computed gray level image using Equation 3.2 (bottom row). Thus, it equals 

approximately what a gray level camera would see. 

In this experiment it has been shown color can be a very unstable feature. With respect 

to precise length measurements it definitely turns out that there are a lot of artifacts at 

the tube edges in the H and S color channel respectively. In the brightness channel, edges 

appear much more sharp. The little artifacts in this channel are due to the camera noise, 

motion blur effects, or not perfectly adjusted camera focus. Since the brightness channel 

is closely related to the gray value image converted from R-G-B values using Equation 3.2 

one could replace the brightness channel by this image. As can be seen in Figure 3.4, the 

bottom row yields even a better contrast between object and background. 

With the observations made before, one can conclude that a gray level camera is best 

suited in this particular application. It yields the best edge quality, which is important for 

precise measurements, and both black and transparent tubes are imaged with a sufficient 

contrast between object and background making it possible to locate a tube in the image 

without using color information. Hence, the Marlin F-046B camera has been selected for 

this prototype. It yields the best compromise in image quality, resolution, and speed. 

3.2.2. Camera Positioning 

The camera is placed at fix position and viewing angle above the measuring area of the 

conveyor (see Figure 3.1(b)). In a calibration step, it is adjusted to position the image 

plane parallel to the surface of the conveyor with the optical center above the center of the 

measuring area, thus, minimizing the perspective effects at this area. The exact calibration 

procedure will be explained in Section 4.3.2. The moving direction of the conveyor (and 

therefore of the tube segments) is horizontal in the image. 

The distance between camera and conveyor depends on the optical system, i.e. the lens, 

that is used and on the tube size to be inspected. In Section 4.2.2, the basic assumptions 

and constraints regarding the image content with respect to the image processing are 

presented. This includes the assumption that only one tube can be seen totally in an 

image at one time. Correspondingly, the cameras field of view has to be adapted to satisfy 

this criterion for different tube lengths. 

Placing the camera above the conveyor has the additional advantage of not extending 

the dimensions of the production line, since space in a production hall is limited and 

therefore expensive. 

3.2.3. Lens Selection 

Parameters such as object size, sensor size of the camera, camera distance, and accuracy 

requirements determine the right optical system (objective) for a particular application. 

In the following, the term lens will be used synonymously to the term optical system or 

objective, although an objective is actually more than just a single lens (iris, case, mount,


adjusting screws, etc.). The lens, however, is the most important factor that determines 

thepropertiesoftheobjective. 

The most important parameters to specify a lens include the focal length, F-number, 

magnification, angle of view, depth of focus, minimum object distance, and finally the 

price. In addition, lenses can have a number of aberrations as introduced before in Section 

2.1.3. Lens manufacturers try to minimize for example chromatic or spherical aberrations, 

but it is not possible to produce an completely aberration free lens in the general 

case (e.g. for all wavelengths of light or angles). In practice, lenses are composed of 

different layers of special glass. High precision is needed to produce high quality lenses, 

thus, such lenses can be very expensive. There are different lens types available including 

fix-focal and zoom lenses. While fix-focal length lenses, as the term indicates, have a fix 

focal length, zoom lenses cover a range of different focal lengths. The actual focal length 

can be adjusted manually or motorized. For machine vision applications fix-focal length 

lenses are usually preferable [40]. If the conditions are highly constrained, the best suited 

lens can be selected a priori. 

This section should give a brief overview on the most important lens parameters and 

motivate the selection of the lens used in this application. 

Focal Length In the ideal thin lens camera model, the focal length is defined as the 

distance between the lens and the focal point, i.e. the point where parallel rays entering 

the lens intersect at the other side (see Figure 2.4). In practice, the focal length value 

specified by the manufacturer depends on the lens model used (which is usually unknown) 

and does not have to be accurate. In applications that require high accuracy, a camera 

calibration step is important to determine the intrinsic parameters of the camera including 

the effective focal length with respect to the underlying camera model. 

F-number The F-number describes the relation of the focal length to the relative aperture 

size such as [18]: 

F = f 

(3.3) 

d 

where d is the diameter of the aperture. Thus, the F-number is an indicator of the lightgathering 

power of the lens. Typical values are 1.0, 1.4, 2, 2.8, 4, 5.6, 8, 11, 16, 22, and 

32 with a constant ratio of √ 2 between consecutive values. A smaller F-number indicates 

more light can pass the lens and vice versa. Camera lenses are often specified by the 

minimum and maximum F-number, also denoted as iris range. 

Magnification In the weak perspective camera model (see Section 2.1.3), the ratio between 

focal length and the average scene depth Z0 can be seen as magnification, i.e. 

following Equations 2.3 the magnification m is expressed as [24]: 

m = f 

Z0 

(3.4)


(a) (b) 

Figure 3.5: (a) Standard perspective lens. Closer objects appear larger in the image than 

objects of equal size further away. (b) Telecentric lenses map objects of equal size to the same 

image size independent of the depth within a certain range of distances. Images are taken 

from Carl Zeiss AG (www.zeiss.de) 

where Z0 can be seen as the lens-object distance also denoted as working distance in 

the following. This gives a good estimate how large an object will appear on the image 

plane at a given distance Z0 to the camera with a lens of focal length f. 

Depth of Focus Following the thin lens camera model, only points at a defined distance 

to the camera will be focused on the image plane. Points at shorter or further distance 

appear blurred in the ideal model. In practice, however, points within some range of 

distances are in acceptable focus [24]. This range is denoted as depth of focus or depth 

of field. This is due to the finite size of each sensor element, since there is no difference 

visible in the image if a point is focused on the image plane or not as long as it will not 

spread over several pixels [18]. The depth of focus increases with a larger F-number [18]. 

Minimum Object Distance (MOD) All real lenses have a certain distance at which 

points that lie closer to the camera can not be focused anymore. This has both mechanical 

and physical reasons. The MOD value is important, since it determines the minimum 

distance of the camera to the objects in an application. 

AngleofView The angle of view is the maximum angle from which rays of light are 

imaged to the camera sensor by the lens. Short focal length lenses have usually a wider 

angle of view and therefore are also denoted as wide-angle lenses, while lenses with a larger 

focal length have a narrower angle of view. The angle of view determines the field of view 

of the camera at a given distance and a certain sensor size, this means what part of the 

world is imaged onto the sensor array of the camera. 

Commonly short focal lenses are used to capture images of a larger field of view for 

example in video surveillance applications that have to cover a larger area. With respect 

to machine vision applications, such lenses can also be used for close-up images at a 

short camera-object distance. The amount of radial distortion increases with a shorter 

focal length. The fish-eye lens is an extreme example for a very short focal length lens. 

Increasing the focal length increases the magnification. Thus, even smaller objects at


further distance can be imaged over the whole image size with such lenses. However, the 

minimum object distance is larger for long focal length lenses. 

For two-dimensional measuring tasks most accurate and precise results can be achieved 

with telecentric lenses (see Figure 3.5). These special lenses are designed to map objects of 

the same size in the world to the same image size, even if the object to lens distance differs. 

It is important to note that the maximum object size can not be larger than the diameter 

of the lens. This makes telecentric lenses useful only in connection with relatively small 

objects. In addition, such lenses reach a size of over 0.5m for objects of about 100mm and 

a mass of approximated 4kg [18]. Finally, telecentric lenses are very expensive. 

Although a telecentric lens would be advantageous in the imaging properties, a less 

expensive solution had to be found for the prototype development in this application. The 

optical system must be able to map objects between 20 and 100mm to an 1/2” CDD 

sensor at a relative short camera-object distance, and which is expected not to be affected 

too much by aberrations and radial distortion. 

However, this is an optimization problem that has no universal solution for all tube 

lengths. Different tube lengths need different magnification factors and field of views if the 

maximum possible resolution should be exploited to reach the highest accuracy. Changing 

the magnification factor means changing either the focal length of the optical system or 

the distance between object and camera, or both. If moving the camera toward the object, 

the minimum object distance of the lens has to be considered to be able to yield sharp 

images. Zoom lenses could be used to change the focal length without changing the whole 

optical system. However, zoom lenses should be avoided in machine vision applications 

[40], since they have to make larger compromises than fix-focal lenses and usually have 

a minimum working distance of one meter and more. Hence, if using a fix-focal lens, 

this implies changing the camera-object distance to adapt to different tube lengths, or to 

physically exchange the lens when a new length is cut by the machine which can not be 

covert by the current lens. 

Several commercial lenses designed for machine vision applications have been compared 

to find the lens that is best suited to inspect different tube sizes (see Table 3.1). Figure 3.6 

gives an overview on the parameters that influence a camera’s field of view. The angle 

of view θ is specified by the lens manufacturer, and is depending on the focal length and 

the camera sensor size. All values in the following are oriented at an 1/2” CCD sensor, 

since this is the sensor size of the Marlin F-033C and F-046B. The working distance d 

is here defined as the distance between lens and conveyor. O represents the object size, 

and L indicates the size of the measuring area with respect to a certain tube size. L 

canbeapproximatedastwicetheobjectsizeO. The goal is to find a combination of a 

lens with a working distance that yields a visual field so that the size V of the imaged 

region of the conveyor equals the measuring area L. Note, in this context size can be 

replaced by length in horizontal, i.e. in the moving direction of the conveyor, since this 

is the measuring direction in this constraint application. Thus, in the following only this 

direction is considered. 

The geometry in Figure 3.6 leads to the following relationship between θ, d and V : 

� � 

θrad 

V =2dtan 

(3.5) 

2


Figure 3.6: Parameters that influence the field of view (FoV) of a camera. θ indicates the 

angle of view of the optical system, d the distance between lens and conveyor, O the object 

size, V is the size of the region on the conveyor that is imaged, and L representing the size of 

the measuring area depending on the current tube size. The goal is to find a lens that yields 

afieldofviewsuchasV ≈ L at short distance. 

Model f θ dmin 

Pentax H1214-M 12mm 28.91 250mm 

Pentax C1614-M 16mm 22.72 250mm 

Pentax C2514-M 25mm 14.60 250mm 

Pentax C3516-M 35mm 10.76 400mm 

Pentax C5028-M 50mm 7.32 900mm 

Table 3.1: Different commercial machine vision lenses and there specifications including focal 

length f, horizontal angle of view θ (in degrees) with respect to an 1/2” sensor, and minimum 

object distance dmin respectively. 

where θrad represents the angle of view θ in radians. Using this equation one can 

compute the length of the conveyor that is imaged in horizontal direction at the minimum 

object distance of a lens. The results can be found in Table 3.2. 

This shows, none of the compared lenses is able to image small objects (< 30mm) in 

focus onto the camera sensor in a way that the object covers about half the full image 

width. Thus, the minimum tube size that can be inspected at full resolution under this 

assumption is 30mm. However, if one shrinks the image width manually (for example 

using the AOI function of the camera), the constraints can be reached even for tubes 

below 30mm. 

The real world representation s of one pixel in the image plane can be approximated as 

follows: 

s = V 

Wimg 

(3.6) 

where Wimg represents the image width in pixels. For example, for a 16mm focal 

length lens and a working distance of 250mm, one pixel represents about 0.12mm at this 

distance in the real world if the image resolution is 780 in horizontal direction. At the same 

distance, a 25mm focal length lens yields a pixel representation of about 0.08mm at the


f V 

12mm 129mm 

16mm 100mm 

25mm 64mm 

35mm 75mm 

50mm 115mm 

Table 3.2: Field of view of different fix-focal length lenses at the specified minimum object 

distance. 

V f=12mm 16mm 25mm 35mm 50mm 

40 •77 •99 •156 •212 •312 

60 •116 •149 •234 •318 •469 

100 •193 248 390 530 •782 

200 381 497 780 1006 1563 

Table 3.3: Working distances to yield a certain field of view for different focal length lenses. 

Distances that fall significantly below the minimum working distance are marked with a •. 

same resolution. Thus, smaller tubes can be measured theoretically at higher precision. 

The minimum object distance of the compared lenses, however, represents a certain limit 

in precision. Tubes below 30mm can not be measured with higher, but with the same 

precision as 30mm tubes. Reminding the tolerances introduced in Section 1.3, smaller 

tubes have a smaller tolerance than larger tubes, and 20 − 30mm tubes have the same 

tolerance. 

At the upper bound, larger tubes need a wider field of view of the camera. Hence, a 

larger region is mapped on the same image sensor, so one pixel represents more. For a 

200mm measuring area the pixel representation is about 0.25mm. The field of view can 

be achieved by placing the camera further away from the object. The distance increases 

with the focal length of the lens. Table 3.3 shows the approximated working distance for 

the compared lenses that are needed to result in a certain field of view. Distances that 

fall below the minimum object distance are marked with a ‘•’. 

It turns out that a 16mm focal length lens is best choice for tube lengths between 50 

and 100mm, since this lens maps the required measuring areas onto the image plane at the 

smallest working distance. However, tubes below 50mm can not be inspected with higher 

precision with this lens. In this case, a 25mm focal length lens has to be selected. This 

lens is the best compromise for small and large tube sizes. It has the drawback of a large 

working distance of up to 780mm for 100mm tubes. Both a 16mm (PENTAX C1614-M) 

and a 25mm (PENTAX C2514-M) focal lens have been used in the experiments. 

3.3. Illumination 

As introduced in Section 2.2, the right choice of illumination is substantial in machine 

vision applications. Accurate length measuring of heat shrink tubes requires a sharp contrast 

at the tube’s outline, especially at the boundaries that are considered as measuring 

points. Any shadows that would increase the tube’s dimension in the 2D image projection


(a) (b) 

(c) (d) 

Figure 3.7: Heat shrink tubes at different front lighting setups. (a) Illumination by two 

desktop halogen lamps. Specular reflections at the tube boundaries complicate an accurate 

detection. (b) Varying the angle and distance of the light sources as in (a) can reduce reflections. 

(c) Professional front lighting setup with two line lights at both tube ends. (d) 

Resulting image of the setup in (c). Both in (b) and (d) shadows can not be eliminated 

completely. (Images (c) and (d) by Polytec GmbH, Waldbronn, Germany)


(a) (b) (c) 

Figure 3.8: Back lighting through different types of conveyor belts. The structure of the 

belt determines the amount of light entering the camera, thus, influencing the image quality 

significantly. 

must be avoided. In addition, the illumination setup should cover both black and transparent 

tubes, whereas the transparent tubes are translucent while the black ones are not. 

The surface of both materials appears mat under diffuse illumination, but shows specular 

reflections if illuminated directly with point light sources. 

In a first experiment with standard desktop halogen lamps a front lighting setup was 

tested. Two light sources have been placed at low angle to illuminate the tube boundaries 

from two sides at the measuring area inside the guide bars. The results are shown in 

Figure 3.7(a) and 3.7(b). This setup yielded good results with black heat shrink tubes, 

but it turned out to produce unacceptable reflections just at the measuring points with 

the transparent ones. Such reflections could be reduced by changing the angle of light 

incidence, but still left strongly non-uniform results. Although the halogen lamps are 

operated at DC power, the AC/DC conversion of off-the-shelf desktop lamps if often not 

stabilized, thus, leading to temporal and spatial variances in image intensities and color. 

This effect has been observed throughout the experiments with the desktop lamps at video 

frame rates of 50fps. 

Using a professional, flicker free, front lighting system with two fiber optical line lights 

illuminating the tube ends (see Figure 3.7(c)), the image quality could be increased as can 

be seen in Figure 3.7(d). However, there are still a few shadows left. 

Experiments with a back light setup have been accomplished, too. A calibrated fiber 

optical area light is placed at a certain distance (about 1-2cm) below the conveyor belt. 

The light has to shine through the belt, thus, it is important to use a material that is 

translucent. A typical belt core consist of a canvas (e.g. cotton) and a rubber coating, 

whereas thickness, structure and density of the canvas as well as the color of the rubber 

determine how much light can enter the camera. In the optimal case, no light at all would 

be absorbed by the belt what is technically hardly possible. 

Five different belt types have been tested. Some of the results can be seen in Figure 3.8. 

Each sample in this experiment consists of a transparent rubber coating and a white canvas 

as base. The structure of the belt canvas is visible in each image as background pattern. 

Obviously, the background should not influence the detection of the tube’s boundary. 

Thus, the goal is to find a belt type that allows for back lighting without adding to much 

unwanted information to the image that could complicate the measurements. 

In Figure 3.8(a), the coarse texture of the background significantly affects the tube ends 

of the transparent tube at the bottom. A sharp boundary is missing, making accurate 

and reliable measurements impossible. The belt type in Figure 3.8(b) has a finer texture, 

but transmits only a little amount of light. Figure 3.8(c) shows the belt type that yielded


(a) (b) (c) 

(d) (e) 

Figure 3.9: Polarized back lighting. (a) Image of diffuse back light through polarized glasses 

used for viewing 3D stereo projections with no filter in front of the camera. (b) Setup as in (a) 

with an opposing polarization filter in front of the camera. Almost no light enters the camera 

at the polarized area. (c) Transparent heat shrink tube at polarized back light. There is a 

strong contrast at the polarized area, while it is impossible to locate the tube’s boundaries at 

the unpolarized area (bottom right). (d) Polarized back light through a conveyor belt. The 

polarization is changed both by the belt and the tube, thus, leading to a poor contrast. (e) 

For comparison: Back light setup without polarization. 

best results both in background texture and transmittance. As can be seen, there are no 

shadows at the tube boundaries. 

Since the black tubes do not let pass any light rays, the contrast between background and 

tube is excellent with all kinds of belt types tested. One advantage of black tubes follows 

from this property: The printing on the tube’s surface is not visible in the image. On 

the other hand, the transparent tubes do transmit the light coming from below. Positions 

covered by the printing show a minor transmittance, hence, the printing is visible in terms 

of darker intensity values in the image. 

As introduced in Section 2.2.3, polarized back lighting can be used to emphasize transparent, 

translucent objects. In an experiment, shown in Figure 3.9, the integration of 

polarization filters has been tested. Two polarized glasses originally used for viewing 3D 

stereo projections have been employed to polarize the light coming from the area back 

light. First, the principle is tested without a conveyor belt. Two opposite polarization filters 

are placed between light source and camera. As can be seen in Figure 3.9(b), the area 

covered by the two polarization filters at right angle appears black in the image while the 

areas without polarization filters are ideally white. A transparent tube between the two 

filters changes the polarization, and hence, making it possible that light enters the camera 

at locations that have been black before. There is an almost binary contrast between 

object and background (see Figure 3.9(c)). At regions that are not affected by the filters, 

there is no contrast at all making the tube invisible. Unfortunately these good results 

have no practical relevance, since in the real application the light has to pass the conveyor 

belt, too. If the belt is placed between the first polarization filter and the object, it also 

changes the polarization at regions that do belong to the background (see Figure 3.9(d)). 

The binary segmentation is lost and the structure of the conveyor belt is visible again. 

While it is not possible to install the first polarization filter between conveyor and tube, 

the polarized back light approach has no advantages compared to the unpolarized in this


(a) (b) 

Figure 3.10: (a) Installation of the back light panel. The measuring area is illuminated 

from below through a translucent conveyor belt. A diffuser is used to yield a more uniform 

light and to protect the fiber optical light panel. (b) SCHOTT PANELight Backlight A23000 

used for illumination (Source: SCHOTT). 

application. On the contrary it has the effect of less light entering the camera which yields 

darker images and increases the amount of sensor noise. 

As result of the experiments with different lighting techniques, the back lighting setup 

has been chosen for the prototype. It offers excellent properties for black tubes and yielded 

also very good results for the transparent tubes in connection with a fine structured, 

translucent conveyor belt. The incident lighting did not perform better in the experiments. 

A light source (SCHOTT DCR III) with a DDL halogen lamp (150W, 20V) has been 

selected in combination with the fiber optic area light (SCHOTT PANELight Backlight 

A23000) (see Figure 3.10(b)). The panel size is 102 × 152mm. It is installed 20mm below 

a cut-out in the conveyor below the belt as can be seen in Figure 3.10(a). A diffuser 

between light panel and conveyor belt provides a uniform illumination and protects the 

light area against dirt. More details regarding the illumination hardware can be found in 

Appendix B.2. 

The usage of a fiber optic area light below the conveyor belt has the advantage of a very 

low heat development since the light source can be placed outside at a certain distance. 

With respect to the characteristics of heat shrink tubes, the avoidance of heat is essential 

at this step to prevent deformations. The light is transmitted through a flexible tube of 

fibers. If the lamp is out of order it can be exchanged easily without changing anything at 

the conveyor. The lifetime of one halogen lamp is about 500 hours at maximum brightness. 

To eliminate the influence of illumination from other light sources than the back light, 

the whole measuring area including the camera is darkened. This guarantees constant 

illumination conditions. For the prototype, a wooden rack has been constructed that is 

placed around the measuring area on the conveyor. A thick black, non translucent fabric 

can be spanned around the rack leaving only two openings where the tubes enter and leave


Figure 3.11: Air pressure is used to sort out tubes that do not meet the tolerances. The 

blow out unit consisting of an air blow nozzle, light barrier and a controller (not visible in the 

image) is placed at a certain distance behind the measuring area. 

the function room darkening. For industrial use this interim solution has to be replaced 

by a more robust and compact (metal) case that excludes environmental illumination and 

protects the whole measuring system against other outside influences in addition. A slight 

overpressure inside the closed case or an air filtering system could be integrated to avoid 

dust particles from entering the case through the required openings. Any accumulation of 

dust or other dirt on the lens is critical and must be prevented. 

3.4. Blow Out Mechanism 

After a tube has passed the measuring area the measured length is evaluated with respect 

to the given target length and tolerance. The result is a binary good/bad decision for 

each particular tube. Good tubes are allowed to pass the blow out unit, which is placed 

behind the measuring area at a certain distance. On the other hand, tubes that do not 

meetthetoleranceshavetobesortedout. Thisisdonebyairpressure. Aairblownozzle 

is arranged to blow out tubes from the conveyor. Therefore, the guide bars have to end 

behind the measuring area. The whole blow out setup can be seen in Figure 3.11. 

The visual inspection system sends the good/bad decision over a RS-232 connection 

(serial interface) to a controller unit in terms of a certain character followed by a carriage 

return (‘\r’). The used protocol can be seen in Table 3.4. Once the controller receives an 

A or B this message is stored in a first-in-first-out (FIFO) buffer. 

Message Code 

TUBE GOOD ‘A\r’ 

TUBE BAD ‘B\r’ 

RESET ‘Z\r’ 

Table 3.4: Protocol used for communication between the inspection system and the blow 

out controller.

3.4. BLOW OUT MECHANISM 49 

A light barrier is used to send a signal to the controller when a tube is placed in front 

of the air blow nozzle. If the first entry in the FIFO buffer contains a B, thetubehasto 

be blown out and the air blow nozzle is activated. On the other hand, if the first entry 

contains an A, the tube can pass. In both cases the first entry in the buffer is deleted. 

The advantage of this approach is that the current conveyor velocity does not have to 

be known to compute the time a tube needs to move from a point x on the measuring 

area to the position of the air blow nozzle. The light barrier guarantees the blow out is 

activated when the tube is exactly at the intended position.

50 CHAPTER 3. HARDWARE CONFIGURATION

4. Length Measurement Approach 

While the previous chapter focused on the hardware setup, this chapter will present the 

methodical part of the system. After a brief overview, the different steps including the 

camera calibration and teach-in step as well as the tube localization, measuring point 

detection, tube tracking and the good/bad classification are introduced. All assumptions 

and the model knowledge used throughout these steps are presented before. 

4.1. System Overview 

The fundamental concept of the developed system is a so called multi-image measuring 

strategy. This means, the goal is to measure each tube not only once, but in as many 

images as possible while it is in the visual field of the camera. The advantage of this 

approach is that the decision whether a particular tube meets the length tolerances can 

be made based on a set of measurements. The total length is computed by averaging over 

these single measurements leading to more robust results. Furthermore, the system is less 

sensitive to detection errors. Depending on the conveyor velocity and the tube length one 

can reach between 2 and 10 measurements per tube. 

The system is designed to work without any external trigger that provokes the camera 

to grab a frame depending on a certain event, e.g. a tube passing a light barrier. Instead, 

the camera is operated in continuous mode, i.e. images are captured at a constant frame 

rate using an internal trigger. The absence of an external trigger, however, requires fast 

algorithms to evaluate whether a frame is useful, i.e. whether a measurement is possible. 

In addition, the system must be able to track a tube while it is in the visual field of the 

camera to assign measurements to this particular tube. Accurate length measurements of 

tubes require the very accurate detection of the tube edges. A template based tube edge 

localization method has been developed allowing for reliable, subpixel accurate detection 

resultsevenunderthepresenceoftubeedgelikebackgroundclutter. Oncethereisevidence 

that a tube has left the visual field of the camera, all corresponding measurements have 

to be evaluated with respect to the given target length and tolerances. The resulting 

good/bad decision must be delegated to the external controller handling the air pressure 

based blow out mechanism. Model knowledge regarding the inspected tubes under the 

constrained conditions is exploited if possible to optimize the processing. 

Before any measurements can be performed, the system has to be calibrated and trained 

to the particular target length. This includes camera positioning, radial distortion compensation 

and an online teach-in step. 

Figure 4.1 gives an overview on the different stages of the system. It can also be seen 

as outline of this section. The underlying methods and concepts will be introduced in the 

following in more detail. 

Throughout this chapter all parameters will be handled abstract. Corresponding value 

assignments used in the experiments are given in Section 5.1.1. 

51

52 CHAPTER 4. LENGTH MEASUREMENT APPROACH 

No 

No 

Camera calibration 

Teach-In 

Next image 

Tube localization 

Measurement 

possible? 

Yes 

Measuring point detection 

Length measuring 

Tube passed? 

Yes 

Total length computation 

Good/bad classification 

Blow out control 

Figure 4.1: System overview. After camera calibration and a teach-in step the system 

evaluates the acquired images continuously. If a tube is located and assigned as measurable, 

the exact measuring points on the tube edges are detected and the tube length is calculated. 

Once a tube has passed the visual field of the camera, the computed total length is compared 

to the allowed tolerances for a good/bad classification. Finally, the blow out controller is 

notified whether the current tube is allowed to pass.

4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 53 

(a) ‘empty’ (b) ‘entering’ (c) ‘leaving’ 

(d) ‘centered’ (e) ‘entering + centered’ (f) ‘ent. + centered + leav.’ 

(g) ‘centered + leaving’ (h) ‘entering + leaving’ (i) ‘full’ 

Figure 4.2: Potential image states. Each image can be categorized into one of these nine 

states. States that contain one tube completely with a clear spacing to neighboring tubes can 

be used for length measuring, i.e. state (d), (e), (f) and (g) respectively. The remaining states 

do not allow for a measurement and, thus, can be skipped. State (i) might be due to a too 

small field of view of the camera (i.e. tubes are too large), or to a failure in separation (i.e. 

the spacing between two or more tubes is missing). If this state is detected, a warning must 

be thrown. 

4.2. Model Knowledge and Assumptions 

The visual length measurement of heat shrink tubes to be proposed throughout this chapter 

is based on several assumptions and model knowledge regarding the inspected objects, 

which is introduced in the following. 

4.2.1. Camera Orientation 

As introduced in Section 3.2.2, the camera is placed above the conveyor. It must be 

adjusted to fulfill the following criteria: 

The optical ray is perpendicular to the conveyor 

The image plane is parallel to the conveyor 

This camera view is commonly denoted as fronto-parallel view [30]. If the image plane 

is parallel to the conveyor, the average scene depth is quite small. Therefore it is possible 

to approximate the perspective projection with a weak-perspective camera model. In this 

model (see Section 2.1.3) objects are projected onto the image plane up to a constant 

magnification factor. This means distances between two points lying in the same plane 

are preserved in the image plane until a constant scale factor. This property is important 

to allow for affine distance measurements in a fronto-parallel image view. 

4.2.2. Image Content 

The following assumptions regard the image content and capture properties of the camera:


(a) Ideal tube model (b) Perspective tube model 

Figure 4.3: (a) In the ideal model, the (parallel) projection of a 3D tube corresponds to a 

rectangle in the image. The distance d between the left and right edge is equal at each height. 

Under a perspective camera, objects closer to the camera appear larger in the image. Hence, 

the distance d1, belonging to the points on the tube edges that are closest to the camera, 

is larger than d2, and d2 is larger than d3 (the distance of the edge points that are farthest 

away). Note, the dashed lines are not visible in the image under back light, and the tube 

edges appear convex. 

Only one tube is visible completely (with left and right end) in each image at one 

time 

There is a clear spacing between two consecutive tubes 

The guide bars cover the upper and lower border of each image 

The guide bars are parallel and in horizontal direction 

The moving direction is from the left to the right 

The mean intensity of the background (conveyor belt) is brighter than the foreground 

(heat shrink tubes) 

There is a sufficient contrast between background and objects 

The video capture rate is fast enough to take at least one valuable image of each 

tube segment so that a length measurement can be performed. (Potentially the 

production speed has to be reduced to qualify this constraint) 

The image is not distorted, i.e. straight lines in the world are imaged as straight 

lines and parallel lines are also parallel in the image 

In this application, the variety of image situations to be observed is highly limited and 

constraint by the physical setup (see Chapter 3). Thus, it is possible to reduce the number 

of potential situations to nine defined states. Each image can be categorized into exactly 

one of these states as shown in Figure 4.2 by means of synthetic representatives. Only 

four of the nine states are measurable, i.e. state (d), (e), (f) and (g) respectively. In these 

states a tube is completely in the image. 

4.2.3. Tubes Under Perspective 

Under ideal conditions, i.e. with a parallel projection, a tube on the conveyor is represented 

by a rectangle in the image plane with the camera setup used (see Figure 4.3(a)). Due


Figure 4.4: The plane parallel to the conveyor plane ΠC that goes through the measuring 

points PL and PR is denoted as measuring plane ΠM . TheredlineinΠM between PL and 

PR corresponds to the measured distance d1 in Figure 4.3(b), i.e. the distance between the 

mostouterpointsoftheprojectedtubeedgeinanimage. 

to the guide bars this rectangle is oriented parallel to the x-axis in horizontal direction 

and parallel to the y-axis in vertical direction respectively. The length can be measured 

between the left and right edge of the tube in horizontal direction. The horizontal distance 

d is equal between the left and right tube boundary independent of the height. This is an 

ideal property for length measurements. 

However, if the camera is not provided with a telecentric lens or the camera is not placed 

at infinity, the tube’s projection is influenced by perspective. In general, objects that are 

closer to the camera are imaged larger than objects further away. Thus, the left and right 

tube edge do not appear straight in the image, but curved in a convex fashion due to the 

different distances between a point on the tube’s surface and the camera. Figure 4.3(b) 

visualizes a synthetic tube under perspective. The distance d1 between the two edge 

points closest to the camera is larger than the distances between points farther away. 

Accordingly, d2 is larger than d3, although in the real world d1 =d2 =d3 (assuming the 

tube is not cut skew). The perspective curvature increases with the distance to the image 

center. Thus, the maximum curvature is reached at the image boundaries, while an edge 

that lies directly below the optical center of the camera (approximately the image center) 

appears straight. 

With the constraints regarding the image content it is not possible to look inside a tube 

from the camera view if a tube is completely in the image. Therefore, one can assume 

that the most outer edge point of the tube edge corresponds always to the point that is 

closest to the camera, i.e. measuring between these two points corresponds always to the 

same distance in the world. 

In the following PL and PR will denote the points on the left and right side respectively 

that are closest to the camera. The tube length in the real world is defined as the length 

of the line connecting these two points (corresponding to d1 in Figure 4.3(b)). Assuming 

a tube has the same height at the left and right side, PL and PR lie in the same plane 

denoted as measuring plane ΠM. This plane is assumed to lie parallel to the image plane 

as can be seen in Figure 4.4. The measuring points have two correspondences in the image 

denoted as pL =(xpL ,ypL )T and pR =(xpR ,ypR )T respectively. The distance between pL 

and pR in the image can be related to the real world length up to a certain scale factor.


However, this scale factor may differ depending on the image position. It is expected 

that the distance between pL and pR will be slightly shorter at the image boundaries and 

maximal at the image center due to perspective. 

4.2.4. Edge Model 

Thetubeedgesaremodeledasramp edges as introduced in Section 2.3.1, since this model 

describes the real data most adequate both for transparent and black tubes. The slope of 

the ramp determines the sharpness of an edge. As steeper the rise (or fall respectively) as 

sharper the edge. Obviously, the edge position can be located much more precise if the 

ramp has only a minimum spatial extension. 

As mentioned before in the technical background section there are several factors that 

can cause ramp edges including the discrete pixel grid, the camera focus, and motion blur. 

The first factor can be reduced if using a high-resolution camera (reminding the trade off 

between resolution and speed as discussed in Section 3.2.1). The camera focus depends 

mainly on the depth of an object. In this application, the depth of an object does not 

change over time, since all tubes in a row have the same diameter and are lying on the 

planar conveyor belt which is parallel to the image plane. In the following it is assumed 

that the camera and the optical system are adjusted in way that a tube is imaged as sharp 

as possible. Motion is another common parameter influencing the appearance of an edge. 

Since the tubes are inspected at motion (up to 40m/min), a short shutter time (exposure 

time) of the camera is required. If the shutter time is too large, light rays from one point on 

the tube contribute to the integrated intensity values of several sensor elements along the 

moving direction. Especially the left and right tube boundary considered for measuring 

are affected by motion blur as they lie in the moving direction. 

Therefore, it is assumed that the shutter of the camera is adjusted to a very small 

exposure time to suppress motion blur as much as possible. A short shutter time requires 

a large amount of light to enter the camera at one time. The iris optical system has to be 

wide open (corresponding to a little F-number) and the illumination must be sufficiently 

bright. 

4.2.5. Translucency 

Translucency is the main property to distinguish between transparent and black tubes. 

Black tubes do not transmit light leading to one uniform black region with strong edges 

in the image under back light. In this case, the local edge contrast at a certain position 

depends on the background only. On the other hand, transparent tubes transmit light. 

However, some part of the light is also absorbed or reflected in directions that do not 

reach the camera. Therefore, a tube will appear darker in the image compared to the 

background. It will even be darker at positions where the light has to go through more 

material. This leads to two characteristic dark horizontal stripes at the top and bottom of 

a transparent tube as can be seen in Figure 4.5. This model knowledge has been exploited 

to define a robust feature for edge localization which can still be detected in situations 

where the contrast at the center of the edge is poor. 

The printing on the tubes also reduces the translucency and is therefore visible on 

transparent tubes in the image. On average it covers about 8% of a tube’s surface along 

the perimeter for 6, 8, and 12mm diameter tubes.


Figure 4.5: The image intensity of transparent tubes is not uniform as for black tubes. 

Depending on how much light can pass through a tube, regions appear darker or brighter. 

One characteristic of transparent tubes under back light are two dark horizontal stripes at 

the top and the bottom of a tube indicated by the arrows. The printing also reduces the 

translucency and thus appears darker in the image. 

4.2.6. Tube Orientation 

The tube orientation is highly constrained by the guide bars as introduced in Section 3.1. 

Thus, an approximately horizontal orientation can be assumed throughout the design of 

the inspection algorithms. 

In practice, the distance between the guide bars is slightly larger than the outer diameter 

of a tube to prevent a blockage, since tubes may not be ideally round. This means, the 

cross-section of a tube can be elliptical instead of circular. Let dspace denote the vertical 

distance between the guide bar distance dGB, and hmax the maximum expected tube 

extension in vertical direction with respect to the image projection. The remaining spacing 

distance can be expressed as dspace = dGB − hmax ascanbeseeninFigure4.6(a). 

The maximum possible rotation is reached if the tube hits both guide bars at two points 

(see Figure 4.6(b)). The maximum angle of rotation θmax can be defined as the angle 

between the longitudinal axis of the tube and the x-axis. One can define an unrotated 

version of the tube with the longitudinal axis parallel to the x-axis and shifted so that the 

two axis intersect at the center of gravity of the rotated tube. In Figure 4.6(b) this virtual 

tube is visualized as dashed rectangle. The distance between the measuring points of the 

rotated and the ideal horizontal tube can be also seen in the Figure and are denoted as 

dL and dR for the left and right tube side respectively. Both dL and dR are ≤ dspace/2. If 

thetubeisnotbent,dL = dR. The maximum error between the ideal distance l and the 

rotated distance l ′ can be estimated as follows: 

errθ = l ′ − l 

� 

(4.1) 

= l2 + d2 space − l 

For example, in a typical setup for 50mm tubes of 8mm diameter one tube has a length of 

approximately 415 pixels and dspace = 15. This leads to an error of errθ =0.27pixel. Thus, 

with one pixel representing 0.12mm in the measuring plane, the acceptable maximum error 

due to orientation would be about 0.03mm. On average this error will be even smaller. 

Based on these estimation, the orientation error is neglected in the following, i.e. all tubes 

are assumed to be oriented ideally horizontal.


(a) 

(b) 

Figure 4.6: (a) The guide bar distance dGB and the maximum extension of a tube in vertical 

direction hmax define the maximum space between a tube and the guide bars dspace at ideal 

horizontal orientation. (b) The maximum possible tube orientation is limited by the guide 

bars. The angle θ between the longitudinal axis of the tube and the ideal measuring distance 

parallel to the x-axis determines the maximum distance the measuring point can be displaced 

by rotation (dL = dR ifthetubeisnotbent).Thisdistanceis≤ dspace/2 and can be used to 

estimate the error due to rotation between the ideal tube length l andtherotateddistancel ′ .

4.3. CAMERA CALIBRATION 59 

4.2.7. Background Pattern 

As introduced in Section 3.3, the measuring area is illuminated by a back light setup below 

the conveyor belt. This setup emphasizes the structure of the belt which can be seen as a 

characteristic pattern in the image. This pattern may differ between different belt types. 

Depending on the light intensity it is possible to eliminate the background completely. If 

the light source is bright enough, the background appears uniform white even with a short 

shutter. For black tubes such an overexposed image would lead to an almost binary image. 

Transparent tubes, however, do also disappear under too bright illumination. Hence, there 

will be always a certain amount of background structure visible in the image in practice. 

The strength of the background pattern increases with lower light intensity. 

In the following, it is generally assumed that the illumination is adjusted to allow for 

distinguishing between a tube edge and edges in the background. Larger amounts of dirt 

or other particles than heat shrink tubes on the conveyor must be prevented. 

4.3. Camera Calibration 

In the previous section several assumptions regarding the camera position and the image 

content have been presented. With respect to accurate measurements it is important that 

an object is imaged as reliably as possible, this means, straight lines should appear straight 

and not curved in the image, parallelism should be preserved, and objects of the same size 

should be mapped to the same size in the image. Unfortunately, the later properties do 

not hold in the perspective camera model as introduced before. However, under certain 

constraintsitispossibletominimizetheperspectiveeffects. 

If the internal camera parameters are known including the radial and tangential distortion 

coefficients, it is possible to compute an undistorted version of an image. After 

undistorting, straight lines in the world will appear as straight lines in the image. Furthermore, 

if one can arrange the camera in way that objects of equal size are projected 

onto the same size in the image within the camera’s field of view at a constant depth, one 

can assume that the image plane is approximately parallel to the conveyor. 

In the following the calibration method used to receive the intrinsic camera parameters 

as well as a method to arrange the camera in a way that perspective effects are minimized 

is presented. 

4.3.1. Compensating Radial Distortion 

To compensate for the radial distortion of an optical system, one needs to compute the 

intrinsic camera parameters. Since the intrinsic parameters can be assumed to be constant 

if the focal length is not changed, the calibration procedure does not have to be repeated 

every time the system is started and therefore can be precomputed offline. 

The common Camera Calibration Toolbox for Matlab of Jean-Yves Bouguet [9] is used 

for this purpose. It is closely related to the calibration method proposed in [74] and [31]. 

The calibration pattern required in this method is a planar chessboard of known grid size. 

The calibration procedure has to be performed for each lens separately. The camera is 

placed at a working distance of approximately 250mm over the measuring area with a 

16mm fix-focal lens. It is adjusted to bring tubes with a diameter of 8mm at this distance 

intofocus(inthemeasuringplaneΠM).


Figure 4.7: 16 sample images used for calibrating the intrinsic camera parameters. 

16 images of a 21 × 10 chessboard of 2.5mm grid size at different spatial orientations 

around the measuring plane ΠM have been acquired. A selection of this images can be 

found in Figure 4.7. 

In each image the outer grid corners have to be selected by hand. The remaining corners 

are then extracted automatically at subpixel accuracy as can be seen in Figure 4.8. The 

coordinate axis of the world reference frame are also visualized. The Z axis is perpendicular 

to the chessboard plane in direction to the camera. 

The result of this calibration procedure are the intrinsic camera parameters including 

the radial distortion coefficients. The Camera Calibration Toolbox for Matlab allows also 

for visualization of the extrinsic location of each of the 16 calibration pattern with respect 

to the camera as shown in Figure 4.9. The actual working distance of approximately 

250mm is reconstructed very well. The resulting radial distortion model can be found in 

Figure 4.10. In Section 3.2 the area of interest function of the camera has been introduced 

since the whole image height is not needed. Obviously, the goal is to select the location of 

this area with respect to minimum distortions. The position of the AOI within a full size 

image is visualized by the red lines, i.e. only pixels between these lines are considered. 

4.3.2. Fronto-Orthogonal View Generation 

Once distortion effects have been compensated, the goal is to yield a view of the measuring 

area in which the world plane, e.g. the conveyor belt, is parallel to the image plane. There 

are two main strategies that can be applied. 

In the first strategy the camera is positioned only roughly. Afterward the perspective 

image is warped to yield an optimal synthetic fronto-orthogonal view of the scene. In the 

second strategy the camera is adjusted as precise as possible so that the resulting image 

is approximately fronto-orthogonal and does not need any correction.


Figure 4.8: Extracted grid corners at subpixel accuracy. The upper right corner is defined as 

origin O of the world reference frame. The directions of the X and Y axis are also visualized 

while the Z axis is perpendicular to the chessboard plane in direction to the camera. 

20 11 

0 

20 

40 20020 

O 

c Z 

X c 

c 

Y 

c 

0 

Extrinsic parameters (camera centered) 

50 

100 

150 

15 

14 6 

312 

516 

217 4 

1079 

81 

13 

Figure 4.9: Reconstructed extrinsic location of each calibration pattern relative to the camera. 

The working distance of approximately 250mm is detected very well. 

200


Figure 4.10: Visualization of the resulting radial distortion model. The computed center of 

distortion indicated by the ‘◦’ is slightly displaced from the optical center (‘×’). The image 

area of interest considered in this application lies in between the red lines. 

Perspective Warping One possibility to compute a synthetic fronto-orthogonal view of 

an image is based on the extrinsic relationship of the camera plane and a particular 

world plane (e.g. conveyor plane) that can be extracted in a calibration step. With the 

extrinsic parameters it is possible to describe the position and orientation of the world 

plane in the camera reference frame. Finally, one can compute a transformation that 

maps the world plane into a plane parallel to the image plane or vice versa, and warp the 

image to a synthetic fronto-orthogonal view. This approach has a significant drawback. 

First of all, the accuracy of the results is closely related to the calibration accuracy. 

Furthermore, the extrinsic parameters of a camera change if the camera is moved even 

slightly compared to the intrinsic parameters that can be assumed constant as long as 

the focus is not changed. Thus, one has to recalibrate the extrinsic parameters as well as 

the transformation parameters every time the camera is moved, which seemed to be not 

practicable in this particular application. 

There are other methods that can be used to compute a fronto-orthogonal view of an 

perspective image, which are based on characteristic image features such as parallel or 

orthogonal lines, angles, or point correspondences and do not need any knowledge on the 

interior or exterior camera parameters [30]. One common approach is based on point 

correspondences of at least 4 points xi and x ′ i with x′ i = Hxi (1 ≤ i ≤ 4) and 

⎡ 

H = ⎣ 

h1 h2 h3 

h4 h5 h6 

h7 h8 h9 

⎤ 

⎦ (4.2) 

the projective transformation matrix representing the 2D homography. 

The unknown parameters of H can be computed in terms of the vector cross product 

x ′ i × Hxi = 0 using a Direct Linear Transformation (DLT) [30]. To correct the perspective 

of an image one has to find four points in the image that lie on the corners of a rectangle 

in the real world, but are perspectively distorted in the image. These points xi have to be 

mapped to points x ′ i that represent the corners of a rectangle in the image. Then, after H


is computed, each point in the image is transformed by H. Obviously, this is an expensive 

operation for larger images. Furthermore, in practice the question is where to place the 

calibration points. One possibility is to place them on top of the guide bars. The system 

could automatically detect the calibration points and check whether these points lie on a 

rectangle in the affine image space. This requires a very accurate positioning of the guide 

bars, and all marker points should be coplanar, i.e. lie in one plane. Assuming one can 

solve this mechanical problem there is still another problem, since - depending on how the 

destination rectangle is defined - the warped image may be scaled. In any case, warping 

discrete image points requires interpolation since transformed points may fall in between 

the discrete grid. Obviously, this can reduce the image quality. 

Online Grid Calibration Although the previous described approach does not require an 

accurate positioning of the camera, there are several drawbacks especially with respect to 

performance and image reliability. If there is a way to adjust the camera perfectly one 

does not need warping and perspective correction. However, a human operator must be 

able to perform this positioning task in an appropriate time. 

Therefore, an interactive camera positioning method has been developed denoted as 

Online Grid Calibration. 

First, the distance of the parallel guide bars has to be adjusted to the current tube size. 

Then, a planar chessboard pattern of known size is placed between the guide bars on the 

conveyor within the visual field of the camera. The horizontal lines on the chessboard must 

be parallel to the guide bars (see Figure 4.11). To simplify the adjustments, a mechanical 

device may be developed that can be placed in between the guide bars combining the 

function of a spacer bringing the guide bars into the right distance, and the calibration grid 

that perfectly fits into the space between the guide bars with the designated orientation. 

The underlying idea is as follows: If the chessboard is imaged in a way that vertical lines 

in the world are vertical in the image and horizontal lines appear horizontal respectively, 

while each grid cell of the chessboard results in the same size in the image, the camera is 

adjusted accurate enough to yield a fronto-orthogonal view. 

The process of camera adjustment can be simplified if the operator gets a feedback in 

real-time of how close the current viewing position is to the optimal position. Therefore, 

the live images of the camera are overlaid with an optimal visual grid of squares. This grid 

can be parametrized by two points, i.e. the upper left corner and the lower right corner 

respectively as well as the vertical and horizontal size of each grid cell. The operator can 

move the grid in horizontal and vertical direction and adjust the size. This is a good 

feature to initialize the grid or to perform the fine adjustments. 

For each image, the correspondence between the overlaid virtual grid and the underlying 

image data is computed. A two step method has been developed. At first the image 

gradient both in vertical and horizontal direction is extracted using the SOBELX and 

SOBELY operator. This information can be used to approximate the gradient magnitude 

and orientation (see Equation 2.21 and 2.24). Since there is a strong contrast between 

the black and white chessboard cells, the gradient magnitude at the edges is strong as 

well. If the virtual grid matches the current image data, the gradient orientation φ(p) 

on horizontal grid lines must be ideally π/2 or3π/2 respectively depending on whether 

an edge is a black-white or white-black transition. Remind that the gradient direction is 

always perpendicular to the edge. Correspondingly, vertical grid lines have orientations of


(a) 

(b) 

Figure 4.11: Online Grid Calibration using a 5 × 5mm chessboard pattern. (a) Calibration 

image distorted by perspective. The goal in this calibration step is to adjust the camera in a 

way that the chessboard pattern perfectly fits the overlaid grid as in (b). 

0orπ. Inpractice,thegradientorientationisallowedtobeinanarrowrangearoundthe 

ideal orientation, since the computation of φ(p) is only an approximation that estimates 

the real orientation up to an epsilon (see Figure 4.12(b)). Thus, theoretically each position 

on the virtual grid must meet the orientation constraints. In addition, the gradient 

magnitude must reach a certain threshold to prevent that edges induced by noise influence 

the calibration procedure. 

To reduce the computational load only a selection of points on the grid denoted as 

control points is considered. The position of these points can be seen in Figure 4.12(a). 

The ratio of grid matches to the total number of control points can be seen as score 

of correspondence. If the score reaches a threshold, e.g. more than 95% of all checked 

positions on the virtual grid match the real image data, the second step of the calibration 

is started. 

The second step concentrates on the size of each grid cell. Assuming negligible perspective 

effects if the camera is perfectly positioned, all grid cells should have the same 

size in the image. To compute the size of each grid cell as accurate as possible, the real 

edge location of the grid is detected with subpixel precision within a local neighborhood 

of each control point on the virtual grid. Therefore, the gradient magnitude of a 7 × 1 

neighborhood perpendicular to the grid orientation at a given control point is interpolated 

using cubic splines. Then, the width and height of a grid cell can be determined over 

the affine distance between two opposed subpixel grid positions. Finally, the mean grid 

size and standard deviation can be computed both for width and height. The standard 

deviation is used as measure of how close the current camera viewing position equals a 

fronto-orthogonal view. Ideally, if all squares have equal size, the standard deviation is 

zero. In practice the standard deviation is always larger than zero for example due to 

noise, edge localization errors, or a remaining small error of perspective. Experiments

4.4. TUBE LOCALIZATION 65 

(a) (b) 

Figure 4.12: (a) Control points (marked as crosses) are used to adjust the virtual calibration 

grid of width w and height h to the underlying image data. (b) Gradient orientation φ at 

each control point. Since the computed values are only an approximation, a narrow range of 

orientations indicated by the gray cones around the ideal orientation is also seen as match. 

have shown that it is possible to adjust the camera within an acceptable time to yield a 

100% coverage in step one and a grid standard deviation of less than 0.3pixels. In this 

case the camera is assumed to be adjusted good enough for accurate measurements. 

4.4. Tube Localization 

Since the system is intended to work without any external trigger (e.g. a light barrier) 

that gives a signal whenever a tube is totally in the visual field of the camera, the first 

step before further processing of a frame is to check whether there is a tube in the image 

that can be measured or not. If there is no tube in the image or only in parts, this image 

can be neglected. This decision has to be very fast and reliable. 

4.4.1. Gray Level Profile 

To classify an image into one of the states proposed in Section 4.2.2, an analysis of the intensity 

profile along the x-axis is performed. Strong changes in intensity indicate potential 

boundaries between tubes and background. 

In ideal images as be seen in Figure 4.2, the localization of object boundaries is almost 

trivial with standard edge detectors (see Section 2.3). In real image sequences, however, 

there are many changes in intensity of different origin that do not belong to the boundaries 

of a tube, e.g. caused by the background pattern (see Figure 3.8) or by dirt on the conveyor 

belt. Furthermore, the printing on transparent tubes, visible in the image using back light 

illumination, influences the intensity profile as will be seen later on. 

The intensity profile ˆ Py of an image row y can be formally defined as


250 

200 

150 

100 

50 

(a) transparent, 50mm length, ∅8mm (b) black, 50mm length, ∅8mm 

gray level profile 

0 

0 100 200 300 400 500 600 700 

(c) 

250 

200 

150 

100 

50 

gray level profile 

0 

0 100 200 300 400 500 600 700 

Figure 4.13: Sample images with 11 equally distributed vertical scan lines used for profile 

analysis within a certain region of interest. (c) and (d) show the resulting profiles of image 

(a) and (b) respectively. 

ˆPy(x) =I(x, y) (4.3) 

where I(x, y) indicates the gray level value of an image I at pixel position (x, y). Since 

a single scan line (e.g. ˆ P h/2 with h the image height) is very sensitive to noise and local 

intensity variations, the localization of the tube boundaries based on the profile of a single 

row can be error-prone. Hence, a set of n parallel scan lines is considered. The mean 

profile Pn of all n lines is calculated by averaging the intensity values at each position: 

Pn = 1 

n 

n� 

ˆPyi 

i=1 

(d) 

(4.4) 

One property of the resulting profile Pn is the projection of a two-dimensional to an onedimensional 

problem which can be solved even faster (processing speed is a very important 

criteria at this step of the computation). Since further processing steps with respect to Pn 

are independent of the number of scan lines n (n ≥ 1), Pn is denoted simply as P in the 

following. A more detailed view on the number of scan lines and the scan line distribution 

with respect to robustness and performance is given in Appendix A. In the following Nscan 

denotes the number of scanlines used. 

4.4.2. Profile Analysis 

Step 1: The first step is smoothing the profile P by convolving with a large 1D mean 

filter kernel of dimension Ksmooth:


� 

1 

Psmooth = P ∗ 

Ksmooth 

� 

1 

Ksmooth 

�� 

... 

� 

1 

Ksmooth 

� 

Ksmooth times 

(4.5) 

The idea of this low pass filtering operation is to reduce the high-frequency components 

in the profile, thus, especially the structure of the background pattern. 

Obviously, this step also blurs the tube edges, and therefore reduces the detection precision 

significantly. Having in mind the goal of the profile analysis, it is intended to verify 

whether a measurement is possible in the current frame or not. In a next step, the proper 

measurements have to be performed on the original image data and not on the profile. 

However,knowledgeofthisfirststepdoesnothavetobediscardedandcanbeusedinstead 

to optimize the following. In other words, if it is possible to predict a tube’s boundaries 

reliable, but not precise, this information is then used to define a region of interest (ROI) 

as close as possible around the exact location. 

Step 2: The next step is to detect strong changes in the profile. Large peaks in the first 

derivative of the profile indicate such changes and can be considered as candidates for 

tube boundaries. Therefore, a convolution with a symmetric 1D kernel approximating the 

first derivative of a Gaussian is performed: 

Pdrv = Psmooth ∗ Dx 

(4.6) 

The odd symmetric 9 × 1 filter kernel Dx is given by the following filter tab as proposed 

in [25] for the design of steerable filters: 

tab 0 1 2 3 4 

value 0.0 0.5806 0.302 0.048 0.0028 

With this kernel a dark-bright edge results in a negative response while a bright-dark 

edge leads to a positive response. The intensity of the response is proportional to the 

contrast at the edge. 

Assuming the potential tube boundaries have a sufficient contrast, only the strongest 

peaks of Pdrv are of interest for later processing. To simplify the task of peak detection, 

theabsolutevaluesofthedifferentiatedprofilearetakenintoaccountonly. Thisisdenoted 

as follows: 

as P + 

drv 

P + 

drv = |Pdrv| (4.7) 

Note that the information of the sign of a peak in Pdrv is still useful for later classification 

and has not to be discarded. 

Step 3: A thresholding is performed on P + 

drv to eliminate smaller peaks that correspond 

for example to changes in intensity due to the background pattern or dirt: 

� + 

P 

Pthresh(x) = drv (x) 

0 

+ 

, if P drv (x) >τpeak 

, otherwise 

(4.8)


The threshold τpeak is calculated dynamically based on the mean of P + 

drv 

P + 

drv with 

τpeak = αpeakP + 

drv 

denoted as 

(4.9) 

The factor αpeak indirectly relates to the number of peaks left to be further processed. 

τpeak is also denoted as profile peak threshold. The goal is to remove as much peaks as 

possible that do not belong to a tube’s boundary without eliminating any relevant peak. 

If the images are almost uniform over larger regions as for black tubes, there are only 

a few strong changes in intensity. Thus, P + 

drv is expected to be quite low compared to 

max(P + 

) and the peaks belonging to the tube boundaries are conserved even for a larger 

drv 

αpeak. On the other hand, for transparent tubes the contrast between foreground and 

background is lower. Hence, the distance between intensity changes due to background 

clutter and those at the tube boundaries is much smaller. The choice of the right threshold 

is more critical in this situation and αpeak has to be selected carefully. If it is too low, 

too many peaks will survive the thresholding. Otherwise if it is too large, important 

peaks will be eliminated as well. The profile peak threshold is closely related to the 

detection sensitivity of the system as will be discussed in more detail in later sections. 

More sophisticated calculations of τpeak considering the difference between maximum value 

and mean or the median did not perform better. 

Step 4: The x-coordinates of the remaining peaks defined as local maxima in Pthresh 

arestoredinalistdenotedascandidate positions Ω in ascending order. NΩ indicates the 

number of elements in Ω, i.e. the number of potential tube boundaries in an image. 

4.4.3. Peak Evaluation 

The process described in the previous section results in a number of candidate positions 

that have to be evaluated since it is possible that there are more candidate positions 

than the number of tube boundaries. This is due to the fact that the thresholding is 

parametrized to avoid the elimination of relevant positions. The actual number of tube 

boundaries indicating the current state as introduced in Section 4.2 is not known by now 

and has to be extracted by applying model knowledge to the candidate positions. 

Since only four of the nine possible states can be used for measuring, it is of interest to 

know whether the current image matches one of these four states. If this is the case, it is 

sufficient to localize the boundaries of the centered tube. Under the assumptions made in 

Section 4.2 only one tube can be in the visual field of the camera completely at one time. 

In the following, an approach reducing this problem to an iterative search for boundaries 

that belong to a single foreground object is presented. 

First, Ω is extended to Ω ′ by two more x-positions: x = 0 at the front and x = xmax 

at the back of the list, where xmax is the largest possible x-coordinate in the profile. 

Then, any segment s(i), defined as the region between two consecutive positions Ω(i) and 

Ω(i + 1) , can be assigned to one of two classes in {BG, TUBE} representing background 

and foreground respectively. In this way, the whole profile is partitioned into NΩ +1 

segments if there are NΩ peaks.


Global Threshold The classification into BG and TUBE is based on the general assumption 

that the mean intensity of objects is darker than the background. In more detail, 

taking the mean value of the smoothed profile Psmooth as a global reference and calculating 

the local mean value for each segment s(i), the classification C can be expressed as: 

� 

TUBE , mean(s(i)) ≤ Psmooth 

C1(s) = 

(4.10) 

BG , otherwise 

In image segmentation the mean value is widely used as an initial guess of a threshold 

separating two classes of data distinguishable via the gray level [48, 2]. There are many 

more sophisticated approaches for threshold selection including histogram shape analysis 

[57, 63, 26], entropy [54], fuzzy sets [20, 14] or cluster-based approaches [55, 46]. The different 

techniques are summarized and compared in several surveys [59, 47, 60]. However, 

in this application the threshold is used for classification and it is not intended for calculation 

of a binary image that segments the tubes from the background. Since processing 

time is strictly limited and critical in this application, it is essential to save computation 

time if possible. As introduced before, the actual segmentation is based on strong vertical 

edges in the profile, but does not include any semantic meaning of the segments. In the 

classification step, the mean turned out to be a reliable and fast choice to distinguish between 

foreground and background segments both for black and transparent tubes if there 

is a uniform and sufficient contrast between tubes and the background over the whole image. 

In this case there is no need for another threshold than the mean - saving additional 

operations. 

Insteadofcomparingtheglobalmeanwiththelocalmean,thelocalmediancouldbe 

observed to result in a more distinct measure for discrimination: 

� 

TUBE , median(s(i)) ≤ Psmooth 

C2(s) = 

(4.11) 


The better performance of measure C2 originates in the characteristic of the median 

tobelesssensitivetooutlierscomparedtothemean[32]. Thisisimportantsincethe 

input data can be very unsteady due to the background texture or printing visible on 

transparent tubes (independent of the additional camera noise level). As mentioned before, 

thesmoothingoftheprofileatthefirststepalsoblursthetubeedgescausingthesegment 

boundaries not to be totally precise. In this case, the local mean tends to move closer 

to the global mean, which does not have to implicate a misclassification. The median, 

however, turned out to be more distinct in most cases. Figure 4.14 shows the smoothed 

profile of (a) a transparent and (b) a black tube respectively. The examples represents the 

states entering + centered and entering + centered + leaving. The segment boundaries, 

which correspond to the locations of the strongest peaks in the first derivative of the 

profile, are visualized as well as the global mean and the local median. Segments that 

have a median above the global mean are classified as background. 

Regional Threshold One drawback of the global threshold approach is that different 

background segments are assumed to be almost equal in image brightness, i.e. the tubebackground 

contrast is approximately uniform within one image. This assumption, however, 

does not hold if there are larger variations in background brightness (for example 

due to material properties or dirt on the belt). Such variations can occur between images,


400 

350 

300 

250 

200 

150 

100 

50 

! 

400 

350 

300 

250 

200 

150 

100 

50 

smoothed profile 

segment boundaries 

predicted tube boundaries 

local median 

global mean 

Background Background 

Tube 1 Tube 2 

0 20 40 60 80 100 

x 

120 140 160 180 

(a) Transparent/ State: entering + centered 




local median 

global mean 

Background Background 

0 

Tube 1 Tube 2 Tube 3 

0 20 40 60 80 100 

x 

120 140 160 180 

(b) Black/ State: entering + centered + leaving 

Figure 4.14: Different steps and results of the profile analysis. After smoothing the profile, 

strong peaks in the first derivative indicate potential tube boundaries. The segments between 

the strongest peaks are classified into foreground and background based on the difference 

between the local median of each segment and the global mean. The background is assumed 

to be brighter on average. Neighboring segments of the same class are merged. The crosses 

mark the correctly predicted boundaries of the centered tube. Note the stronger contrast of 

black tubes.


but also over the whole image width or locally within a single image. The first case is 

uncritical as long as there is a sufficient contrast between a tube and the background. The 

later case, i.e. local variations in background brightness, can lead to failures of the global 

threshold. Figure 4.15(a) shows one characteristic situation which occurs quite often with 

transparent tubes. The background intensity on the left is much darker compared to the 

right. The global threshold fails, since the much brighter background regions on the right 

increase the global mean. Thus, the local median of the most left segment falls below 

the threshold and is therefore classified as foreground. Due to this misclassification no 

measuring will be performed on this frame, although it would be possible. 

A region based threshold can overcome this problem. The idea is to compute the 

classification threshold not globally, but on regional image brightness. While the local 

median is computed for each segment, a good classification threshold must consider at 

least one transition between background and foreground. Following the assumptions made 

in Section 4.2, two tubes can not be completely in the image at one time. Furthermore, 

the number of connected background regions in the image can not exceed two. If there 

are two connected background regions, one has to lie in the left half of the image while 

the other falls in the right half. Thus, one can define two regions, left and right of the 

image center respectively, and compute the mean for each region as analogue to the global 

mean. Inthefollowing,themeanoftheleftandrightsideofthe(smoothed)profileare 

denoted as Pleft and Pright respectively. 

If there is only one background region (states empty, entering, leaving, entering + 

leaving), splitting the image at the center has no negative effect. The left and right mean 

is computed either over a tube and background region, or over background only. At the 

very special case that the image width is exactly twice a tube’s length and the tube enters 

(or leaves) the scene with the right (or left) boundary exactly on the image center, the 

regional threshold is computed only over the tube and the classification may be either 

foreground or background. However, in both cases this situation can be detected as a 

state where a measurement is not possible and is therefore a sufficient solution. 

The region based classification of the segments can now be expressed as: 

� 

TUBE , median(s(i)) ≤ τregion 

C3(s) = 


where τregion is defined as follows: 

⎧ 

⎨ Pleft 

,s(i) falls into left region only 

τregion = Pright 

⎩ 

max(Pleft, Pright) 

,s(i) falls into right region only 

,s(i) falls into both regions 

(4.12) 

(4.13) 

In Figure 4.15(b) one can see the difference between the global and the regional classification 

threshold. The regional threshold of the left half is much lower compared to 

the global threshold. On the other hand, since the second segment belonging to the tube 

intersects the center, the maximum of both regional thresholds is taken into account which 

lies significantly above the global threshold. Finally, all segments are classified correctly. 

With this threshold, the classification is less sensitive to darker background regions. 

Thetwomethodshavebeencomparedinthefollowingexperiment: 

A sequence of transparent tubes (50mm length, 8mm diameter) has been captured 

including 467 frames that have been manually classified as measurable, i.e. a tube is


250 

200 

150 

100 

50 

(a) 

Smoothed graylevel profile 

Regional mean 

Global mean 

peak candidates 

filtered peaks 

local median 

0 

0 20 40 60 80 100 120 140 160 180 200 

(b) 

Figure 4.15: (a) The background intensity at the left is much darker than at the right. 

The global mean as threshold can not compensate for such local variations as can be seen 

in (b). In this case, the left background region is wrongly classified as foreground, since the 

global threshold is larger than the local median of the corresponding segment. A region based 

threshold that considers the left and right image side independently can overcome this problem 

(see text).


Global Regional 

Total number: 467 467 

Measurable: 353 414 

Average PTM: 5.98 7.01 

Table 4.1: Comparison of the global and regional threshold used for classification in the 

profile analysis. The table shows the number of images that have been correctly detected 

as measurable compared to the total number, as well as the average number of per tube 

measurements (PTM). Using the regional threshold increases the number of measurements 

significantly. 

Figure 4.16: Ghost effect: If the parameters of the profile analysis are too sensitive, darker 

parts on the conveyor (e.g. due to dirt or background structure) can be wrongly classified as 

atube. 

completely in the image. The sequence has been analyzed once with the global threshold 

and once with the regional threshold. All other parameters have been constant. The 

results can be found in Table 4.1. In this context it is important to understand that the 

term measurable is related to a single image. It does not mean if the system fails to detect 

a tube in one image that the tube can pass the visual field of the camera undetected. This 

occurs only if it is not measured in all images that include this tube what is very unlikely. 

The experiment shows the average number of measurements per tube can be increased 

by approximately one if using the regional instead of the global mean as threshold for the 

tube classification. Particularly situations as in Figure 4.15(a) can be prevented. 

The reason why none of the two method has detected all measurable frames is due to 

other parameters for example a too little contrast between a tube and the background. 

The dynamic threshold τpeak as introduced before defines the strongest peaks of the profile 

derivative. If it is too large, low contrast tube edges may not be detected. On the other 

hand, if it is too low, darker regions in the background may be wrongly classified as 

foreground. This leads to ghost effects, i.e. the system detects a tube where actually 

no tube is as can be seen in Figure 4.16. Therefore, the weighting factor αpeak of τpeak 

(see Equation 4.9) must be adjusted in the teach-in step to the smallest value that does 

not produce ghost effects if inspecting an empty, moving conveyor belt. Obviously, the 

compromise gets larger with an increasing amount of dirt. 

Merging Segments In Figure 4.14(a), one can find two more segments than needed to 

represent the actual state entering + leaving. Two strong peaks on the right that are due 

to a dark dirt spot on the conveyor belt have not been eliminated by the thresholding. 

However, the corresponding segments are correctly classified as background leading to 

three consecutive background segments which could be merged to one large segment. 

In general, once all segments s(i) are classified the goal is to iteratively merge neighboring 

segments of the same class and to eliminate foreground segments that do not qualify


Input : coordinate list Ω ′ with N = |Ω ′ | 

global median of profile 

minimum tube segment s i z e MIN SIZE 

Step1 : 

Step2 : 

define segments: S = { s [ i ] = [Ω ′ [i], Ω ′ [i+1] ]} 

classify each segment based on Eq. 4.12 : 

s[i].label = C3(s[i]) 

i f s [ i ] . l a b e l == TUBE f o r a l l i r e t u r n ERROR 

/ remove foreground segments at the borders / 

let i1 be the index of the first 

and i2 the index of the last BG segment 

set s[ j ]. label = BG for all j , 0≤j

4.5. MEASURING POINT DETECTION 75 

for measuring. An overview of the algorithm is shown in Listing 4.1. A size filter operation, 

which can be parametrized with respect to the given target length, is used to remove 

too small foreground segments (e.g. caused by dirt on the conveyor belt). 

The output of the algorithm is either one large background segment (i.e. all foreground 

segments have been removed if existed since they did not fulfill the criteria) or three 

segments in the form BG-TUBE-BG. In the later case, the peaks belonging to the left 

and right boundary of the remaining foreground segment are finally verified with respect 

to the sign of the derivative. With the derivative operator used, the position of the left 

boundary must result in a negative first-order derivative value (bright-dark edge) and the 

right boundary in a positive value (dark-bright edge). If the predicted tube boundaries 

are consistent with this last criterion, they are used to define two local ROIs of width 

WROI as starting point for a more precise detection of the measuring points. The local 

ROI height is defined over the distance between the two guide bars. 

ThemergingofthesegmentsisalinearoperationinthecomplexityofO(NΩ). Since it 

is only allowed to reclassify a former foreground segment into background in this procedure 

and never vice versa, Step2 of the algorithm is repeated only once if at all. Hence, the 

algorithm terminates for sure. 

If all segment are classified as TUBE in the first step, an error is returned. This error 

indicates the presence of state full (See Figure 4.2(i)). The reason can be due to a too 

small field of view of the camera or to a missing spacing between consecutive tubes. In 

any case it is not possible to perform a measuring. Since this state is critical compared to 

other states that can not be used for measuring, it is important to detect this situation. 

In practice, if this situation occurs an alert must be produced. 

4.5. Measuring Point Detection 

The previous sections described a fast method to distinguish whether a frame is useful or 

not. If a measuring is possible, two regions around the potential left and right boundary 

of a tube to be measured are the output of this first step. In the following, the exact tube 

boundaries have to be detected with subpixel accuracy. 

4.5.1. Edge Enhancement 

As introduced in Section 2.3 there is a large number of approaches for edge detection. Four 

common methods including the Sobel operator, Laplace operator, Canny edge detector [13] 

and a steerable filter edge detector based on the derivative of a parametrized Gaussian 

have been applied to test images. The results can be found in Table 4.2. It includes 

experiments with two transparent tubes (left boundary) of the same sequence and one 

black tube boundary. All tubes have a inner diameter of 8mm. The difference in size 

between the transparent and black tubes is due to a different camera-object distance. 

As can be seen the edge of the transparent tubes can differ in brightness, contrast and 

background pattern between frames. 

The goal was to find an edge detection operation that adequately extracts the tube 

boundaries under the presence of background structure and noise, and which is computational 

inexpensive in addition.


Input Input 

SOBELX 

SOBELY 

Gaussian 

5x5 (a) 

Gaussian 

5x5 (v) 

Laplace Gaussian 

7x7 (a) 

Canny 

(50/100) 

Canny 

(90/230) 

Canny 

(185/210) 

Gaussian 

7x7 (v) 

Gaussian 

11x11 (a) 

Gaussian 

11x11 (v) 

Table 4.2: Comparison of different edge detectors. The parameters of the Canny edge 

detector indicate the lower and upper threshold. The Gaussian derivative based edge detection 

results are all of first-order. An (a) indicates edges of all orientations (in discrete steps of 5 ) 

are enhanced with a steerable filter approach, while (v) represents only vertical edges.


The results of the transparent tubes are crucial for the selection of an appropriate 

edge detection approach used in this application, since due to the strong contrast the 

detection of the black tube boundaries is uncritical with all tested methods. For both 

tube types the edge detection results differ in detected orientation, edge elongation (i.e. 

how precise an edge can be localized), or edge representation (signed/unsigned values, 

floating point/binary, etc.). 

Canny Edge Detector The Canny edge detector results in a skeletonized one pixel wide 

response that precisely describes edges of arbitrary orientation. In this application the 

main drawback of Canny’s approach is the importance of the threshold choice. As can 

be seen in Table 4.2, different parameter sets yield very different results. If the upper 

hysteresis threshold used as starting point for edge linking is low (e.g. 100) combined with 

a lower second threshold (e.g. 50), too many background edges are detected as well. A 

larger upper threshold (e.g. > 200) reduces the number of detected edge pixels, but also 

eliminates parts of the tube edge. It is possible that it breaks up into parts. If the distance 

between upper and lower threshold is large, it is likely that background and tube edges 

are merged. In any case a threshold set working fine with one image can lead to very 

poor results in another. The result of the Canny edge detector is a binary image where 

non-edge pixels have a value of zero and edge pixels a value of one (or 255 in 8bit gray level 

images). Binary contour algorithms can be applied to analyze chains of connected edge 

pixels. As can be seen in the test images, depending on how many edge pixels survived 

the thresholding, such analysis can be very complex and time-consuming. Gaps within 

edges belonging to the tube boundary make this search even more complicated. 

Sobel The Sobel operator approximates a Gaussian smoothing combined with differentiation. 

It can be applied with respect to x- andy- direction. Accordingly to the filter 

direction, vertical or horizontal edges are enhanced. Since the tube boundaries have a vertical 

orientation, the SOBELX operator is an adequate choice in this application. Edges 

are located at local extrema, i.e. local minima at bright-dark edges and local maxima 

for dark-bright edges with respect to the gradient direction. A drawback is that also 

the background pattern is dominantly vertical oriented, thus, background edges are also 

detected. The intensity of an edge is related to the image contrast. Assuming a certain 

contrast between tubes and background, a large amount of background clutter could be 

removed by thresholding leaving only tube edges and edges due to high-contrast dirt particles. 

However, this would lead to a similar approach like the Canny edge detector with 

the drawbacks stated before. 

Laplace The implementation used to test the Laplacian calculates the second-order derivative 

in x- andy-direction using the Sobel operator and sums the results. The output is 

an image of signed floating point values. Edges are located at the zero crossings between 

strong peaks. The Laplacian is an anisotropic operator, thus, edges off all orientations are 

detected equally. One drawback of this method is the sensitivity to noise. In the resulting 

response there are many zero crossings. Compared to first-order derivatives, the edge criterion 

is more complex. A pixel is an edge pixel if the closest neighbor in the direction of 

the gradient is a local maximum while the opposite neighbor is a local minimum and both


neighbors must meet a certain threshold. However, the zero crossing can be computed 

with subpixel accuracy. 

Steerable Filters The idea with filters that are steerable for example in scale and orientation 

is to design a filter that performs best for a particular edge detection task (See 

Section 2.3.3). In this application the goal is to find a filter that extracts the tube edges 

with maximum precision, while background edges and dirt are suppressed. The steerable 

filter approach allows for testing a large range of different edge detection kernels. 

Experiments with systematically varied parameter sets of first-derivative Gaussian filters 

following the approach of Freeman and Adelson [25] are applied to the test images. Some 

of the results are visualized in Figure 4.2. 

As can be seen, the background clutter can not be eliminated even with larger kernel 

sizes while the tube edges get blurred. No parameter setting for a Gaussian derivative 

kernel has been found that performs significantly better as a tube edge detector than the 

computational less expensive Sobel operator. 

All tested methods beside the Canny edge detector can be seen more as edge enhancer 

than as real edge detectors. This means, the results do not fulfill the second and third 

criterion for good edge detection (See Section 2.3.1). Further processing of the edge 

responses such as nonmaximum suppression is necessary. An alternative is a template 

based edge localization step which is introduced in the next section. 

4.5.2. Template Based Edge Localization 

It is important to state that even precisely detected edges (including Canny’s approach) 

still have no semantical meaning. In all tested methods there have been false positives, 

i.e. edges belonging to the background, dirt, or noise. Hence, model knowledge has to be 

applied to the detected edges to ensure whether an edge really corresponds to a tube’s 

boundary or not. 

In this application, the highly constrained conditions reduce the number of expected 

situations to a small, well defined minimum. The edges belonging to the tube boundaries 

of interest are always approximately vertical. Due to perspective the tube boundary 

appears straight or slightly curved in a convex fashion under back light, depending on the 

position of the tube with respect to the optical ray of the camera. The more the tube 

boundary is displaced from the camera center the larger is the curvature. 

At this stage it is of interest to locate a tube’s boundaries within the two local ROIs 

(left and right respectively). Strong changes in image intensity in x direction (vertical 

edges) have been enhanced using the SOBELX operator. The goal is not only to find 

the strongest peaks in the edge image, but also the strongest connected ridge along such 

peaks that most likely corresponds to the tube boundary. This task can be performed by 

template matching (See Section 2.4). 

If the feature to be detected can be modeled by a template, the response of the crosscorrelation 

with this template computes a match probability within a given search region. 

The idea is to design a template that models the response of the edge enhancer and 

correlate this template with the local ROI. The position where the correlation has its 

maximum provides close information on the tube boundary location. Therefore, it is


-50 

-100 

-150 

0 

400 

350 

300 

250 

200 

150 

100 

50 

0 

-50 

-100 

-150 

0 

350 

300 

250 

200 

150 

100 

50 

5 

10 

15 

x 

20 

25 

(a) 

0 5 10 15 20 25 30 35 40 

x 

(c) 

30 

35 

0 

10 

20 

30 

40 

50 y 

60 

70 

80 

40 90 

0 

10 

20 

30 

40 

50 

60 

70 

80 

90 

y 

-50 

-100 

-150 

0 

350 

300 

250 

200 

150 

100 

50 

0 5 10 15 20 25 30 35 40 

x 

(b) 

0 5 10 15 20 25 30 35 40 

0 

10 

20 

30 

40 

50 

60 

70 

80 

90 

0 

10 

20 

30 

40 

50 

60 

70 

80 

90 

Figure 4.17: Edge detection results of the SOBELX operator applied to different tubes 

(right boundary). The tube boundary corresponds to the strongest ridge in vertical direction in 

each plot. It can be seen that the edge response differs in curvature, intensity and background 

clutter. (a) Almost straight edge (close to the optical center of the camera) of a transparent 

tube with a quite uniform region left of the ridge belonging to the tube, and a more varying 

area on the right due to the background structure. (b) The tube boundary looks convex if 

further away from the camera center due to perspective. The edge response is much stronger 

at the ends of the ridge than at the center. This is due to the amount of light which is 

transmitted by the tube (see text). (c) Edge of a transparent tube with a printing close to 

the boundary visible as smaller ancillary ridge on the left. (d) Boundary of a black tube. The 

edge response is about three times stronger compared to transparent tubes due to the strong 

image contrast. 

1000 

800 

600 

400 

200 

0 

-200 

x 

(d) 

y 

y


important to have a closer look on the response of the edge detection results with respect 

to the input data. Consistent characteristics can used for the design of the right template. 

Figure 4.17 shows examples of the SOBELX operator applied to test images. In this 

case, the response corresponds to the right ROI of three transparent tubes (Figure 4.17(a)- 

(c)) and one black tube (Figure 4.17(d)) at different positions in the image with respect 

to the x-axis. The tube boundary can be detected intuitively by humans even under the 

presence of background clutter. However, one can find the edge response differs between 

the different plots due to image contrast or perspective. 

Figure 4.17(a) shows an almost straight edge (close to the optical center of the camera) 

with a quite uniform region left of the ridge belonging to the tube, and a more varying area 

on the right due to the background structure. It can be observed that the edge response 

is stronger at the ends of the ridge than in the center, which is due to the transmittance 

characteristic of transparent tubes (see Section 4.2). More light is transmitted at the 

center leading to brighter intensity values and a poorer contrast, while the corners (‘L’corners 

between horizontal and vertical boundary of a tube) are darker and yield a better 

contrast. This effect can be seen also very clearly in Figure 4.17(b). In addition, the tube 

boundary looks convex due to perspective since it is further away from the camera center. 

Vertical edges of printings on a tube’s surface are also extracted by the edge detection step 

as can be seen in Figure 4.17(c). In this case, the straight line of an upsight-down capital 

‘D’ falls into the right local ROI, causing the smaller ancillary ridge on the left of the tube 

boundary. Figure 4.17(d) includes the boundary of a black tube. Due to the strong image 

contrast the edge response is about three times stronger compared to transparent tubes. 

The influence of the background clutter reduces to a minimum and since printings are not 

visible on black tubes at back light, this problem vanishes completely. The edge response 

does not differ in intensity at the ends like with transparent tubes. 

4.5.3. Template Design 

The goal is to design a universal, minimum set of templates that covers all potential edge 

responses of both transparent and black tube boundaries. The templates must model 

different curvatures to be able to handle perspective effects. Assuming a constant horizontal 

orientation and a constant size, the curvature is the only varying parameter between 

templates. The following two-dimensional function has been developed that can be parametrized 

to approximate the expected edge responses: 

� � 

y 

Tψ(x, y) =aexpb 

HT 

� 2 

− (x − (ψy2 )) 2 

2σ 2 

� 

(4.14) 

It is based on a Gaussian with standard deviation σ in x-direction extended with respect 

to y. The curvature is denoted by ψ. A value of ψ = 0 represents no curvature, while 

the curvature increases with increasing values of ψ (ψ ≤ 1). The first summand in the 

exponent of the exponential function can be used to emphasize the ends of the template 

in y-direction which is motivated in the characteristic response of transparent tubes. The 

edge detector results in higher values at the ends than at the center. b controls the amount 

of height displacement. If b = 0, the template is equally weighted. HT corresponds to the 

template height. a determines the sign of the template values. For bright-dark edges like at 

the left boundary the edge response is negative, thus a


0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

1 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 

0 

2 

2 

x 

x 

4 

4 

6 

6 

8 

(a) 

8 

(c) 

10 80 

70 

60 

50 

40 

30 

20 

y 

10 

0 

10 

20 

30 

40 

50 

y 

60 

70 

0 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 

2 

x 

4 

6 

8 

(b) 

10 80 

70 

60 

50 

40 

30 

20 

y 

10 

0 

10 

20 

30 

40 

50 

y 

60 

70 

80 

Figure 4.18: Different templates generated using Equation 4.14. (a) Straight edge: ψ =0, 

b = 0. (b) Curved edge: ψ =0.005, b = 0. (c) Curved edge: ψ =0.02, b =0. (d)Curvededge 

with emphasized ends: ψ =0.002, b =3. (σ =0.8 anda = 1 has been used for all templates 

in this figure). Note the differently scaled axis x and y. 

0.8 

0.6 

0.4 

0.2 

0 

1 

1.8 

1.6 

1.4 

1.2 

2 

0 

2 

4 

x 

6 

(d) 

8 

10 

0


# templates ψmin ψmax χ a b 

Left: 30 0.0 0.02 0.00066 −1 3 

Right: 30 −0.02 0.0 0.00066 1 3 

Table 4.3: A set of 30 templates with curvatures equally distributed between ψmin and ψmax 

at a curvature resolution (step size) χ has been used to determine the occurrence of certain 

curvatures empirically. 

side a>0 is used to model the positive response of dark-bright edges. Figure 4.18 shows 

some examples that visualize Equation 4.14 and the effect of the different parameters. 

Template Dimension A constant template width of 11pixels is used, which is large 

enough to represent both straight and maximal curved tube boundaries. The template 

height is defined over the global ROI height. Assuming the guide bars are always arranged 

so that the guide bar distance is only slightly larger than the tube’s perimeter, the global 

ROI height is a good reference on the tube size. It is possible to compute a well guess of 

the tube height by the following equation: 

where HROIG 

HT = γHROIG 

is the global ROI height and γ a factor between 0 and 1. 

(4.15) 

Curvature Thequestionis,whatrangeofcurvaturesoccursinpracticeandhowmany 

templates are needed to cover that range. Therefore, several test sequences with both 

black and transparent tubes of different diameter have been captured. 30 templates of 

different curvature have been generated for both tube sides. The parameters can be found 

in Table 4.3. 

Foreachmeasurableframethecurvatureofthetemplatethatreachesthemaximum 

correlation value is taken into account to build a histogram of curvature occurrence both 

for the left and ride tube side. The normalized cross-correlation (see Equation 2.31) is 

used as measure evaluating the match quality of each template at a certain location. The 

results can be found in Figure 4.19. 

It shows, the occurring curvatures are limited to a small range denoted as Rψ, left and 

Rψ, right with Rψ, left =[0, 0.005] for the left and Rψ, right =[−0.005, 0] for the right 

side respectively. In order to reduce the number of templates all curvatures outside this 

range can be ignored. 

Another important criteria is the step size or curvature resolution χ, i.e. how many 

steps between ψmin and ψmax are taken into account. Theoretically one could quantize 

the curvature ranges into very small steps. However, since correlation is an expensive 

operation one has to make a compromise between accuracy and performance. It was 

observed that if more than 15 templates have to be tested at each tube side per frame, the 

system starts to drop frames, i.e. this is a quantitative indicator that the overall processing 

time exceeds the current frame rate. Therefore the total number of templates is restricted 

to 10 in this application. The corresponding step size between two curvatures is 0.0005.


occurrence 

0.2 

0.18 

0.16 

0.14 

0.12 

0.1 

0.08 

0.06 

0.04 

0.02 

0 

-0.002 0 0.002 0.004 0.006 0.008 0.01 0.012 

curvature 

(a) 

occurrence 

0.3 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

-0.025 -0.02 -0.015 -0.01 

curvature 

-0.005 0 0.005 

Figure 4.19: Histogram of template occurrence for (a) left and (b) right tube side. It can 

be seen that only a small range of curvatures can be observed. This reduces the number of 

templates that have to be tested each time. 

120 

100 

80 

60 

40 

20 

0 

0 

2 

x 

4 

6 

8 

0 

10 

20 

30 

40 

50 

60 

y 

70 

10 80 

Figure 4.20: If the height weighting coefficient gets too large (here: b = 20), the center of 

the tube edge does not contribute to the matching score anymore. 

Template Weighting The weighting coefficient b in Equation 4.14 is important for transparent 

tubes. Due to a poor contrast, the overall edge response of a transparent tube might 

be low. If considering only the center region of an edge, the contrast might be even lower 

than a background edge at worst case. The cross-correlation only computes the similarity 

of a template at a certain location in the image. The maximum response is taken as match, 

since it is assumed that there must be a tube edge in the search region, even if the same 

or another template matches the real tube edge perfectly, but with a lower score. Finally 

this will lead to a wrong measurement. 

With model knowledge about the tube characteristics one can assume that the contrast 

at the edge ends is significantly stronger. If the template is weighted uniformly at the 

center and the ends, the correlation score depends on the whole edge equally. On the 

other hands, if the ends of the template are weighted stronger than the center, a template 

that perfectly fits the tube edge will yield a larger score, since background edges are usually 

uniform. Thus, the template is designed to prefer tubes edges. 

The weighting coefficient b hastobelargerthanonetoyieldthedesiredeffect.Onthe 

other hand, b must not be too large as well, since then the ends get too much influence. In 

the extreme case, the template equals two spots at a certain distance that do not represent 

a tube edge anymore (see Figure 4.20). 

(b)


0.5 

0.4 

0.3 

0.2 

0.1 

0 

0 2 4 6 8 10 12 

x 

14 16 18 20 

(a) (b) 

(c) 

5 

4 

3 

2 

Max 

y 

1 

0 

0 2 4 6 8 10 12 

x 

14 16 18 20 

Figure 4.21: Effect of the weighting coefficient b in Equation 4.14. (a) Tube edge detection 

results with a uniform weighted template (b = 0). (b) Results of a template with enhanced 

ends (b = 3). (c) Corresponding cross-correlation results of (a), and (d) the cross-correlation 

results of (b) respectively. The maximum in (c) and (d) corresponds to the pixel position 

where the template matches best. In this example, the ridge closer to the observer is due to 

a background edge while the ridge further away corresponds to the real tube edge. 

Figure4.21showsanexampleofhowtheweightingofthetemplateendsimprovesthe 

tube edge detection. In this example the right boundary contrast of a transparent heat 

shrink tube is quite low. Using a uniformly weighted template, i.e. b = 0, the maximum 

correlation score is reached at a background edge (see Figure 4.21(a) and (c)). In this 

case, the tube would be measured larger than it really is. On the other hand, with 

an enhancement of the template ends, the tube edge results in a larger score than the 

background edge leading to a correct detection as can be seen in Figure 4.21(b) and (d). 

The enhancement of the template ends is motivated in transparent tube characteristics. 

For black tubes, b = 0 describes the response of the SOBELX operator best. However, 

there is no disadvantage if using the same weighting coefficient as for transparent tubes. 

Due to the strong contrast of black tubes, the curvature and size of the template are the 

dominant factors influencing the matching results. 

Template Rotation The templates generated by Equation 4.14 are symmetric along the 

y-axis with respect to the template center. Thus, the ends of the template lie always on 

one line perpendicular to the x-axis. In the ideal case, the edge response of a heat shrink 

tube has the same characteristic. In practice, however, a tube can be slightly angular 

within the guide bars, or the tube edge might be cut skew. In both cases the strong edge 

responses at the ends do not have to lie on one line perpendicular to the x-axis as in the 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

Max 

(d) 

5 

4 

3 

2 

y 

1 

0


(a) 

(b) (c) 

Figure 4.22: (a) Edge response of an angular oriented (transparent) tube edge. The characteristic 

peaks at the ends of transparent tube edges do not have to lie on one line perpendicular 

to the x-axis. The red line visualizes the slight angular orientation of the tube edge. (b) Example 

detection result with k = 1 orientations. (c) Corresponding result with k = 3 orientations. 

template. Figure 4.22(a) visualizes the edge response of a slightly angular tube edge of a 

transparent tube (left side). In such a situation no template will fit the edge perfectly. This 

can be critical if the edge contrast is poor. In this case, as mentioned before, the stronger 

weighting of the template ends helps to support a match at the real tube boundary instead 

of at a background edge. With an angular tube edge, a symmetric template can not be 

shifted over the image in a way it matches both edge ends. Thus, the cross-correlation 

score is significantly smaller and the probability increases that a background edge yields 

a larger score. 

A little rotation of the template can overcome this problem. Therefore, the bank of 

templates is extended by k − 1 rotated versions of each template. It turned out that it is 

sufficient to rotate each template by ±2 degrees to cover the range of expected deviations 

from the ideal symmetric model. Thus, k = 3 has been used throughout the experiments. 

It is assumed that larger angular deviations can not occur due to the guide bars. 

Model Knowledge Optimization The number of templates to be checked each time on 

the left and right side increases with the number of rotations. Instead of 2 × 10 templates


curvature 

4.5 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

3 

x 10 

5 

0 

0 50 100 150 200 

x 

250 300 350 400 

(a) Left tube side 

curvature 

3 

x 10 

0 

0.5 

1 

1.5 

2 

2.5 

3 

3.5 

4 

4.5 

5 

350 400 450 500 550 

x 

600 650 700 750 

(b) Right tube side 

Figure 4.23: Curvature of best matching template depending on the x-position of the match. 

one has to consider 2 × 10 × 3 templates if k = 3. Since correlation is an expensive 

operation, the processing time increases significantly even if the local ROIs are relative 

small. It turned out that not more than 15 templates can be checked at each side without 

skipping frames at a frame rate of 50fps at an AMD Athlon 64 FX-55 processor with 2GB 

RAM. 

One thinkable optimization is to reduce the curvature resolution, i.e. quantize the same 

range of curvatures to ≤ 5 templates at each side. Obviously this reduces the accuracy of 

the edge localization and is no satisfying solution in this application. 

Instead one can apply model knowledge to exclude several curvatures depending on 

the horizontal image position. It can be assumed that the curvature is maximal at the 

image boundaries and decreases toward the image center. Real sequences support this 

assumption. Figure 4.23 shows the occurrence of different curvatures with respect to x. 

The data was acquired over several sequences including transparent and black tubes. It 

turns out that the curvature decreases linearly within a certain band. The upper and lower 

boundary of this band determine which curvatures can be excluded at a given position. 

The range distance of curvatures dψ at a position x is defined as: 

dψ(x) =ψmax(x) − ψmin(x) (4.16) 

where ψmax(x) andψmin(x) are the maximum and minimum curvature occurring at 

this position. dψ is the average range distance over all x. This range must be checked 

each time and is covered by n templates. In practice n = 5 is used, since as mentioned 

before the maximum number of templates that can be processed with the given hardware 

in real-time is 15 (in addition to all further processing that is needed), and 5 curvatures × 

3 rotations = 15 templates to be checked each frame at one tube side. To yield the desired 

resolution over the whole range of curvatures, the total number of curvatures Nψ,total is 

computed as follows: 

Nψ,total = n(ψmax − ψmin) 

dψ 

(4.17)


where ψmax and ψmin indicate the overall maximum and minimum curvature a template 

can have. Hence, one has to compute Nψ,total × k templates for each side. This can be 

done in a preprocessing step to reduce the computational load. During inspection one has 

to determine which templates have to be checked at a given position defined by the center 

of the local ROI around a predicted tube edge. For an efficient implementation a look up 

table (LUT) is used for this task. 

4.5.4. Subpixel Accuracy 

The maximum accuracy of the template based edge localization so far is limited by the 

discrete pixel grid. The templates are shifted pixelwise within the local ROIs to find the 

position that reaches the maximum correlation score. Following the assumptions of tubes 

under perspective (see Section 4.2.3) the measuring is performed between the most outer 

points of the convex tube edges. 

The way the templates are defined the template center corresponds always to the most 

outer point of the generated ridge. This is consistent to template rotation, since the 

rotation is performed around the template center. In the special case that the template is 

not curved, the template center is still the valid measuring point. With the knowledge of 

this point within the template and the position where this template matches best in the 

underlying image, the position of the measuring point in the image can be easily computed. 

However, pixel grid resolution is not accurate enough in this application. For example 

one pixels represents about 0.12mm in the measuring plane ΠM in a typical setup for 

50mm tubes. The allowed tolerance for 50mm tubes is ±0.7mm. As a rule of thumb for 

reliable results, the measuring system should be as accurate as 1/10thofthetolerance, 

i.e. 0.07mm in this example. To reach that accuracy one has to apply subpixel techniques 

to overcome the pixel limits. 

Figure 4.24(a) visualizes the results of the cross-correlation of an image ROI around the 

right boundary of a transparent tube with the template that yields maximum score. The 

maximum is located at position Mmax =(19, 5). These coordinates refer directly to the 

edge position in the image, since the template function is known and therefore the exact 

location of the template ridge. 

The real maximum that describes the tube edge location most accurate may lie in between 

of two grid positions. With respect to the measuring task, the edge has to be 

detected as accurate as possible. Interpolation methods have been introduced in Section 

2.3.4 to overcome the pixel grid limits in edge detection. The same can be applied at 

this stage to the template matching results. 

Cubic spline interpolation is used to compute the subpixel maximum within a certain 

neighborhood around the discrete maximum. Cubic splines approximate a function based 

on a set of sample points using piecewise third-order polynomials. They have the advantage 

of being smooth in the first-derivative and continuous in the second derivative, both within 

an interval and its boundaries [53]. 

The interpolation is performed only with respect to the x direction, since this is the 

measuring direction. A subpixel location with respect to y has only a marginal effect on 

themeasurements.Ideally,themeasuringpointsontheleftandrightsidehavethesame 

y value. Assuming the real maximum location is displaced by maximal 0.5 pixels at each


0.8 

0.6 

0.4 

0.2 

0 

-0.2 

-0.4 

0.8 

0.6 

0.4 

0.2 

0 

-0.2 

-0.4 

-0.6 

5 

4 

3 

2 

1 

0 30 

-0.6 

0 5 10 15 20 25 30 

(b) 

25 

samples 

cubic spline interpolation 

maximum 

20 

(a) 

1 

0.8 

0.6 

0.4 

0.2 

0 

-0.2 

15 

10 

5 

-0.4 

10 10.5 11 11.5 12 12.5 13 13.5 14 14.5 15 

(c) 

0 

samples 

cubic spline interpolation 

maximum 

Figure 4.24: (a) Cross-correlation results of an image patch around the right boundary of a 

transparent tube and the best scoring template. The maximum is located at position (19, 5). 

(b) Cubic spline interpolation in a local neighborhood around the maximum. In this case, the 

interpolated maximum is equal to the discrete position. (c) Matching results of a different 

image. Here, the interpolated subpixel maximum differs from the discrete maximum and can 

be found at x =12.2.

4.6. MEASURING 89 

side of the tube, the worst-case displacement is 0.5 at one side and −0.5 at the other side 

leading to a total displacement of 1. A straight line connecting the two measuring points 

in an Euclidean plane is slightly longer than the distance in x. Following Pythagoras’ 

theorem the maximum expectable error due to a vertical inaccuracy is: 

errory = � l 2 +1− l (4.18) 

where l is the pixel length between the left and right measuring point. With respect to 

the definition of the camera’s field of view and the image resolution, the length of a tube 

is about 415 pixels in an image. In this case, the worst-case error is about 0.0012 pixel. 

Assuming one pixel represents 0.12mm (a typical value for 50mm tubes) this corresponds 

to an acceptable error of 0.14µm which is far beyond the imaging capabilities of the camera 

used (each sensor element has a size of about 8.3 × 8.3µm). 

Other than in the vertical direction, a subpixel shift of the best matching template 

position in horizontal direction has a significant influence on the length measurement 

results. Again, assuming a maximum error of 0.5 pixels if discrete pixel grid resolution is 

used, the total error at both sides sums up to 1 in worst-case. If one pixel corresponds to 

0.12mm as in the example above, this means the measuring system has an inaccuracy of 

the same length purely depending on the edge localization. Obviously, this error depends 

on the resolution of the camera and can become even worse if one pixels represents a larger 

distance. 

The interpolation considers five discrete points: The maximum matching position Mmax 

and the two nearest neighbors left and right to Mmax in x-direction respectively. In 

Figure 4.24(b), the interpolation results of the local neighborhood around the discrete 

maximum of Figure 4.24(a) are drawn into the plot of the match profile at y =5. It 

shows the interpolated values describe the sampled values quite well. In this example, the 

interpolated subpixel maximum equals the discrete maximum. This does not always have 

to be the case as can be seen in Figure 4.24(c). Here, the discrete maximum is located at 

x = 12, whereas the subpixel maximum lies at x =12.2. In the first case, the neighbor 

pixels of the maximum yield almost equal results at both sides. On the other hand in the 

second example, the right neighbor of the maximum is significantly larger than the left 

one. This explains the shift of the subpixel maximum toward the right. The precision of 

the subpixel match localization is 1/10 pixel. Mathematically, much higher precision is 

possible,butthesignificanceofsuchresultsisquestionablewithrespecttotheimaging 

system and noise, and increases the computational costs unnecessary. 

4.6. Measuring 

The result of the template matching are two subpixel positions indicating the left and right 

measuring point of a tube. This section introduces how a pixel distance is transformed 

into a real world length and how the measurements of one tube are combined. Therefore, 

a tracking mechanism is required that assures the correct assignment of a measurement to 

a particular tube. This means, one has to detect when a tube enters or leaves the visual 

field of the camera.


length [pixel] 

418 

417.5 

417 

416.5 

416 

415.5 

Measurements 

Polynomial Fit 

415 

0 50 100 150 200 

x 

250 300 350 400 

(a) 

Correction [pixel] 

1.4 

1.2 

1 

0.8 

0.6 

0.4 

0.2 

Perspective Correction Function 

0 

0 50 100 150 200 

x 

250 300 350 400 

(b) 

Length [pixel] 

418 

417.5 

417 

416.5 

416 

415.5 

Corrected Measurements 

Mean 

415 

0 50 100 150 200 

x 

250 300 350 400 

Figure 4.25: Perspective correction. (a) The measured length varies depending on the 

image position in terms of the left measuring point. Due to perspective the length of one tube 

appears larger at the image center than at the image boundaries. The effect of perspective 

can be approximated by a 2nd order polynomial. (b) The correction function computed from 

the polynomial coefficients. (c) The result of the perspective correction. 

4.6.1. Distance Measure 

The distance between the two measuring points pL and pR (see Section 4.2) is computed 

over the Euclidean distance. Thus, the pixel length l ofatubeisdefinedasfollows: 

l = � (pR − pL) 2 (4.19) 

where l is expressed in terms of pixels. In the following, l(x) denotes the pixel length of 

a tube at position x where x = xpL , i.e. the position of a measurement is defined by the 

x-coordinate of the left measuring point. 

4.6.2. Perspective Correction 

Figure 4.25(a) shows the measured pixel length l(x) of a metal reference tube (gage) at 

different image positions. The sequence was acquired at the slowest conveyor velocity. 

In the ideal case l should be equal independent of the measuring position. However, the 

measured length is smaller at the boundaries and maximal at the image center due to 

perspective. This property is consistent between tubes. To approximate the ideal case, a 

perspective correction can be applied to the real measurements. Mathematically this can 

be expressed as: 

lcor(x) =l(x)+fcor(x) (4.20) 

where lcor is the perspective corrected pixel length, and fcor a correction function. The 

perspective variation in the measurements can be approximated by a 2nd order polynomial 

of the form: 

f(x) =c1x 2 + c2x + c3 

(c) 

(4.21) 

where the coefficients of the polynomial ci have to be determined in the teach-in step 

by fitting the function f(x) to measured length values l(x) in least-squares sense. Then, 

the correction function fcor canbecomputedas:

4.6. MEASURING 91 

fcor(x) =−(c1x 2 + c2x)+c1s 2 + c2s (4.22) 

where s is the x-coordinate of the peak of f(x) withs = −c2/(2c1), i.e. the point where 

the first-derivative of f(x) is zero. Thus, fcor is the 180 ◦ rotated version of f(x) whichis 

shifted so that fcor(s) = 0 as can be seen in Figure 4.25(b). 

This function applied to the measurements has the effect of all values being adjusted 

to approximately one length l(s). The corrected length values lcor(x) areshowninFigure 

4.25(c). As one can see, the mean value over all measurements describes the data 

much better after perspective correction. 

To reduce the computational load the correction function is computed only once for 

each position at discrete steps and stored in a look up table for fast access. 

4.6.3. Tube Tracking 

Assuming a sufficient frame rate, one tube is measured several times at different positions 

while moving through the visual field of the camera. One constraint in Section 4.2.2 

regarding the image content states that only one tube is allowed to be measurable at one 

time. The question is whether the current measurement belongs to an already inspected 

tube or if there is a new tube in the visual field of the camera. Since there is no external 

trigger, this task has to be solved by the software. 

Consecutive tubes appear quite equal in shape, size, or texture (especially black tubes). 

Itisdifficultuptoimpossibletofindreliablefeaturesinformofanuniquefingerprint 

that can be used to distinguish between tubes. In addition the extraction and comparison 

of such fingerprints would be computational expensive. Standard tracking approaches 

such as Kalman filtering [24] or condensation [8] are also not suited in this particular 

application, since such approaches are quite complex and are worthwhile only if an object 

is expected to be in the scene over a certain time period. At faster velocities, however, a 

tube is in the image for about 4-7 frames only. 

Since processing time is highly limited, it is a better choice to develop fast heuristics 

based on model-knowledge that replace the problem of tube tracking by detecting when 

a tube has left the visual field. Therefore, the following very fast heuristics have been 

defined: 

1. Backward motion 

2. Timeout 

Backward motion Since the conveyor moves always in one direction (e.g. from left to 

right in the image), it is impossible that a tube moves backward. Thus, if the horizontal 

image position of the tube at time t is smaller than at time t − 1(i.e. thetubewouldhave 

moved further to the left), this can be used as indicator that the current measurement 

belongs to the next tube. The position of a tube can be defined as the x-coordinate of 

the left measuring point. Hence with the image content assumption the tube measured at 

time t − 1 has left the visual field if xpL (t)


Timeout The backward motion heuristic assumes a tube has passed the visual field of 

the camera when the next tube is measured for the first time. This requires a successor 

for each tube within a certain time period. With respect to the blow out mechanism 

it is important that the good/ bad decision is made quickly, since the controller (see 

Section 3.4) must receive the result before the tube has passed the light barrier. Thus, a 

timeout mechanism is integrated. If no new tube arrives for more than ∆t frames, it is 

assumed that the previously measured tube has passed the measuring area and the total 

length can be computed. In practice, ∆t should be oriented on the average number of per 

tube measurements and the distance between measuring area and light barrier. 

4.6.4. Total Length Calculation 

Oncethereisevidencethatatubehaspassedthevisualfieldofthecamera,thesingle 

measurements have to be combined to a total length. Let mi denote the number of 

measurements assigned to tube i, andlj(i) thepixellengthofthejth measurement (0 < 

j ≤ mi) ofthattube.Themeanlengthl(i) oftubei canbecomputedas: 

mi � 

l(i) = lj(i) (4.23) 

j=1 

The mean has the significant drawback, since it is quite sensitive to outliers. For example 

assume at one of five measurements a background edge is wrongly classified as tube edge 

and the resulting length is therefore larger than the actual length. This outlier would also 

enlarge the resulting mean length, even if the remaining measurements have approximately 

the same (correct) value. To reduce the influence of outliers, the k strongest outliers are 

excluded from the averaging. Therefore, the measurements are sorted in ascending order 

based on the squared distance dj(i) tothemeanl(i) with 

dj(i) =(lj(i) − l(i)) 2 

(4.24) 

In the following, only the first mi − k measurements in the sorted list are averaged to 

the total length ltotal(i) oftubei as: 

ltotal(i) = 

mi−k � 

j=1 

l ′ j(i) (4.25) 

where l ′ indicates the measurements are sorted based on Equation 4.24, i.e. dj(i) < 

dj+1(i) for 0

4.7. TEACH-IN 93 

inthemeasuringplaneΠM that can be represented by one pixel in the image plane. The 

total length in mm Ltotal of tube i canbecomputedasfollows: 

Ltotal(i) =ltotal(i)fpix2mm 

(4.27) 

The length Ltotal is used for the good/bad classification whether a tube meets the allowed 

tolerances. This can be formalized to: 

� 

GOOD if |Ltotal(i) − Ltarget|


and background afterward. It is computed dynamically based on the regional mean of 

the profile and a constant factor αpeak (see Equation 4.9). Although this parameter is 

assumed to be constant it has to be trained once with respect to the conveyor belt used. 

The teach-in of this parameter is very simple and intuitive. The visual system is set to 

inspection mode, i.e. it is started as for standard measuring. The conveyor is empty, but 

moving. The operator can adjust αpeak online starting at a quite low value. This value 

is slightly increased as long as the system detects tubes (ghosts) where actually no tubes 

are. Until now this procedure has to be performed manually, but one could think of an 

automated version to reduce the influence of a human operator which is always a source 

of errors. 

To ensure the threshold has not become too large, several tubes are placed on the 

conveyor. If the system is able to successfully detect all tubes (detection does not mean 

the length has to be computed correctly in this context), the profile threshold factor is 

assumed to be trained sufficiently. If the conveyor belt is not uniformly translucent, i.e. 

the overall image brightness changes significantly over time, one has to assure that the 

system is able to detect a tube both at the brightest and at the darkest region of the belt. 

4.7.3. Perspective Correction Parameters 

As introduced in Section 4.6.2 perspective effects in the measuring data can be reduced 

using a perspective correction function fcor(x). This function has two parameters c1 and 

c2 that have to be learned in the teach-in step from real data. 

One intuitive method to do this is to measure a tube at a very slow conveyor velocity. 

The result is a set of pixel length measurements (see Figure 4.25(a)) at almost every 

position in the image. Then, the parameters of a second order polynomial f(x) =c1x 2 + 

c2x + c3 can be computed using nonlinear least-squares (NLLS) methods. In this case, a 

standard Levenberg-Marquardt algorithm [53] is used. 

The resulting parameters c1 and c2 can be directly inserted into Equation 4.22 to compute 

fcor(x). 

For robust results this procedure can be repeated several times and the final parameter 

set is averaged. Alternatively one could first acquire measurements of several tubes and 

fit the correction function to the total data. 

4.7.4. Calibration Factor 

The most important parameter to be trained in the teach-in step is the calibration factor 

that relates a length in the image to a real world length in the measuring plane ΠM. This 

factor has been introduced as fpix2mm. The idea is to learn the calibration factor based 

on correspondences between measurements and ground truth data. 

In an interactive process the operator places a tube of known length onto the moving 

conveyor. The velocity of the conveyor is set to production velocity, i.e. the velocity where 

the tubes will be measured later. When the tube reaches the visual field of the camera 

it is measured with the described approach, but at pixel level only. Once the tube has 

left the measuring area, the total pixel length is computed and the user is asked to enter 

the real world length of this tube into a dialog box. Again the input device is a standard 

keyboard in the prototype version of the system.

4.7. TEACH-IN 95 

The pair of a pixel length l(i) and a real world reference L(i) can be used to compute 

the ideal factor fpix2mm(i) thatconvertspixelsintomm for a measurement i as follows: 

fpix2mm(i) = L(i) 

(4.29) 

l(i) 

This procedure has to be repeated several times for different reference tubes. Finally, 

the estimated calibration factor is computed analog to Equation 4.25 using a k-outlier 

filter before averaging: 

fpix2mm = 

N−k � 

j=0 

f ′ pix2mm(j) (4.30) 

where k is the number of outliers, N the number of iterations, and f ′ pix2mm indicates the 

single calibration factors sorted by the squared distance to the mean in ascending order. 

The median could be also used instead of averaging. 

The root-mean-square error at iteration i betweentheknownrealworldlengthsand 

thelengthscomputedbasedontheestimatedcalibrationfactorcanbeusedasmeasure 

of quality. 

� 

� 

� 

Err(i) = � i � 

(L(j) − l(j)fpix2mm) 2 (4.31) 

j=1 

If the error is low, this can be used as indicator that the learned calibration factor is 

a good approximation of the ideal magnification factor that relates a pixel length in the 

image into a real world length in the measuring plane ΠM without any knowledge on the 

distance between ΠM and the camera. 

In practice, the learning of the calibration factor is an interactive process. One can 

define a minimum and maximum number of iterations Nmin and Nmax respectively. Once 

Nmin correspondences have been acquired, fpix2mm and Err(i) are computed for the first 

time. The operator continues the procedure as long as the calibration at iteration i +1 

does change more than a little epsilon compared to iteration i. This means the learning 

can be stopped if |Err(i +1)− Err(i)|

96 CHAPTER 4. LENGTH MEASUREMENT APPROACH

5. Results and Evaluation 

5.1. Experimental Design 

There are several parameters influencing the measuring results both in the hardware setup 

and in the vision algorithms. To yield meaningful results, it is important to vary not more 

than one parameter within the same experiment. In the following the parameters that are 

tested as well as the evaluation criteria and the strategies used are proposed. 

5.1.1. Parameters 

The different parameters of the system can be grouped into four main categories including 

tube, conveyor, camera and software respectively. Table 5.1 summarizes the most 

important representatives of each category. 

Obviously, there are much more parameters which have been described in the previous 

chapter that theoretically fall in the last category. However, most of these parameters 

do not have to be changed (e.g. the number of profile scanlines or the local ROI width). 

The corresponding value assignments have been determined empirically at representative 

sequences and are summarized in Table 5.2. 

αpeak =4.0 has been determined in a teach-in step as proposed in Section 4.7.2 and 

yields best results for transparent tubes with the conveyor belt and the illumination used. 

This assignment does also cover black tubes, although the threshold could be much larger 

in that case. As long as the conveyor belt is not changed and the amount of dirt on 

the conveyor does not change significantly, the detection sensitivity does not have to be 

re-initialized each time. 

A timeout period of ∆t = 5 frames for the tube tracking (see Section 4.6.3) has been 

used throughout the experiments, which is a good compromise between the number of 

expected per tube measurements and the distance to the light barrier. 

Approximately 1/4 of all measurements (rounded to the next integer value) are not 

considered for the total length computation with αoutlier =0.25 to eliminate outliers in 

the single measurements as introduced in Section 4.6.4. The same value is used for the 

outlier filter in the teach-in step (see Section 4.7.4) 

The teach-in of the calibration factor fpix2mm (see Section 4.7.4) terminates if the root 

mean square error does not change for more than ɛ =0.0001 between two iterations. 

Since it is still very complex to test all permutations and assignments of the remaining 

parameters, one has to make compromises in the experimental design. Therefore, some 

of the parameters listed above have been adjusted before the experiments to meet the 

assumptions made in Section 4.2. This includes the guide bar distance as well as the 

illumination (fiber optical back light setup through the conveyor belt) and all camera 

parameters, i.e. lens, working distance, exposure time and F-number. For all experiments 

with 50mm tubes a 16mm focal length lens at a working distance of approximately 250mm 

is used. The shutter time has been adjusted to 1.024ms which is a good compromise 

97

98 CHAPTER 5. RESULTS AND EVALUATION 

Category Parameter 

Tube Color 

Length 

Diameter 

Conveyor Velocity 

Tube spacing 

Guide bar distance 

Camera Lens 

Working distance 

Exposure time 

F-number 

Software Profile peak threshold τpeak (sensitivity) 

Number of templates (scale, orientation, curvature) 

Perspective correction 

Calibration factor 

Table 5.1: Overview on different test parameters 

Parameter Category Description Value Section 

Nscan Profile Analysis Number 

scanlines 

of 11 4.4.1 

Ksmooth Profile Analysis Smoothing 

kernel size 

19 4.4.2 

αpeak Profile Analysis Peak threshold 

factor 

4.0 4.4.2 

WROI Edge detection Local 

width 

ROI 15 4.4.3 

γ Template Generation Template 0.95 4.5.3 

Rψ, right Template Generation 

height ratio 

Curvature 

range right 

[-0.005, 0] 4.5.3 

Rψ, left Template Generation Curvature 

range left 

[0, 0.005] 4.5.3 

χ Template Generation Curvature resolution 

0.0005 4.5.3 

b Template Generation Height weighting 

coefficient 

3 4.5.3 

k Template Generation Number of rotations 

3 4.5.3 

∆t Tube Tracking Time out period 

5 4.6.3 

αoutlier Total Length Outlier factor 0.25 4.6.4 

ɛ Teach-In Allowed cali- 0.0001 4.7.4 

bration error 

Table 5.2: Constant software parameter settings throughout the experiments.

5.1. EXPERIMENTAL DESIGN 99 

between light efficiency and motion blur effects. This shutter time requires a small Fnumber 

of 1.4 to yield sufficient bright images. 

In all experiments it is assumed that the system is calibrated correctly, the radial distortioncoefficientsareknownandateach-instephasbeenperformedtolearnfpix2mm. 

In 

addition, the perspective correction function has been determined before each experiment 

to compensate for perspective distortions. 

5.1.2. Evaluation Criteria 

There are several criteria that can be used to compare and evaluate the results of different 

experiments. These can be classified into quantitative and qualitative criteria. 

Quantitative Criteria 

Total Detection Ratio The system must exactly detect the number of tubes that pass 

the visual field of the camera. Formally, this can be expressed in the following score Ωtotal: 

Ωtotal = Ndetected 

(5.1) 

Ntotal 

where Ndetected indicates the number of detected tubes and Ntotal the total number 

of tubes respectively. Ωtotal = 1 is a necessary but not sufficient criterion for a correct 

working inspection system. 

Per Tube Measurements The average number of single measurements for each tube 

depends mainly on the velocity of the conveyor and the camera frame rate. If N tubes 

have been measured, the mean number of per tube measurements can be computed as: 

ΩPTM = 1 

N 

where mi isthenumberofsinglemeasurementsoftheith tube. 

N� 

i=1 

mi 

(5.2) 

False Positives/ False Negatives Each tube T can be classified into one of the three 

groups G0 (good ),G− (too short), and G+ (too long) ifmeasuredmanually. G0 is defined 

by the target length and the allowed tolerance for this length. It contains all tubes that 

meet the tolerance in the real world. G− and G+ include all tubes of a real world length 

that lie below the lower or above the upper tolerance threshold respectively. 

In the same way, each tube can be categorized into one of the three groups G ′ 0 , G′ −,or 

G ′ + based on the measured length by the visual inspection system. In the ideal case, this 

three groups are equal to the corresponding ground truth classifications, i.e. G ′ 0 = G0, 

G ′ − = G−, andG ′ + = G+ 1 . 

In practice, however, the measurements are biased by many factors like perspective 

errors, curved tubes, skew tube edges, noise, motion blur, or failures in measuring point 

detection. In addition, as will be introduced in Section 5.1.3, the manually acquired ground 

1 Theoretically, a fourth group U for unsure can be defined including all tubes that could not be detected 

at all. These tubes have to be handled by different mechanisms as will be discussed in later sections


truth data has also a certain variance. Thus, the distributions measured by humans and 

a machine vision system may differ. This gets critical if two distributions intersect. 

Tubes that are actually too short or too long, but are measured to be within the tolerance 

are denoted as false positives (FP). On the other hand, tubes of an allowed length can be 

wrongly classified as outlier and are denoted as false negatives (FN). More mathematically, 

false positives and false negatives can be defined as follows: 

FP = {T |T ∈ G ′ 0 ∧ T /∈ G0} (5.3) 

FN = {T |T /∈ G ′ 0 ∧ T ∈ G0} (5.4) 

In terms of system evaluation, the following measures can be used: 

ΩFP = NFP 

Ntotal 

ΩFN = NFN 

Ntotal 

(5.5) 

(5.6) 

where NFP and NFN indicate the number of false positives and false negatives respectively. 

Both the false positive ratio ΩFP and the false negative ratio ΩFN should be zero 

in the optimal case. As already discussed in the introduction, ΩFP is more critical than 

ΩFN, since it is less bad to sort out a good tube than delivering a failure to the customer. 

Performance The performance of the system can be evaluated with respect to the average 

processing time that is needed to analyze a frame: 

ΩTIME = 1 

M 

M� 

i=1 

ti 

(5.7) 

where M is the number of frames considered and ti represents the processing time 

of frame i. ΩTIME is expressed in terms of ms/frame. This measure can be used to 

determine the maximum possible capture rate. Skipped frames indicate that the camera 

captures more frames than the system is able to process. 

Qualitative Criteria 

Standard Deviation Per Tube The multi-image measuring approach is based on the 

idea, that more robust measuring results can be reached if each tube is measured several 

times. In the ideal case, all measurements should yield the equal length value. In practice, 

however, the single measurements can differ. The standard deviation σtube(i) canbeused 

as an indicator of how much these measurements vary. It is computed as: 

� 

� 

� 

σtube(i) = � 1 

mi � � �2 lj(i) − l(i) 

mi − 1 

j=1 

(5.8)


where lj(i) indicates the length of the jth single measurement of tube i, l(i) themean 

over all single measurements of this tube, and mi is the total number of single measurements 

of tube i. σtube is expressed in terms of pixels. 

A large per tube standard deviation represents a uncertainness in the results. In this 

case, the mean describes the data only roughly. If the uncertainness is too large, it may 

be better to blow out the particular tube, since the probability of a false positive decision 

increases proportional with the standard deviation. 

Sequence Standard Deviation The standard deviation of a sequence σseq is computed 

analogue to σtube, but not with respect to the single measurements of one tube, but to the 

computed total length ltotal of N tubes: 

� 

� 

� 

σseq = � 1 

N − 1 

N� 

i=1 

� �2 ltotal(i) − ltotal 

(5.9) 

where ltotal is the mean over all total measurements. Finally, all measurements can be 

represented by a Gaussian distribution function G(x) as: 

G(x) = 

� 

1 (x − µseq) 

√ exp 

2π 2 � 

(5.10) 

σseq 

2σ2 seq 

where µseq = ltotal. The production is most accurate if the distance between the given 

target length and the mean of this distribution is small. 

Ground Truth Distance The difference between the vision-based length measurement 

results and the manually acquired ground truth data can be seen as relative error assuming 

the ground truth data is correct. Interesting are the minimum and maximum ground truth 

distance (GTD) of a sequence of tubes defined as: 

GT Dmin = min {(ltotal(i) − lgt(i)) | 1 ≤ i ≤ N} (5.11) 

GT Dmax = max {(ltotal(i) − lgt(i)) | 1 ≤ i ≤ N} (5.12) 

where ltotal(i) is the computed total length of tube i, lgt(i) the corresponding ground 

truth length, and N the number of tubes considered. If the mean ground truth distance 

GT D is approximately zero, the deviation is distributed equally. Otherwise, if GT D > 0, 

the measured length is predominantly larger than the ground truth measurement. Accordingly 

if GT D < 0, the opposite is valid. In both cases, the systematic error indicates 

the system is probably not calibrated correctly. 

Root Mean Square Error (RMSE) The root mean square error measure is used to 

compare the measurements of the visual inspection system to manually acquired ground 

truth data over a sequence as follows: 

� 

� 

� 

RMSE = � 1 

N� 

(ltotal(i) − lgt(i)) 

N 

2 

(5.13) 

i=1


Figure 5.1: Measuring slide used for acquiring ground truth measurements by hand. 

with ltotal(i), lgt(i) andN as defined before. A small root mean square error indicates the 

measurements are close to the ground truth data. 

5.1.3. Ground Truth Measurements 

The acquisition of ground truth data is important for evaluating the vision-based inspection 

system with respect to human measurements. For this purpose a special digital 

measuring slide as can be seen in Figure 5.1 has been used. The precision of this device 

is up to 1/100mm. 

However, there is a significant deviation in human measurements, since heat shrink 

tubes are flexible. Depending on the force the human operator applies to the measuring 

slide, the measured length gets smaller or larger. This variation has been investigated 

empirically. 

12 sample tubes of different diameter (6, 8 and 12) are selected as test set (see Table 

5.3). One half of the samples are black, the other half transparent tubes. For each 

combination of color and diameter, one tube has a length of approximately 50mm and one 

was manipulated, i.e. slightly larger or shorter than the tolerance allows for. 

No. color diameter mean length 

1 Transparent 8 49.95 






7 Black 8 50.98 

8 Black 6 50.19 

9 Black 12 50.00 

10 Black 6 50.84 

11 Black 8 49.66 

12 Black 12 51.56 

Table 5.3: Test set used to determine the human variance in measuring. 

The results are shown in Figure 5.2. In a first experiment, the variance of a single 

person is investigated denoted as intra human variance. Each tube in the test set has 

been measured 10 times by the same person with the goal to be as precise as possible.


Ground Truth Length [mm] 

52.5 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

Intra Human Variance 

48 

0 2 4 6 8 10 12 14 

Tube 

(a) 

Ground Truth Length [mm] 

52.5 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

Inter Human Variance 

48 

0 2 4 6 8 10 12 14 

Tube 

Figure 5.2: Intra and inter human variance for the test set in Table 5.3 under ideal laboratory 

conditions. The error bars indicate the maximum and minimum length for each of the 12 tubes 

as well as the mean value of the measurements once for one person (a) and once for 10 persons 

(b). The average inter human variance is slightly larger compared to the intra human variance. 

Theerrorbarsindicatethemaximumandminimumlengthaswellasthemeanvalueof 

all measurements. The computed mean standard deviation is 0.078mm. 

In a second experiment, the inter human variance is determined. Therefore, 10 persons 

have been asked to measure the same test set again as precise as possible. The inter human 

variance is slightly larger than the intra human variance (see Figure 5.2(b)). In this case, 

the mean standard deviation was observed to be 0.083mm. 

Furthermore, it is important to state that the manual measurements for the ground 

truth data have been acquired very carefully with elevated concentration under laboratory 

conditions and with the aim to be as precise as possible using the digital measuring slide 

(see Figure 5.1). Less than 5 tubes can be measured within one minute at this precision. At 

production, the sample measurements are performed with a standard sliding caliper and 

at a much higher rate. There is a definitively tradeoff between accuracy and speed. The 

expected individual measuring error at production is much larger. Furthermore, factors 

like tiredness or distraction can significantly increase the inter and intra human measuring 

variance. 

The accuracy and precision of the visual inspection system, however, should be evaluated 

with respect to the maximum possible accuracy humans can reach with the given 

measuring slide under ideal conditions. Throughout this thesis, manual ground truth measurements 

always refer to the ideal, laboratory condition measurements. One has to keep 

in mind that there is still a certain unsureness in these measurements. The real absolute 

length of a tube can not be determined exactly. 

For the following experiments, all tubes have been measured three times to reduce the 

influence of the human variance. The mean of the three measurements is taken as ground 

truth reference. All measurements are stored in a database and each measured tube is 

labeled by hand with a four digit ID using a white touch-up pen. 

(b)


Figure 5.3: At velocities > 30m/min larger sequences of tubes with a small spacing have to 

be placed on the conveyor using a special supply tube. 

5.1.4. Strategies 

Online vs Offline Inspection There are two main strategies for evaluation of the inspection 

system. The first strategy analyzes the tubes online, i.e. in real-time on the conveyor. 

This includes the tube localization, tracking, measuring as well as the good/bad classification. 

The results are stored in a file and can be further processed or visualized afterward. 

This is closely related to the application at production. The drawback of this approach is 

that if there is some interesting or strange behavior observed in the resulting data, it is 

difficult to localize the origin. 

Therefore, the second evaluation strategy is based on an offline inspection. This means 

a sequence of tubes is first captured into a file at the maximum frame rate that can be 

processed online. Then, the sequence can be analyzed repetitive with different sets of 

parameters or methods. This is a significant advantage if one wants to compare different 

techniques or parameter settings. 

In the following experiments, both strategies will be applied. 

Tube Placement The prototype setup in the laboratory has one significant drawback. 

The tubes to be inspected have to be added manually to the conveyor, since there is no 

second conveyor from which the tubes fall onto continuously like in production. The size of 

the conveyor allows for about 21 tubes of 50mm length with a spacing of 10mm in between. 

If all tubes are placed on the inactive conveyor it takes some time until the desired velocity 

is reached. Therefore, at faster velocities, the first tubes pass the measuring area with a 

slower velocity leading to unequal conditions between measurements. 

Hence, either less tubes have to be placed on the conveyor (starting further away from 

themeasuringarea)orthetubeshavetobeplacedontotheconveyorwhileitisrunning 

at the desired velocity. The later is hardly possible for a human without producing large 

spacings between two consecutive tubes. Instead a certain supply tube of about 1.30m 

length, with a diameter slightly larger than the current tube diameter, can be used as 

magazine for about 25 tubes of 50mm length (see Figure 5.3). The supply tube is placed 

at steep angle at the front of the conveyor (in moving direction). If the conveyor is not 

moving, the tubes are blocked and can not leave the supply tube. On the other hand if 

theconveyorismoving,thebottomtubeisgrippedbythebeltandcanleavethesupply 

tube through a bevel opening in moving direction. If the velocity of the conveyor is fast

5.2. TEST SCENARIOS 105 

enough, the time until the next tube in the supply tube is gripped by the belt is sufficient to 

produce a spacing. Experiments have shown that the supply tube works only for velocities 

> 30m/min. Otherwise it is possible that two consecutive tubes are not separated. 

Thus, one has two disjunctive methods to fit a conveyor with tubes. One is working well 

for lower velocities, the other for faster ones. In both cases the maximum number of tubes 

is limited. Therefore, larger experiments have to be partitioned over several sequences. 

Test Data Since it is not worthwhile to manually measure thousands of tubes as ground 

truth reference, the number of tubes that can be compared to such reference lengths is 

limited. However, it is possible to increase the number of ground truth comparisons if one 

repeats the automated visual measurement of a manually measured tube. For example, one 

can manually measure 20 tubes of each particular type (that number can be placed onto 

the conveyor or into the supply tube at one time) and repeat the automated inspection 

several times. From the algorithmic perspective the system is confronted with a new 

situation every time, independent if there are 100 different tubes to be inspected or 5×20. 

In the following it is distinguished between tubes of a length that meet the given target 

length within the allowed tolerance and tubes of manipulated length falling outside this 

tolerance. The system must be able to separate the manipulated tubes from the proper 

ones. 

5.2. Test Scenarios 

Eight test scenarios have been developed to evaluate the system. In each scenario only 

one parameter is varied, while the others are kept constant. The different scenarios are 

introduced in the following. 

Noise Before the system is tested with respect to real data, the accuracy and precision 

of the measuring approach is evaluated on synthetic images. A rectangle of known pixel 

size simulates the projection of an ideal tube that is not deformed by perspective. The 

‘tube edges’ as well as the measuring points are detected with subpixel precision like at 

real images. The resulting length in pixels must equal the rectangle width. To evaluate 

the accuracy under the presence of noise, Gaussian noise of different standard deviation 

is added systematically to the sequences. 

Minimum Tube Spacing In this scenario the minimum spacing between tubes is investigated 

both for black and transparent tubes on real images. The test objects have a size 

of about 50mm within the allowed tolerance and a diameter of 8mm. The velocity of the 

belt is 30m/min. Starting at sequences that allow for only one tube in the visual field, 

e.g. the spacing is larger than the tube length, the spacing is decreased until the detection 

rate Ωtotal fallsbelow1,i.e.atleastonetubecouldnotbedetected. 

Conveyor Velocity The goal in this scenario is to investigate how accuracy and precision 

of the measurements depend on the velocity of the conveyor. The focus is on four different 

velocities: slow (10m/min), medium (20m/min), fast (30m/min), and very fast (40m/min). 

This is the maximum velocity that can be reached at production. Currently, the production 

line runs at approximately 20m/min. To test the limits of the system, even higher velocities


up to 55m/min are tested. For all velocities > 30m/min,thetubeshavetobeplacedonto 

the conveyor using the supply tube. 

Again the inspected tube size is about 50mm in length within the allowed tolerance and 

a diameter of 8mm both for black and transparent tubes. The spacing in between the tubes 

must be large enough following the results of the minimum tube spacing experiments. 

In this scenario, all evaluation criteria introduced in Section 5.1.2 are considered including 

a comparison to ground truth measurements. The evaluation is performed offline. 

Tube Diameter If the distance between camera and conveyor belt does not change, 

the diameter of a tube influences the distance between the measuring plane ΠM (see 

Section 4.2) and the image plane. Tubes with a smaller diameter are further away and 

appear smaller in the image, while tubes with a larger diameter are magnified in the image. 

Thus, the calibration factor that relates a pixel length to a real world length in mm has 

to be adapted. 

The test data includes transparent and black tubes with a diameter of 6, 8 and 12mm 

and a length of 50mm that meet the allowed tolerances. The conveyor velocity is constant 

at 30m/min. Again all evaluation criteria are considered and the evaluation is performed 

offline. 

Repeatability In this scenario, a tube of known size is measured many times in a row 

at a constant velocity of 30m/min. Theoretically, the system should measure the same 

length each time, since one can assume the length of the tube does not change throughout 

the experiments. As mentioned before there are several parameters that can influence the 

repeatability in practice like a varying background. 

In the same experiment one can not only determine the repeatability, i.e. the precision 

of the system, but also the accuracy if one does not use a heat shrink tube, but an ideal 

tube gage. Such a gage can be made from metal with much higher precision overcoming 

the human variance in measuring deformable heat shrink tubes. For comparable results, 

the gage should have the same shape and dimension of a heat shrink tube. Since it does 

not transmit light, a metallic gage can simulate black tubes only. 

The real world length of the gage is known very accurate and precise. Thus, the RMSE 

of the measuring results gets almost independent of errors in the ground truth data. 

The measurements can be best performed online, i.e. in real-time, due to the amount 

of accumulating data. The resulting lengths are stored in a file for later evaluation. 

Outlier Detection Until now, all experiments are based on test data that is known to 

meet the given tolerances. In this scenario, tubes of approximately 50mm length are mixed 

with tubes that are too long or too short, i.e. differ from the target length for more than 

0.7mm.Thepositionandthenumberoftheoutliersinasequenceisknown.Thesystem 

must be able to detect the outliers correctly. Thus, the false positive and false negative 

rate are the main criteria of interest in this scenario. 

The evaluation can be performed both offline or online. 

Tube Length As mentioned before, the focus in this thesis is set to tubes of 50mm length. 

In addition it is shown that the system is able to measure also tubes of different length 

exemplary for tubes of 30 and 70mm length.

5.3. EXPERIMENTAL RESULTS 107 

The tolerances for these lengths differ, i.e. the 30mm tubes are allowed to deviate only 

up to 0.5mm around the target length while 70mm tubes have a larger tolerance of 1mm. 

The measuring precision can be directly linked to these tolerances. Accordingly the system 

must measure smaller tubes with a higher precision then larger ones. 

In this scenario, the accuracy and precision is evaluated based on the mean and standard 

deviation of a sequence of tubes measured online that approximately meet the given target 

length. Corresponding ground truth data is available. 

Performance Finally, it is of interest to determine the performance of the system in 

terms of the average per frame processing time ΩTIME. It is investigated how the total 

processing time is distributed over the different stages of the inspection including radial 

distortion compensation, profile analysis, edge detection and template matching, as well 

as the total length computation and tracking. 

5.3. Experimental Results 

In this section the experimental results of the different scenarios are presented and discussed. 

Further discussion as well as an outlook on future work is given in Section 5.4. 

5.3.1. Noise 

The influence of noise on the measuring accuracy is tested on synthetic sequences. Rectangles 

of 200 pixels width are placed on a uniform background with a contrast of 70 gray 

levels between the object and the brighter background. The image size is 780 × 160, and 

the sequence is analyzed like a real sequence with two differences. First, the perspective 

correction function is disabled, since the synthetic ‘tube’ is not influenced by perspective, 

i.e. the width of the rectangle is constant independent of the image position. Furthermore, 

the dynamic selection of template curvatures based on the image position does not work 

as well in this scenario, since the model knowledge assumptions do not hold. Thus, in 

this experiment all templates are tested at each position (computation time is not critical 

here). 

Gaussian noise of standard deviation σN has been added to the ideal images, with 

σN ∈{5, 10, 25}. Sample images of each noise level are shown in Figure 5.4(a)-(d). 

The measuring results are evaluated using the root-mean-square-error between the 

ground truth length of 200pixels and the result of the single measurements. The results 

show that in the ideal (noise free) case, the pixel length is always measured correctly. 

Under the presence of noise, the measured length varies at subpixel level. Figure 5.4(e) 

shows how the measurements differ in accuracy and precision under the presence of noise. 

The maximum deviation from the target length occurs at the largest standard deviation 

(σN = 25). The RMSE results can be found in Figure 5.4(f). For sequences with only 

a little amount of noise (σN = 5) the RMSE is acceptable low with 0.122. If one pixel 

represents 0.12mm in the measuring plane, the real world error is about 1/100mm. Even 

under strong noise (σN = 25), which is far beyond the noise level of real images, the 

measuring error is 0.252pixels or 0.03mm in the example. This is still significantly below 

the human measuring variance.


1 

0.8 

0.6 

0.4 

0.2 

(a) σN =0 (b) σN =5 

(c) σN =10 (d) σN =25 

std=0 

std=5 

std=10 

std=25 

0 

199 199.2 199.4 199.6 199.8 200 

Length [pixel] 

200.2 200.4 200.6 200.8 201 

(e) (f) 

σN RMSE 

0 0 

5 0.122 

10 0.158 

25 0.252 

Figure 5.4: Accuracy evaluation of length measurements at synthetic sequences under the 

influence of noise. (a)-(d) Rectangles of known size (length = 200 pixels) simulate a tube 

on a uniform background without perspective effects. Gaussian noise of different standard 

deviation σN ∈{5, 10, 25} has been added to the ideal images. (e) Gaussian distribution of 

the measurements. (f) Root mean square error (RMSE) for each noise level.


Detection rate 

1.05 

1 

0.95 

0.9 

0.85 

0.8 

black 

transparent 

0.75 

0 10 20 30 

Tube spacing [mm] 

40 50 60 

Figure 5.5: Detection rate of black and transparent tubes depending on the spacing between 

consecutive tubes. 

Thus, one can conclude the system is able to detect the synthetic tube edges very accurate 

even under the presence of noise if there is a sufficient contrast between background 

and foreground. 

5.3.2. Minimum Tube Spacing 

10 black and 10 transparent tubes are used to investigate the influence of the spacing 

on the detection rate. The tubes have been placed on the conveyor at an approximately 

constant spacing. Five gap sizes are tested: 60, 30, 20, 10, and 5mm respectively. Each 

load of tubes passes the measuring area five times for each gap size at a conveyor velocity 

of 30m/min. In this experiment the total detection rate Ωtotal is considered only, i.e. how 

many tubes are detected by the system at least once. The results are averaged over the 5 

iterations. 

As can be seen in Figure 5.5 the detection of black tubes is uncritical indicated by 

Ωtotal = 1 until the tube spacing is less than 10mm. This means no black tube can pass 

the measuring area without being measured if the spacing is ≥ 10mm. The decrease at 

5mm gaps to Ωtotal =0.98 (i.e. 1 tube out of 50 is not detected) may be due to the fact 

that the manual tube placing can not guarantee an exact spacing of 5mm. It is likely 

that the distance between two tubes has become even smaller leading to the failure. Since 

the tests have been performed online it is not possible to locate the origin of the outlier. 

Therefore, it has been investigated how small the gap between two black tubes must be 

until the profile analysis fails to locate the tube. The results are shown in Figure 5.6. Even 

a spacing of about 2mm as in (a) is large enough to reliably detect the background regions 

between the tubes as can be also seen at the corresponding profile analysis results in (c). 

A gap of about 1mm, however, is too small even for black tubes. Due to perspective the 

points closer to the camera merge (see Figure 5.6(b) and (d)). 

Thetransparenttubesshowadetectionrateof< 1 even for the largest tested gap 

size of 60. This can be explained by the much lower contrast to the background. If the 

system must able to overcome a strong non-uniform background brightness, one has to 

make a larger compromise in terms of detection sensitivity. As it turns out there is no 

parameter setting that can guarantee that all tubes are detected independent of the gap


300 

250 

200 

150 

100 

50 

(a) (b) 



local median 

global mean 

regional mean 


0 

0 20 40 60 80 100 120 140 160 180 

(c) 

300 

250 

200 

150 

100 

50 



local median 

global mean 

regional mean 

0 

0 20 40 60 80 100 120 140 160 180 

Figure 5.6: Minimum tube spacing for black tubes. (a) A spacing of about 2mm is still 

sufficient to locate the measurable tube correctly. (b) The detection fails if the two tubes 

appear to touch under perspective as on the left side. (c) Profile analysis of (a). (d) Profile 

analysis of (b). 

size. However, the results have shown that the detection rate decreases drastically below 

10mm(seeFigure5.5). 

As the result of these experiments the minimum spacing used in the following experiments 

is 10mm for black tubes and 20mm for transparent tubes. 

5.3.3. Conveyor Velocity 

The test data in this scenario includes 17 transparent and 21 black tubes of 50mm length 

and 8mm diameter. Manual ground truth measurements of these tubes are available. The 

number of tubes of each color is geared to the number of tubes that can be placed on the 

conveyor with a sufficient spacing. To increase the probability of a 100% detection rate, 

the spacing between two transparent tubes has to be larger than for black tubes. Each 

charge of tubes is measured 5 − 6 times at each velocity of 10, 20, 30, and 40m/min to 

yield a total number of > 100 measurements (based on even more single measurements) 

in each experiment. Thus, all tubes have to pass the measuring area many times. 

Before presenting the results in detail, Figure 5.7 shows an example of how the system 

has measured (a) the charge of black tubes and (b) the charge of transparent tubes at 

20m/min respectively. Both the single measurements per tube (indicated by the crosses) as 

well as the computed total length and the corresponding ground truth length are visualized. 

The lengths measured by the system are quite close to the ground truth data. 

These results are just an example to show what kind of data is evaluated in the following. 

Since it is not possible to visualize longer sequences as detailed as in Figure 5.7 due to the 

(d)


Length [mm] 

Length [mm] 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

48 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

48 

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 

Measurement number 

(a) 21 black tubes at 20m/min 

measurements 

upper tolerance 

lower tolerance 

resulting mean length 

ground truth 

boundaries 

0 10 20 30 40 50 60 70 80 90 100 110 


(b) 17 transparent tubes at 20m/min 

measurements 




ground truth 

boundaries 

Figure 5.7: Measuring results at 20m/min for (a) black and (b) transparent tubes. The red 

crosses indicate single measurements, while the dashed vertical lines represent the boundaries 

between measurements belonging to the same tube. The averaged total length as well as the 

corresponding ground truth length are also shown in the plots. All measured tubes of this 

sequence meet the tolerances. However, while the transparent tubes have approximately the 

target length of 50mm on average, the mean of the black tubes is slightly shifted, i.e. all tubes 

tend to be shorter than the target length.


v[m/min] Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE 

10 1 11.4 0.05 -0.12 0.14 0.01 0.07 

20 1 6.9 0.04 -0.16 0.11 -0.02 0.07 

30 1 4.6 0.05 -0.19 0.19 0.0 0.07 

40 1 3.2 0.07 -0.21 0.17 -0.01 0.09 

55 1 2.3 0.07 -0.16 0.16 0.01 0.08 

Table 5.4: Evaluation results at different conveyor velocities v for black tubes (50mm length, 

∅8mm). The accuracy of the measurements does not decrease significantly with faster velocities 

nor with a decreasing number of per tube measurements ΩPTM indicated by the RMSE. 

σtube is the per tube standard deviation and GT D stands for ground truth distance (see 

Section 5.1.2). 

amount of data, more comprehensive representations will be used based on the proposed 

evaluation criteria. 

Black Tubes The results of the velocity experiments with black tubes are summarized 

in Table 5.4. 

TheblacktubesshowadetectionrateΩtotal of 1 for all velocities, i.e. no tube has 

passed the measuring area without being measured independent of how fast the tubes are 

moved. The average number of per tube measurements ΩPTM decreases from 11.4 atthe 

slowest velocity (10m/min) to 3.2 at the maximum possible production velocity. Even at 

55m/min each tube is measured at least twice. The average standard deviation σtube of 

themeasurementspertubereachesfrom0.04 to 0.07mm, again there is only a very little 

rise from the slower to the faster velocities. The absolute ground truth distance does not 

exceed 0.21 and measurements that are shorter or larger than the ground truth are equally 

distributed indicated by the mean ground truth distance GT D that is approximately zero. 

As an example, the ground truth distance at 30m/min is shown in Figure 5.8(a). If the 

distance is larger than 0, the manually measured length is shorter than the vision-based 

measurement and vice versa. Due to the variance in the ground truth data it is not very 

likely that the distance is zero for all values. However, the distance should be as small as 

possible. If the ground truth distance is one-sided, i.e. all measurements of the system 

are larger or shorter than the corresponding ground truth measurement, this indicates an 

imprecise calibration factor. The conversion of the pixel length into a real world length 

results in a systematical error which has to be compensated by adapting the calibration 

factor. 

The RMSE differs only marginally between the tested velocities. The largest RMSE 

is computed at 40m/min with 0.09. This value is only slightly larger than the deviation of 

human measurements. For lower velocities it is even better with 0.07. Another indicator 

of how the vision-based measurements converge to the ground truth data is the Gaussian 

distribution over the sequence of all measurements. This distribution is based on the 

mean µseq and standard deviation σseq (see Section 5.1.2). Figure 5.8(b) compares the 

vision-based distribution (solid line) at 30m/min and the corresponding ground truth 

distribution (dashed line). The mean is 49.66 in both cases. σseq is slightly larger with 

0.1193 compared to the ground truth with 0.1027. 

In terms of accuracy and precision this means the vision-based measurements of black 

tubes are equally accurate compared to human measurements (laboratory conditions) and


GTD [mm] 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

-0.1 

-0.2 

-0.3 

-0.4 

-0.5 

-0.6 

ground truth distance 

-0.7 

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 

Tube number 

(a) 

1 

0.8 

0.6 

0.4 

0.2 

0 

Measurement distribution 

Ground truth distribution 

49 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51 

Length [mm] 

Figure 5.8: (a) Ground truth distance GT D in mm for black tubes (50mm length, ∅8mm) 

at 30m/min. (b) Gaussian distribution of all measurements compared to the ground truth 

distribution. 

v[m/min] Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE 

10 0.99 9.6 0.06 -0.14 0.32 0.09 0.13 

20 0.98 5.2 0.09 -0.16 0.29 0.08 0.11 

30 1 3.9 0.15 -0.16 0.66 0.15 0.20 

40 0.97 2.4 0.18 -0.27 0.75 0.23 0.28 

Table 5.5: Evaluation results at different conveyor velocities v for transparent tubes (50mm 

length, ∅8mm). The accuracy seems to decrease with faster velocities as can be seen at the 

RMSE and the mean per tube standard deviation σtube. The number of per tube measurements 

ΩPTM is smaller for transparent tubes. Due to the lower contrast it is more likely that 

a tube is not detected as measurable. 

are only marginally less precise. Furthermore, as an additional benefit, it is possible to 

show that a sequence of tubes is systematically shorter than the target length (although 

still in the tolerances). This information could be used to adjust the cutting machine until 

µseq approximates the given target length. 

Transparent Tubes The same experiments have been repeated with transparent tubes. 

The results are summarized in Table 5.5. 

The detection rate Ωtotal tends to decrease with an increasing velocity, although all 

tubes have been detected at 30m/min in this experiment. 3% of the tubes have passed 

the visual field of the camera without being measured at 40m/min. 

Due to the poorer contrast of transparent tubes the probability increases that a tube 

can not be located in the profile analysis step. This can be seen on the average number 

of per tube measurements ΩPTM. While black tubes are measured about 11.4 timesat 

v = 10m/min, the transparent tubes reach only 9.6 measurements per tube at the same 

velocity. At 40m/min this number decreases to 2.4. At faster velocities, e.g. 55m/min, 

the number of per tube measurements falls short of 1. Reliable measurements are not 

(b)


GTD [mm] 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

-0.1 

-0.2 

-0.3 

-0.4 

-0.5 

-0.6 


tube marker 

-0.7 

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 

Tube number 

(a) 

1 

0.8 

0.6 

0.4 

0.2 

0 



49 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51 

Length [mm] 

Figure 5.9: (a) Ground truth distance GT D in mm for transparent tubes (50mm length, 

∅8mm) at 30m/min. The measurements marked by a ‘+’ are all belonging to the same tube 

that reached the maximum GT D at measurement 68. As one can see it is not systematically 

measured wrong. A poor contrast region on the conveyor belt is rather the origin for the strong 

deviations from the ground truth. (b) Gaussian distribution of all measurements compared to 

the ground truth distribution. 

possible at this velocity for transparent tubes so far and are therefore not considered in 

Table 5.5. 

The standard deviation σtube of transparent tubes moved at 40m/min is three times 

larger than at 10m/min. This can be explained by the smaller number of per tube measurements. 

The ground truth distance increases also with the velocity. Especially GT D 

gets conspicuously larger, i.e. the measured lengths are larger than the ground truth 

length on average. This trend can also be observed at the absolute value of GT Dmax and 

GT Dmin. At a velocity of 40m/min the maximum ground truth distance is 0.75 which 

is more than the allowed tolerance. In this context one has to keep in mind that these 

values are only the extrema and do not describe the average distribution. This makes the 

ground truth distance measure very sensitive to outliers. However, a large GT D value 

does not have to mean poor accuracy automatically. On the other hand if the ground 

truth distance is low in the extrema as with the black tubes in this experiment, this is an 

additional indicator of high accuracy. The ground truth distance of the transparent tubes 

at 30m/min is shown in Figure 5.9(a). The deviations are significantly larger compared 

to Figure 5.8(a). 

Instead of being approximately equally distributed as for the black tubes, the error of 

transparent tubes seems to increase and decrease randomly, but always over a range of 

consecutive measurements. This observation can be explained by the varying background 

intensity at back light through the conveyor belt. The periodic intensity changes influence 

the transparent tubes obviously much stronger than the black tubes since the detection 

quality depends mostly on the image contrast. Figure 5.10 shows how the mean image 

intensity of a moving empty conveyor belt changes over time. If a tube is measured at 

a part on the conveyor belt that yields a poor contrast under back light, the GT D is 

likely to increase. Having in mind each tube passes the measuring area 6 times in this 

experiment, the probability is small that it is always measured at the same position on 

the conveyor. The tube measured with the maximum GT D hasbeenmarkedintheplot 

(b)


gray level 

155 

150 

145 

140 

135 

130 

125 

120 

115 

110 

105 

0 50 100 150 200 250 300 350 400 450 500 

t 

Mean image brightness 

Figure 5.10: Mean image intensity of a moving empty conveyor belt over time. The deviation 

between the brightest and the darkest region on the conveyor exceeds 40 gray levels and is 

originated in non uniform translucency characteristics of the belt. Example images showing 

this non uniformity can be found in Figure 3.4. 

as well as all other measurements belonging to this particular tube. It turns out that 

the average ground truth distance of this tube is 0.3mm which is still larger than the 

RMSE of the whole sequence due to the outliers. However it is shown that this tube is 

not measured wrongly in general. Furthermore one can see that all neighboring tubes that 

lie in the same region on the conveyor are also measured inaccurately. It is assumed that 

with a more uniform conveyor belt such deviations could be avoided. 

The mean over all measurements is 50.04 at 30m/min compared to 49.96 in the ground 

truth. This is still very accurate. The precision of the vision-based measurements is 

0.15 compared to 0.09 of human measurements under ideal laboratory conditions. The 

corresponding Gaussian distributions are plotted in Figure 5.9(b). 

Finally, the RMSE increases with faster velocities, and the total error is larger compared 

to black tubes. The lowest error was measured at 20m/min (approximately the current 

production velocity) with 0.11. This error is still only slightly larger than the human 

variance. 

One can conclude that the results of the black tubes are very accurate both for slow and 

fast conveyor velocities. The RMSE falls even below the standard deviation of human 

measurements. The accuracy of transparent tubes decreases with faster velocities, but is 

still in a range that allows for measurements with the given tolerance specifications. Best 

results have been achieved at a velocity of 20m/min. As it turns out, all tubes meeting the 

tolerances in the real world (based on manual ground truth data) have been also measured 

reliably to be within the tolerances by the system, i.e. ΩFN =0. Thus,notubewould 

have been blown out wrongly at any velocity.


Diameter Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE 

6mm (B) 1 4.8 0.05 -0.40 0.29 -0.13 0.18 

8mm (B) 1 4.6 0.05 -0.19 0.19 0.0 0.07 

12mm (B) 1 4.6 0.07 -0.44 0.31 -0.11 0.19 

6mm (T) 0.92 2.8 0.18 -1.15 0.87 0.01 0.20 

8mm (T) 1 3.9 0.15 -0.16 0.66 0.15 0.20 

12mm (T) 0.98 3.12 0.24 -0.69 0.67 0.07 0.20 

Table 5.6: Measuring results of 50mm length tubes with different diameter at a velocity of 

30m/min. The first two rows show to black tubes (B) and the last two rows transparent (T) 

ones. 

Figure 5.11: The thin 6mm tubes are likely to be bent. The distance between the defined 

measuring points in the image does not represent the length of the straight tube correctly. 

5.3.4. Tube Diameter 

Beside tubes of 8mm diameter as investigated in the velocity experiments, there are also 6 

and 12mm diameter tubes to be considered in the DERAY-SPLICEMELT series. Therefore, 

the test data in this scenario includes transparent and black tubes of 50mm length 

with these diameters. The velocity is constant at 30m/min. Again more than 100 tubes 

are measured for each combination of color and diameter. The summarized evaluation 

results can be found in Table 5.6. 

Black Tubes As for 8mm tubes, 100% of the black tubes both for 6 and 12mm diameter 

are measured by the system indicated by a score of Ωtotal = 1. The number of per tube 

measurements is also approximately equal with 4.8 for 6mm diameter tubes and 4.6 for 

12mm tubes. The per tube standard deviation σtube is slightly larger for 12mm with 0.07 

compared to 0.05 at 6 and 8mm tubes. One significant difference to 8mm tubes are the 

larger extrema in the ground truth distance GT Dmin and GT Dmax and the definite shift 

in the average ground truth distance GT D. Values of −0.13 for 6mm and −0.11 for 12mm 

indicate the vision-based lengths are mostly shorter than the manual measurements. 

This has basically two different origins: Tubes with a diameter of 6mm are bent much 

stronger than tubes of larger diameters as can be seen for example in Figure 5.11. In this 

case,bothmanualaswellasvision-basedmeasurementsaredifficult. Thelengthofatube 

intheimageisdefinedasthedistancebetweentheleftandrightendofthetubeatthe 

most outer points of the corresponding edges. If the tube is bent, however, the distance 

between the measuring points is obviously smaller than the real length. This can be seen 

in the ground truth distance as well as in the resulting RMSE which is significantly larger 

with 0.18 compared to 8mm tubes. Figure 5.12(a) visualizes the results of a sequence of 

21 black tubes at 30m/min. The bent tube in Figure 5.11 corresponds to the 10th tube in 

this plot (located between measurement number 45 and 50) and is measured significantly


Length [mm] 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

48 

measurements 




ground truth 

boundaries 

0 10 20 30 40 50 60 70 80 90 100 


(a) black, 6mm diameter 

Length [mm] 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

48 

measurements 




ground truth 

boundaries 

0 10 20 30 40 50 60 70 80 90 100 


(b) black, 12mm diameter 

Figure 5.12: Length measurement results of black tubes with different diameter at 30m/min. 

The plots show only a section of the total number of measured tubes. Although the RMSE 

is larger both for 6 and 12mm tubes compared to the 8mm results, the measurements are still 

accurate enough to correctly detect all tubes within the allowed tolerances. 

shorter than the ground truth. The total results of the experiment with 6mm diameter 

black tubes are shown in terms of the ground truth distance in Figure 5.13(a). 

Only a few tubes are measured too long while most measurements are shorter than the 

ground truth depending on how much a tube is bent, i.e. how much it is differing from 

the assumed straight tube model. However, all tubes out of 100 are measured correctly to 

lie within the allowed tolerances leading to a false negative rate of ΩFN =0(ΩFP =0is 

implicit since there are no outliers in the test data). 

While bending is no problem for black tubes with a diameter of 12mm, these tubes have 

another drawback. The larger diameter makes the tubes more susceptible to deformations 

of the circular cross-section shape. This means, only a little pressure is needed to deform 

the cross-section of a tube to an ellipse. These deformations occur if the tubes are stored 

for example in a bag or box and many tubes lay on top of each other. The tubes used as 

test set have been delivered in such way. In addition, the effect is increased since most 

tubes are grabbed by hand several times, e.g. to measure the ground truth distance or if 

experiments have been repeated with the same tubes. Each manual handling is a potential 

source for a deformation. With respect to the vision-based measuring results the elliptical 

cross-section of a tube leads to a significant problem. In the model assumptions the 

measuring plane ΠM is defined at a certain distance above the conveyor belt. This distance 

is assumed to be exactly the outer diameter of an ideal circular tube (see Figure 5.14(a)). 

The magnification factor that relates a pixel length into a real world length is valid only 

in the measuring plane. With a weak-perspective camera model it is assumed that this 

factor is also valid within a certain range of depth around this plane. 

For a deformed tube the measuring points in the image pL and pR do not originate in 

points that lie in the measuring plane. If the cross-section is elliptical it is most likely 

that the tube will automatically roll to the largest contact area. In this case the points 

closest to the camera will be further away than the measuring plane. Under perspective 

the resulting length in the image will be shorter. This is exactly what is observed in


GTD [mm] 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

-0.1 

-0.2 

-0.3 

-0.4 

-0.5 

-0.6 


-0.7 

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 

Tube number 

(a) black, 6mm diameter 

GTD [mm] 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

-0.1 

-0.2 

-0.3 

-0.4 

-0.5 

-0.6 


-0.7 

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105 

Tube number 

(b) black, 12mm diameter 

Figure 5.13: Ground truth distance in mm of all measured black tubes with a diameter of 

6 and 12mm at 30m/min. 

the experiments. Although it is less likely, it is also possible that a tube lies on the side 

with the smaller contact area. This happens if the tube is leaned against a guide bar for 

example. The result are measuring points above the measuring plane leading to a larger 

length in the image. 

Figure 5.12(b) shows a section of 21 black tubes with a diameter of 12mm measured at 

30m/min. The larger distance to the ground truth data is clearly visible. However, the 

system is again able to reliably detect all tubes correctly within the tolerances without 

any false negatives (ΩFN = 0). As an example of how the deformation of a tube influences 

the measuring results, images of the 7th and the 11th tube 2 of this sequence are shown 

in Figure 5.14(b) and (c) respectively. The extension in vertical direction of tube No. 7 

is definitely smaller than for the neighboring tubes. This is an indicator that the tube is 

deformed and lies on the smaller side, thus, it is measured larger than it actually is. On 

the other hand, tube No. 11 is larger in the vertical extension indicating it is lying on 

the larger contact area. The result is a much shorter length measured by the vision-based 

system which can be also seen in Figure 5.13(b). Like for 6mm tubes, the measurements are 

mostly shorter compared to the ground truth, although the origin is different as introduced 

above. 

These results show the accuracy limits of the weak-perspective model. If higher accuracy 

is needed, a telecentric lens could be used to overcome the perspective effects of different 

depths, or the height of a tube in the image could be exploited to adapt the calibration 

factor fpix2mm dynamically. 

Transparent Tubes The experiments with different diameters have been repeated with 

transparent tubes. Only 92% of all transparent tubes with a diameter of 6mm are detected 

and measured by the system in this experiment. This is mainly due to the nonuniform 

translucency of the conveyor belt. Especially the thin 6mm tubes are very sensitive to 

2 Note: The tube number does not correspond to the (single) measurement number. The dashed lines 

indicate which measurements belong to the same tube.


(a) 

(b) (c) 

Figure 5.14: (a) Idealized cross-section of deformed tubes (frontal view). The measuring 

plane ΠM is defined based on an ideal circular tube (center). Deviations denoted as ∆1 (left) 

and ∆2 (right) influence the length measurement in the image projection. (b) Example of a 

deformed tube (No. 7 in Figure 5.12(b)) lying on the smaller side. The measuring points are 

closer to the camera and due to perspective, the tube appears measurable larger in the image. 

(c) The opposite effect occurs if a deformed tube (No. 11 in Figure 5.12(b)) lies on the larger 

contact area. 

changes in brightness, since they are more translucent than 8mm and 12mm tubes. At 

regions on the conveyor belt that transmit more light, the thin tubes almost disappear. 

Thus, one has to reduce the intensity of the light source. This is a tradeoff, because 

other regions that transmit less light get even darker while the structure of the belt is 

emphasized. If the contrast is too low, the tube can not be located in the profile. This 

problem could be prevented if one would use a more homogenously translucent conveyor 

belt. 

The 12mm diameter tubes yield generally a better contrast which can be seen on the 

detection rate of 98%. The number of per tube measurements ΩPTM is 3.12 compared to 

2.8 for 6mm tubes. However, the average standard deviation is larger for the 12mm tubes 

with 0.24. A RMSE of 0.2 for both 6 and 12mm transparent tubes indicates the measuring 

results are almost equally accurate than black tubes of the same diameter, although the 

extrema are significantly larger. As already mentioned, these values can be influenced by a 

few outliers. The values of GT D show a much more uniform distribution of the deviations 

compared to black tubes. This is due to the fact that transparent tubes are more sensitive 

to strong background edges which can be wrongly detected as tube edge. Figure 5.17 

gives an example of how the system can fail leading to a larger measured length. The 

poor contrast at the tube boundary can not be compensated by the stronger responses at 

thetubeedgeends. Themaximumcorrelationscoreisreachedatthebackgroundedge. 

This problem does not occur at black tubes due to the stronger contrast. 

Thus, in addition to the problems described for black tubes of 6 and 12mm diameter, 

transparent tubes may be measured longer than they really are. Figure 5.15 visualizes the 

experimental results with different diameters of transparent tubes. Again, this is only a 

section of the total number of measurements which are summarized more comprehensive 

in Figure 5.16 based on the ground truth distance. Compared to the experiments with 

black tubes there have been false negatives among the transparent tubes, i.e. tubes have


Length [mm] 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

48 

measurements 




ground truth 

boundaries 

0 10 20 


30 

(a) transparent, 6mm diameter 

Length [mm] 

52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

48 

measurements 




ground truth 

boundaries 

0 10 20 30 40 50 


(b) transparent, 12mm diameter 

Figure 5.15: Experimental results of transparent tubes (50mm length) with a diameter of 6 

and 12mm at 30m/min. The plots show a section of the total number of tubes only. 

GTD [mm] 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

-0.1 

-0.2 

-0.3 

-0.4 

-0.5 

-0.6 


-0.7 

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 

Tube number 

(a) transparent, 6mm diameter 

GTD [mm] 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

-0.1 

-0.2 

-0.3 

-0.4 

-0.5 

-0.6 


-0.7 

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 

Tube number 

(b) transparent, 12mm diameter 

Figure 5.16: Ground truth distance in mm of all measured transparent tubes with a diameter 

of 6 and 12mm at 30m/min.


(a) (b) (c) 

Figure 5.17: The tube edge detection can fail if the contrast between tube and background 

is poor. (a) Zoomed region of an input image. (b) Edge response of this image within the local 

ROI around the assumed edge location. Only the ends of the tube edge yield a significant 

response which is of little account compared to the edge response of the background. (c) The 

maximum correlation score between a template and the image within the local ROI (blue 

bounding box) is reached at the background edge indicated by the red dots. The resulting 

measured length is obviously wrong. 

been wrongly classified as too long or too short. For 6mm tubes the false negative rate 

is ΩFN =0.02 and for 12mm tubes ΩFN =0.01. This means 1 − 2 tubes out of hundred 

would have been sorted out wrongly by the system. 

5.3.5. Repeatability 

A transparent tube of 50.0mm and a black tube of 49.7mm (manual ground truth length) 

have been measured 100 times (based on several single measurements in each case) by 

the system at a constant velocity of 30m/min. The tubes have a diameter of 8mm. The 

measuring results of the black tube are shown in Figure 5.18(a) and the results of the transparent 

tube in Figure 5.18(c). The corresponding Gaussian distribution functions based 

on the mean and standard deviation over all measurements can be found in Figure 5.18(b) 

and (d) respectively. The narrower the distribution the better is the repeatability of the 

measurements. 

The mean of the 100 measurements of the black tube is 49.66 which is pretty close to 

the ground truth length. The standard deviation of the black tube is 0.0614mm. Thus, 

the deviation between measurements of the same tube is less than 1/10th of the tolerance 

and significantly smaller than the deviation between human measurements. 

The measuring results of the transparent tubes show a mean of 49.99 and a standard 

deviation of 0.051. With the results of the previous experiments one could have expected 

the deviation of a transparent tube would be larger than for a black tube. In this experiment 

the transparent tube has been detected 100times in a row correctly (as the black 

tube). The only difference between the two tubes is the shape of the cross-section. Both 

tubes are not ideally circular, but the material of the black tubes is slightly softer, i.e. 

more susceptible for deformations than transparent tubes. In this experiment each tube 

is manually put onto the conveyor belt 100 times. Thus, even if the operator tries to grab 

the tubes as carefully as possible, deformations can not be prevented for both tube types 

leading to the observed deviations in the measurements. Obviously the total deviation


Length [mm] 

Length [mm] 

50 

49.8 

49.6 

49.4 

49.2 

50.4 

50.2 

50 

49.8 

49.6 

Measurements 

Mean 

0 20 40 60 80 100 

N 

(a) black (49.7mm) 

0 20 40 60 80 100 

N 

(c) transparent (50.0mm) 

Measurements 

Mean 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

1 

0.8 

0.6 

0.4 

0.2 


49 49.2 49.4 49.6 49.8 50 50.2 

Length [mm] 

(b) black (49.7mm) 


0 

49.4 49.6 49.8 50 

Length [mm] 

50.2 50.4 50.6 

(d) transparent (50.0mm) 

Figure 5.18: Repeatability of the measurement of one tube. (a) 100 measurements of one 

black tube with the ground truth length of 49.7mm. (b) Corresponding Gaussian distribution 

of all measurements in (a) with µ =49.66 and σ =0.0614. (c) 100 measurements of 

one transparent tube with the ground truth length of 50.0mm. (d) Corresponding Gaussian 

distribution of all measurements in (c) with µ =49.99 and σ =0.051. The belt velocity is 

30m/min in both experiments.


Length [mm] 

50.4 

50.2 

50 

49.8 

49.6 

Measurements 

Mean 

0 20 40 60 80 100 

N 

(a) 

1 

0.8 

0.6 

0.4 

0.2 


0 

49.4 49.6 49.8 50 

Length [mm] 

50.2 50.4 50.6 

Figure 5.19: Repeatability results of a metallic cylinder simulating a tube of 49.99mm ground 

truth length. (a) 100 measurements of the gage at 30m/min. (b) Gaussian distribution of the 

results with µ =49.94 and σ =0.033. 

is also influenced by other parameters such as the tube orientation within the guide bars 

and the limits of the discrete input image (although subpixel techniques are applied). 

This experiment shows how accurate the vision-based system is able to measure even 

transparent tubes if the tube edge detection is successful. 

The experiment has been repeated with a metallic cylinder of 49.99mm length simulating 

an ideal tube (gage). The cross-section of this gage is circular and not deformable manually. 

The results of this experiment are shown in Figure 5.19(a) and (b). The mean over all 

100 measurements is 49.94 with a standard deviation of 0.0331. This deviation is close to 

the error that has been estimated in Section 4.2.6 with respect to the maximum possible 

tube orientation within the guide bars. 

One can conclude, as long as the orientation within the guide bars is neglected, the 

maximum precision of the system is about 0.03mm for tubes that are ideally round and 

not bent. This is much more than twice as precise as human measurements. It is assumed 

that this precision could be even increased, if the tubes are not only approximately but 

ideally horizontally oriented. 

5.3.6. Outlier 

The system is evaluated with respect to outliers in two steps. First more than 150 tubes 

(about 50mm, ∅8mm) are measured by the system at 30m/min. Approximately 1/3 of 

thetubesmeetthetoleranceswhiletheother2/3 have a manipulated length. The ground 

truth length of the tubes is known as well as the measuring order, i.e. each measurement 

can be assigned to a corresponding ground truth length. With the results of the previous 

experiments one can assume that the results of the black tubes will be better or equal to 

the transparent tube results. 

The results of this experiment are visualized in Figure 5.20. All of the 150 tubes are 

classified correctly. There is not a single false positive or false negative in the data. 

In the second stage of this experiment 30 manipulated and 22 good tubes are randomly 

mixed. All tubes are measured online at 30m/min while the blow out mechanism is 

(b)


52 

51.5 

51 

50.5 

50 

49.5 

49 

48.5 

48 




ground truth 

0 20 40 60 80 100 120 140 

Figure 5.20: 150 transparent tubes of both good and manipulated tubes have been measured 

by the system at 30m/min and compared to ground truth data. The system is able to reliably 

separate the tubes that meet the tolerances around the target length of 50mm from the 

manipulated tubes without any false positive or false negative. 

activated. This means tubes that do not meet the tolerances should be sorted out. Once 

all tubes have passed the measuring area it is checked how many of the manipulated tubes 

have also passed the blow out mechanism (false positives) and how many good tubes 

have been sorted out (false negatives). To simplify this task the manipulated tubes have 

been marked before. This experiment is repeated 22 times leading to a total number of 

1144 inspected tubes. The results can be found in Table 5.7. The total detection rate 

is Ωtotal =0.99, i.e. 6 tubes out of 1144 could pass the measuring area without being 

measured at. Three tubes have been sorted out wrongly representing a false negative rate 

of ΩFN =0.0026, i.e. 2.6 . 

The false positives are more critical. 5 outliers have not been blown out correctly, thus, 

ΩFP =0.0043. However, it turns out that 4 of the 5 false positives occur at sequences 

with at least one non detected tube. Hence with the ratio of good and manipulated 

tubes of about 2:3, the probability is larger that the not inspected tube is a manipulated 

one. In this case the false positives are most likely not due to failures in measuring, but 

originated in the fact that these tubes have not been measured at all. At production, all 

non inspected tubes should be sorted out and revised to be sure that no outlier can pass. 

5.3.7. Tube Length 

Measuring tubes of a different length requires the adaptation of the visual field of the 

camera. For tubes < 50mm this means placing the camera closer to the conveyor. However, 

due to the minimum object distance (250mm) of the 16mm lens used in the experiments 

before and with the consideration made in Section 3.2.1, a lens with a longer focal length 

is needed to yield the desired field of view. In this case a 25mm focal length lens is used.


Total Detected Missed FN FP 

52 52 0 1 0 

52 52 0 0 0 

52 52 0 0 0 

52 52 0 0 0 

52 52 0 0 0 

52 52 0 1 0 

52 51 1 0 1 

52 52 0 0 0 

52 50 2 0 1 

52 52 0 0 0 

52 52 0 0 1 

52 52 0 0 0 

52 52 0 0 0 

52 52 0 0 0 

52 52 0 1 0 

52 52 0 0 0 

52 51 1 0 1 

52 52 0 0 0 

52 51 1 0 1 

52 52 0 0 0 

52 51 1 0 0 

52 52 0 0 0 

1144 1138 6 3 5 

Table 5.7: Results of repeated blow out experiments. 22 × 52 transparent tubes have been 

measured at 30m/min. The test data included 22 tubes within the allowed tolerances and 30 

outliers. Detected outliers should have been sorted out by the blow out mechanism. 3 tubes 

have been sorted out wrongly (false negatives) and 5 outliers have passed (false negatives). 

Conspicuously, 4 of the 5 false positives occur if at least one tube has not been detected at all 

by the system.


Larger tubes can be covered by the 16mm focal length lens like 50mm tubes, but the 

camera has to be placed further away from the conveyor to yield a larger field of view. The 

resulting pixel representation, i.e. the length a pixel represents in the measuring plane, 

increases as mentioned before. Hence, the precision decreases. 

In each experiment a charge of 50 tubes (transparent and black) of 30mm and 70mm 

length and 8mm diameter is used as test data. Each charge has been measured by hand and 

is evaluated with respect to mean and standard deviation. Each tube passes the measuring 

area once in this experiment and is measured as often as possible (single measurements) 

while it is in the visual field of the camera. The mean over the computed total lengths 

as well as the standard deviation are determined and compared to the ground truth data. 

The results are summarized in Table 5.8 and visualized in Figure 5.21 in terms of Gaussian 

distributions. 

The number of per tube measurements ΩPTM of 30mm tubes is slightly smaller compared 

to experiments with 50mm tubes at the same velocity. This is due to the smaller 

field of view of the camera. Obviously the tubes leave the measuring area faster. However, 

there are still more than 3 single measurements of each tube both for black and transparent 

tubes on average. The larger 70mm tubes have been measured even more often 

than 50mm tubes with 6.12 single measurements for black and 4.85 for transparent tubes 

respectively. This can be explained by a larger field of view. 

The mean value over a sequence of tubes µseq equals the expectation µGT in almost 

all experiments. Only the 30mm transparent tubes differ from the ground truth of about 

0.01mm which is acceptable small. This indicates the calibration factor between pixels 

and mm has been trained perfectly in all experiments. 

The standard deviation is much smaller for 30mm tubes both in the manual and automated 

measurements compared to 70mm tubes. In general black tubes are measured with 

higher precision than transparent tubes by the system according to the observations in 

previous experiments. The higher precision for 30mm tubes is important with respect to 

the specified tolerances (see Table 1.2). In all experiments beside the 70mm black tubes 

the manual precision is only slightly better than the precision of the visual inspection 

system. However, the results of the system have been always precise enough to allow for 

reliable measurements in terms of the allowed tolerances. At 70mm black tubes the system 

performed even better than humans with a standard deviation of 0.14 compared to 0.16 

measured by hand. 

It is important to state that the precision in these experiments depends both on the 

measuring variance of the system and the real variance of the tubes. Accordingly one 

can not compare the results directly with those in Section 5.3.5 where only one tube was 

measured several times in one experiment. 

One can conclude the visual inspection system is able to measure also tubes of different 

lengths as accurate as humans on average. 

5.3.8. Performance 

Finally, the performance of the system is evaluated on an Athlon64 FX-55 (2.6GHz, 2GB 

RAM) platform. 

The total processing time can be divided into five main groups including profile analysis, 

compensation for radial distortion, edge detection and template matching, as well as length


Color Ltarget ΩPTM µseq µGT σseq σGT 

(a) Black 30 3.43 30.06 30.06 0.09 0.08 

(b) Transparent 30 3.18 30.07 30.06 0.12 0.08 

(c) Black 70 6.12 69.76 69.76 0.14 0.16 

(d) Transparent 70 4.85 70.21 70.21 0.27 0.20 

Table 5.8: Results of 30mm and 70mm tubes at 30m/min. Ltarget represents the target 

length and ΩPTM the average number of per tube measurements. The mean and standard 

deviation of the length measuring distributions are denoted as µseq and σseq for the automated, 

and µGT and σGT for the human measurements respectively. The results are also visualized 

in Figure 5.21. 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 



29 29.5 30 30.5 31 

Length [mm] 

(a) 30mm black 

69 69.5 70 70.5 71 

Length [mm] 

(c) 70mm black 



1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 



29 29.5 30 30.5 31 

Length [mm] 

(b) 30mm transparent 



0 

68.5 69 69.5 70 

Length [mm] 

70.5 71 71.5 

(d) 70mm transparent 

Figure 5.21: Length distribution of 30mm and 70mm tubes at 30m/min for automated (solid 

line) and manual measurements (dashed line). All experiments show a very good accuracy, i.e. 

the vision system measures the same length on average. Black tubes are generally measured 

slightly more precise than transparent tubes. The vision system is even more precise at 70mm 

black tubes than human measurements.


computation and tracking. The last group contains all remaining operations that are not 

considered by any of the groups before. 

Manythousandsofframeshavebeentimedwithandwithouttubesinthevisualfield 

of the camera. The results of the performance evaluation can be found in Figure 5.22. It 

turns out that the processing of a measurable frame requires 17.8ms on average. Thus, 

all images at a capture rate of 50fps (i.e. a new image is acquired every 20ms) can be 

processed. 

The dominant part of the processing is consumed by edge detection and template matching 

where the later is mostly expensive. 82% of the total processing time is needed for 

this step on average, although the number of pixels considered is highly restricted by the 

local ROIs. The undistortion operation is the second most expensive operation with 10% 

followed by the length computation and tracking with 4%. The profile analysis, thougth as 

fast heuristic to locate a tube roughtly, is proven to be very fast with only 0.29ms/frame. 

The remaining 3% represent operations such as image conversions, copying or drawing 

functions to visualize the detection results. The later could be saved at production if 

visualization is not required. 

If the profile analysis detects a non measurable frame, the template matching is not 

performed. Thus, the remaining time could be used for different side operations in future, 

e.g. to save logging information or to run certain self control mechanisms. Such mechanisms 

could check whether the illumination is still bright enough or if the camera position 

has changed for example.


Task Ωtime [ms/frame] 

Profile Analysis 0.29 

Undistortion 1.79 

Edge Detection/ 

14.57 

Template Matching 

Length computation/ 

Tracking 

0.69 

Other 0.48 

Total 17.82 

Undistortion: 10% 

Profile Analysis: 2% 

(a) 

Other: 3% 

Length computation/ 

Tracking: 4% 

Edge detection/ Template matching: 82% 

(b) 

Figure 5.22: (a) Average processing time per frame divided into different steps of the visual 

inspection. (b) Corresponding pie chart. As one can see, the edge detection and template 

matching is the dominant operation throughout inspection.


5.4. Discussion and Future Work 

The main difficulties with transparent tubes come along with the nonuniform brightness 

and the texture of the background. A conveyor belt which is equally translucent over 

the whole length could prevent many problems. The parameters controlling the detection 

sensitivity must cover both the brightest and the darkest region of the conveyor belt. This 

is always a compromise leading to poorer results on average. However, if the contrast 

between tubes and the background does not depend on where the tube is located on the 

conveyor belt, the parameters can be adjusted much more specific. 

The background texture of the conveyor belt used for the prototype has the drawback 

of regular vertical structures. If the tube edge contrast is poor, the edge response of 

the background may be stronger than the tube edge. Model knowledge can be used to 

improve the tube edge localization even under the presence of strong vertical background 

edges. However, there is still a certain error probability which can be drastically reduced 

if vertical background edges are suppressed. The best solution would be to use a conveyor 

belt with a canvas of horizontal structure. This would obviously simplify the detection 

task without requiring any computation time. 

If no conveyor belt can be found that provides the desired horizontal structure in combination 

with good translucency characteristics, one can think of suppressing the background 

pattern within the local ROI around a tube edge algorithmically by exploiting the regularity 

of the background pattern. One idea is to transform the spatial image into the 

frequency domain using the Fourier transform. For more information on the Fourier transform 

and the frequency domain it is referred to [64]. If it is possible to find characteristic 

frequencies belonging to the background pattern, one can remove these frequencies in the 

frequency domain and apply the inverse Fourier transform to the filtered spectrum. The 

result is a filtered spatial image with reduced background structure. The filter must be 

designed carefully to preserve the tube edges. 

In a first experiment, test images of both a conveyor with and without a tube have been 

acquired and transformed into the frequency domain. Figure 5.23(a) and (b) show an 

example of the spectrum of an image with background only and with transparent tubes 

in the image respectively. The spatial domain of (b) can be seen in (d). One eye-catching 

consistency in the spectra are the bright spots. If one removes these spots in the spectrum 

of an image indicated by the black regions in (c) and applies the inverse Fourier transform 

to this filtered spectrum, the result is an image with a significantly reduced background 

pattern. The actual tube edges, however, are quite well preserved. In this case the 

spectrum has been filtered by hand and only coarse. Much more work has to be spent 

in designing more sophisticated and reliable filters that perform well for a large number 

of images without removing or blurring any relevant edges. Removing a frequency from 

the spectrum does always influence the whole image. The filter in the example produces 

new structure at the tube regions, especially around the printings. In addition, the darker 

stripe in the background on the right of the input image is still present in the filtered 

version, since it does not belong to the regular pattern of the background. Although in 

this example the dark stripe is not critical it might be in other situations. This shows the 

limits of this approach. Any deviations from the regular background pattern are difficult 

to suppress in the frequency domain. If the conveyor belt is changed, the texture of the 

belt might by completely different. In this case the filter has to be adapted. An automated 

filter adaptation and background learning is non trivial.

5.4. DISCUSSION AND FUTURE WORK 131 

(a) Background only (b) Background + Tubes (c) Masked spectrum 

(d) Source Image (e) Filtered Image 

Figure 5.23: Background suppression in the frequency domain. (a) Fourier transform of an 

image of an empty conveyor. (b) Fourier transform of (d). (c) Certain frequencies have been 

removed by hand indicated by the black regions. (e) Inverse Fourier transform of the filtered 

spectrum. The characteristic vertical background pattern could be reduced quite well while 

thetubeedgesarepreserved. 

The experiments have shown that tubes of 8mm diameter are most robust against 

deformations. While thinner tubes of 6mm diameter tend to be bent, tubes of 12mm may 

be elliptical in the cross-section. In both cases the accuracy and precision decreases. The 

question is whether such deformations are only caused by the way the tubes have been 

stored, transported and handled throughout the experiments in the laboratory or if they 

also occur at production. The later can be assumed, at least in a certain amount. A 

telecentric lens could overcome the problem of perspective occurring with deformed 12mm 

tubes. 

A less cost expensive improvement would be to measure not only the length, but also 

the height of a tube in the image. A larger height indicates the tube is closer to the camera 

and vice versa. The calibration factor relating pixels to mm could be defined as a function 

ofthetubeheight.Obviously,thisrequiresamorecomplexteach-instep. 

Another potential source of deviations in the measurements is the tube orientation. 

The guide bars restrict the maximum tube rotation to a minimum. The remaining error 

has been approximated. Although it is very small, it could be even further reduced by 

tilting the whole conveyor slightly around its longitudinal axis. The angular orientation 

guarantees that all tubes will roll to the lower guide bar. If the guide bar is horizontal 

in the image, so will be the tubes. Accordingly the camera position has to be adapted 

to reestablish the fronto-orthogonal view. The proposed camera positioning method is 

independent of the orientation of the conveyor and the camera in 3D space.


The blow out mechanism was tested successfully in the prototype setup. The advantage 

of this mechanism is that it works almost independent of the conveyor velocity and the 

position of the light barrier relative to the measuring area. One has to assure only that 

no tube passes the light barrier before the good/bad decision of the measuring system 

reaches the blow out controller. 

One drawback of the current strategy is the sensitivity to ghosts. If the system detects 

a tube where actually no tube is, the resulting classification of the ghost is send to the 

controller anyhow and stored in the FIFO memory. Since a ghost is never detected by 

the light barrier, the good/bad decision of the ghost is still in the memory when the next 

tube passes the light barrier. Instead of considering the decision belonging to this tube 

(appended to the FIFO memory) the decision of the ghost is evaluated. This leads to a 

loss of synchronization, i.e. a tube T is related to the decision of tube T − 1. Over time 

this effect can increase and the reliability of the system is obviously violated. 

A potential solution of this problem can be achieved by replacing the FIFO memory by 

a single register that is able to store only the latest decision. Without loss of generality 

a0inthisregistermightcorrespondtoblowingoutthenexttubewhilea1indicatesthe 

next tube can pass. The register is set to 0 by default. Each time the inspection system 

measures a tube to be within the allowed tolerances a signal is send to the controller that 

sets the bit in the register to 1. As soon as the tube has passed the light barrier, the 

register is reset to 0. This has to be done before the next tube is measured. Therefore 

the light barrier has to be placed quite close to the measuring area. The advantage of this 

approach is that the memory contains always the current decision belonging to the tube 

that passes the light barrier next. A timer can be used to reset the register if no tube 

intersects the light barrier within the expected time. Thus, ghosts become uncritical. 

Furthermore, since the register is reset each time, this helps also to prevent the problemsofnondetectedtubes,i.e. 

tubesthathavepassedthevisualfieldofthecamera 

without being measured. In the outlier experiment (see Section 5.3.6) the false positive 

rate increased drastically if tubes could not be detected. In this case the system does not 

send a good/bad decision for the missed tube to the controller. The light barrier, however, 

detects every tube independent of being measured or not. With the single register strategy 

these tubes are blown out by default. Thus, only tubes that have been measured by the 

system and meet the allowed tolerances are able to pass the blow out nozzle. 

If tubes are not detected at all or measurements do not result in a meaningful length 

value (e.g. the standard deviation of the single measurements is too large), the corresponding 

tubes define another group U including all unsure measurements that can not 

definitely be assigned to G ′ 0 , G′ −,orG ′ +. All tubes of this class should be blown out by 

default to ensure no outlier can pass the quality control. These tubes do not have to 

be considered as rejections, but could be measured by hand afterward or recirculated to 

be inspected again by the vision-based measuring system depending on the frequency of 

occurrence. 

The experiments have shown that more than 80% of the total processing time is needed 

for the template based edge localization. In the current implementation the left and 

right ROI are processed sequential. One possible optimization could be to parallelize this 

problem. This means, the computation within the left and right ROI could be performed 

in separate threads to exploit the power of curret dual core architectures. This is possible, 

since the processing in the two ROIs is independent of each other.

6. Conclusion 

In this thesis a functioning prototype for a vision-based heat shrink tube measuring system 

has been presented allowing for an 100% online inspection in real-time. Extensive experiments 

have shown the accuracy and precision of the developed system which is reaching 

the quality of accurate human measurements under ideal laboratory conditions. The advantage 

of the developed system is that this accuracy can be achieved even at conveyor 

velocities of up to 40m/min. 

A multi-measurement approach has been investigated in which each decision whether 

a tube has to be sorted out is based on 2-11 single measurements depending on the tube 

type and conveyor velocity. This requires video frame rates of ≥ 50fps to be processed 

in real-time. Fast algorithms, heuristics and model knowledge are used to improve the 

performance in this constrained application. Tube edge specific templates have been defined 

that are able to locate a tube edge with subpixel accuracy even in low contrast 

images under the presence of background clutter. In the prototype setup, the tube edge 

detection has been complicated by the strong vertical structure of the conveyor belt and 

an inhomogeneous translucency leading to non uniform bright background regions. The 

consequences for transparent tubes have been discussed including the possibility of tubes 

that can pass the visual field of the camera without being detected. 

Since black tubes are not translucent, they yield an optimal contrast to the background 

with a back lighting setup. On the other hand, transparent tubes are much more sensitive 

to the structure of the background and the local tube edge contrast. All parameters 

adjusted for transparent tubes turned out to have no disadvantage for black ones. Thus, 

the parameters for transparent tubes are used in general, leading to a more uniform 

solution in the system design. 

Beside the algorithmic part of the work the engineering of the whole system including 

the proper selection of a camera, optical system, and illumination has been solved. The 

integration of the micro controller and the air blow nozzle completes the prototype, allowing 

for concrete demonstrations of how tubes that do not meet the tolerances are blown 

out. 

A simple and intuitive initialization of the system has been developed. Most parameters 

can be trained interactively and automated without complicated user interactions. Even 

an unskilled worker should be able to perform the teach-in step after a few instructions. 

The only critical part of the teach-in is the camera positioning. To exclude as many sources 

of error the camera should be mounted as stable as possible at fix orientation (which has 

to be calibrated only once). The required height adjustments to cover the range of tube 

lengths should be automated if possible. 

The maximum measuring precision of 0.03mm was reached for a metallic tube model 

simulating an ideal tube (at a conveyor velocity of 30m/min). During the experiments 

it has been observed that deformations of real heat shrink tubes (elliptical cross-section 

133

134 CHAPTER 6. CONCLUSION 

or bending) have a certain influence on the measuring precision. However, the average 

precision is still < 0.1mm for real tubes. In general, tubes of 8mm diameter have been 

measured more precisely than 6mm or 12mm tubes. 

The average accuracy (root mean square error) of the automated measurements, i.e. 

the distance to some ground truth reference, is about 0.1mm for black tubes and about 

0.2mm for transparent tubes at velocities of 30m/min. The ground truth has been acquired 

manually under ideal laboratory conditions and has also a certain inter and intra human 

deviation of about 0.1mm. While the velocity has only a minor influence on the accuracy of 

black tubes, the accuracy of transparent tubes decreases significantly with higher velocities. 

The main reason for this observation is the decreasing number of per tube measurements, 

sinceaveragingoverthesinglemeasurementsgetsmoresensitivetooutliers.Inaddition, 

the probability increases that a transparent tube is not detected at all if the background 

contrast is poor. However, in general, the accuracy and precision has been good enough in 

all experiments to reliably detected both black and transparent tubes of different length 

and diameter with respect to the specified tolerances. Experiments with transparent tubes 

of manipulated lengths have shown the system is able to separate the good ones from the 

tubes that do not meet the tolerances successfully. The false negative rate, i.e. the number 

of tubes that have been sorted out wrongly, is 2.6 .Lessthan4.3 of failures could 

pass the measuring area. However, 80% of the false positives have not been detected at all 

by the system. With the adaptation of the blow out strategy as suggested in Section 5.4 

these tubes would have been blown out, too. Hence, the theoretically remaining false 

positive rate is 0.87 for transparent tubes. Following the experimental results one can 

assume that the false positive rate for black tubes will be less or equal. 

The measuring results have a positive side effect, since it is possible to compute the 

moving average over the last N measurements. An operator can compare the current 

mean length to the given target length. This can be useful especially during the teachin 

of the machine. At production, deviations can be corrected before the tolerances are 

exceeded. In a more sophisticated solution the adjustment could be automated. If one can 

assure the current mean length measured by the vision system equals the target length, 

the blow out mechanism may never need to be activated and the probability for false 

positives can be further decreased. 

In addition, the system is able to store the inspection results in a file or database. Such 

statistics can be also useful for the management or controlling since they include not only 

the length distribution of the production, but also information about the total number of 

tubesproduced,thetimeofproduction,aswellasthenumberofdefectives. 

The good results of the prototype support the use of an optical inspection system for 

length measurements of heat shrink tubes. Manual sample inspections as used currently at 

production are influenced by many factors like concentration, speed, motivation, or tiredness 

of the individual operator. In general, less precision can be assumed for measurements 

at production compared to ideal laboratory measurements as used for evaluating the system. 

The advantage of the automated vision-based system is the ability to inspect each 

tube at laboratory precision without getting tired.

Appendix 

135

A. Profile Analysis Implementation Details 

Details regarding the implementation of the profile analysis with a focus on performance 

aspects are introduced in the following. 

A.1. Global ROI 

A simple, but very effective way to decrease the computational load is to restrict the image 

processing to a certain region of interest (ROI). Following the assumption that parts of 

the guide bars are visible in the images at the top and the bottom without containing 

any information, the guide bars can be excluded from further processing and, thus, the 

ROI lies in between these guide bars. The height of the ROI is given by the guide bar 

distance which should be almost constant over the whole image since they are adjusted to 

be parallel to the x-axis in the image. The ROI extends in horizontal direction over the 

whole image width minus a certain offset at both sides. This offset is due to the fact that 

the image distortion is maximal at the boundaries. The actual value of the offset depends 

on the ability to overcome the distortion at measuring. If the measurements are accurate 

even at the image boundaries, the offset tends against zero. In the following, the ROI 

betweentheguidebarsisalsoreferredtoasglobal ROI. 

Section 3.2 states it is possible to adapt the camera resolution to a user-defined size. 

The reason why the image size is not adjusted to cover the global ROI exactly (by what it 

becomes redundant) is a very practical one. First of all, the guide bars provide a valuable 

clue in adjusting the field of view of the camera. In addition, smaller images mean less data 

has to be transferred and consequentially a larger number of images can be transferred in 

the same time. If the image size is too small, the actual frame rate exceeds the number of 

frames that can be processed without skipping frames which should be avoided. 

The extraction of the global ROI can be automated using a similar profile analysis 

approach as used for tube localization but in vertical direction. Again several vertical 

scan lines are used to build the profile. If there is no tube in the image (empty scene), 

the guide bars can be detected clearly since the contrast between the bright conveyor belt 

and the black guide bars is very strong. A smoothing step as used in horizontal direction 

to overcome the background clutter is not necessary. This has the benefit that the two 

strongest peaks in the profile describe the guide bar location quite accurate. The detection 

of the global ROI has to be performed only once at an initialization step if assuming a static 

setup of camera and conveyor that does not change over time. In future, it is thinkable 

that everytime the state ’empty’ is detected, the ROI is reinitialized and compared with 

the previous location. A difference indicates something changed with the setup and may 

induce an alert or some specific reaction. 

137

138 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILS 

A.2. Profile Subsampling 

In many computer vision tasks it is common to perform a specific operation on lower 

resolution images than the input to increase computation speed. For example one could 

simply discard every second row or column two obtain an image of half size of the original 

image. However, to avoid a violation of the sampling theorem it is important to apply 

a low-pass filter operation on the data before. This mechanism can be used to generate 

pyramids of images at different resolutions or scales. Each layer in the pyramid has half 

the size of the layer above with the top layer corresponding to the original size. Before 

subsampling the data a Gaussian smoothing operation is performed to suppress higher 

frequencies. Thus, such pyramids are called Gaussian Pyramids in the literature [24]. 

The same can be applied to one-dimensional signals such as gray level profiles. In 

this application, experiments have shown the information about the tube boundaries is 

conserved a coarser scale. Thus, a subsampled version two levels down the pyramid instead 

of the original profile is used in praxis. The data to be processed after this step is only 

a fourth of the input. Obviously, the profile analysis can be accelerated by this step. 

Experiments investigating whether the profile subsampling could replace step one in the 

profile analysis, i.e. the smoothing with a large mean kernel, came to the conclusion that 

in connection with transparent tubes and dark printing, the strong contrast of the letters 

could be misclassified as tube boundary. The system tries to detect the real tube location 

in a certain region around the wrong position and is likely to fail. The mean filter instead 

is able to reduce the influence of the lettering and must not be replaced. 

A.3. Scan Lines 

As mentioned in Section 4.4.2, the profile to be evaluated is based on the normalized sum 

of Nscan scan lines equally distributed over the global ROI. The reason why a single scan 

line is not sufficient is shown in Figure A.1(b). Three sample profiles at different heights 

(61, 80 and 100) are selected to visualize the influence of the printing. One can see the 

strong contrast at the letters as well as a poor contrast at the right tube boundary. Since 

it is non-deterministic of whether the printing of a particular tube is visible in an image, 

one has to consider the worst case. This is a scan line passing through the printing at as 

many positions possible. The global mean of the resulting profile is much lower in this case 

and it is possible that the intensity of the tube at regions outside the printing is wrongly 

classified as background. The result of this effect is shown in Figure A.1(d). On the other 

hand, the usage of several scan lines decreases the influence of the printing significantly. 

The probability that more than a few scan lines will pass through the printing is low. 

For example, among the sample tubes used for testing of the prototype, the coverage of 

the printing is about 16% with respect to the diameter. Thus, it is very likely to have 

more than one scan line passing through tube regions without printing. In total, the 

influence of the printing decreases with the number of scan lines. However, Figure A.1(c) 

shows 11 scan lines equally distributed over the global ROI in y-direction are sufficient 

to yield almost equal results as with considering all rows of the ROI. Here, the profile 

consisting of 11 scan lines is shifted, i.e. the intensity values are lower compared to the 

profile calculated from all ROI rows (90 in this example). This is due to the location of

A.3. SCAN LINES 139 

gray value 

400 

350 

300 

250 

200 

150 

100 

50 

1 scanline (y=61) 



0 

0 100 200 300 400 

x 

500 600 700 

(b) 

(a) 

gray value 

400 

350 

300 

250 

200 

150 

100 

50 

normalized sum of 11 scanlines 

normalized sum of all rows 

0 

0 100 200 300 400 

x 

500 600 700 

(d) (e) 

Figure A.1: Comparison of a single and multi scan line approach. (a) Input gray scale image. 

(b) Profiles of three selected scan lines at height 61, 80 and 100 respectively. The first two 

scan lines pass through the printing leading to strong variations in the profile. Compared to 

these variations the poor contrast of the right tube border makes a correct detection difficult. 

(c) The normalized sum of several scan lines reduces the effect of the printing bringing out the 

location of the tube much more clearly. It can be seen that 11 scan lines equally distributed 

over the global ROI are sufficient to yield almost equivalent results as if considering every row. 

(Note: The profile of the 11 scan lines is shifted since the global ROI included parts of the 

guide bars at the upper and bottom row. Since these pixels have a value near zero, they do not 

contribute much to the profile sum but are considered in normalization. The scale, however, 

does not affect the actual tube location.) (d) Wrong detection of the tube boundaries if using 

a single scan line. (e) Result of the multi scan line approach. 

(c)

140 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILS 

the global ROI. As can be seen in Figure A.1(e) the global ROI is a bit too large, thus, the 

upper and bottom row hits the border of the guide bars. Scan lines through these rows do 

not contribute much to the overall profile, but have an effect in normalization. This shift, 

however, does not affect the actual tube location. With respect to performance, rows that 

have no influence should be ignored. 

Obviously the problem with the printing on a tube’s surface comes only with transparent 

tubes since the printing is not visible on the black tubes at back light. If black tubes are 

inspected, a single scan line in the image center is sufficient to localize the tube correctly, 

but more scan lines do not impair the results. To have a more universal solution, the multi 

scan line approach is used for all tube types and it is not distinguished at this part of the 

system to keep it simple. 

A.4. Notes on Convolution 

At several steps in the profile analysis a convolution operation is performed. With respect 

to the derivation of the profile by convolving with a first derivative Gaussian kernel in step 

two, it is important to note what boundary condition is used, since in discrete convolution 

there are positions at the image boundaries that are undefined. There are many different 

strategies to adopt this problem including padding the image with constant values (e.g. 

zero), reflecting the image boundaries periodically or simply ignoring the boundaries [24]. 

Here, a symmetric reflection strategy is used: 

P (−i) = P (i − 1) (A.1) 

P (NP + i) = P (NP +1− i); (A.2) 

where the first equation is used for the left and the second equation for the right boundary 

respectively. NP indicates the length of P and P (x) the intensity value in the profile at 

position x. The advantage of this strategy compared to a padding with zeros for example 

is that no artificial edges are introduced.

B. Hardware Components 

B.1. Camera 

Specification MF-033C MF-046B 

Image Device 1/2” (diag. 8 mm) type progressive scan 1/2” (diag. 8 mm) type progressive scan 

SONY IT CCD 

SONY IT CCD 

Effective Picture Elements 656 (H) × 492 (V) 780 (H) × 580 (V) 

Lens Mount C-mount: 17.526 mm (in air); ∅ 25.4 mm C-mount: 17.526 mm (in air); ∅ 25.4 mm 

(32 T.P.I.) Mechanical Flange Back to filter (32 T.P.I.) Mechanical Flange Back to filter 

distance: 8.2 mm 

distance: 8.2 mm 

640 × 480 pixels (Format 0) 

Picture Sizes 

640 × 480 pixels (Format 0; Mode 5) 






Cell Size 9.9 µm × 9.9 µm 8.3 µm × 8.3 µm 

ADC 10 Bit 10 Bit 

Color Modes Raw 8, YUV 4:2:2, YUV 4:1:1 - 

Data Path 8Bit 8 

Frame Rates 3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 74 Hz 3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 53 Hz 

in Format 7(RAW);68Hz(YUV4:1:1);up 

to 51 Hz in YUV 4:2:2 

in Format 7 

Gain Control Manual: 0-16 dB (0.035 dB/step); Auto gain Manual: 0-24 dB (0.035 dB/step); Auto gain 

(select. AOI) 

(select. AOI) 

White Balance Manual 

AOI) 

(U/V); One Push; Auto (select. - 

Shutter Speed 20 . . . 67.108.864 µs (∼ 67s); Auto shutter 20 . . . 67.108.864 µs (∼ 67s); Auto shutter 

(select. AOI) 

(select. AOI) 

External Trigger Shutter Trigger Mode 0, Trigger Mode 1, Advanced Trigger Mode 0, Trigger Mode 1, Advanced 

feature: Trigger Mode 15 (bulk); image feature: Trigger Mode 15 (bulk); image 

transfer by command; Trigger delay 

transfer by command; Trigger delay 

Internal FIFO-Memory Up to 17 frames Up to 13 frames 

#LookUpTables One, user programmable (10 Bit → 8 Bit); One, user programmable (10 Bit → 8 Bit); 

Gamma (0.45) 

Gamma (0.45) 

Smart Functions Real time shading correction, image sequenc- Real time shading correction, image sequencing, 

two configurable inputs, two configing, two configurable inputs, two configurable 

outputs, image mirror (L-R ↔ R-L), urable outputs, image mirror (L-R ↔ R-L), 

serial port (IIDC v. 1.31) 

binning, serial port (IIDC v. 1.31) 

Transfer Rate 100 Mb/s, 200 Mb/s, 400 Mb/s 100 Mb/s, 200 Mb/s, 400 Mb/s 

Digital Interface IEEE 1394 IIDC v. 1.3 IEEE 1394 IIDC v. 1.3 

Power Requirements DC 8 V - 36 V via IEEE 1394 cable or 12-pin DC 8 V - 36 V via IEEE 1394 cable or 12-pin 

HIROSE 

HIROSE 

Power Consumption Less than 3 Watts (@ 12 V d.c) Less than 3 Watts (@ 12 V d.c) 

Dimension 58 mm × 44 mm × 29 mm (L × W × H); 58 mm × 44 mm × 29 mm (L × W × H); 

without tripod and lens 

without tripod and lens 

Mass 

Operating Temparature 

< 120g(withoutlens) 

+5 – +45 

< 120g(withoutlens) 

◦ Celsius +5 – +45 ◦ Storage Temparature −10 – +60 

Celsius 

◦ Celsius −10 – +60 ◦ Celsius 

Regulations EN 55022, EN 61000, EN 55024, FCC Class EN 55022, EN 61000, EN 55024, FCC Class 

A, DIN ISO 9022 

A, DIN ISO 9022 

Options Host adapter card, locking IEEE 1394 ca- Removable IR-cut-filter, Host adapter card, 

ble, API (FirePackage), TWAIN (WIA)- and locking IEEE 1394 cable, API (FirePackage), 

WDM stream driver 

TWAIN (WIA)- and WDM stream driver 

Table B.1: Camera specifications for the AVT Marlin F-033C and F-046B. 

141

142 APPENDIX B. HARDWARE COMPONENTS 

B.2. Illumination Hardware 

Description Value 

Rated Power Output 200 Watts 

Output Voltage 0.0, 0.5 to 20.5 VDC 

Input Voltage Rating, 50/60 Hz 90 to 265 VAC 

Power Factor Correction @ 230 VAC, 50 Hz > 0.99, < 4 ◦ 

Hold-up Time, Nominal AC Input, Full Load 8.3 ms 

Line Regulation, Over Entire Input Range ±0.5% 

Current Limit Set Point 8.5 Amps 

Temperature Range: Operating 0 ◦ to 45 ◦ C 

Storage −25 ◦ to 85 ◦ C 

Relative Humidity, Non-condensing 5% to 95% 

Table B.2: Light Source (A20800.2) with DDL Lamp 


Calibrated Area 3” × 5” (76 × 127mm) 

Panel Size 4” × 6” (102 × 152mm) 

Overall Thickness .05” (1.3mm) 

Table B.3: SCHOTT PANELite Backlight (A23000) (flexible fiber optical area light).

B.2. ILLUMINATION HARDWARE 143 


Bulb Type DDL 

Voltage 20 

Wattage 150 

Lamp Base GX5.3 

Bulb Finish Clear 

Burn Position Base/Down Horz. 

Shape MR-16 

Color Temp. 3150 

Filament CC-6 

Lamp Fill Halogen 

Lamp Life 500 Hrs. 

Over All Lengt [mm] 44.5 

Reflector Design Dichroic 

Reflector Size [mm] 50.7 

Working Distance [mm] 194.5 

Table B.4: Lamp specifications

144 APPENDIX B. HARDWARE COMPONENTS

Bibliography 

[1] Y.I. Abdel-Aziz and H.M. Karara. Direct linear transformation from comparator 

coordinates into object space coordinates in close-range photogrammetry. Proc. of 

the Symposium on Close-Range Photogrammetry, pages 1–18, 1971. 

[2] M. B. Ahmad and T. S. Choi. Local threshold and boolean function based edge 

detection. IEEE Trans. on Consumer Electronics, 45(3):674–679, August 1999. 

[3] Allied Vision Technologies GmbH, Taschenweg 2a, D-07646 Stadtroda, Germany. 

AVT Marlin - Technical Manual, 7 2004. 

[4] A. Alper. An inside look at machine vision. Managing Automation, 2005. 

[5] American Society for Photogrammetry and Remote Sensing (ASPRS). Manual of 

Photogrammetry. Asprs Pubns, 4th edition, 1980. 

[6] K. Astrom and A. Heyden. Stochastic modelling and analysis of sub-pixel edge detection. 

In International Conference on Pattern Recognition (ICPR), pages 86–90, 

1996. 

[7] B. Batchelor and F. Waltz. Intelligent Machine Vision. Springer, 2001. 

[8] A. Blake. Active Contours. Springer, 1999. 

[9] J. Y. Bouguet. Camera calibration toolbox for matlab. 

[10] I. N. Bronstein, G. Musiol, H. Mühlig, and K. A. Semendjajew. Taschenbuch der 

Mathematik. Harri Deutsch, 2001. 

[11] D. C. Brown. Decentering distortion of lenses. Photometric Engineering, 32(3):444– 

462, 1966. 

[12] D. C. Brown. Lens distortion for close-range photogrammetry. Photometric Engineering, 

37(8):855–866, 1971. 

[13] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern 

Analysis and Machine Intelligence (PAMI), 8:679–698, 1986. 

[14] T. Chaira and A. K. Ray. Threshold selection using fuzzy set theory. Pattern Recognition 

Letters (PRL), 25(8):865–874, June 2004. 

[15] R. W. Conners, D. E. Kline, P. A. Araman, and T.H. Drayer. Machine vision technology 

for the forest products industry. Computer, 30(7):43–48, 1997. 

[16] E. R. Davies. Machine Vision- Theory, Algorithms, Practicalities. Elsevier, 2005. 

[17] C. de Boor. A practical guide to splines. Springer, 1978. 

145

146 Bibliography 

[18] C. Demant, B. Streicher-Abel, and P. Waszkewitz. Industrial Image Processing - 

Visual Quality Control in Manufacturing. Springer, 1999. 

[19] R. Deriche. Using canny’s criteria to derive a recursively implemented optimal edge 

detector. International Journal of Computer Vision (IJCV), 1(2):167–187, 1987. 

[20] S. di Zenzo, L. Cinque, and S. Levialdi. Image thresholding using fuzzy entropies. 

IEEE Transactions on Systems, Man, and Cybernetics (SMC-B), 28(1):15–23, February 

1998. 

[21] O. Faugeras. Three-Dimensional Computer Vision. A Geometric Viewpoint. MIT 

Press, Cambridge, 1993. 

[22] J. Föglein. On edge gradient approximations. Pattern Recognition Letters (PRL), 

1:429–434, 1983. 

[23] P. J. Flynn and A. K. Jain. Cad-based computer vision: From cad models to relational 

graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 

13(2):114–132, 1991. 

[24] D. A. Forsyth and J. Ponce. Computer Vision - A modern approach. Pearson Education 

International, 2003. 

[25] W. T. Freeman and E. H. Adelson. The design and use of steerable filters. IEEE 

Transactions on Pattern Analysis and Machine Intelligence (PAMI), 13(9):891–906, 

1991. 

[26] C. A Glasbey. An analysis of histogram-based thresholding algorithm. Graphical 

Models and Image Processing, 55(6):532–537, November 1993. 

[27] E. B. Goldstein. Sensation and Perception. California: Brooks/Cole Publishing Co., 

1996. 

[28] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice Hall, 2nd 

edition, 2002. 

[29] E. R. Hancock and J. V. Kittler. Adaptive estimation of hysteresis thresholds. In 

Proc. of the IEEE Computer Vision and Pattern Recognition (CVPR), pages 196–201, 

1991. 

[30] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge 

University Press, 2nd edition, 2003. 

[31] J. Heikkila and O. Silven. A four-step camera calibration procedure with implicit 

image correction. In Proc. of the IEEE Computer Vision and Pattern Recognition 

(CVPR), pages 1106–1112, 1997. 

[32] R. V. Hogg and A. T. Craig. Introduction to Mathematical Statistics. Prentice Hall, 

5 edition, 1994. 

[33] D. H. Hubel. Exploration of the primary visual cortex, 1955-1978. Nature, 299:515– 

524, 1982.

Bibliography 147 

[34] R. J. Hunsicker, J. Patten, A. Ledford, C Ferman, et al. Automatic vision inspection 

and measurement system for external screw threads. Journal of Manufacturing 

Systems, 1994. 

[35] R. W. Hunt. Measuring Colour. Ellis Horwood Ltd. Publishers, 2nd edition, 1991. 

[36] B. Jähne. Digital Image Processing. Springer, 6th edition, 2005. 

[37] B. Julez. A method of coding TV signals based on edge detection. Bell System Tech., 

38(4):1001–1020, July 1959. 

[38] R. King. Brunelleschi’s Dome: How a Renaissance Genius Reinvented Architecture. 

Penguin Books, 2001. 

[39] R. K. Lenz and R. Y. Tsai. Calibrating a cartesian robot with eye-on-hand configuration 

independent of eye-to-hand relationship. IEEE Transactions on Pattern Analysis 

and Machine Intelligence (PAMI), 11(9):916–928, September 1989. 

[40] J. Linkemann. Optics recommendation guide. http://www.baslerweb.com/. 

[41] E. P. Lyvers, O. R. Mitchell, M. L. Akey, and A. P. Reeves. Subpixel measurements 

using a moment-based edge operator. IEEE Transactions on Pattern Analysis and 

Machine Intelligence (PAMI), 11(12):1293–1309, December 1989. 

[42] E. N. Malamas, E. G. M. Petrakis, M. E. Zervakis, L. Petit, and J. D. Legat. A 

survey on industrial vision systems, applications and tools. Israel Venture Capital 

(IVC), 21(2):171–188, February 2003. 

[43] M. Malassiotis and G. Strintzis. Stereo vision system for precision dimensional inspection 

of 3d holes. Machine Vision and Applications, 15(2):101–113, December 

2003. 

[44] J. Malik and P. Perona. Preattentive texture discrimination with early vision mechanism. 

Journal of the Optical Society of America, 7(5):923–932, May 1990. 

[45] D. Marr and E. C. Hildreth. Theory of edge detection. Proc. Royal Soc. London, 

B207:187–217, 1980. 

[46] N. Otsu. A threshold selection method from grey-level histograms. IEEE Transactions 

on Systems, Man, and Cybernetics (SMC), 9(1):62–66, January 1979. 

[47] N. R. Pal and S. K. Pal. A review on image segmentation techniques. Pattern 

Recognition, 26(9):1277–1294, September 1993. 

[48] J.R. Parker. Algorithms for image processing and computer vision. John Wiley & 

Sons, Inc., 1997. 

[49] P. Perona. Deformable kernels for early vision. IEEE Transpaction on Pattern Analysis 

and Machine Intelligence, 17(5):488–499, May 1995. 

[50] D. T1 Pham and R. J Alcock. Automated visual inspection of wood boards: Selection 

of features for defect classification by a neural network. In Proc.oftheIMECH 

E Part E Journal of Process Mechanical Engineering, volume 213, pages 231–245. 

Professional Engineering Publishing, 1999.

148 Bibliography 

[51] K.K. Pingle. Visual perception by a computer. In Proc. of Analogical and Inductive 

Inference (AII), pages 277–284, 1969. 

[52] W. J. Plut and G. M. Bone. Grasping of 3-d sheet metal parts for robotic fixtureless 

assembly. In Proc. of the CSME Forum - Engineering Applications of Mechanics, 

pages 221–228, Hamilton, Ont., 1996. 

[53] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 

Numerical Recipes in C: The Art of Scientific Computing. Cambridge University 

Press, Cambridge, UK, 2nd edition, 1993. 

[54] T. Pun. Entropic thresholding: A new approach. Computer Graphics and Image 

Processing (CGIP), 16(3):210–239, July 1981. 

[55] T. W. Ridler and S. Calvard. Picture thresholding using an iterative selection method. 

IEEE Transactions on Systems, Man, and Cybernetics (SMC), 8(8):629–632, August 

1978. 

[56] P. Rockett. The accuracy of sub-pixel localisation in the canny edge detector. In 

Proc. of the British Machine Vision Conference (BMVC), 1999. 

[57] A. Rosenfeld and P. de la Torre. Histogram concavity analysis as an aid in threshold 

selection. IEEE Transactions on Systems, Man, and Cybernetics (SMC), 13(3):231– 

235, March 1983. 

[58] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. Real-time 3d model acquisition. ACM 

Transactions on Graphics, 21(3):438–446, July 2002. 

[59] P.K. Sahoo, S. Soltani, A. K. C. Wong, and Y.C. Chen. A survey of thresholding 

techniques. Computer Vision, Graphics, and Image Processing (CVGIP), 41(2):233– 

260, February 1988. 

[60] B. Sankur and M. Sezgin. A survey over image thresholding techniques and quantitative 

performance evaluation. Journal of Electronic Imaging, 13(1):146–165, 1994. 

[61] J. L. Sanz and D. Petkovic. Machine vision algorithms for automated inspection of 

thin-film disk heads. IEEE Transactions on Pattern Analysis and Machine Intelligence 

(PAMI), 10(6), 1988. 

[62] M. Seul, L. O’Gorman, and M. J. Sammon. Practical Algorithms For Image Analysis. 

Cambridge University Press, 2000. 

[63] M.I. Sezan. A peak detection algorithm and its application to histogram-based image 

data reduction. Computer Vision, Graphics, and Image Processing (CVGIP), 

49(1):36–51, January 1990. 

[64] S. W. Smith. The Scientist and Engineer’s Guide to Digital Signal Processing. California 

Technical Publishing, 1997. 

[65] E. Trucco and A. Verri. Introductory Techniques for 3-D Computer Vision. Prentice 

Hall PTR, 1998.

Bibliography 149 

[66] F. Truchetet, F. Nicolier, and O. Laligant. Supixel edge detection for dimensional 

control by artificial vision. Journal of Electronic Imaging, 10(1):234–239, Januar 

2001. 

[67] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machine 

vision metrology using off-the-shelf tv cameras and lenses. Robotics and Automation, 

IEEE Journal, 3(4):323–344, 1987. 

[68] H. Voorhees and T. Poggio. Detecting textons and texture boundaries in natural 

images. In Proc. of the International Conference on Computer Vision (ICCV), pages 

250–258, 1987. 

[69] J. Weickert. Anisotropic Diffusion in Image Processing. ECMI. Teubner, Stuttgart, 

1998. 

[70] J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models and 

accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 

(PAMI), 14(10):965–980, October 1992. 

[71] G. A. W. West and T. A Clarke. A survey and examination of subpixel measurement 

techniques. ISPRS Int. Conf. on Close Range Photogrammetry and Machine Vision, 

1395:456 – 463, 1990. 

[72] P. C. West. High speed, real-time machine vision. Technical report, Imagenation and 

Automated Vision Systems, 2001. 

[73] M. Young. The pinhole camera, imaging without lenses or mirrors. The Physics 

Teacher, pages 648–655, December 1989. 

[74] Z. Y. Zhang. A flexible new technique for camera calibration. IEEE Transactions 

on Pattern Analysis and Machine Intelligence (PAMI), 22(11):1330–1334, November 

2000.

Master Thesis - Fachbereich Informatik

Create successful ePaper yourself

Delete template?

Save as template?