01.12.2012 Views

Master Thesis - Fachbereich Informatik

Master Thesis - Fachbereich Informatik

Master Thesis - Fachbereich Informatik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Fachbereich</strong> <strong>Informatik</strong><br />

Department of Computer Science<br />

<strong>Master</strong> <strong>Thesis</strong><br />

Visual Inspection of Fast Moving Heat<br />

Shrink Tubes in Real-Time<br />

Alexander Barth<br />

A thesis submitted to the Bonn-Rhein-Sieg University of Applied Sciences<br />

in partial fulfillment of the requirements for the degree of<br />

<strong>Master</strong> of Science in Computer Science<br />

Date of submission: December 16, 2005<br />

Examination Committee: Prof. Dr.-Ing. Rainer Herpers (Supervisor)<br />

Prof. Dr. Dietmar Reinert


Declaration<br />

I hereby declare, that the work presented in this thesis is solely my work and<br />

thattothebestofmyknowledgethisworkisoriginal,exceptwhereindicated<br />

by references to other authors.<br />

This thesis has neither been submitted to another committee, nor has it been<br />

published before.<br />

St. Augustin, December 16, 2005<br />

Alexander Barth<br />

i


Acknowledgments<br />

First of all, I would like to thank my thesis advisor Prof. Dr.-Ing. Rainer Herpers and Prof.<br />

Dr. Dietmar Reinert for guiding this work and for their helpful input and discussions.<br />

Many thanks to the company DSG-Canusa for funding this work and for supporting me<br />

in designing the hardware setup. A special thanks to Thomas Schminke, Markus Greßnich,<br />

Manfred Hirn, Andreas Dederichs and Klaus Lanzerath.<br />

During my thesis work, several people, fellow students and friends working in the Computer<br />

Vision Lab at Bonn-Rhein-Sieg University of Applied Sciences provided me with<br />

useful comments and ideas. Particularly, I would like to thank Stefan Hahne, Axel Hau,<br />

Bernd Göbel, Ingmar Burak, Christian Becker and Nils Neumaier, who also modeled the<br />

nice 3D figures. I also thank Philipp Wegner and Patrick Schmitz for assisting me during<br />

the experiments and for measuring several hundreds of heat shrink tubes by hand.<br />

Furthermore, I appreciate the time I could spent at the York University in Toronto<br />

and at the Centre of Vision Research, Toronto. Thanks to all professors and students<br />

who contributed to my interest in Computer Vision. A special thanks goes to Markus<br />

Enzweiler. We really had a great time not only during our stay in Canada and always<br />

kept in touch for productive discussions.<br />

Many thanks to Gemma Adcock from New Zealand, who gave me great native support<br />

in writing my thesis in English.<br />

Finally, I would like to thank Steffi for being so understanding during stressful times.<br />

iii


Abstract<br />

Heat shrink tubing is widely used in electrical and mechanical applications for insulating<br />

and protecting cable splices. Especially in the automotive supply industry accuracy<br />

demands are very high and quality assurance is an important factor in establishing and<br />

maintaining customer relationships. In production, the heat shrink tubes are cut into<br />

lengths (between 20 and 100mm) from a continuous tube. During this process, however,<br />

deviations from the target length can occur.<br />

In this thesis, a prototype of a vision-based length measuring sensor for a range of heat<br />

shrink tubes is presented. The measuring is performed on a conveyor belt in real-time at<br />

velocities of up to 40m/min. The tubes can differ in color, diameter and length.<br />

In a multi-measurement strategy, the total length of each tube is computed based on up<br />

to 11 single measurements while the tube is in the visual field of the camera. Tubes that<br />

do not meet the allowed tolerances between ±0.5mm and ±1mm depending on the target<br />

length are sorted out by air-pressure. Both the engineering and the software development<br />

are part of this thesis work.<br />

About 70% of all manufactured tubes are transparent, i.e. they show a poor contrast<br />

to the background. Thus, sophisticated but fast algorithms are developed which reliably<br />

detect even low contrast tube edges under the presence of background clutter (e.g. belt<br />

texture or dirt) with subpixel accuracy. For this purpose, special tube edge templates are<br />

defined and combined with model knowledge about the inspected objects. In addition,<br />

perspective and lens specific distortions have to be compensated.<br />

An easy to operate calibration and teach-in step has been investigated which is importanttobeabletoproducedifferenttubetypesatthesameproductionlineinshort<br />

intervals.<br />

The prototype system has been tested in extensive experiments at varying velocities<br />

and for different tube diameters and lengths. The measuring precision of non deformed<br />

tubes can reach 0.03mm at a conveyor velocity of 30m/min. Even with elliptical deformations<br />

of the cross-section or deflections it is still possible to achieve an average precision<br />

of < 0.1mm. The results have been compared to manually acquired ground truth measurements,<br />

which also show a standard deviation of about 0.1mm under ideal laboratory<br />

conditions. Finally, a 100% control during production is possible with this system - reaching<br />

the same accuracy and precision than humans without getting tired.<br />

v


Contents<br />

Acknowledgments iii<br />

Abstract v<br />

List of Tables xi<br />

List of Figures xiii<br />

1. Introduction 1<br />

1.1. MachineVision-StateofArt.......................... 1<br />

1.2. ProblemStatement................................ 3<br />

1.3. Requirements................................... 4<br />

1.4. RelatedWork................................... 5<br />

1.5. <strong>Thesis</strong>Outline .................................. 6<br />

2. Technical Background 9<br />

2.1. VisualMeasurements............................... 9<br />

2.1.1. AccuracyandPrecision ......................... 9<br />

2.1.2. InverseProjectionProblem ....................... 10<br />

2.1.3. CameraModels.............................. 10<br />

2.1.4. CameraCalibration ........................... 13<br />

2.2. Illumination.................................... 16<br />

2.2.1. LightSources............................... 16<br />

2.2.2. IncidentLighting............................. 18<br />

2.2.3. Backlighting ............................... 19<br />

2.3. EdgeDetection.................................. 20<br />

2.3.1. EdgeModels ............................... 20<br />

2.3.2. DerivativeBasedEdgeDetection.................... 21<br />

2.3.3. CommonEdgeDetectors ........................ 23<br />

2.3.4. SubpixelEdgeDetection......................... 27<br />

2.4. TemplateMatching................................ 29<br />

3. Hardware Configuration 31<br />

3.1. Conveyor ..................................... 31<br />

3.2. Camerasetup................................... 33<br />

3.2.1. CameraSelection............................. 33<br />

3.2.2. CameraPositioning ........................... 38<br />

3.2.3. LensSelection .............................. 38<br />

3.3. Illumination.................................... 43<br />

3.4. BlowOutMechanism .............................. 48<br />

vii


viii Contents<br />

4. Length Measurement Approach 51<br />

4.1. SystemOverview................................. 51<br />

4.2. ModelKnowledgeandAssumptions ...................... 53<br />

4.2.1. CameraOrientation ........................... 53<br />

4.2.2. ImageContent .............................. 53<br />

4.2.3. TubesUnderPerspective ........................ 54<br />

4.2.4. EdgeModel................................ 56<br />

4.2.5. Translucency ............................... 56<br />

4.2.6. TubeOrientation............................. 57<br />

4.2.7. BackgroundPattern ........................... 59<br />

4.3. CameraCalibration ............................... 59<br />

4.3.1. CompensatingRadialDistortion .................... 59<br />

4.3.2. Fronto-Orthogonal View Generation . . . ............... 60<br />

4.4. TubeLocalization ................................ 65<br />

4.4.1. GrayLevelProfile ............................ 65<br />

4.4.2. ProfileAnalysis.............................. 66<br />

4.4.3. PeakEvaluation ............................. 68<br />

4.5. MeasuringPointDetection ........................... 75<br />

4.5.1. EdgeEnhancement............................ 75<br />

4.5.2. TemplateBasedEdgeLocalization................... 78<br />

4.5.3. TemplateDesign ............................. 80<br />

4.5.4. SubpixelAccuracy ............................ 87<br />

4.6. Measuring..................................... 89<br />

4.6.1. DistanceMeasure............................. 90<br />

4.6.2. PerspectiveCorrection.......................... 90<br />

4.6.3. TubeTracking .............................. 91<br />

4.6.4. TotalLengthCalculation ........................ 92<br />

4.7. Teach-In...................................... 93<br />

4.7.1. RequiredInput.............................. 93<br />

4.7.2. Detection Sensitivity ........................... 93<br />

4.7.3. PerspectiveCorrectionParameters................... 94<br />

4.7.4. CalibrationFactor ............................ 94<br />

5. Results and Evaluation 97<br />

5.1. ExperimentalDesign............................... 97<br />

5.1.1. Parameters ................................ 97<br />

5.1.2. EvaluationCriteria............................ 99<br />

5.1.3. GroundTruthMeasurements ......................102<br />

5.1.4. Strategies .................................104<br />

5.2. TestScenarios...................................105<br />

5.3. ExperimentalResults...............................107<br />

5.3.1. Noise ...................................107<br />

5.3.2. MinimumTubeSpacing .........................109<br />

5.3.3. ConveyorVelocity ............................110<br />

5.3.4. TubeDiameter ..............................116<br />

5.3.5. Repeatability...............................121<br />

5.3.6. Outlier ..................................123


Contents ix<br />

5.3.7. TubeLength ...............................124<br />

5.3.8. Performance ...............................126<br />

5.4. DiscussionandFutureWork...........................130<br />

6. Conclusion 133<br />

Appendix 135<br />

A. Profile Analysis Implementation Details 137<br />

A.1.GlobalROI ....................................137<br />

A.2.ProfileSubsampling ...............................138<br />

A.3.ScanLines.....................................138<br />

A.4.NotesonConvolution ..............................140<br />

B. Hardware Components 141<br />

B.1.Camera ......................................141<br />

B.2.IlluminationHardware..............................142<br />

Bibliography 145


x Contents


List of Tables<br />

1.1. Rangeoftubetypesconsideredinthisthesis. ................. 4<br />

1.2. Tolerancespecifications ............................. 5<br />

3.1. Lensselection-Overview ............................ 42<br />

3.2. Lensselection-FieldofViewatminimumobjectdistance.......... 43<br />

3.3. Lensselection-Workingdistances ....................... 43<br />

3.4. Blowoutcontrolprotocol ............................ 48<br />

4.1. Thresholdcomparisonofprofileanalysis.................... 73<br />

4.2. Comparisonofdifferentedgedetectors..................... 76<br />

4.3. Templatecurvaturetestsetparameters .................... 82<br />

5.1. Overviewondifferenttestparameters ..................... 98<br />

5.2. Constantsoftwareparametersettingsthroughouttheexperiments. ..... 98<br />

5.3. Testsetusedtodeterminethehumanvarianceinmeasuring. ........102<br />

5.4. Resultsof50mmtubesatdifferentvelocities(black) .............112<br />

5.5. Resultsof50mmtubesatdifferentvelocities(transparent)..........113<br />

5.6. Resultsof50mmtubeswithdifferentdiameterat30m/min .........116<br />

5.7. Resultsofblowoutexperiment .........................125<br />

5.8. Resultsof30mmand70mmtubesat30m/min ................127<br />

B.1. Camera specifications for the AVT Marlin F-033C and F-046B. . ......141<br />

B.2. Light Source (A20800.2) with DDL Lamp ...................142<br />

B.3.Backlightspecifications .............................142<br />

B.4.Lampspecifications................................143<br />

xi


xii List of Tables


List of Figures<br />

2.1. AccuracyandPrecision ............................. 10<br />

2.2. Parallellinesatperspective ........................... 11<br />

2.3. Pinholegeometry................................. 12<br />

2.4. Thinlensmodel ................................. 13<br />

2.5. Incidentlightingsetups ............................. 18<br />

2.6. Edgemodels ................................... 21<br />

2.7. Comparisonofdifferentedgedetectors..................... 24<br />

2.8. Orientationselectivefilters ........................... 27<br />

2.9. Subpixelaccuracyusinginterpolationtechniques ............... 28<br />

3.1. Hardwaresetupoftheprototype ........................ 32<br />

3.2. BAYERmosaic.................................. 34<br />

3.3. Comparisonofcolorandgraylevelcamera................... 36<br />

3.4. Colorinformationoftransparenttubes..................... 37<br />

3.5. Telecentriclens.................................. 40<br />

3.6. FieldofViewgeometry ............................. 42<br />

3.7. Tubesatdifferentfrontlightingsetups..................... 44<br />

3.8. Backlightingthroughaconveyorbelt ..................... 45<br />

3.9. Polarizedbacklighting.............................. 46<br />

3.10.Backlightpanel ................................. 47<br />

3.11.Blowoutsetup .................................. 48<br />

4.1. Systemoverview ................................. 52<br />

4.2. Potentialimagestates .............................. 53<br />

4.3. Tubemodels ................................... 54<br />

4.4. Measuringplanedefinition............................ 55<br />

4.5. Characteristicintensitydistributionoftransparenttubes........... 57<br />

4.6. Tubeorientationerror .............................. 58<br />

4.7. Cameracalibration-Calibrationimages.................... 60<br />

4.8. Cameracalibration-Subpixelcornerextraction................ 61<br />

4.9. Cameracalibration-Extrinsicparameters................... 61<br />

4.10.Cameracalibration-Radialdistortionmodel ................. 62<br />

4.11.Camerapositioning-OnlineGridCalibration................. 64<br />

4.12.Camerapositioning-controlpoints....................... 65<br />

4.13.Scanlinesforprofileanalysis .......................... 66<br />

4.14.Profileanalysis .................................. 70<br />

4.15.Motivationforaregion-basedprofilethreshold ................ 72<br />

4.16.Ghosteffect.................................... 73<br />

4.17.Characteristictubeedgeresponses ....................... 79<br />

xiii


xiv List of Figures<br />

4.18.TemplateDesign ................................. 81<br />

4.19.TemplateOccurrence............................... 83<br />

4.20.Templatewithextremeheightweightingcoefficient.............. 83<br />

4.21.TemplateWeighting ............................... 84<br />

4.22.Templaterotation-Motivation ......................... 85<br />

4.23.Templatecurvatureoccurrence ......................... 86<br />

4.24.Subpixelaccuratetemplatematching...................... 88<br />

4.25.Perspectivecorrectionfunction ......................... 90<br />

5.1. Measuring slide used for acquiring ground truth measurements by hand. . . 102<br />

5.2. Intraandinterhumanmeasuringvariance...................103<br />

5.3. Supplytube....................................104<br />

5.4. Accuracyevaluationoflengthmeasurementsatsyntheticsequences.....108<br />

5.5. Resultsofminimumspacingexperiment ....................109<br />

5.6. Minimumtubespacingforblacktubes.....................110<br />

5.7. Measuringresultsat20m/min..........................111<br />

5.8. Resultsof8mmblacktubesat30m/min....................113<br />

5.9. Resultsof8mmtransparenttubesat30m/min ................114<br />

5.10.Brightnessvarianceofanemptyconveyorbeltatbacklight .........115<br />

5.11.Bent6mmtube..................................116<br />

5.12.Experimentalresultsofblacktubeswith6and12diameter .........117<br />

5.13.Groundtruthdistanceofblacktubeswith6and12mmdiameter......118<br />

5.14.Influenceofcross-sectiondeformationsat12diametertubes.........119<br />

5.15. Experimental results of transparent tubes with 6 and 12mm diameter . . . 120<br />

5.16. Ground truth distance of transparent tubes with 6 and 12mm diameter . . 120<br />

5.17.Failureoftubeedgedetectionduetoapoorcontrast.............121<br />

5.18.Repeatabilityofthemeasurementofonetube.................122<br />

5.19. Repeatability of the measurement of a metallic cylinder . ..........123<br />

5.20.Resultsofoutlierexperiment ..........................124<br />

5.21.Resultsof30mmand70mmtubesat30m/min ................127<br />

5.22.Performanceevaluationresults .........................129<br />

5.23.Backgroundsuppressioninthefrequencydomain...............131<br />

A.1.Comparisonofdifferentscanlines........................139


1. Introduction<br />

Heat shrinkable tubing is widely used for electrical and mechanical insulation, sealing,<br />

identification and connection solutions. Customers are mainly from the automotive, electronics,<br />

military or aerospace sector. In terms of competition in world markets, high<br />

quality assurance standards are essential in establishing and maintaining customer relationships.<br />

Especially in the automotive supply industry, accuracy demands are very high,<br />

and tolerated outliers are specified in only a few parts-per-million.<br />

In this master thesis, a prototype of a vision-based sensor for real-time length measurement<br />

of heat shrink tubes in line production is presented. The main objectives are<br />

accuracy, reliability and meeting time constraints.<br />

The thesis work has been accomplished in cooperation with the company DSG-Canusa,<br />

Meckenheim, Germany.<br />

1.1. Machine Vision - State of Art<br />

This section gives an overview on the term Machine Vision (MV), the use of vision systems<br />

in industrial applications, and a brief historical review. In addition, the advantages and<br />

drawbacks of MV are discussed and related applications are presented. The term Machine<br />

Vision is defined by Davies [16] as follows:<br />

“Machine Vision is the study of methods and techniques whereby artificial vision systems<br />

can be constructed and usefully employed in practical applications. As such, it<br />

embraces both the science and engineering of vision.”<br />

Researchers and engineers argue whether the terms Machine Vision and Computer<br />

Vision can be used synonymously [7]. Both terms are part of a larger field called Artificial<br />

Vision and have many things in common. The main objective is to make artificial systems<br />

‘see’. However, the priorities of the two subjects differ.<br />

Computer Vision has arisen in the academic field and concentrates mainly on theoretical<br />

problems with a strong mathematical background. Usually, as the term Computer Vision<br />

indicates, a computer processes an input image or a sequence of images. Nevertheless,<br />

many methods and algorithms developed in Computer Vision can be adapted to practical<br />

applications.<br />

Machine Vision, on the other hand, implies practical solutions for many applications,<br />

and covers not only the image processing itself, but also the engineering that makes a<br />

system work [16]. This includes the right choice of the sensor, optics, illumination, etc.<br />

MV systems are often used in industrial environments making robustness, reliability and<br />

cost-effectiveness very important. If an application is highly time-constrained or computationally<br />

expensive, specific hardware (e.g. DSPs, ASICs, or FPGAs) is used instead of<br />

an off-the-shelf computer [42]. A current trend is to develop imaging sensors, that have<br />

1


2 CHAPTER 1. INTRODUCTION<br />

on-chip capabilities for image processing algorithms. Thus, the image processing moves<br />

from the computer into the camera superseding the bottleneck of data transfer.<br />

During the 1970s and 1980s, western companies faced a new challenge with the Asian<br />

market [7]. Especially countries like Japan established new production methods, leading<br />

to an increased significance of quality in manufacturing at the international markets.<br />

Many western companies proved unable to meet the challenge and failed to survive, while<br />

others realized the importance of quality assurance and started to investigate the use of<br />

new technologies like Machine Vision. MV has many advantages and is able to improve<br />

product quality, to enhance processing efficiency, and to increase operational safety.<br />

In the early 1980s, the development in the field of Artificial Vision was slow and mainly<br />

academic, and the industrial interest was low until the late 1980s and early 1990s [7]. A<br />

significant progress in computer hardware allows for real-time implementations of image<br />

processing algorithms, developed over the past 25 years, on standard platforms. The<br />

decreasing costs for computational power made MV systems more and more attractive,<br />

leading to a growth of MV applications and companies developing such systems. Today,<br />

the field of MV has become a confident multi-million dollar industry [7].<br />

The objectives of MV systems include position recognition, identification, shape and<br />

dimension check, completeness check, image and object comparison, and surface inspection<br />

[18]. Usually, the goal is to detect and sort out production errors or to guide a robot arm<br />

(or other devices) in a particular task [42].<br />

MV systems can be found in all industrial sectors and cover a huge range of inspected<br />

objects. Dimensional measuring tasks can be found for example in the inspection of<br />

bottles on assembly lines [72], wood [15, 50], screw threads [34], or thin-film disk heads<br />

[61]. Measuring objects is often related to 3D CAD models [23, 43]. An example for<br />

guiding a robot arm in grasping 3D sheet metal parts is given in [52]. Giving a detailed<br />

overview on all potential applications is beyond the scope of this thesis.<br />

Guaranteed product quality can help to establish and maintain customer relationships,<br />

enhancing the competitive position of a company. The main advantage of visual inspection<br />

in quality control is, beside its versatile range of applications, that it is non-contact, clean,<br />

fast [7].<br />

Although the interpretative capability of today’s vision systems can not achieve the<br />

ability of the human visual system in the overall case, it is possible to develop systems that<br />

perform better than people at some quantitative task. However, this assumes controlled<br />

and circumscribable conditions, reducing the problem to a defined and repetitive task.<br />

Usually, such conditions can be established at manufacturing lines.<br />

A human operator can be expected to be only 70-80% efficient, even under ideal conditions<br />

[7]. In practice, there are many factors that can reduce this productive efficiency<br />

of humans like tiredness, sickness, boredom, alcohol or drugs. For example, if a human<br />

is instructed to observe objects on a conveyor, this task is tiring and it is not unlikely<br />

that the operator is distracted after a while. On the other hand, a MV system could,<br />

theoretically, perform the same task 24 hours a day and 365 days a year without getting<br />

tired.<br />

If the inspection is performed in surroundings were working can be unpleasant, intolerable,<br />

dangerous or harmful to health for a human being, MV is a welcome option. This<br />

includes working under high (or low) temperatures, chemical exhalation, smoke, biological


1.2. PROBLEM STATEMENT 3<br />

hazards, risk of explosion, x-rays, radioactive material, loud noise levels, etc. [7]. On the<br />

other hand, in applications that require aseptic conditions as in the food or pharmaceutical<br />

industry, a human operator can be a ‘polluter’ as a source of dirt particles (hair, danders,<br />

bacteria, etc.). In this case, a MV system is a clean alternative.<br />

Machines usually exceed humans in all kinds of accurate vision-based measurements.<br />

Human vision performs well in comparing objects and in detecting differences for example<br />

in shape, color or texture [27]. Large deviations can be detected quickly. As the difference<br />

is getting smaller, however, the time of inspection increases or the deviation can not be<br />

detected at all without technical tools. With respect to the task considered in this thesis,<br />

a human is not able to determine the length of an object at sub millimeter precision just<br />

by looking at it. Manual measurements are slow and not practicable in line production if<br />

100% control is desired, and can thus be used only for random inspection of few objects.<br />

MV systems on the other hand can measure the length (or other features) of an object<br />

without contact up to nm precision - depending on the optical system and the size of the<br />

object [16]. Furthermore, humans soon reach limits, if the number of objects to inspect per<br />

minute increases significantly. Many manufacturing processes are so fast that the human<br />

eye has problems to even perceive the objects, not to mention the ability to accomplish<br />

any inspection task. MV systems, however, can handle several hundred objects per minute<br />

with high accuracy.<br />

Although MV systems have many advantages for manufacturers, there are also drawbacks.<br />

Usually, a MV system is designed and optimized for a specific task in a constraint<br />

environment. If the requirements of the application change, the system has to be adapted,<br />

which can be difficult and expensive [7]. Furthermore, the system can be sensitive to a<br />

lot of influences of the (industrial) environment like heat, humidity, dirt, dust, or ambient<br />

lighting. Respective precautions have to be taken to protect the system. Finally, like in<br />

general in automation, vision systems that exceed the power of a human at some specific<br />

task, replace human operators and will therefore supersede mostly low-skilled jobs in the<br />

future. Addressing this problem in more detail is outside the scope of this thesis.<br />

1.2. Problem Statement<br />

A large variety of heat shrink tubes of different sizes, material and shrinking properties is<br />

available on the market. The focus in this thesis will be on the DSG- Canusa DERAY-<br />

SPLICEMELT series. These tubes are commonly used for insulation of cable splices in<br />

the automotive industry (see Figure 1.1 for an example). A film of hotmelt adhesive inside<br />

the heat shrink tubes provides a waterproof sealing around the splice after shrinking. In<br />

addition, the DERAY- SPLICEMELT series shows a strong resistance against thermal,<br />

chemical and mechanical strains. The easy and fast handling allows for an application in<br />

series production. Accordingly, if the heat shrinking is performed in an automated fashion,<br />

the accuracy demands increase.<br />

In production, the heat shrink tubes are cut into lengths from a continuous tube. During<br />

this process, however, deviations from a specific target length can occur. In terms of quality<br />

assurance, any deviations above a tolerable level must be detected so that failings can be<br />

sorted out.


4 CHAPTER 1. INTRODUCTION<br />

(a) (b) (c)<br />

Figure 1.1: Application of a transparent heat shrink tube of type DERAY- SPLICEMELT.<br />

After shrinking the heat shrink tube provides a robust, waterproof insulation of the cable<br />

splice. (Source: DSG-Canusa)<br />

Property Attributes<br />

Color transparent, black<br />

Length 20-100mm<br />

Diameter 6, 8, 12mm<br />

Table 1.1: Range of tube types considered in this thesis.<br />

Delivering defectives must be avoided at highest priority to satisfy the customer and to<br />

retain a good reputation. In this context, tolerable failure rates are specified in parts per<br />

million. Rejected goods can be very expensive.<br />

Up to now, length measurements have been performed manually by a human operator.<br />

This has several drawbacks. First, only random samples can be controlled by hand, since<br />

10 parts per second and more considerably exceed the human capabilities. Furthermore,<br />

one operator is busy doing the monotone measuring task at one machine and can not be<br />

deployed to other tasks. This leads to a low effective productivity. In practice, more than<br />

one production line that cuts the heat shrink tubes into lengths is running in parallel,<br />

requiring even more human resources which is very expensive. In addition, there is always<br />

a non-negligible possibility of subjective errors when human operators carry out the<br />

inspections - they also show symptoms of fatigue over time in this highly repetitive task.<br />

The measuring quality varies detectable between morning and late shift.<br />

In this thesis work a machine vision inspection system is developed that is able to replace<br />

the human operator at this particular measuring task allowing for a reliable 100% control.<br />

1.3. Requirements<br />

The system must cover a range of tube types, differing in diameter, length or material<br />

properties. An overview of the variety of tube types can be found in Table 1.1.<br />

ThetwomainclassesofDERAY-SPLICEMELT heat shrink tubes considered in this<br />

thesis are black or transparent in color - transparent tubes cover about 70% of the production.<br />

Unlike black tubes, the transparent ones are translucent and appear slightly<br />

yellowish or reddish due to a film of hotmelt adhesive inside the tube.<br />

Most tubes have a printing on the surface that can consist of both letters and numbers<br />

(e.g. DSG2). Since this printing is plotted onto the continuous tube before being cut into


1.4. RELATED WORK 5<br />

Length [mm] Tolerance [mm]<br />

20 − 30 ±0.5<br />

31 − 50 ±0.7<br />

51 − 100 ±1.0<br />

Table 1.2: Tolerance specifications of different tube lengths<br />

lengths, the position of the printing is not consistent among the tubes and must not affect<br />

the measuring results.<br />

The tube length ranges from 20mm to 100mm. In this thesis, however, the focus will<br />

be on 50mm tubes since this is the dominant length in production. The outer diameter<br />

varies between 6mm and 12mm.<br />

The tolerances differ between 0.5 and1.0mm depending on the tube length as can be<br />

seen in Table 1.2. This table includes the tolerable deviations from a given target length<br />

in mm.<br />

Themeasurementshavetobeaccomplishedatlineproductiononaconveyorinrealtime.<br />

The system is intended to reach a 100% control without reducing production velocity.<br />

Currently the conveyor runs at approximately 20m/min, i.e. 3-17 tubes per second are<br />

cut depending on the segment size. Theoretically the cutting machine is able to run at<br />

up to 40m/min. A faster velocity results in less processing time per tube segment. The<br />

system design must be robust with respect to industrial use. Theoretically, it must be<br />

able to run stable 24 hours/day, 7 days/week and 365 days/year.<br />

Although there are many different tube types, only one kind of tube is processed at<br />

one production line over a certain period of time. This means, the tube segments to be<br />

inspected on the conveyor are all of the same kind. However, to be flexible to customer<br />

demands, a production line must be able to be rearranged to a different kind of tube<br />

several times a day. This emphasizes the importance of an easy to operate calibration and<br />

teach-in step of the inspection system for practical application.<br />

The goal of the visual inspection is a reliable good/bad decision for each tube segment<br />

whether it has to be sorted out or not. In the following, tube segments wrongly classified<br />

as proper, but nevertheless deviating from the given target length above the allowed<br />

tolerances (see Table 1.2), are denoted as false positives. On the other hand, false negatives<br />

are tube segments that are classified for sorting out, although the actual length meets the<br />

tolerances. To reach optimal product quality, the number of false positives must be reduced<br />

to zero. Large numbers of false negatives indicate that the system is not adjusted properly<br />

and has to be reconfigured.<br />

1.4. Related Work<br />

In Section 1.1 several examples of vision-based measuring systems in industrial applications<br />

have been presented. Much more work in this area has been done over the past 20 years<br />

[4]. However, MV related publications of academic interest often consider only specific<br />

subproblems, but do not present a detailed insight of the whole system. On the other<br />

hand, commercial manufacturers of MV systems hide the technical details in order to keep<br />

the competitive advantage [18].


6 CHAPTER 1. INTRODUCTION<br />

There are several useful books addressing the fundamental methods, techniques and<br />

algorithms used to develop machine vision applications in a comprehensive fashion [7, 16,<br />

18, 62].<br />

Dimensional measuring of objects requires knowledge of an object’s boundaries. A<br />

common indicator for object boundaries, both in human and artificial vision, are edges.<br />

Edge detection is a widely investigated area of vision research dating back from 1959 in the<br />

field of TV signal processing [37] to the present. The edge detection methods considered<br />

in this thesis are related to the work of Sobel [36, 51], Marr and Hildreth [45] and Canny<br />

[13].<br />

In addition, anisotropic approaches have been proposed [69], i.e. orientation selective<br />

edge detectors. These filters have many applications for example in texture analysis or<br />

in the design of steerable filters that efficiently control the orientation and scale of filters<br />

to extract certain features in an adaptive way [25, 49]. Many of these approaches are<br />

motivated in early human vision. In their investigation of the visual cortex, Hubel and<br />

Wiesel discovered orientation selective cells in the striate cortex V1 [33]. In several theories<br />

it is assumed that humans perceive low-level features such as edges or lines by combinations<br />

of the response of these cells [27]. Many computer vision researchers, however, adapted<br />

the idea of orientation selective cells or filters which can be combined to produce a certain<br />

response. Such sets of filters are often called filter banks. Malik and Perona [44] used a<br />

filter bank based on even symmetric difference of offset Gaussians (DOOG) for texture<br />

discrimination.<br />

The discrete pixel grid resolution of CCD camera images limits the measuring accuracy.<br />

Thus, several techniques have been proposed that compute subpixel edge positions [6, 41,<br />

66, 56, 71].<br />

A common task in vision applications is to search whether a particular pattern is part<br />

of an image, and if so, where it is located [28]. Template matching is one method to<br />

tackle this problem. Cross-correlation techniques are widely used as measure of similarity<br />

[64, 18, 62]. In stereo vision, correlation is used to solve the problem of correspondences<br />

between the left and right view [21, 65]. Other practical applications can be found in<br />

feature trackers, pattern recognition, or registration of e.g. medical image data.<br />

Accurate visual measurements often require a camera calibration step to relate 3D points<br />

in the real world to image coordinates and to compensate for lens distortions. One early<br />

approach was presented by Tsai [39, 67]. An extensive introduction into calibration is<br />

given by Faugeras [21] or Hartley and Zisserman [30]. The calibration approach in this<br />

thesis work is closely related to the work of Zhang [74] and Heikkilä and Silvén [31].<br />

1.5. <strong>Thesis</strong> Outline<br />

The remainder of this thesis is organized as follows: Chapter 2 provides the theoretical<br />

background on models and techniques used in later sections with regard to measuring with<br />

video cameras. This chapter also gives an overview on different illumination techniques<br />

used for machine vision applications.<br />

In Chapter 3, the physical design of the system is introduced. Especially the camera<br />

and lens selection as well as the illumination setup are discussed in detail in this chapter.


1.5. THESIS OUTLINE 7<br />

The vision part of the system is presented in Chapter 4. After describing assumptions<br />

and model knowledge used throughout the inspection, the different steps of the length<br />

measuring are proposed. This chapter also contains the calibration and teach-in of the<br />

system as well as the algorithms and techniques used to perform the measuring task with<br />

respect to real-time demands.<br />

The system is systematically evaluated in Chapter 5. Therefore, several quantitative<br />

and qualitative evaluation criteria as well as different test scenarios are introduced. The<br />

automated measurements are compared to human measurements in terms of accuracy and<br />

precision. Finally, the results are discussed, and ideas for future work are given. The<br />

thesis concludes with a summary on the presented work in Chapter 6.


8 CHAPTER 1. INTRODUCTION


2. Technical Background<br />

2.1. Visual Measurements<br />

This section introduces the basic concepts and techniques making visual measurements<br />

possible. It is elementary to understand the fundamental process of image acquisition<br />

as well as the underlying camera models and geometries to be able to understand what<br />

parameters influence the measurement of real world objects in video images. Based on<br />

these concepts one can determine the factors that influence accuracy and precision.<br />

Extracting information about real world objects from images in machine vision applications<br />

is closely related to the area of photogrammetry. In [5], photogrammetry is defined<br />

as the art, science, and technology of obtaining reliable information about physical objects<br />

and the environment through the processes of recording, measuring, and interpreting photographic<br />

images and patterns of electromagnetic radiant energy and other phenomena.<br />

There are many traditional applications of photogrammetry in geography, remote sensing,<br />

medicine, archaeology, or crime detection. In machine vision applications, there is a<br />

wide range of measuring tasks including dimensional measuring (size, distance, diameter,<br />

etc.) or angles. Although sophisticated algorithms can increase accuracy, the quality and<br />

repeatability of measurements is always related to the hardware used (e.g. camera sensor,<br />

optical system, digitizer) as well as the environmental conditions (e.g. illumination).<br />

2.1.1. Accuracy and Precision<br />

Throughout this thesis the terms accuracy and precision are used quite often and are<br />

mostly related to measuring quality. Although these terms may be used synonymously in<br />

a different context, with respect to measurements they have a very distinct meaning.<br />

Accuracy relates a measured length to a known reference truth or ground truth. The<br />

closer a measurement approximates the ground truth, the more accurate is the measuring<br />

system. Precision represents the repeatability of measurements, i.e. how much different<br />

measurements of the same object vary. The more precise a measuring system is, the closer<br />

lie the measured values together.<br />

Figure 2.1 visualizes the definition of accuracy and precision in a mathematical sense.<br />

The distribution of a set of measurements can be expressed in terms of a Gaussian probability<br />

density function. The peak of this distribution corresponds to the mean value of the<br />

measurements. The distance between the mean value and the reference ground truth value<br />

determines the accuracy of this measurement. The standard deviation of the distribution<br />

can be used as measure of precision.<br />

It is important to state that accuracy does not have to imply precision and vice versa.<br />

For example the measuring result of a tube of 50mm length could be 50 ± 20mm. This<br />

statement is very accurate but not very precise. On the other hand a measuring system can<br />

be very precise, but not accurate if it is not calibrated correctly. Thus, good measurements<br />

for industrial inspection tasks have to be both accurate and precise.<br />

9


10 CHAPTER 2. TECHNICAL BACKGROUND<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

49.5 49.6 49.7 49.8 49.9 50 50.1 50.2 50.3 50.4 50.5<br />

Reference<br />

Value<br />

Figure 2.1: Visualization of the difference between accuracy and precision in terms of measurements.<br />

A good measuring system must be both accurate and precise.<br />

2.1.2. Inverse Projection Problem<br />

A general problem of human vision denoted as inverse projection problem [27] can also<br />

be applied for artificial systems. It states that the (perspective) projection of threedimensional<br />

world objects onto a two-dimensional image plane can not be inverted welldefined.<br />

The loss of dimension indicates a loss of information, which can not be compensated<br />

in general, since it is possible to produce the same stimulus on the human retina or<br />

the camera sensor by different origins. So several objects of different size or shape can look<br />

identical in an image. One important property to consider in this context is the influence<br />

of perspective. The term perspective is further discussed again in Section 2.1.3.<br />

Humans can compensate for the inverse projection problem by certain heuristics and<br />

model knowledge of the scene in many situations. Similar techniques can be adapted<br />

to artificial systems. Especially in machine vision applications where conditions are welldefined<br />

and known, model knowledge of the inspection task can be derived and integrated.<br />

2.1.3. Camera Models<br />

There are several approaches to model the geometry of a camera. Addressing all these<br />

models is outside the scope of this thesis. In the following only the most common camera<br />

models are introduced that provide a theoretic basis for visual measurements with CCD<br />

cameras.<br />

Pin Hole Camera The simplest form of a camera known as camera obscura was invented<br />

in the 16th century. The underlying principle of this camera was already known long before<br />

by Aristotle (384-322 BC): Light enters an image plane through an (ideally) infinite small<br />

hole, so only one ray of light from the world passes through the hole for each point in<br />

the 2D image plane leading to an one-to-one correspondence. Objects at a wide range<br />

of distances from the camera can be imaged sharp and undistorted [65, 73]. The camera<br />

obscura is formally named pin hole camera. In the non-ideal case the pinhole has a finite<br />

size, thus, each image point collects light from a cone of rays.<br />

Accuracy<br />

Precision


2.1. VISUAL MEASUREMENTS 11<br />

(a)<br />

Figure 2.2: Parallel lines intersect at horizon at perspective. Image taken by F. Wagenfeld<br />

at Alaska Highway between Watson Lake and Whitehorse, Canada.<br />

In the 15th century, Filippo Brunelleschi used the pin hole camera model to demonstrate<br />

the laws of perspective discovered earlier [24, 38]. Two main effects characterize the pin<br />

hole perspective or central perspective:<br />

Close objects appear larger than far ones<br />

Parallel lines intersect at horizon<br />

Figure 2.2 visualizes these effects of perspective at an example.<br />

A drawback of the pinhole camera with respect to practical use in combination with a<br />

photosensitive device is its long exposure time, since only a little amount of light enters<br />

the image plane at one time [65]. However, the pinhole model can be used to derive<br />

fundamental properties in a mathematical sense that describe the imaging process. These<br />

properties can be extended by more realistic models to imply real imaging devices.<br />

Figure 2.3(a) gives an overview over the pinhole geometry. The camera center O, also<br />

denoted as optical center or center of projection, is the origin of a 3D coordinate system<br />

with the axis X, Y and Z. This 3D coordinate system is denoted as camera reference<br />

frame or simply camera frame. The image plane ΠI is defined to be parallel to the XY<br />

plane, i.e. perpendicular to the Z axis. The point o where the Z axis intersects the image<br />

planeisreferredtoasimage center. TheZ axis, i.e. the line through O and o is denoted<br />

as optical axis.<br />

The fundamental equations of a perspective camera describe the relationship between<br />

apointP =(X, Y, Z) T in the camera frame and a point p =(x, y) T in the image plane:<br />

x = f X<br />

Z<br />

y = f Y<br />

Z<br />

(2.1)<br />

(2.2)<br />

where f is the focal length of the camera. p can be seen as the point of intersection of a<br />

line through P and the center of projection with the image plane ΠI [30]. This relationship<br />

can be easily derived from Figure 2.3(b). In the following, lower-case letters will always


12 CHAPTER 2. TECHNICAL BACKGROUND<br />

(a) (b)<br />

Figure 2.3: (a) Pinhole geometry. (b) Projection of a point P in the camera frame onto the<br />

image plane ΠI (herewithregardtoY ).<br />

indicate image coordinates, while upper-case letters refer to 3D coordinates outside the<br />

image plane.<br />

Weak-Perspective Camera If the relative distance between points in the camera frame<br />

with respect to the Z axis (scene depth) is small compared to the average distance from<br />

the camera, these points are approximately projected onto the image plane like lying all<br />

on one Z-plane Z0. Thus,theZ coordinate of each point can be approximated by Z0 as:<br />

x ≈ f X<br />

Z0<br />

y ≈ f Y<br />

Z0<br />

(2.3)<br />

This has the effect of all points being projected with a constant magnification [24]. If<br />

thedistancebetweencameraandplaneZ0 increases to infinity, there is a direct mapping<br />

between 3D points in the camera frame and in the image plane:<br />

x = X (2.4)<br />

y = Y<br />

This projection is denoted as orthographic projection [65]. To overcome the described<br />

problems of pinhole cameras, real imaging systems are usually provided with a lens which<br />

collects rays of light and brings them into focus on the image plane.<br />

Thin Lens Camera The simplest optical system can be modeled by a thin lens. The<br />

main characteristics of a thin lens are [65]:


2.1. VISUAL MEASUREMENTS 13<br />

Figure 2.4: Thin lens camera model.<br />

Any ray entering the lens parallel to the axis on one side goes through the focus on<br />

the other side<br />

Any ray entering the lens from the focus on one side emerges parallel to the axis on<br />

the other side<br />

The geometry of a thin lens imaging system is shown in Figure 2.4. F and ˆ F are the<br />

focus points before and behind the lens. From this model one can derive the fundamental<br />

equation of thin-lenses [65]:<br />

1 1 1<br />

+ = (2.5)<br />

Z z f<br />

where Z is the distance or depth of a point to the lens and z the distance between the<br />

lens and the image plane. The focal length f, i.e. the distance between the focus point<br />

and the lens is equal at both sides of the thin lens in the ideal model.<br />

Thick Lens Camera Real lenses are represented much better by a thick lens model. The<br />

thin lens model does not consider several aberrations that come with real lenses. This<br />

includes defocusing of rays that are neither parallel nor go through the focus (spherical<br />

aberration), different refraction based on the wavelength or color of light rays entering<br />

the lens (chromatic aberration), or focusing of objects at different depths. Another factor<br />

that is important with real lenses with respect to accurate measuring applications, is lens<br />

distortion. Ideally, a world point, its image point and the optical center are collinear, and<br />

world lines are imaged as lines [30]. For real cameras this model does not hold. Especially<br />

at the image boundaries, straight lines appear curved (radial distorted). The effect of<br />

distortion will be re-addressed in following sections.<br />

2.1.4. Camera Calibration<br />

Until now, all relationships between 3D points and image coordinates have been defined<br />

with respect to a common (camera) reference frame. Usually, the location of a point<br />

in the world is not known in camera coordinates. Thus, if one wants to relate world<br />

coordinates to image coordinates, or vice versa, one has to consider geometric models and


14 CHAPTER 2. TECHNICAL BACKGROUND<br />

physical parameters of the camera. At this stage, one can distinguish between intrinsic<br />

and extrinsic parameters [24].<br />

Intrinsic Parameters The intrinsic parameters describe the projection of a point in the<br />

camera frame onto the image plane, i.e. the transformation of camera coordinates into<br />

image coordinates. This transformation extends the ideal perspective camera model introduced<br />

in the previous section with respect to properties of real CCD cameras. One can<br />

derive the following projection matrix Mi:<br />

⎛<br />

⎞<br />

−f/sx k ox<br />

Mi = ⎝ 0 −f/sy oy ⎠ (2.6)<br />

0 0 1<br />

where f represents the focal length, sx and sy theeffectivepixelsizeinxand y direction<br />

respectively, k the skew coefficient, and (ox,oy) the coordinates of the image center. α = sy<br />

is the aspect ratio of the camera. If α = 1, the sensors of the CCD array are ideally square.<br />

The skew coefficient k determines the angle between the pixel axis and is usually zero, i.e.<br />

the x- and y axis are perpendicular. (ox,oy) can be seen as an offset that translates the<br />

projection of the camera origin onto the image origin in pixel dimensions. If sx = sy =1<br />

and ox = oy = k =0,Mi represents an ideal pinhole perspective camera.<br />

Extrinsic Parameters The extrinsic parameters take the transformation between a fixed<br />

world coordinate system (or object coordinate system) and the camera coordinate system<br />

into account. This includes the translation and rotation of the coordinate axis [65], i.e. a<br />

translation vector T =(Tx Ty Tz) T and a 3 × 3 rotation matrix R such as:<br />

⎛<br />

Me = ⎝<br />

r11 r12 r13 −R T 1 T<br />

r21 r22 r23 −R T 2 T<br />

r31 r32 r33 −R T 3 T<br />

⎞<br />

sx<br />

⎠ (2.7)<br />

where rij (i, j ∈{1, 2, 3}) are the matrix elements of R at (i, j) andRi indicates the<br />

ith row of R.<br />

Thus, the relationship between world and image coordinates can be written in terms of<br />

two matrix multiplications [65]:<br />

⎛<br />

⎝<br />

x1<br />

x2<br />

x3<br />

⎞<br />

⎠ = Mi Me<br />

⎛<br />

⎜<br />

⎝<br />

X<br />

Y<br />

Z<br />

1<br />

⎞<br />

⎟<br />

⎠<br />

(2.8)<br />

with (X, Y, Z, 1) T representing a 3D world point in homogeneous coordinates, and image<br />

coordinates can be computed as x = x1/x3 andy = x2/x3 respectively. M = MiMe is<br />

denoted as projection matrix in the following.<br />

Image Distortion The resulting image coordinates may be distorted by the lens, i.e.<br />

linear projection is not guaranteed. If high accuracy and precision is required, the simple<br />

mathematical relationships introduced before are not sufficient.<br />

To overcome this effect, a model of the distortion has to be defined. A common radial<br />

distortion model [30] can be written as:


2.1. VISUAL MEASUREMENTS 15<br />

� xd<br />

yd<br />

�<br />

�<br />

˜x<br />

= L(˜r)<br />

˜y<br />

�<br />

(2.9)<br />

where (˜x, ˜y) T is the undistorted and (xd,yd) the corresponding distorted image position.<br />

The function L(˜r) determines the amount of distortion depending on the radial distance<br />

˜r = � ˜x 2 +˜y 2 from the center for radial distortion.<br />

The correction of the distortion at a measured position p =(x, y) can be computed as:<br />

ˆx = xc + L(r)(x − xc) (2.10)<br />

ˆy = yc + L(r)(y − yc) (2.11)<br />

where (ˆx, ˆy) T is the undistorted (corrected) position, (xc,yc) T the center of the radial<br />

distortion, and r = � (x − xc) 2 +(y − yc) 2 theradialdistancebetweenp and the center<br />

of distortion.<br />

An arbitrary distortion factor L(r) can be approximated by the following equation [30]:<br />

L(r) =1+<br />

m�<br />

κir i<br />

i<br />

(2.12)<br />

whichisdefinedforr > 0andL(0) = 1. The distortion coefficients κi as well as<br />

the center of radial distortion (xc,yc) T can be seen as additional intrinsic parameters of<br />

the camera model. The number of coefficients m depends on the required accuracy and<br />

the available computation time. Usually less than the first three or four coefficients are<br />

considered. In common calibration procedures such as the calibration method proposed<br />

by Tsai [67], only the even coefficients (i.e. κ2, κ4,...) are taken into account while odd<br />

coefficients are set to zero. In this case, one or two coefficients are sufficient to compensate<br />

for the distortion in most cases [31].<br />

Beside the radial distortion model, there are several other models including tangential,<br />

linear, and thin prism distortion [31]. Usually a radial distortion model is combined with<br />

a tangential model as proposed in [11, 12].<br />

There are several approaches to compute the unknown intrinsic and extrinsic parameters<br />

of a camera. The most common methods are based on known correspondences between<br />

real world points and image coordinates. A chessboard-like calibration grid has become<br />

quite common as a calibration pattern. The corners of the grid provide a set of coplanar<br />

points.<br />

The world coordinates can be easily determined if one defines a coordinate system with<br />

the X- andY axis lying orthogonal in the chessboard plane and Z = 0 for all points. A<br />

corner not to close to the center represents the world origin. Based on these definitions,<br />

each corner of the calibration pattern can be described in the form (X, Y, Z) T . Threedimensional<br />

calibration rigs composed of orthogonal chessboard-planes are also used quite<br />

often.<br />

In a captured image of the calibration pattern, the corners can be extracted at pixel<br />

(or subpixel) level, and mapped to world coordinates. If there is a sufficient number of<br />

correspondences, one can try to solve a homogeneous linear system of equations based on


16 CHAPTER 2. TECHNICAL BACKGROUND<br />

the projection matrix M. The solution is also denoted as implicit camera calibration, since<br />

the resulting parameters do not have any physical meaning [31]. In the next stage, the<br />

intrinsic and extrinsic camera parameters can be extracted from the computed solution of<br />

M [21].<br />

There are linear and nonlinear methods to solve for the projection matrix. Linear methods<br />

assume an ideal pinhole camera and ignore distortion effects. Thus, these methods<br />

can be solved in closed-form. Abdel-Aziz and Karara [1] introduced a direct linear transform<br />

(DLT) to compute the parameters in a noniterative algorithm. If higher accuracy is<br />

needed, nonlinear optimization techniques have been investigated to accomplish for distortion<br />

models. Usually, the parameters are estimated by minimizing the pixel error between<br />

a measured point correspondence and the reprojected position of the world point using<br />

the projection matrix in least-square sense. This is an iterative process that may end<br />

up with a bad solution unless a good initial guess is available [70]. Therefore, linear and<br />

nonlinear methods are combined as the DLT can be used for initialization of the nonlinear<br />

optimization. One well-known two-step calibration method was proposed by Tsai [67, 39]<br />

2.2. Illumination<br />

In machine vision applications, the right choice of illumination can simplify the further<br />

image processing considerably [16]. It can preprocess the input signal (e.g. enhance<br />

contrast or object boundaries, eliminate background, diminish unwanted features etc.)<br />

without consuming any computational power. On the other hand, even the best imaging<br />

sensor can not compensate for the loss in image quality induced by poor illumination.<br />

There are several different approaches of illumination in MV. Depending on the application<br />

one has to consider what has to be inspected (e.g. boundaries, surface patterns<br />

or color), what are the material properties of the objects to be inspected (e.g. lighting<br />

reflection characteristics or translucency) and what are the environmental conditions (e.g.<br />

background characteristics, object dimension, camera position or the available space to<br />

install light sources). In the following, different types of light sources and lighting setups<br />

used in MV are introduced.<br />

2.2.1. Light Sources<br />

The light sources commonly used in machine vision include high-frequency fluorescent<br />

tubes, halogen lamps, xenon light bulbs, laser diodes and light emitting diodes (LED)<br />

[18].<br />

High-frequency fluorescent lights High-frequency fluorescent light sources are widely<br />

used in machine vision applications, since they produce a homogeneous, uniform, and<br />

very bright illumination. They feature white or ultraviolet light at low development of<br />

heat, thus, there is no need for fan cooling.<br />

Standard fluorescent light tubes are not suitable for vision applications since they flicker<br />

cyclically with the power supply frequency. This yields unwanted changes in intensity or<br />

color in the video image, whereas the effect increases, if the capturing rate of the camera is<br />

close to the power supply frequency (e.g. 50Hz in Germany). High-frequency fluorescent


2.2. ILLUMINATION 17<br />

tubes alternate at about 25kHz, what is far beyond what can be captured by a video<br />

camera.<br />

Fluorescent lights exist at different sizes, shapes, and setups. Beside the common light<br />

tube, there are also fluorescent ring lights or rectangle area lights. Low costs and a long<br />

life-time make fluorescent lights even more attractive.<br />

Light Emitting Diodes A LED is a semiconductor device that emits incoherent, monochromatic<br />

light with the wavelength depending on the chemical composition of the semiconductor.<br />

Today, different wavelengths of the visible spectrum for humans ranging from<br />

about 400 to 780nm, as well as ultraviolet or infrared wavelengths, can be covered by<br />

LEDs. The emitted visible light appears for example red, green, blue or yellow. Furthermore,<br />

it is possible to produce LEDs that appear “white” by combining a blue LED with<br />

a yellowish phosphor coating.<br />

LEDs have many advantages compared to other light sources. Due to the small size, they<br />

can be used for a variety of lighting geometries [18]. This includes ring lights, dome lights,<br />

area or line lights, spot lights, dark-field lights and backlights. Theoretically, each single<br />

LED in a cluster can be controlled independently. Thus, it is possible to generate different<br />

illumination conditions (for example different lighting angles or intensities) with a single<br />

setup by enabling and disabling certain LEDs, e.g. automated or software controlled. It<br />

is also possible to use LEDs in strobe light mode.<br />

Another advantage of LEDs is their energy efficiency and long lifetime with only a little<br />

loss in intensity over time. Thus, LEDs have low maintenance costs. Operated at DC<br />

power, LEDs do not produce any flickering visible as intensity changes in the video image.<br />

Halogen lights Halogen lamps are an extension of light bulbs and filled with a halogen<br />

gas (e.g. bromine or iodine). With respect to machine vision applications, halogen lamps<br />

are often used in combination with fiber optic light guides [18]. The emitted light of a<br />

light source is transferred through this fiber optic light guides, allowing for very flexible<br />

illumination setups and geometries. This includes ring lights, dome lights, area or line<br />

lights, spot lights, dark-field lights and backlights as for LEDs. Furthermore, there are a<br />

range of fiber optic bundles at different sizes to route and position the light for user-defined<br />

lighting.<br />

One disadvantage of halogen lamps is a large heat development. Thus, usually active<br />

cooling is required. Nevertheless, due to the bright “white” light emitted by halogen<br />

lights (color temperature of about 6000K), they are also called cold light sources in the<br />

literature. If heat development of the light source can be harmful to heat sensitive objects,<br />

fiber optics can be useful to keep the light source away from the point of inspection. Like<br />

LEDs, halogen lamps do not produce flickering effects, if the light source is DC-regulated.<br />

Thus, halogen lamps qualify for high accuracy inspection tasks.<br />

Xenon lights, often used for strobe light mode, are quite similar to halogen lamps. These<br />

lights allow for very short and bright light pulses, which are used to reduce the effect of<br />

motion blur.<br />

Besidethedifferentwaysoflightgeneration,therearemultiplepossiblesetupsofhow<br />

light sources are arranged. Especially LED lights and fiber optics are very flexible as<br />

introduced before. They can be adapted to a wide range of machine vision tasks at almost<br />

any size and geometry.


18 CHAPTER 2. TECHNICAL BACKGROUND<br />

(a) (b)<br />

(c) (d)<br />

Figure 2.5: Incident lighting setups. (a) Indirect diffuse illumination over a hemisphere. (b)<br />

Diffuse ring- or area light setup. (c) Darkfield illumination. (d) Coaxial illumination.<br />

2.2.2. Incident Lighting<br />

Incident lighting or front lighting is characterized by one or more light sources illuminating<br />

the object of interest from the cameras viewing direction. This includes diffuse front<br />

lighting, directional front lighting, polarized light, axial/in-line illumination and structured<br />

lighting [18]. Figure 2.5 gives an overview on different incident lighting setups.<br />

Diffuse Lighting Diffuse lighting reduces specular surface reflections and can be seen as a<br />

uniform, undirected illumination. It is usually generated by one or more light sources that<br />

are placed behind a diffuser at a certain distance. This yields the effect of one uniform<br />

area of light. A diffuser can be a pane of white translucent (acrylic) glass, mylar or other<br />

synthetic material. Instead of using a diffuser in front of a light source, indirect lighting<br />

can also result in diffuse illumination. A simple, but effective method reported in the<br />

literature [7] converts a Chinese wok into a hemisphere for diffuse lighting. The inner<br />

side of the wok is painted white. The camera can be placed at a hole on the top of the<br />

hemisphere (the bottom of the wok). The light sources are arranged in a way they can not<br />

directly illuminate the object, but the emitted light is reflected at the white screen inside<br />

the hemisphere (see Figure 2.5(a)). A diffuse illumination can also be achieved using a<br />

ring or area light source as in Figure 2.5(b)<br />

Directional Lighting Directional lighting is achieved by one or more directed light sources<br />

at a very low angle of incidence. The main characteristic of such type of illumination is the


2.2. ILLUMINATION 19<br />

effect of completely smooth objects appearing dark in the image, since the light rays are<br />

not reflected toward the camera, while unevenness leads to brighter image intensities. Due<br />

to this effect, directional lighting is also denoted as dark field illumination in the literature<br />

[18] (See Figure 2.5(c)). Directional lighting mostly qualifies for surface inspection tasks<br />

that consider the surface structure revealing irregularities or bumpiness.<br />

Polarized light In combination with a polarizing filter in front of the camera lens, incident<br />

lighting with polarized light can be used to avoid specular reflections. Such reflections<br />

preserve the polarization of a light ray, thus, with the right choice of filter, only scattered<br />

light rays can pass the filter and reach the camera. A maximal filter effect can be reached if<br />

the polarization of the light source and the filter are perpendicular to each other. Polarized<br />

light is often combined with a ring light setup to avoid both shadows and reflections.<br />

Structured lighting Structured lighting is used to obtain three-dimensional information<br />

of objects. A certain pattern of light (e.g. crisp lines, grids or cycles [18]) is projected<br />

onto the object. Based on deflections of this known pattern in the image, one can infer<br />

the object’s three-dimensional characteristics. For example, in [58], a 3D scanner using<br />

structured lighting is presented that integrates a real-time range scanning pipeline. In<br />

machine vision applications, structured lighting can be used for dimensional measuring<br />

tasks were the contrast between object and background is poor.<br />

Axial illumination In this type of illumination setup (see Figure 2.5(d)), also denoted as<br />

coaxial illumination in the literature, the light rays are directed to run along the optical<br />

axis of the camera [18]. This is achieved using an angled beam splitter or half-silvered<br />

mirror in combination with a diffuse light source. The beam of light has usually the same<br />

size as the camera’s field of view. The main application of axial illumination systems<br />

is to illuminate highly reflective, shiny materials such as plastic, metal or other specular<br />

materials, or for example to inspect the inside of bore holes. Axial illumination is typically<br />

used for inspection of small objects such as electrical connectors or coins.<br />

One potential problem with most incident lighting methods are shadows. Although the<br />

shadow contrast can be lowered using several light sources at different positions around<br />

the object (e.g. ring lights) or axial illumination setups, objects with sharp corners or<br />

concavities might have regions that can not be illuminated and therefore especially regions<br />

close to the object’s boundaries appear darker in the image. Thus, dark objects on a bright<br />

background may appear enlarged [16]. The effect of shadows is less significant for bright<br />

objects on dark background. In applications that require totally shadow-free conditions<br />

for highly accurate measurements of object contours, another lighting setup called back<br />

lighting can be used, as introduced in the following.<br />

2.2.3. Back lighting<br />

The setup were the object is placed between the light source and the camera, compared<br />

to incident lighting, is denoted as back light illumination. In this arrangement, the light<br />

enters the camera directly leading to bright intensity values at non-occluded regions. The<br />

object, on the other hand, casts a shadow on the image plane, thus, leading to darker<br />

intensity values. Non-translucent materials result in a very strong, shadow-free contrast,


20 CHAPTER 2. TECHNICAL BACKGROUND<br />

which makes back lighting interesting for dimensional measuring tasks. Furthermore,<br />

surface structures or textures can be suppressed. If the only light source is placed below<br />

the object, there will be no shadows around. Back lighting can also be used for localization<br />

of wholes and cracks, or for measuring translucency.<br />

In combination with polarized light, back lighting can also be adapted to enhance the<br />

contrast of transparent materials which are difficult to detect in an image at other lighting<br />

setups. In a typical scenario, polarized light entering the camera directly is filtered out by<br />

an adequate polarization filter in front of the camera lens, while the polarization of the<br />

light is changed when passing through the object. Thus, in opposition to back lighting<br />

without polarization, background regions appear dark in the image while (translucent)<br />

objects result in brighter intensities. Figure 3.9 in Section 3.3 visualizes the effect of back<br />

lighting in combination with a polarization filter.<br />

2.3. Edge Detection<br />

An edge can be defined as particularly sharp change in (image) brightness [24], or more<br />

mathematically speaking a strong discontinuity of the spatial image gray level function<br />

[36].<br />

Beside edges due to object boundaries, there are much more causes for edges in images<br />

such as shadows, reflectance, texture or depth. Thus, simply extracting edges in images<br />

is no general indicator for object boundaries. To yield a semantical meaning, edge information<br />

can be combined with other features including shape, color, texture, or motion.<br />

Model knowledge about expected properties can be useful to group these low-level features<br />

to objects.<br />

In real images there are many changes in brightness (or color), but with respect to a<br />

certain application it may be of interest to extract only the strongest edges or edges of a<br />

certain orientation. Thus, information such as edge strength and orientation have to be<br />

taken into account to link the results of the filter response. Furthermore, in real images<br />

there is also a certain amount of noise in the data which has to be handled carefully.<br />

2.3.1. Edge Models<br />

Edges can be modeled according to their intensity profiles [65].<br />

considered in this thesis are shown in Figure 2.6.<br />

The two edge models<br />

The ideal step edge is the basis for most theoretic approaches. It can be defined as:<br />

�<br />

i1<br />

Eideal(x) =<br />

i2<br />

,x


2.3. EDGE DETECTION 21<br />

i 2<br />

i 1<br />

0<br />

(a)<br />

Figure 2.6: (a) Ideal step edge model. (b) Ramp edge model.<br />

reach a higher precision than the discrete pixel grid (See Section 2.3.4). Ramp edges also<br />

appear if an object is not in focus, or if imaged at motion (motion blur).<br />

There are three common criteria for optimal edge detectors proposed by Canny:<br />

Good detection<br />

Good localization<br />

Uniqueness of response<br />

The first criterion states an optimal edge detector must not be affected by noise, i.e. it<br />

must be robust against false positives (edges due to noise). On the other hand, edges of<br />

interest have to be conserved.<br />

The good localization criterion takes into account the precision of the detected edge<br />

position. The distance between the real edge location and the detected position must<br />

vanish.<br />

The last criterion requires to have distinct and unique results where only the local<br />

maxima of an edge is relevant. Responses of more than one pixel describe an edge location<br />

only poorly and should be suppressed.<br />

The Canny edge detector [13] is designed to optimize all three criteria (See Section 2.3.3<br />

for more details). However, there is definitive tradeoff between the detection and localization<br />

criterion, since it is not possible to improve both criteria simultaneously [65].<br />

2.3.2. Derivative Based Edge Detection<br />

A common way to localize strong discontinuities in a mathematical function is to search for<br />

local extrema in the function’s first-order derivative or for zero crossings in the secondorder<br />

derivative. This principle can be easily adapted to images, thus, replacing the<br />

problem of edge detection by a search for extrema or zero crossings.<br />

In the discrete case, differentiation of the image gray level function f(x, y) can be approximated<br />

by finite differences. Since an image can be seen as a two-dimensional function,<br />

it can be differentiated in both the horizontal and vertical direction, i.e. with respect to<br />

the x- and y-axis respectively. Following the notation of [21], the partial derivatives ∂f<br />

∂x<br />

and ∂f<br />

∂y<br />

can be calculated as:<br />

i2<br />

i 1<br />

0<br />

(b)


22 CHAPTER 2. TECHNICAL BACKGROUND<br />

∂f<br />

∂x (x, y) ∼ = ∆xf(x, y) =f(x +1,y) − f(x, y) (2.14)<br />

∂f<br />

∂y (x, y) ∼ = ∆yf(x, y) =f(x, y +1)− f(x, y) (2.15)<br />

The partial derivative operators ∆x and ∆y can be expressed by a discrete convolution<br />

of the image with the filter kernel [1 -1] and[1-1] T for x- andy-direction respectively<br />

(the ‘center’ elements of the asymmetric kernel are printed in bold). There are other approximations<br />

possible including the mirrored versions of the kernels above or a symmetric<br />

kernel 1/2[1 0 − 1] [36].<br />

Accordingly, the second-order derivative can be approximated by the discrete operators<br />

∆ 2 x =[1 − 2 1] and ∆ 2 y =[1 − 2 1] T .<br />

Under the presence of noise (as usual in real images), edge detectors using the approximations<br />

introduced before work only poor. This is due to the fact that noise is mostly<br />

uncorrelated and is characterized by local changes in intensity. Assuming a uniform region,<br />

a good edge detector should result in a value of zero at this region. With noise the local<br />

intensity variations lead to noticeable responses (and local extrema) if using estimates of<br />

partial derivatives. Therefore, all common edge detectors include a certain smoothing step<br />

to reduce the influence of noise. The selection of the smoothing function, however, can<br />

differ between approaches. The most common smoothing function is a Gaussian.<br />

The Gaussian function is a widespread choice, since it comes with several advantages.<br />

This includes the property of a Gaussian that convolving a Gaussian with a Gaussian<br />

results in another Gaussian. Assume a Gaussian function G1 with standard deviation σ1<br />

and G2 with standard deviation σ2. The result of convolving G1 and G2 is a Gaussian<br />

with standard deviation σG1∗G2 :<br />

σG1∗G2 =<br />

�<br />

σ2 1 + σ2 2<br />

(2.16)<br />

Thus, instead of resmoothing a smoothed image to get a stronger smoothing, it is<br />

possible to use a single convolution with a Gaussian with larger standard deviation. This<br />

obviously saves computational costs, which is important since convolution is an expensive<br />

operation.<br />

Another advantage of a Gaussian kernel is its separability. This means, a two-dimensional,<br />

circularly symmetric Gaussian function Gσ(x, y) can be factored into two one-dimensional<br />

Gaussians (see [24]) as:<br />

Gσ(x, y) =<br />

=<br />

�<br />

1 (x2 + y2 )<br />

exp<br />

2πσ2 2σ2 �<br />

� �<br />

1 x2 √2πσexp 2σ2 �� � �<br />

1 y2 √2πσexp 2σ2 ��<br />

(2.17)<br />

Since convolution is an associative operation, the same results can be achieved by convolving<br />

an image with a two-dimensional kernel, or by applying a convolution once with<br />

the separated version in x-direction and convolve the result with the y-version. In practice,<br />

a convolution with a discrete N × N kernel can be replaced by two convolutions with a


2.3. EDGE DETECTION 23<br />

N × 1 kernel. This increases the performance significantly for large images and N. More<br />

information about convolution and filter separation can be found for example in [64].<br />

The general procedure of edge enhancement in common derivative-based edge detectors<br />

can be summarized into two steps:<br />

1. Smoothing of the image by convolving with a smoothing function<br />

2. Differentiation of the smoothed image<br />

Mathematically, this can be expressed as follows (here with respect to x):<br />

Iedge(x, y) = K∂/∂x ∗ (S ∗ I(x, y)) (2.18)<br />

= (K∂/∂x ∗ S) ∗ I(x, y)<br />

= ∂S<br />

∗ I(x, y)<br />

∂x<br />

where K∂/∂x indicates the filter kernel approximating the partial derivative with respect<br />

to x. S represents the kernel of the smoothing function. Again, the associativity of the<br />

convolution can be used to optimize processing. Thus, instead of first smoothing the<br />

image with kernel S and then calculating the partial derivative, it is possible to reduce<br />

the problem to a single convolution with the partial derivative of the smoothing kernel<br />

∂S<br />

∂x . Hence, the first-order derivative of a Gaussian is suited as an edge detector which is<br />

less sensitive to noise compared to finite difference filters [24]. The response of the edge<br />

detector can be parametrized by the standard deviation of the Gaussian to control the<br />

scale of detected edges, i.e. the level of detail. A larger σ suppresses high-frequency edges<br />

for example.<br />

2.3.3. Common Edge Detectors<br />

Due to the large number of approaches in this section only a selection of common edge<br />

detectors can be presented. Figure 2.7 visualizes the edge responses of different edge<br />

detectors that will be introduced in the following in more detail.<br />

Sobel Edge Detector A very early edge detector that is still used quite often in the<br />

present is the Sobel operator. It was first described in [51] and attributed to Sobel. It is<br />

the smallest difference filter with odd number of coefficients that averages the image in<br />

the direction perpendicular to the differentiation [36]. The corresponding filter kernel for<br />

x and y are:<br />

⎡<br />

SOBELX = ⎣<br />

⎡<br />

SOBELY = ⎣<br />

1 0 −1<br />

2 0 −2<br />

1 0 −1<br />

⎤<br />

1 2 1<br />

0 0 0<br />

−1 −2 −1<br />

⎦ (2.19)<br />

⎤<br />

⎦ (2.20)


24 CHAPTER 2. TECHNICAL BACKGROUND<br />

(a) (b)<br />

(c) (d)<br />

Figure 2.7: Comparison of different edge detectors. (a) Common LENA test image. (b)<br />

Gradient magnitude based on Sobel operator. (c) Edges enhanced via the discrete Laplace<br />

operator. (d) Result of Canny edge detector (Hyteresis thresholds: 150, 100).


2.3. EDGE DETECTION 25<br />

These operators compute the horizontal and vertical components of a smooth gradient<br />

[21], denoted as gx and gy in the following. The total gradient magnitude g at a pixel<br />

position p in an image can be computed by the following equation:<br />

�<br />

g(p) = g2 x(p)+g2 y(p) (2.21)<br />

An example of the gradient magnitude based on the Sobel operator can be found in<br />

Figure 2.7(b). The following approximations can be used in order to save computational<br />

costs:<br />

g(p) ≈ |gx(p)| + |gy(p)| (2.22)<br />

g(p) ≈ max(|gx(p)|, |gy(p)|) (2.23)<br />

These approximations yield equally accurate results on average [22]. Beside the gradient<br />

magnitude it is possible to compute the angle of the gradient as:<br />

� �<br />

gy(p)<br />

φ(p) =arctan<br />

(2.24)<br />

gx(p)<br />

Although there is a certain angular error with the Sobel gradient [36], it is used very<br />

often in practice, since it provides a good balance between the computational load and<br />

orientation accuracy [16].<br />

The Equations 2.21-2.24 are defined not only for the Sobel operator, but for every other<br />

operator that computes the horizontal and vertical gradient components.<br />

Canny Edge Detector Today, the Canny edge detector [13] is probably the most used<br />

edge detector, and is proven to be optimal in a precise, mathematical sense [65]. It is<br />

designed to detect noisy step edges of all orientations and consists of three steps:<br />

1. Edge enhancement<br />

2. Nonmaximum suppression<br />

3. Hysteresis thresholding<br />

The first step is based on a first-order Gaussian derivative as introduced before. For<br />

fast implementations, the separability of the filter kernel can be used to improve the performance.<br />

Gradient magnitude and orientation can be computed as in Equation 2.21 and<br />

2.24, or using the approximations. The standard deviation parameter σ of the Gaussian<br />

function influences the scale of the detected edges. A lower σ preserves more details (highfrequencies),<br />

but also noisy edges, while a larger σ leaves only the strongest edges. The<br />

appropriate σ depends on the image content and what kind of edges should be detected.<br />

The goal of the nonmaximum suppression step is to thin out ridges around local maxima<br />

and return a number of one pixel wide edges [65]. The dominant direction of the gradient<br />

calculated in step one determines the considered neighbors of a pixel. The gradient magnitude<br />

at this position must be larger than both neighbors, otherwise it is no maximum<br />

and its position is set to zero (suppressed) in the edge image.


26 CHAPTER 2. TECHNICAL BACKGROUND<br />

In the last stage of the Canny edge detector, an edge tracking combined with hysteresis<br />

thresholding is applied. Starting at a local maxima that meets the upper threshold of the<br />

hysteresis function, the algorithm follows the contour of neighboring pixels that have not<br />

been visited before and meet the lower threshold. Due to step two, a set of one-pixel wide<br />

contours is the output of the edge detection (see Figure 2.7(d) for an example with an<br />

upper threshold of 150 and a lower threshold of 100).<br />

As in most cases, a thresholding is always a tradeoff between false positives (in this<br />

case edges due to noise) and false negatives (suppressed or fragmented edges of interest).<br />

As with the standard deviation of the Gaussian in step one, the hysteresis thresholds<br />

have to be adapted depending on the particular image content. Methods for estimating<br />

the threshold parameters dynamically from image-statistics are reported for example in<br />

[68] or [29]. There are many variations and extensions of the Canny edge detector. One<br />

popular approach motivated by the Canny’s work is the edge detector of Deriche [19].<br />

Laplace The Laplace edge detector is a common representative for second-order derivative<br />

edge detectors. Recalling edges are localized at zero crossings in the second-order<br />

derivative of an image’s two-dimensional intensity function, the goal is to find zero crossings<br />

that are surrounded by strong peaks.<br />

The Laplacian of a function can be seen as sensible analogue to the second derivative<br />

and is rotationally invariant [24]. It is defined as<br />

∇ 2 (f(x, y)) = ∂2 f<br />

∂x 2 + ∂2 f<br />

∂y 2<br />

(2.25)<br />

As with first-order derivative edge detectors, a smoothing operation to reduce noise<br />

is performed before applying the edge detector, usually with a Gaussian. Analog to<br />

Equation 2.18, the two steps can be combined by applying the Laplacian function to<br />

the Gaussian smoothing kernel before convolution. This leads to an edge detector denoted<br />

as Laplacian of Gaussian (LoG) proposed by Marr and Hildreth [45]. It is quite common<br />

to replace the LoG with a Difference of Gaussians (DoG) [24] to reduce the computational<br />

load.<br />

A discrete Laplace operator can be derived directly from the first-order operators ∆ 2 x<br />

and ∆ 2 y as<br />

L∇2 = ∆ 2 x ⊕ ∆ 2 y (2.26)<br />

= [1 − 2 1]⊕ [1 − 2 1] T<br />

⎡<br />

⎤<br />

0 1 0<br />

= ⎣ 1 −4 1 ⎦<br />

0 1 0<br />

where the ⊕ operator denotes the tensor product [10] in this context. The result of the<br />

discrete Laplace operator applied to the LENA test image can be found in Figure 2.7(c).<br />

Edge detectors based on the Laplacian are isotropic, meaning the response is equally<br />

over all orientations [36]. One drawback of this approach is that second-order derivative<br />

based methods are much more sensitive to noise than gradient-based methods.


2.3. EDGE DETECTION 27<br />

(a) 0 ◦<br />

(b) 90 ◦<br />

(c) 30 ◦<br />

(d) (e) (f) (g)<br />

Figure 2.8: Orientation selective filters based on rotated versions of a first derivative Gaussian<br />

(Images taken from [25]).<br />

Orientation Selective Edge Detection Until now all presented approaches for edge detection<br />

have been more or less isotropic, but there are also many approaches that consciously<br />

exploit anisotropy leading to orientation selective edge detectors. A good overview<br />

on anisotropic filters can be found for example in [69]. These filters have many applications<br />

for example in texture analysis or in the design of steerable filters that efficiently<br />

controltheorientationandscaleoffilterstoextractcertainfeaturesinanadaptiveway.<br />

An orientation selective filter can be generated from a rotated version of an elongated<br />

Gaussian derivative. Figure 2.8 shows an example of different filters that are mostly sensitive<br />

to 0 ◦ ,90 ◦ ,and30 ◦ oriented edges respectively. If many different orientations should<br />

be detected independently in one image, common optimizations exploit the associativity<br />

of the convolution operation. Instead of convolving the image with a large number of different<br />

orientation specific filters, the image is convolved with few basis filters only. Then,<br />

an anisotropic response of an arbitrary orientation can be estimated over a weighted sum<br />

of the basis filter responses. For more information on the technical background of this<br />

approach is referred to the original papers [25, 49].<br />

2.3.4. Subpixel Edge Detection<br />

At image acquisition (e.g. with CCD cameras) light intensity is integrated over a finite,<br />

discrete array of sensor elements. Following the Sampling Theorem [36] this sampling can<br />

be seen as a low-pass filter on the incoming signal, cutting off high-frequencies. Hence,<br />

strong edges, which can be seen as high-frequency, may not be imaged precisely by the<br />

discrete grid. On the other hand, edge detectors that work on pixel level can detect the<br />

real edge position only roughly. The average localization error is 0.5 pixel since the center<br />

of the real edge could be anywhere within the pixel [65].<br />

In many applications such as high precision measuring tasks, detected edges at pixel grid<br />

accuracy are often not accurate enough. Thus, subpixel techniques have been developed<br />

toovercomethelimitsofdiscreteimagesandtocomputecontinuousvaluesthatliein<br />

between the sampled grid.


28 CHAPTER 2. TECHNICAL BACKGROUND<br />

(a)<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

Interpolated<br />

subpixel<br />

edge location<br />

Discrete 1st derivative<br />

Spline Interpolation<br />

Edge Profile<br />

50<br />

0 2 4 6 8 10 12 14 16 18<br />

x<br />

Figure 2.9: (a) Subpixel accuracy using bilinear interpolation. Pixel position P is a local<br />

maximum if the gradient magnitude of gradient g at P is larger than at the positions A<br />

and B respectively. These positions can be computed using bilinear interpolation between<br />

the neighboring pixels 0, 7and3, 4 respectively. The gradient direction determines which<br />

neighbors contribute to the interpolation. The edge direction is perpendicular to the gradient<br />

vector. (b) The discrete first derivative of a noisy step edge is approximated using cubic<br />

spline interpolation. The subpixel tube edge location is assumed to be at the maximum of the<br />

continuous spline function, which can lie in between two discrete positions (here at x =9.5).<br />

Interpolation is the most common technique to compute values between pixels by consideration<br />

of the local neighborhood of a pixel. This includes for example bilinear, polynomial,<br />

or B-spline interpolation. In [21], a linear interpolation of the gradient values within<br />

a3× 3 neighborhood around a pixel is proposed. Here, the gradient direction determines<br />

whichofthe8neighborsareconsidered(seeFigure2.9(a)). Sincethegradientdoesnot<br />

have to fall exactly on pixel positions on the grid, the gradient value is interpolated using<br />

a weighted sum of the two pixel positions respectively that are next to the position where<br />

the gradient intersects the pixel grid (denoted as A and B in the figure). In a nonmaximum<br />

suppression step, the center pixel is classified as edge pixel only if the gradient magnitude<br />

at this position is larger than at the interpolated neighbors. If so, the corresponding edge<br />

is perpendicular to gradient direction.<br />

Since the center pixel P lies still on the discrete pixel grid, one has to perform a second<br />

interpolation step, if higher precision is needed. The image gradient within a certain<br />

neighborhood along the gradient direction (e.g. A-P -B) can be approximated for example<br />

by a one-dimensional spline function [17, 66]. Figure 2.9(b) shows an example of a noisy<br />

step edge between the discrete pixel positions 9 and 10 in x-direction. The discrete first<br />

derivative of intensity profile is approximated with cubic splines. The extremum of this<br />

continuous function can be theoretically detected with an arbitrary precision representing<br />

the subpixel edge position. However, there are obviously limits of what is still meaningful<br />

with respect to the underlying input data. In this example, a resolution of 1/10 pixel was<br />

used. The maximum is found at 9.5, i.e. exactly in between the discrete positions.<br />

Rockett [56] analyzes the subpixel accuracy of a Canny implementation that uses interpolation<br />

by least-square fitting of a quadratic polynomial to the gradient normal to the<br />

detected edge. He found out that for high-contrast edges the edge localization reaches an<br />

(b)


2.4. TEMPLATE MATCHING 29<br />

accuracy of 0.01 pixels, while the error increases to about 0.1 pixels for low-contrast edges.<br />

Lyvers et al. [41] proposed a subpixel edge detector based on spatial moments of a gray<br />

level edge with an accuracy of better than 0.05 pixels for real image data. Aström [6] analyzes<br />

subpixel edge detection by stochastic models. A survey on subpixel measurements<br />

techniques can be found in [71].<br />

2.4. Template Matching<br />

A common task in vision applications is to search whether a particular pattern is part<br />

of an image, and if so, where it is located [28]. Template matching is one method to<br />

tackle this problem. The search pattern or template can be represented as an image and is<br />

usually considerably smaller than the inspected input image. Then, the template is shifted<br />

over the input image and compared with the underlying values. A measure of similarity is<br />

computed at each position. Positions reaching a high score are likely to match the pattern,<br />

or the other way around, if the template matches at a certain location, the score has a<br />

maximum at this location.<br />

A technique denoted as cross-correlation is widely used as measure of similarity between<br />

image patches [64]. It can be derived from the sum of squared differences (SSD):<br />

cSSD(x, y) =<br />

W� −1<br />

i=0<br />

H−1 �<br />

j=0<br />

(T (i, j) − I(x + i, y + j)) 2<br />

(2.27)<br />

where I is the discrete image function and T the discrete template function. W and<br />

H indicate the template width and height respectively. Expanding the squared quantity<br />

yields:<br />

cSSD(x, y) =<br />

W� −1<br />

i=0<br />

H−1 �<br />

j=0<br />

T 2 (i, j) − 2T (i, j)I(x + i, y + j)+I 2 (x + i, y + j) (2.28)<br />

Since the template is constant, the sum over the template patch T 2 (i, j) is constant as<br />

well and does not contain any information on similarity. The same holds approximately<br />

for the sum over the image patch I 2 (x + i, y + j) if there are no strong variances in image<br />

intensity. Hence, the term T (i, j)I(x+i, y +j) remains the only real indicator of similarity<br />

that depends on both the image and the template. This leads to the cross-correlation<br />

equation:<br />

c(x, y) =<br />

W� −1<br />

i=0<br />

H−1 �<br />

j=0<br />

T (i, j)I(x + i, y + j) (2.29)<br />

It turns out that the correlation looks very similar to the discrete convolution. Indeed,<br />

the only difference between correlation and convolution is the sign of the summation in the<br />

second term [28]. Thus, theoretically a correlation can be replaced by a convolution with a<br />

flipped version of the template [64]. Like convolution, correlation is an expensive operation<br />

if applied to large images and templates. In some cases it is faster to convert the spatial<br />

images into the frequency domain using the (discrete) Fast Fourier Transformation (FFT),


30 CHAPTER 2. TECHNICAL BACKGROUND<br />

multiply the resulting transform of one image with the complex conjugate of the other,<br />

and finally reconvert the result to the spatial domain using the inverse FFT [62, 64, 53].<br />

Unfortunately, the assumption of image brightness constancy is weak. If there is, for<br />

example, a bright spot in the image, the cross-correlation results in much larger values at<br />

this position than at darker regions. This may lead to incorrect matches. To overcome<br />

this problem, several normalized correlation methods have been introduced. One common<br />

measure is denoted as correlation coefficient. It can be computed as:<br />

� W −1<br />

i=0<br />

�H−1 � �<br />

j=0 T (i, j) − T �<br />

I(x + i, y + j) − I(x, y)<br />

ccoeff (x, y) =<br />

(2.30)<br />

WHσT σI(x,y) where T represents the mean template brightness and I(x, y) the mean image brightness<br />

within the particular window at position (x, y). σT and σI(x,y) indicate the standard<br />

deviation of the template and the image patch respectively. The resulting values lie in<br />

the range between −1 and 1. Obviously, the correlation coefficient is computational more<br />

expensive. If the standard cross-correlation yields accurate enough results in a certain<br />

application it may be of interest to use a less expensive normalization that simply maps<br />

the results of the cross-correlation into the range of −1 to1. Thiscanbeachievedover<br />

the following equation:<br />

c ′ (x, y) =<br />

� W −1<br />

i=0<br />

���W −1 �H−1 i=0<br />

�H−1 j=0 T (i, j)I(x + i, y + j)<br />

j=0 T (i, j)2 �W −1<br />

i=0<br />

�<br />

�H−1 j=0 I(x + i, y + j)2<br />

�<br />

(2.31)<br />

The term cross-correlation is usually used if two different images are correlated. If one<br />

image is correlated with itself, i.e. I = T ,thetermautocorrelation is commonly used [28].<br />

In practical applications it is often necessary to adapt the template by changing the<br />

orientation or scale to reach maximum matching results [28]. This increases the number<br />

of correlation operations, and thus, the computational load. Therefore, optimization<br />

strategies are used that try to exclude as many positions as possible that are very unlikely<br />

to match a template.


3. Hardware Configuration<br />

This chapter introduces the physical design of the visual inspection prototype. This includes<br />

the conveyor, the camera setup, the choice of illumination as well as the blow out<br />

mechanism. Figure 3.1 gives an overview on the hardware setup of the prototype.<br />

3.1. Conveyor<br />

For the prototype, a 200cm long and 10cm wide conveyor is used to simulate a production<br />

line. It can be manually fit with several tube segments where the exact number depends<br />

on the target length and the distance between two consecutive segments. The measuring<br />

is performed at a certain area of the conveyor denoted as measuring area in the following.<br />

The field of view of the camera is adjusted to this area, as well as the illumination as will<br />

be introduced in Section 3.2 and 3.3 respectively.<br />

The dimension of the measuring area depends on the size of the tubes to be measured.<br />

Therefore, with respect to the range of tube sizes, the measuring area is designed to cover<br />

the maximum tube size of 100mm in length and about 12mm in diameter. It must be<br />

even larger to be able to capture several images of each tube while passing the visual field<br />

of the camera.<br />

Since in production the tubes are cut to lengths from a continuous tube using a rotating<br />

knife (flying knife), there would not be a notable spacing between two consecutive tube<br />

segments if transfered to the measuring area with the same speed as entering the knife.<br />

Thus, it can be difficult to determine where one tube starts and ends in the continuous<br />

line by looking both for humans and artificial vision sensors. To overcome this problem,<br />

after cutting, the tube segments have to fall onto another conveyor with a faster velocity<br />

to separate them. The faster the second conveyor is compared to the first one, the larger<br />

the gap.<br />

Since processing time is expensive, the goal is to simplify the measuring conditions as<br />

much as possible using an elaborated hardware setup. One easy but effective simplification<br />

is to mount two guide bars to the conveyor that guarantee almost horizontal oriented tube<br />

segments. The guide bars are arranged like a narrow ‘V’ (see Figure 3.1(b)). The tubes<br />

enter the guide bars at the wider end and are adjusted into horizontal position while<br />

moving. At the measuring area the guide bars are almost parallel and just slightly wider<br />

than the diameter of the tubes. The distance of the guide bars can be easily changed using<br />

adjusting screws if the tube type changes.<br />

The color and structure of the conveyor belt is crucial to maximize the contrast between<br />

objects and background for the inspection task. Therefore, a white-colored belt is used.<br />

The advantage of this choice with respect to the range of tube types to be inspected in<br />

combination with the illumination setup will be discussed in more detail in Section 3.3.<br />

31


32 CHAPTER 3. HARDWARE CONFIGURATION<br />

(a)<br />

(b)<br />

Figure 3.1: Hardware setup of the prototype in the laboratory environment. (a) Total view.<br />

(b) View on the measuring area.


3.2. CAMERA SETUP 33<br />

3.2. Camera setup<br />

Machine vision applications have high demands on the imaging system, especially if high<br />

accuracy and precision is required. The camera and optical system, i.e. the lens, have to<br />

be selected with respect to the particular inspection task. This section gives an overview<br />

on the imaging system used in this application and how it was selected.<br />

3.2.1. Camera Selection<br />

The main criteria for camera selection with respect to the application in this thesis are:<br />

Image quality<br />

Speed<br />

Resolution<br />

The image quality is essential to allow for precise measurements. This includes a low<br />

signal-to-noise ratio, no or only a little cross-talking between neighboring pixels, and square<br />

pixel elements. As introduced in Section 1.3 the system is intended to work in continuous<br />

mode. Therefore, the speed, i.e. the possible frame rate, of the camera determines how<br />

many images of a tube can be captured within a given time period. Of course, this number<br />

is also depending on the velocity of the conveyor. Especially at higher velocities, a fast<br />

camera is important, since the idea of multi-image measurements fails if the camera is<br />

not able to capture more than one image of each tube that is possible to evaluate. The<br />

final frame rate should depend purely on the per frame processing time. This means, the<br />

camera must be able to capture at least as many frames as can be processed. Otherwise<br />

the camera would be a bottleneck. The frame rate of a camera is closely related to the<br />

image resolution. Higher resolutions mean a larger amount of data to be transferred and<br />

processed. Thus, there is a tradeoff between resolution and speed. A higher resolution<br />

means smaller elements on the CDD sensor array, hence, an object can be imaged more<br />

detailed. With respect to length measurements the effective pixel size decreases at a higher<br />

resolution, and a pixel represents a smaller unit in the real world.<br />

Three cameras have been tested and compared:<br />

Sony DFW VL-500<br />

AVT Marlin F-033C<br />

AVT Marlin F-046B<br />

These cameras are all IEEE 1394 (Firewire) progressive scan CCD cameras.<br />

The Sony camera has a 1/3” image device (Sony Wfine CCD) and provides VGA (640×<br />

480) resolution color images at a frame rate of 30 frames per second (fps). It is equipped<br />

with an integrated 12× zoom lens which can be adjusted over a motor.<br />

The Marlin F-033C is a color camera with a maximum resolution of 656 × 492 pixel in<br />

raw mode, while the F-046B is a gray scale camera with a resolution of 780 × 582 pixel in<br />

raw mode. Both cameras have a 1/2” image device (SONY IT CCD). The Marlin cameras<br />

reach much higher frame rates compared to the Sony. At full resolution, the F-033C


34 CHAPTER 3. HARDWARE CONFIGURATION<br />

R1 G1 R2 G2<br />

G3 B1 G4 B2<br />

P1 P2 P3<br />

Figure 3.2: The sensor elements of single chip color cameras like the Marlin F-033C are<br />

provided with color filters, so that each sensor element gathers light of a certain range of<br />

wavelengths only, corresponding to red, green, and blue respectively. The arrangement of<br />

the filters is denoted as BAYER mosaic. Interpolation is needed to compute the missing two<br />

channels at each pixel. Image taken from [3].<br />

features 74fps and the F-046B 53fps respectively. Since these cameras do not come with<br />

an integrated optical system, a particular lens (C-Mount) must be provided additionally.<br />

AmoredetailedspecificationoftheMarlincamerascanbefoundinAppendixB.1.<br />

It turned out that the Sony camera is not suited for this particular application. The<br />

main reason is the limited frame rate of 30fps, thus, a new image is captured approximately<br />

every 30ms. As mentioned before, the camera speed should not be the bottleneck of the<br />

application. However, as will be shown in Section 5.3.8, the processing time of one image is<br />

significantly less than 30ms, which excludes the Sony camera in this particular application.<br />

The Marlin cameras reach much higher frame rates and come with another advantage.<br />

Since the tube orientation can be considered as horizontal due to the guide bars as introduced<br />

in the previous section, one does not need the whole image height that a camera can<br />

provide. It is possible to reduce the image size to user-defined proportions also denoted<br />

as area of interest (AOI). This function is used to decrease the number of image rows to<br />

be transferred over the firewire connection, but keeping the full resolution in horizontal<br />

direction. For example, in a typical setup an image height of 160 pixels is large enough<br />

to include the whole region between the guide bars. The reduced image size is about 1/3<br />

of the original size. Combined with a short shutter time, the reduced number of image<br />

rows increases the effective frame rate significantly, so it is possible to reach frame rates<br />

of > 100fps.<br />

The decision whether to use the Marlin F-033C or the F-046B depends mainly on the<br />

question if color is a useful feature in this particular application. In general, single chip<br />

color cameras like the F-033C map a scene less accurate compared to gray scale cameras<br />

if image brightness is considered.<br />

This is due to how these cameras are designed. Each sensor cell of a single chip color<br />

camera is provided with a color filter for either red (R), green (G), or blue (B) respectively.<br />

Without these filters the sensor cells are equal to those in gray scale cameras. Usually, the<br />

filters are arranged in a pattern denoted as BAYER mosaic (see Figure 3.2). Within each<br />

2×2 region there are two green, one red, and one blue filter. This distribution is originated<br />

in human vision and leads to more natural looking images, since the human optical system<br />

is most sensitive to green light. The drawback of this approach is that the resolution of


3.2. CAMERA SETUP 35<br />

eachcolorchannelisreduced. Toovercomethisproblem,onehastointerpolatethetwo<br />

missing color channels at each pixel position. There are several interpolation approaches<br />

also denoted as BAYER demosaicing. With respect to speed it is important to use a not<br />

too expensive computation. The F-033C computes R-G-B values at virtual points Pi at<br />

the center of each local 2 × 2 neighborhood as follows [3]:<br />

= R1<br />

P 1red<br />

P 1green = 1<br />

P 1blue<br />

P 2red<br />

P 2green = 1<br />

P 2blue<br />

P 3red<br />

P 3green = 1<br />

P 3blue<br />

2 (G1+G3)<br />

= B1<br />

= R2<br />

2 (G1+G4)<br />

= B1<br />

= R2<br />

2 (G2+G4)<br />

= B2<br />

(3.1)<br />

where the location of the different points can be found in Figure 3.2. Obviously, this<br />

interpolation technique reduces the resolution of the sensor in all channels, since values<br />

can be computed only at positions where four pixels meet and not at the boundaries of<br />

the image. 1<br />

If the inspection task can be performed at gray scale images, gray scale cameras should<br />

be used instead of color cameras. Intuitively, the accuracy of a color camera can not be<br />

the same as that of a gray scale camera, because this requires two interpolation steps.<br />

First, one interpolates the R-G-B color channels as introduced before, and then has to<br />

estimate the image brightness from these interpolated values. A gray scale camera offers<br />

a more direct transformation between light intensity and image values, thus, leading not<br />

only to more accurate images, but also to higher frame rates. This can be supported by<br />

the following experiment.<br />

A test image of graph paper has been captured once with the F-033C and once with the<br />

F-046B. A 16mm fix-focal length lens has been used respectively, and the distance between<br />

camera and graph paper as well as the viewing direction has been the same. The focus of<br />

the optical system was adjusted to obtain a sharp image in both cases. The results can<br />

be found in Figure 3.3. The color image in (a) has been converted into gray level values<br />

using the following equation:<br />

I(x, y) =0.299R(x, y)+0.587G(x, y)+0.114B(x, y) (3.2)<br />

where R, G, B represent the three color channels for red, green, and blue respectively<br />

and I is the resulting gray level image.<br />

The grid appears to be more sharp in the image of the gray scale camera, although the<br />

color image was also at focus during acquisition. The profiles of two scan lines of equal<br />

length through an edge of the grid (visualized in (b) and (d)) can be found in Figure 3.3(e).<br />

1 There are also color cameras that are provided with three chip sensors. The incoming light is split into<br />

different wavelength ranges via a prism. Thus, each sensor yields a full resolution image of one color<br />

channel and interpolation is not necessary. These cameras, however, are quite expensive and could not<br />

be tested.


36 CHAPTER 3. HARDWARE CONFIGURATION<br />

170<br />

160<br />

150<br />

140<br />

130<br />

120<br />

110<br />

100<br />

90<br />

80<br />

(a) (b)<br />

(c) (d)<br />

Marlin F033C<br />

Marlin F046B<br />

70<br />

0 1 2 3 4 5<br />

(e)<br />

Figure 3.3: Comparison of the F-033C color and F-046B gray level camera. The test images<br />

show a graph paper captured from a distance of approximately 250mm using a 16mm fixfocal<br />

length lens. (a) Color image of the F-033C. (b) Zoom view showing the location of the<br />

scan line through a grid edge in the converted gray scale image of (a). (c) Gray scale image<br />

acquired with the F-046B. (d) Zoom view showing the location of the scan line through a grid<br />

edge in (c). (e) Profiles of the two scan lines. The F-046B acquires a significant sharper edge<br />

compared to the color camera which can be seen at the slope of the edge ramp.


3.2. CAMERA SETUP 37<br />

Figure 3.4: Color information of transparent tubes in HSV color space. Rows include from<br />

top to bottom: Color input image, hue channel, saturation channel, value (brightness) channel,<br />

and in the bottom row the computed gray scale image using Eq. 3.2. Although all images are<br />

taken from the same sequence, tubes and background can have very different color.<br />

The position of the scan lines corresponds to the same real world location. It can be seen<br />

that both edges are ramp edges (see Section 2.3.1). The slope of the edge profile, however,<br />

is larger for the gray level camera, i.e. the edge can be located more precise. This is an<br />

important advantage with respect to accurate measuring. Therefore, if color has no other<br />

significant advantage over gray scale images, a gray scale camera should be preferred in<br />

this application.<br />

One can think of using color information to segment the transparent tubes from the<br />

background, since they appear yellowish or reddish while the conveyor belt should be<br />

white. For black tubes, color has obviously no significant benefit, hence, it is adequate to<br />

concentrate on the transparent tubes in this context.<br />

The idea is to use color as a measure to distinguish between transparent tubes and the<br />

background, since here the gray scale contrast is lower compared to black tubes. However,<br />

as can be seen in Figure 3.4, in real images of transparent heat shrink tubes on a conveyor,<br />

the color of the conveyor belt can appear quite different. The test images have been taken<br />

from a sequence of tubes on a moving conveyor. The images have been illuminated via a<br />

back light setup, which will be introduced in Section 3.3. It can be observed that some<br />

regions of the same conveyor belt look yellowish, while others appear blueish in the image<br />

(see left column in Figure 3.4).<br />

There are several color models beside the R-G-B model. Humans intuitively perceive<br />

and describe color experiences in terms of hue (chromatic color), saturation (absence of<br />

white) and brightness [27]. A corresponding color model is the H-S-V model, where H<br />

stands for hue, S for saturation, and V for (brightness) value respectively. More detailed<br />

information on color models can be found for example in [35].<br />

In the hue domain, a yellowish transparent tube differs from a blueish background significantly<br />

(left column in Figure 3.4). If the background is also yellowish, the difference


38 CHAPTER 3. HARDWARE CONFIGURATION<br />

between tube and background decreases (center column). Strong discontinuities in background<br />

color (as in the right column) could be wrongly classified as a tube. The saturation<br />

domain is also a quite unstable feature. If the background contains a lot of white, it is<br />

more desaturated than the object (like in the center column) and yields a quite strong<br />

contrast. The example in the left column, however, shows that the difference in saturation<br />

does not always have to be that clear. The brightness channel (fourth row) is very<br />

close to the computed gray level image using Equation 3.2 (bottom row). Thus, it equals<br />

approximately what a gray level camera would see.<br />

In this experiment it has been shown color can be a very unstable feature. With respect<br />

to precise length measurements it definitely turns out that there are a lot of artifacts at<br />

the tube edges in the H and S color channel respectively. In the brightness channel, edges<br />

appear much more sharp. The little artifacts in this channel are due to the camera noise,<br />

motion blur effects, or not perfectly adjusted camera focus. Since the brightness channel<br />

is closely related to the gray value image converted from R-G-B values using Equation 3.2<br />

one could replace the brightness channel by this image. As can be seen in Figure 3.4, the<br />

bottom row yields even a better contrast between object and background.<br />

With the observations made before, one can conclude that a gray level camera is best<br />

suited in this particular application. It yields the best edge quality, which is important for<br />

precise measurements, and both black and transparent tubes are imaged with a sufficient<br />

contrast between object and background making it possible to locate a tube in the image<br />

without using color information. Hence, the Marlin F-046B camera has been selected for<br />

this prototype. It yields the best compromise in image quality, resolution, and speed.<br />

3.2.2. Camera Positioning<br />

The camera is placed at fix position and viewing angle above the measuring area of the<br />

conveyor (see Figure 3.1(b)). In a calibration step, it is adjusted to position the image<br />

plane parallel to the surface of the conveyor with the optical center above the center of the<br />

measuring area, thus, minimizing the perspective effects at this area. The exact calibration<br />

procedure will be explained in Section 4.3.2. The moving direction of the conveyor (and<br />

therefore of the tube segments) is horizontal in the image.<br />

The distance between camera and conveyor depends on the optical system, i.e. the lens,<br />

that is used and on the tube size to be inspected. In Section 4.2.2, the basic assumptions<br />

and constraints regarding the image content with respect to the image processing are<br />

presented. This includes the assumption that only one tube can be seen totally in an<br />

image at one time. Correspondingly, the cameras field of view has to be adapted to satisfy<br />

this criterion for different tube lengths.<br />

Placing the camera above the conveyor has the additional advantage of not extending<br />

the dimensions of the production line, since space in a production hall is limited and<br />

therefore expensive.<br />

3.2.3. Lens Selection<br />

Parameters such as object size, sensor size of the camera, camera distance, and accuracy<br />

requirements determine the right optical system (objective) for a particular application.<br />

In the following, the term lens will be used synonymously to the term optical system or<br />

objective, although an objective is actually more than just a single lens (iris, case, mount,


3.2. CAMERA SETUP 39<br />

adjusting screws, etc.). The lens, however, is the most important factor that determines<br />

thepropertiesoftheobjective.<br />

The most important parameters to specify a lens include the focal length, F-number,<br />

magnification, angle of view, depth of focus, minimum object distance, and finally the<br />

price. In addition, lenses can have a number of aberrations as introduced before in Section<br />

2.1.3. Lens manufacturers try to minimize for example chromatic or spherical aberrations,<br />

but it is not possible to produce an completely aberration free lens in the general<br />

case (e.g. for all wavelengths of light or angles). In practice, lenses are composed of<br />

different layers of special glass. High precision is needed to produce high quality lenses,<br />

thus, such lenses can be very expensive. There are different lens types available including<br />

fix-focal and zoom lenses. While fix-focal length lenses, as the term indicates, have a fix<br />

focal length, zoom lenses cover a range of different focal lengths. The actual focal length<br />

can be adjusted manually or motorized. For machine vision applications fix-focal length<br />

lenses are usually preferable [40]. If the conditions are highly constrained, the best suited<br />

lens can be selected a priori.<br />

This section should give a brief overview on the most important lens parameters and<br />

motivate the selection of the lens used in this application.<br />

Focal Length In the ideal thin lens camera model, the focal length is defined as the<br />

distance between the lens and the focal point, i.e. the point where parallel rays entering<br />

the lens intersect at the other side (see Figure 2.4). In practice, the focal length value<br />

specified by the manufacturer depends on the lens model used (which is usually unknown)<br />

and does not have to be accurate. In applications that require high accuracy, a camera<br />

calibration step is important to determine the intrinsic parameters of the camera including<br />

the effective focal length with respect to the underlying camera model.<br />

F-number The F-number describes the relation of the focal length to the relative aperture<br />

size such as [18]:<br />

F = f<br />

(3.3)<br />

d<br />

where d is the diameter of the aperture. Thus, the F-number is an indicator of the lightgathering<br />

power of the lens. Typical values are 1.0, 1.4, 2, 2.8, 4, 5.6, 8, 11, 16, 22, and<br />

32 with a constant ratio of √ 2 between consecutive values. A smaller F-number indicates<br />

more light can pass the lens and vice versa. Camera lenses are often specified by the<br />

minimum and maximum F-number, also denoted as iris range.<br />

Magnification In the weak perspective camera model (see Section 2.1.3), the ratio between<br />

focal length and the average scene depth Z0 can be seen as magnification, i.e.<br />

following Equations 2.3 the magnification m is expressed as [24]:<br />

m = f<br />

Z0<br />

(3.4)


40 CHAPTER 3. HARDWARE CONFIGURATION<br />

(a) (b)<br />

Figure 3.5: (a) Standard perspective lens. Closer objects appear larger in the image than<br />

objects of equal size further away. (b) Telecentric lenses map objects of equal size to the same<br />

image size independent of the depth within a certain range of distances. Images are taken<br />

from Carl Zeiss AG (www.zeiss.de)<br />

where Z0 can be seen as the lens-object distance also denoted as working distance in<br />

the following. This gives a good estimate how large an object will appear on the image<br />

plane at a given distance Z0 to the camera with a lens of focal length f.<br />

Depth of Focus Following the thin lens camera model, only points at a defined distance<br />

to the camera will be focused on the image plane. Points at shorter or further distance<br />

appear blurred in the ideal model. In practice, however, points within some range of<br />

distances are in acceptable focus [24]. This range is denoted as depth of focus or depth<br />

of field. This is due to the finite size of each sensor element, since there is no difference<br />

visible in the image if a point is focused on the image plane or not as long as it will not<br />

spread over several pixels [18]. The depth of focus increases with a larger F-number [18].<br />

Minimum Object Distance (MOD) All real lenses have a certain distance at which<br />

points that lie closer to the camera can not be focused anymore. This has both mechanical<br />

and physical reasons. The MOD value is important, since it determines the minimum<br />

distance of the camera to the objects in an application.<br />

AngleofView The angle of view is the maximum angle from which rays of light are<br />

imaged to the camera sensor by the lens. Short focal length lenses have usually a wider<br />

angle of view and therefore are also denoted as wide-angle lenses, while lenses with a larger<br />

focal length have a narrower angle of view. The angle of view determines the field of view<br />

of the camera at a given distance and a certain sensor size, this means what part of the<br />

world is imaged onto the sensor array of the camera.<br />

Commonly short focal lenses are used to capture images of a larger field of view for<br />

example in video surveillance applications that have to cover a larger area. With respect<br />

to machine vision applications, such lenses can also be used for close-up images at a<br />

short camera-object distance. The amount of radial distortion increases with a shorter<br />

focal length. The fish-eye lens is an extreme example for a very short focal length lens.<br />

Increasing the focal length increases the magnification. Thus, even smaller objects at


3.2. CAMERA SETUP 41<br />

further distance can be imaged over the whole image size with such lenses. However, the<br />

minimum object distance is larger for long focal length lenses.<br />

For two-dimensional measuring tasks most accurate and precise results can be achieved<br />

with telecentric lenses (see Figure 3.5). These special lenses are designed to map objects of<br />

the same size in the world to the same image size, even if the object to lens distance differs.<br />

It is important to note that the maximum object size can not be larger than the diameter<br />

of the lens. This makes telecentric lenses useful only in connection with relatively small<br />

objects. In addition, such lenses reach a size of over 0.5m for objects of about 100mm and<br />

a mass of approximated 4kg [18]. Finally, telecentric lenses are very expensive.<br />

Although a telecentric lens would be advantageous in the imaging properties, a less<br />

expensive solution had to be found for the prototype development in this application. The<br />

optical system must be able to map objects between 20 and 100mm to an 1/2” CDD<br />

sensor at a relative short camera-object distance, and which is expected not to be affected<br />

too much by aberrations and radial distortion.<br />

However, this is an optimization problem that has no universal solution for all tube<br />

lengths. Different tube lengths need different magnification factors and field of views if the<br />

maximum possible resolution should be exploited to reach the highest accuracy. Changing<br />

the magnification factor means changing either the focal length of the optical system or<br />

the distance between object and camera, or both. If moving the camera toward the object,<br />

the minimum object distance of the lens has to be considered to be able to yield sharp<br />

images. Zoom lenses could be used to change the focal length without changing the whole<br />

optical system. However, zoom lenses should be avoided in machine vision applications<br />

[40], since they have to make larger compromises than fix-focal lenses and usually have<br />

a minimum working distance of one meter and more. Hence, if using a fix-focal lens,<br />

this implies changing the camera-object distance to adapt to different tube lengths, or to<br />

physically exchange the lens when a new length is cut by the machine which can not be<br />

covert by the current lens.<br />

Several commercial lenses designed for machine vision applications have been compared<br />

to find the lens that is best suited to inspect different tube sizes (see Table 3.1). Figure 3.6<br />

gives an overview on the parameters that influence a camera’s field of view. The angle<br />

of view θ is specified by the lens manufacturer, and is depending on the focal length and<br />

the camera sensor size. All values in the following are oriented at an 1/2” CCD sensor,<br />

since this is the sensor size of the Marlin F-033C and F-046B. The working distance d<br />

is here defined as the distance between lens and conveyor. O represents the object size,<br />

and L indicates the size of the measuring area with respect to a certain tube size. L<br />

canbeapproximatedastwicetheobjectsizeO. The goal is to find a combination of a<br />

lens with a working distance that yields a visual field so that the size V of the imaged<br />

region of the conveyor equals the measuring area L. Note, in this context size can be<br />

replaced by length in horizontal, i.e. in the moving direction of the conveyor, since this<br />

is the measuring direction in this constraint application. Thus, in the following only this<br />

direction is considered.<br />

The geometry in Figure 3.6 leads to the following relationship between θ, d and V :<br />

� �<br />

θrad<br />

V =2dtan<br />

(3.5)<br />

2


42 CHAPTER 3. HARDWARE CONFIGURATION<br />

Figure 3.6: Parameters that influence the field of view (FoV) of a camera. θ indicates the<br />

angle of view of the optical system, d the distance between lens and conveyor, O the object<br />

size, V is the size of the region on the conveyor that is imaged, and L representing the size of<br />

the measuring area depending on the current tube size. The goal is to find a lens that yields<br />

afieldofviewsuchasV ≈ L at short distance.<br />

Model f θ dmin<br />

Pentax H1214-M 12mm 28.91 250mm<br />

Pentax C1614-M 16mm 22.72 250mm<br />

Pentax C2514-M 25mm 14.60 250mm<br />

Pentax C3516-M 35mm 10.76 400mm<br />

Pentax C5028-M 50mm 7.32 900mm<br />

Table 3.1: Different commercial machine vision lenses and there specifications including focal<br />

length f, horizontal angle of view θ (in degrees) with respect to an 1/2” sensor, and minimum<br />

object distance dmin respectively.<br />

where θrad represents the angle of view θ in radians. Using this equation one can<br />

compute the length of the conveyor that is imaged in horizontal direction at the minimum<br />

object distance of a lens. The results can be found in Table 3.2.<br />

This shows, none of the compared lenses is able to image small objects (< 30mm) in<br />

focus onto the camera sensor in a way that the object covers about half the full image<br />

width. Thus, the minimum tube size that can be inspected at full resolution under this<br />

assumption is 30mm. However, if one shrinks the image width manually (for example<br />

using the AOI function of the camera), the constraints can be reached even for tubes<br />

below 30mm.<br />

The real world representation s of one pixel in the image plane can be approximated as<br />

follows:<br />

s = V<br />

Wimg<br />

(3.6)<br />

where Wimg represents the image width in pixels. For example, for a 16mm focal<br />

length lens and a working distance of 250mm, one pixel represents about 0.12mm at this<br />

distance in the real world if the image resolution is 780 in horizontal direction. At the same<br />

distance, a 25mm focal length lens yields a pixel representation of about 0.08mm at the


3.3. ILLUMINATION 43<br />

f V<br />

12mm 129mm<br />

16mm 100mm<br />

25mm 64mm<br />

35mm 75mm<br />

50mm 115mm<br />

Table 3.2: Field of view of different fix-focal length lenses at the specified minimum object<br />

distance.<br />

V f=12mm 16mm 25mm 35mm 50mm<br />

40 •77 •99 •156 •212 •312<br />

60 •116 •149 •234 •318 •469<br />

100 •193 248 390 530 •782<br />

200 381 497 780 1006 1563<br />

Table 3.3: Working distances to yield a certain field of view for different focal length lenses.<br />

Distances that fall significantly below the minimum working distance are marked with a •.<br />

same resolution. Thus, smaller tubes can be measured theoretically at higher precision.<br />

The minimum object distance of the compared lenses, however, represents a certain limit<br />

in precision. Tubes below 30mm can not be measured with higher, but with the same<br />

precision as 30mm tubes. Reminding the tolerances introduced in Section 1.3, smaller<br />

tubes have a smaller tolerance than larger tubes, and 20 − 30mm tubes have the same<br />

tolerance.<br />

At the upper bound, larger tubes need a wider field of view of the camera. Hence, a<br />

larger region is mapped on the same image sensor, so one pixel represents more. For a<br />

200mm measuring area the pixel representation is about 0.25mm. The field of view can<br />

be achieved by placing the camera further away from the object. The distance increases<br />

with the focal length of the lens. Table 3.3 shows the approximated working distance for<br />

the compared lenses that are needed to result in a certain field of view. Distances that<br />

fall below the minimum object distance are marked with a ‘•’.<br />

It turns out that a 16mm focal length lens is best choice for tube lengths between 50<br />

and 100mm, since this lens maps the required measuring areas onto the image plane at the<br />

smallest working distance. However, tubes below 50mm can not be inspected with higher<br />

precision with this lens. In this case, a 25mm focal length lens has to be selected. This<br />

lens is the best compromise for small and large tube sizes. It has the drawback of a large<br />

working distance of up to 780mm for 100mm tubes. Both a 16mm (PENTAX C1614-M)<br />

and a 25mm (PENTAX C2514-M) focal lens have been used in the experiments.<br />

3.3. Illumination<br />

As introduced in Section 2.2, the right choice of illumination is substantial in machine<br />

vision applications. Accurate length measuring of heat shrink tubes requires a sharp contrast<br />

at the tube’s outline, especially at the boundaries that are considered as measuring<br />

points. Any shadows that would increase the tube’s dimension in the 2D image projection


44 CHAPTER 3. HARDWARE CONFIGURATION<br />

(a) (b)<br />

(c) (d)<br />

Figure 3.7: Heat shrink tubes at different front lighting setups. (a) Illumination by two<br />

desktop halogen lamps. Specular reflections at the tube boundaries complicate an accurate<br />

detection. (b) Varying the angle and distance of the light sources as in (a) can reduce reflections.<br />

(c) Professional front lighting setup with two line lights at both tube ends. (d)<br />

Resulting image of the setup in (c). Both in (b) and (d) shadows can not be eliminated<br />

completely. (Images (c) and (d) by Polytec GmbH, Waldbronn, Germany)


3.3. ILLUMINATION 45<br />

(a) (b) (c)<br />

Figure 3.8: Back lighting through different types of conveyor belts. The structure of the<br />

belt determines the amount of light entering the camera, thus, influencing the image quality<br />

significantly.<br />

must be avoided. In addition, the illumination setup should cover both black and transparent<br />

tubes, whereas the transparent tubes are translucent while the black ones are not.<br />

The surface of both materials appears mat under diffuse illumination, but shows specular<br />

reflections if illuminated directly with point light sources.<br />

In a first experiment with standard desktop halogen lamps a front lighting setup was<br />

tested. Two light sources have been placed at low angle to illuminate the tube boundaries<br />

from two sides at the measuring area inside the guide bars. The results are shown in<br />

Figure 3.7(a) and 3.7(b). This setup yielded good results with black heat shrink tubes,<br />

but it turned out to produce unacceptable reflections just at the measuring points with<br />

the transparent ones. Such reflections could be reduced by changing the angle of light<br />

incidence, but still left strongly non-uniform results. Although the halogen lamps are<br />

operated at DC power, the AC/DC conversion of off-the-shelf desktop lamps if often not<br />

stabilized, thus, leading to temporal and spatial variances in image intensities and color.<br />

This effect has been observed throughout the experiments with the desktop lamps at video<br />

frame rates of 50fps.<br />

Using a professional, flicker free, front lighting system with two fiber optical line lights<br />

illuminating the tube ends (see Figure 3.7(c)), the image quality could be increased as can<br />

be seen in Figure 3.7(d). However, there are still a few shadows left.<br />

Experiments with a back light setup have been accomplished, too. A calibrated fiber<br />

optical area light is placed at a certain distance (about 1-2cm) below the conveyor belt.<br />

The light has to shine through the belt, thus, it is important to use a material that is<br />

translucent. A typical belt core consist of a canvas (e.g. cotton) and a rubber coating,<br />

whereas thickness, structure and density of the canvas as well as the color of the rubber<br />

determine how much light can enter the camera. In the optimal case, no light at all would<br />

be absorbed by the belt what is technically hardly possible.<br />

Five different belt types have been tested. Some of the results can be seen in Figure 3.8.<br />

Each sample in this experiment consists of a transparent rubber coating and a white canvas<br />

as base. The structure of the belt canvas is visible in each image as background pattern.<br />

Obviously, the background should not influence the detection of the tube’s boundary.<br />

Thus, the goal is to find a belt type that allows for back lighting without adding to much<br />

unwanted information to the image that could complicate the measurements.<br />

In Figure 3.8(a), the coarse texture of the background significantly affects the tube ends<br />

of the transparent tube at the bottom. A sharp boundary is missing, making accurate<br />

and reliable measurements impossible. The belt type in Figure 3.8(b) has a finer texture,<br />

but transmits only a little amount of light. Figure 3.8(c) shows the belt type that yielded


46 CHAPTER 3. HARDWARE CONFIGURATION<br />

(a) (b) (c)<br />

(d) (e)<br />

Figure 3.9: Polarized back lighting. (a) Image of diffuse back light through polarized glasses<br />

used for viewing 3D stereo projections with no filter in front of the camera. (b) Setup as in (a)<br />

with an opposing polarization filter in front of the camera. Almost no light enters the camera<br />

at the polarized area. (c) Transparent heat shrink tube at polarized back light. There is a<br />

strong contrast at the polarized area, while it is impossible to locate the tube’s boundaries at<br />

the unpolarized area (bottom right). (d) Polarized back light through a conveyor belt. The<br />

polarization is changed both by the belt and the tube, thus, leading to a poor contrast. (e)<br />

For comparison: Back light setup without polarization.<br />

best results both in background texture and transmittance. As can be seen, there are no<br />

shadows at the tube boundaries.<br />

Since the black tubes do not let pass any light rays, the contrast between background and<br />

tube is excellent with all kinds of belt types tested. One advantage of black tubes follows<br />

from this property: The printing on the tube’s surface is not visible in the image. On<br />

the other hand, the transparent tubes do transmit the light coming from below. Positions<br />

covered by the printing show a minor transmittance, hence, the printing is visible in terms<br />

of darker intensity values in the image.<br />

As introduced in Section 2.2.3, polarized back lighting can be used to emphasize transparent,<br />

translucent objects. In an experiment, shown in Figure 3.9, the integration of<br />

polarization filters has been tested. Two polarized glasses originally used for viewing 3D<br />

stereo projections have been employed to polarize the light coming from the area back<br />

light. First, the principle is tested without a conveyor belt. Two opposite polarization filters<br />

are placed between light source and camera. As can be seen in Figure 3.9(b), the area<br />

covered by the two polarization filters at right angle appears black in the image while the<br />

areas without polarization filters are ideally white. A transparent tube between the two<br />

filters changes the polarization, and hence, making it possible that light enters the camera<br />

at locations that have been black before. There is an almost binary contrast between<br />

object and background (see Figure 3.9(c)). At regions that are not affected by the filters,<br />

there is no contrast at all making the tube invisible. Unfortunately these good results<br />

have no practical relevance, since in the real application the light has to pass the conveyor<br />

belt, too. If the belt is placed between the first polarization filter and the object, it also<br />

changes the polarization at regions that do belong to the background (see Figure 3.9(d)).<br />

The binary segmentation is lost and the structure of the conveyor belt is visible again.<br />

While it is not possible to install the first polarization filter between conveyor and tube,<br />

the polarized back light approach has no advantages compared to the unpolarized in this


3.3. ILLUMINATION 47<br />

(a) (b)<br />

Figure 3.10: (a) Installation of the back light panel. The measuring area is illuminated<br />

from below through a translucent conveyor belt. A diffuser is used to yield a more uniform<br />

light and to protect the fiber optical light panel. (b) SCHOTT PANELight Backlight A23000<br />

used for illumination (Source: SCHOTT).<br />

application. On the contrary it has the effect of less light entering the camera which yields<br />

darker images and increases the amount of sensor noise.<br />

As result of the experiments with different lighting techniques, the back lighting setup<br />

has been chosen for the prototype. It offers excellent properties for black tubes and yielded<br />

also very good results for the transparent tubes in connection with a fine structured,<br />

translucent conveyor belt. The incident lighting did not perform better in the experiments.<br />

A light source (SCHOTT DCR III) with a DDL halogen lamp (150W, 20V) has been<br />

selected in combination with the fiber optic area light (SCHOTT PANELight Backlight<br />

A23000) (see Figure 3.10(b)). The panel size is 102 × 152mm. It is installed 20mm below<br />

a cut-out in the conveyor below the belt as can be seen in Figure 3.10(a). A diffuser<br />

between light panel and conveyor belt provides a uniform illumination and protects the<br />

light area against dirt. More details regarding the illumination hardware can be found in<br />

Appendix B.2.<br />

The usage of a fiber optic area light below the conveyor belt has the advantage of a very<br />

low heat development since the light source can be placed outside at a certain distance.<br />

With respect to the characteristics of heat shrink tubes, the avoidance of heat is essential<br />

at this step to prevent deformations. The light is transmitted through a flexible tube of<br />

fibers. If the lamp is out of order it can be exchanged easily without changing anything at<br />

the conveyor. The lifetime of one halogen lamp is about 500 hours at maximum brightness.<br />

To eliminate the influence of illumination from other light sources than the back light,<br />

the whole measuring area including the camera is darkened. This guarantees constant<br />

illumination conditions. For the prototype, a wooden rack has been constructed that is<br />

placed around the measuring area on the conveyor. A thick black, non translucent fabric<br />

can be spanned around the rack leaving only two openings where the tubes enter and leave


48 CHAPTER 3. HARDWARE CONFIGURATION<br />

Figure 3.11: Air pressure is used to sort out tubes that do not meet the tolerances. The<br />

blow out unit consisting of an air blow nozzle, light barrier and a controller (not visible in the<br />

image) is placed at a certain distance behind the measuring area.<br />

the function room darkening. For industrial use this interim solution has to be replaced<br />

by a more robust and compact (metal) case that excludes environmental illumination and<br />

protects the whole measuring system against other outside influences in addition. A slight<br />

overpressure inside the closed case or an air filtering system could be integrated to avoid<br />

dust particles from entering the case through the required openings. Any accumulation of<br />

dust or other dirt on the lens is critical and must be prevented.<br />

3.4. Blow Out Mechanism<br />

After a tube has passed the measuring area the measured length is evaluated with respect<br />

to the given target length and tolerance. The result is a binary good/bad decision for<br />

each particular tube. Good tubes are allowed to pass the blow out unit, which is placed<br />

behind the measuring area at a certain distance. On the other hand, tubes that do not<br />

meetthetoleranceshavetobesortedout. Thisisdonebyairpressure. Aairblownozzle<br />

is arranged to blow out tubes from the conveyor. Therefore, the guide bars have to end<br />

behind the measuring area. The whole blow out setup can be seen in Figure 3.11.<br />

The visual inspection system sends the good/bad decision over a RS-232 connection<br />

(serial interface) to a controller unit in terms of a certain character followed by a carriage<br />

return (‘\r’). The used protocol can be seen in Table 3.4. Once the controller receives an<br />

A or B this message is stored in a first-in-first-out (FIFO) buffer.<br />

Message Code<br />

TUBE GOOD ‘A\r’<br />

TUBE BAD ‘B\r’<br />

RESET ‘Z\r’<br />

Table 3.4: Protocol used for communication between the inspection system and the blow<br />

out controller.


3.4. BLOW OUT MECHANISM 49<br />

A light barrier is used to send a signal to the controller when a tube is placed in front<br />

of the air blow nozzle. If the first entry in the FIFO buffer contains a B, thetubehasto<br />

be blown out and the air blow nozzle is activated. On the other hand, if the first entry<br />

contains an A, the tube can pass. In both cases the first entry in the buffer is deleted.<br />

The advantage of this approach is that the current conveyor velocity does not have to<br />

be known to compute the time a tube needs to move from a point x on the measuring<br />

area to the position of the air blow nozzle. The light barrier guarantees the blow out is<br />

activated when the tube is exactly at the intended position.


50 CHAPTER 3. HARDWARE CONFIGURATION


4. Length Measurement Approach<br />

While the previous chapter focused on the hardware setup, this chapter will present the<br />

methodical part of the system. After a brief overview, the different steps including the<br />

camera calibration and teach-in step as well as the tube localization, measuring point<br />

detection, tube tracking and the good/bad classification are introduced. All assumptions<br />

and the model knowledge used throughout these steps are presented before.<br />

4.1. System Overview<br />

The fundamental concept of the developed system is a so called multi-image measuring<br />

strategy. This means, the goal is to measure each tube not only once, but in as many<br />

images as possible while it is in the visual field of the camera. The advantage of this<br />

approach is that the decision whether a particular tube meets the length tolerances can<br />

be made based on a set of measurements. The total length is computed by averaging over<br />

these single measurements leading to more robust results. Furthermore, the system is less<br />

sensitive to detection errors. Depending on the conveyor velocity and the tube length one<br />

can reach between 2 and 10 measurements per tube.<br />

The system is designed to work without any external trigger that provokes the camera<br />

to grab a frame depending on a certain event, e.g. a tube passing a light barrier. Instead,<br />

the camera is operated in continuous mode, i.e. images are captured at a constant frame<br />

rate using an internal trigger. The absence of an external trigger, however, requires fast<br />

algorithms to evaluate whether a frame is useful, i.e. whether a measurement is possible.<br />

In addition, the system must be able to track a tube while it is in the visual field of the<br />

camera to assign measurements to this particular tube. Accurate length measurements of<br />

tubes require the very accurate detection of the tube edges. A template based tube edge<br />

localization method has been developed allowing for reliable, subpixel accurate detection<br />

resultsevenunderthepresenceoftubeedgelikebackgroundclutter. Oncethereisevidence<br />

that a tube has left the visual field of the camera, all corresponding measurements have<br />

to be evaluated with respect to the given target length and tolerances. The resulting<br />

good/bad decision must be delegated to the external controller handling the air pressure<br />

based blow out mechanism. Model knowledge regarding the inspected tubes under the<br />

constrained conditions is exploited if possible to optimize the processing.<br />

Before any measurements can be performed, the system has to be calibrated and trained<br />

to the particular target length. This includes camera positioning, radial distortion compensation<br />

and an online teach-in step.<br />

Figure 4.1 gives an overview on the different stages of the system. It can also be seen<br />

as outline of this section. The underlying methods and concepts will be introduced in the<br />

following in more detail.<br />

Throughout this chapter all parameters will be handled abstract. Corresponding value<br />

assignments used in the experiments are given in Section 5.1.1.<br />

51


52 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

No<br />

No<br />

Camera calibration<br />

Teach-In<br />

Next image<br />

Tube localization<br />

Measurement<br />

possible?<br />

Yes<br />

Measuring point detection<br />

Length measuring<br />

Tube passed?<br />

Yes<br />

Total length computation<br />

Good/bad classification<br />

Blow out control<br />

Figure 4.1: System overview. After camera calibration and a teach-in step the system<br />

evaluates the acquired images continuously. If a tube is located and assigned as measurable,<br />

the exact measuring points on the tube edges are detected and the tube length is calculated.<br />

Once a tube has passed the visual field of the camera, the computed total length is compared<br />

to the allowed tolerances for a good/bad classification. Finally, the blow out controller is<br />

notified whether the current tube is allowed to pass.


4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 53<br />

(a) ‘empty’ (b) ‘entering’ (c) ‘leaving’<br />

(d) ‘centered’ (e) ‘entering + centered’ (f) ‘ent. + centered + leav.’<br />

(g) ‘centered + leaving’ (h) ‘entering + leaving’ (i) ‘full’<br />

Figure 4.2: Potential image states. Each image can be categorized into one of these nine<br />

states. States that contain one tube completely with a clear spacing to neighboring tubes can<br />

be used for length measuring, i.e. state (d), (e), (f) and (g) respectively. The remaining states<br />

do not allow for a measurement and, thus, can be skipped. State (i) might be due to a too<br />

small field of view of the camera (i.e. tubes are too large), or to a failure in separation (i.e.<br />

the spacing between two or more tubes is missing). If this state is detected, a warning must<br />

be thrown.<br />

4.2. Model Knowledge and Assumptions<br />

The visual length measurement of heat shrink tubes to be proposed throughout this chapter<br />

is based on several assumptions and model knowledge regarding the inspected objects,<br />

which is introduced in the following.<br />

4.2.1. Camera Orientation<br />

As introduced in Section 3.2.2, the camera is placed above the conveyor. It must be<br />

adjusted to fulfill the following criteria:<br />

The optical ray is perpendicular to the conveyor<br />

The image plane is parallel to the conveyor<br />

This camera view is commonly denoted as fronto-parallel view [30]. If the image plane<br />

is parallel to the conveyor, the average scene depth is quite small. Therefore it is possible<br />

to approximate the perspective projection with a weak-perspective camera model. In this<br />

model (see Section 2.1.3) objects are projected onto the image plane up to a constant<br />

magnification factor. This means distances between two points lying in the same plane<br />

are preserved in the image plane until a constant scale factor. This property is important<br />

to allow for affine distance measurements in a fronto-parallel image view.<br />

4.2.2. Image Content<br />

The following assumptions regard the image content and capture properties of the camera:


54 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

(a) Ideal tube model (b) Perspective tube model<br />

Figure 4.3: (a) In the ideal model, the (parallel) projection of a 3D tube corresponds to a<br />

rectangle in the image. The distance d between the left and right edge is equal at each height.<br />

Under a perspective camera, objects closer to the camera appear larger in the image. Hence,<br />

the distance d1, belonging to the points on the tube edges that are closest to the camera,<br />

is larger than d2, and d2 is larger than d3 (the distance of the edge points that are farthest<br />

away). Note, the dashed lines are not visible in the image under back light, and the tube<br />

edges appear convex.<br />

Only one tube is visible completely (with left and right end) in each image at one<br />

time<br />

There is a clear spacing between two consecutive tubes<br />

The guide bars cover the upper and lower border of each image<br />

The guide bars are parallel and in horizontal direction<br />

The moving direction is from the left to the right<br />

The mean intensity of the background (conveyor belt) is brighter than the foreground<br />

(heat shrink tubes)<br />

There is a sufficient contrast between background and objects<br />

The video capture rate is fast enough to take at least one valuable image of each<br />

tube segment so that a length measurement can be performed. (Potentially the<br />

production speed has to be reduced to qualify this constraint)<br />

The image is not distorted, i.e. straight lines in the world are imaged as straight<br />

lines and parallel lines are also parallel in the image<br />

In this application, the variety of image situations to be observed is highly limited and<br />

constraint by the physical setup (see Chapter 3). Thus, it is possible to reduce the number<br />

of potential situations to nine defined states. Each image can be categorized into exactly<br />

one of these states as shown in Figure 4.2 by means of synthetic representatives. Only<br />

four of the nine states are measurable, i.e. state (d), (e), (f) and (g) respectively. In these<br />

states a tube is completely in the image.<br />

4.2.3. Tubes Under Perspective<br />

Under ideal conditions, i.e. with a parallel projection, a tube on the conveyor is represented<br />

by a rectangle in the image plane with the camera setup used (see Figure 4.3(a)). Due


4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 55<br />

Figure 4.4: The plane parallel to the conveyor plane ΠC that goes through the measuring<br />

points PL and PR is denoted as measuring plane ΠM . TheredlineinΠM between PL and<br />

PR corresponds to the measured distance d1 in Figure 4.3(b), i.e. the distance between the<br />

mostouterpointsoftheprojectedtubeedgeinanimage.<br />

to the guide bars this rectangle is oriented parallel to the x-axis in horizontal direction<br />

and parallel to the y-axis in vertical direction respectively. The length can be measured<br />

between the left and right edge of the tube in horizontal direction. The horizontal distance<br />

d is equal between the left and right tube boundary independent of the height. This is an<br />

ideal property for length measurements.<br />

However, if the camera is not provided with a telecentric lens or the camera is not placed<br />

at infinity, the tube’s projection is influenced by perspective. In general, objects that are<br />

closer to the camera are imaged larger than objects further away. Thus, the left and right<br />

tube edge do not appear straight in the image, but curved in a convex fashion due to the<br />

different distances between a point on the tube’s surface and the camera. Figure 4.3(b)<br />

visualizes a synthetic tube under perspective. The distance d1 between the two edge<br />

points closest to the camera is larger than the distances between points farther away.<br />

Accordingly, d2 is larger than d3, although in the real world d1 =d2 =d3 (assuming the<br />

tube is not cut skew). The perspective curvature increases with the distance to the image<br />

center. Thus, the maximum curvature is reached at the image boundaries, while an edge<br />

that lies directly below the optical center of the camera (approximately the image center)<br />

appears straight.<br />

With the constraints regarding the image content it is not possible to look inside a tube<br />

from the camera view if a tube is completely in the image. Therefore, one can assume<br />

that the most outer edge point of the tube edge corresponds always to the point that is<br />

closest to the camera, i.e. measuring between these two points corresponds always to the<br />

same distance in the world.<br />

In the following PL and PR will denote the points on the left and right side respectively<br />

that are closest to the camera. The tube length in the real world is defined as the length<br />

of the line connecting these two points (corresponding to d1 in Figure 4.3(b)). Assuming<br />

a tube has the same height at the left and right side, PL and PR lie in the same plane<br />

denoted as measuring plane ΠM. This plane is assumed to lie parallel to the image plane<br />

as can be seen in Figure 4.4. The measuring points have two correspondences in the image<br />

denoted as pL =(xpL ,ypL )T and pR =(xpR ,ypR )T respectively. The distance between pL<br />

and pR in the image can be related to the real world length up to a certain scale factor.


56 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

However, this scale factor may differ depending on the image position. It is expected<br />

that the distance between pL and pR will be slightly shorter at the image boundaries and<br />

maximal at the image center due to perspective.<br />

4.2.4. Edge Model<br />

Thetubeedgesaremodeledasramp edges as introduced in Section 2.3.1, since this model<br />

describes the real data most adequate both for transparent and black tubes. The slope of<br />

the ramp determines the sharpness of an edge. As steeper the rise (or fall respectively) as<br />

sharper the edge. Obviously, the edge position can be located much more precise if the<br />

ramp has only a minimum spatial extension.<br />

As mentioned before in the technical background section there are several factors that<br />

can cause ramp edges including the discrete pixel grid, the camera focus, and motion blur.<br />

The first factor can be reduced if using a high-resolution camera (reminding the trade off<br />

between resolution and speed as discussed in Section 3.2.1). The camera focus depends<br />

mainly on the depth of an object. In this application, the depth of an object does not<br />

change over time, since all tubes in a row have the same diameter and are lying on the<br />

planar conveyor belt which is parallel to the image plane. In the following it is assumed<br />

that the camera and the optical system are adjusted in way that a tube is imaged as sharp<br />

as possible. Motion is another common parameter influencing the appearance of an edge.<br />

Since the tubes are inspected at motion (up to 40m/min), a short shutter time (exposure<br />

time) of the camera is required. If the shutter time is too large, light rays from one point on<br />

the tube contribute to the integrated intensity values of several sensor elements along the<br />

moving direction. Especially the left and right tube boundary considered for measuring<br />

are affected by motion blur as they lie in the moving direction.<br />

Therefore, it is assumed that the shutter of the camera is adjusted to a very small<br />

exposure time to suppress motion blur as much as possible. A short shutter time requires<br />

a large amount of light to enter the camera at one time. The iris optical system has to be<br />

wide open (corresponding to a little F-number) and the illumination must be sufficiently<br />

bright.<br />

4.2.5. Translucency<br />

Translucency is the main property to distinguish between transparent and black tubes.<br />

Black tubes do not transmit light leading to one uniform black region with strong edges<br />

in the image under back light. In this case, the local edge contrast at a certain position<br />

depends on the background only. On the other hand, transparent tubes transmit light.<br />

However, some part of the light is also absorbed or reflected in directions that do not<br />

reach the camera. Therefore, a tube will appear darker in the image compared to the<br />

background. It will even be darker at positions where the light has to go through more<br />

material. This leads to two characteristic dark horizontal stripes at the top and bottom of<br />

a transparent tube as can be seen in Figure 4.5. This model knowledge has been exploited<br />

to define a robust feature for edge localization which can still be detected in situations<br />

where the contrast at the center of the edge is poor.<br />

The printing on the tubes also reduces the translucency and is therefore visible on<br />

transparent tubes in the image. On average it covers about 8% of a tube’s surface along<br />

the perimeter for 6, 8, and 12mm diameter tubes.


4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 57<br />

Figure 4.5: The image intensity of transparent tubes is not uniform as for black tubes.<br />

Depending on how much light can pass through a tube, regions appear darker or brighter.<br />

One characteristic of transparent tubes under back light are two dark horizontal stripes at<br />

the top and the bottom of a tube indicated by the arrows. The printing also reduces the<br />

translucency and thus appears darker in the image.<br />

4.2.6. Tube Orientation<br />

The tube orientation is highly constrained by the guide bars as introduced in Section 3.1.<br />

Thus, an approximately horizontal orientation can be assumed throughout the design of<br />

the inspection algorithms.<br />

In practice, the distance between the guide bars is slightly larger than the outer diameter<br />

of a tube to prevent a blockage, since tubes may not be ideally round. This means, the<br />

cross-section of a tube can be elliptical instead of circular. Let dspace denote the vertical<br />

distance between the guide bar distance dGB, and hmax the maximum expected tube<br />

extension in vertical direction with respect to the image projection. The remaining spacing<br />

distance can be expressed as dspace = dGB − hmax ascanbeseeninFigure4.6(a).<br />

The maximum possible rotation is reached if the tube hits both guide bars at two points<br />

(see Figure 4.6(b)). The maximum angle of rotation θmax can be defined as the angle<br />

between the longitudinal axis of the tube and the x-axis. One can define an unrotated<br />

version of the tube with the longitudinal axis parallel to the x-axis and shifted so that the<br />

two axis intersect at the center of gravity of the rotated tube. In Figure 4.6(b) this virtual<br />

tube is visualized as dashed rectangle. The distance between the measuring points of the<br />

rotated and the ideal horizontal tube can be also seen in the Figure and are denoted as<br />

dL and dR for the left and right tube side respectively. Both dL and dR are ≤ dspace/2. If<br />

thetubeisnotbent,dL = dR. The maximum error between the ideal distance l and the<br />

rotated distance l ′ can be estimated as follows:<br />

errθ = l ′ − l<br />

�<br />

(4.1)<br />

= l2 + d2 space − l<br />

For example, in a typical setup for 50mm tubes of 8mm diameter one tube has a length of<br />

approximately 415 pixels and dspace = 15. This leads to an error of errθ =0.27pixel. Thus,<br />

with one pixel representing 0.12mm in the measuring plane, the acceptable maximum error<br />

due to orientation would be about 0.03mm. On average this error will be even smaller.<br />

Based on these estimation, the orientation error is neglected in the following, i.e. all tubes<br />

are assumed to be oriented ideally horizontal.


58 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

(a)<br />

(b)<br />

Figure 4.6: (a) The guide bar distance dGB and the maximum extension of a tube in vertical<br />

direction hmax define the maximum space between a tube and the guide bars dspace at ideal<br />

horizontal orientation. (b) The maximum possible tube orientation is limited by the guide<br />

bars. The angle θ between the longitudinal axis of the tube and the ideal measuring distance<br />

parallel to the x-axis determines the maximum distance the measuring point can be displaced<br />

by rotation (dL = dR ifthetubeisnotbent).Thisdistanceis≤ dspace/2 and can be used to<br />

estimate the error due to rotation between the ideal tube length l andtherotateddistancel ′ .


4.3. CAMERA CALIBRATION 59<br />

4.2.7. Background Pattern<br />

As introduced in Section 3.3, the measuring area is illuminated by a back light setup below<br />

the conveyor belt. This setup emphasizes the structure of the belt which can be seen as a<br />

characteristic pattern in the image. This pattern may differ between different belt types.<br />

Depending on the light intensity it is possible to eliminate the background completely. If<br />

the light source is bright enough, the background appears uniform white even with a short<br />

shutter. For black tubes such an overexposed image would lead to an almost binary image.<br />

Transparent tubes, however, do also disappear under too bright illumination. Hence, there<br />

will be always a certain amount of background structure visible in the image in practice.<br />

The strength of the background pattern increases with lower light intensity.<br />

In the following, it is generally assumed that the illumination is adjusted to allow for<br />

distinguishing between a tube edge and edges in the background. Larger amounts of dirt<br />

or other particles than heat shrink tubes on the conveyor must be prevented.<br />

4.3. Camera Calibration<br />

In the previous section several assumptions regarding the camera position and the image<br />

content have been presented. With respect to accurate measurements it is important that<br />

an object is imaged as reliably as possible, this means, straight lines should appear straight<br />

and not curved in the image, parallelism should be preserved, and objects of the same size<br />

should be mapped to the same size in the image. Unfortunately, the later properties do<br />

not hold in the perspective camera model as introduced before. However, under certain<br />

constraintsitispossibletominimizetheperspectiveeffects.<br />

If the internal camera parameters are known including the radial and tangential distortion<br />

coefficients, it is possible to compute an undistorted version of an image. After<br />

undistorting, straight lines in the world will appear as straight lines in the image. Furthermore,<br />

if one can arrange the camera in way that objects of equal size are projected<br />

onto the same size in the image within the camera’s field of view at a constant depth, one<br />

can assume that the image plane is approximately parallel to the conveyor.<br />

In the following the calibration method used to receive the intrinsic camera parameters<br />

as well as a method to arrange the camera in a way that perspective effects are minimized<br />

is presented.<br />

4.3.1. Compensating Radial Distortion<br />

To compensate for the radial distortion of an optical system, one needs to compute the<br />

intrinsic camera parameters. Since the intrinsic parameters can be assumed to be constant<br />

if the focal length is not changed, the calibration procedure does not have to be repeated<br />

every time the system is started and therefore can be precomputed offline.<br />

The common Camera Calibration Toolbox for Matlab of Jean-Yves Bouguet [9] is used<br />

for this purpose. It is closely related to the calibration method proposed in [74] and [31].<br />

The calibration pattern required in this method is a planar chessboard of known grid size.<br />

The calibration procedure has to be performed for each lens separately. The camera is<br />

placed at a working distance of approximately 250mm over the measuring area with a<br />

16mm fix-focal lens. It is adjusted to bring tubes with a diameter of 8mm at this distance<br />

intofocus(inthemeasuringplaneΠM).


60 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

Figure 4.7: 16 sample images used for calibrating the intrinsic camera parameters.<br />

16 images of a 21 × 10 chessboard of 2.5mm grid size at different spatial orientations<br />

around the measuring plane ΠM have been acquired. A selection of this images can be<br />

found in Figure 4.7.<br />

In each image the outer grid corners have to be selected by hand. The remaining corners<br />

are then extracted automatically at subpixel accuracy as can be seen in Figure 4.8. The<br />

coordinate axis of the world reference frame are also visualized. The Z axis is perpendicular<br />

to the chessboard plane in direction to the camera.<br />

The result of this calibration procedure are the intrinsic camera parameters including<br />

the radial distortion coefficients. The Camera Calibration Toolbox for Matlab allows also<br />

for visualization of the extrinsic location of each of the 16 calibration pattern with respect<br />

to the camera as shown in Figure 4.9. The actual working distance of approximately<br />

250mm is reconstructed very well. The resulting radial distortion model can be found in<br />

Figure 4.10. In Section 3.2 the area of interest function of the camera has been introduced<br />

since the whole image height is not needed. Obviously, the goal is to select the location of<br />

this area with respect to minimum distortions. The position of the AOI within a full size<br />

image is visualized by the red lines, i.e. only pixels between these lines are considered.<br />

4.3.2. Fronto-Orthogonal View Generation<br />

Once distortion effects have been compensated, the goal is to yield a view of the measuring<br />

area in which the world plane, e.g. the conveyor belt, is parallel to the image plane. There<br />

are two main strategies that can be applied.<br />

In the first strategy the camera is positioned only roughly. Afterward the perspective<br />

image is warped to yield an optimal synthetic fronto-orthogonal view of the scene. In the<br />

second strategy the camera is adjusted as precise as possible so that the resulting image<br />

is approximately fronto-orthogonal and does not need any correction.


4.3. CAMERA CALIBRATION 61<br />

Figure 4.8: Extracted grid corners at subpixel accuracy. The upper right corner is defined as<br />

origin O of the world reference frame. The directions of the X and Y axis are also visualized<br />

while the Z axis is perpendicular to the chessboard plane in direction to the camera.<br />

20 11<br />

0<br />

20<br />

40 20020<br />

O<br />

c Z<br />

X c<br />

c<br />

Y<br />

c<br />

0<br />

Extrinsic parameters (camera centered)<br />

50<br />

100<br />

150<br />

15<br />

14 6<br />

312<br />

516<br />

217 4<br />

1079<br />

81<br />

13<br />

Figure 4.9: Reconstructed extrinsic location of each calibration pattern relative to the camera.<br />

The working distance of approximately 250mm is detected very well.<br />

200


62 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

Figure 4.10: Visualization of the resulting radial distortion model. The computed center of<br />

distortion indicated by the ‘◦’ is slightly displaced from the optical center (‘×’). The image<br />

area of interest considered in this application lies in between the red lines.<br />

Perspective Warping One possibility to compute a synthetic fronto-orthogonal view of<br />

an image is based on the extrinsic relationship of the camera plane and a particular<br />

world plane (e.g. conveyor plane) that can be extracted in a calibration step. With the<br />

extrinsic parameters it is possible to describe the position and orientation of the world<br />

plane in the camera reference frame. Finally, one can compute a transformation that<br />

maps the world plane into a plane parallel to the image plane or vice versa, and warp the<br />

image to a synthetic fronto-orthogonal view. This approach has a significant drawback.<br />

First of all, the accuracy of the results is closely related to the calibration accuracy.<br />

Furthermore, the extrinsic parameters of a camera change if the camera is moved even<br />

slightly compared to the intrinsic parameters that can be assumed constant as long as<br />

the focus is not changed. Thus, one has to recalibrate the extrinsic parameters as well as<br />

the transformation parameters every time the camera is moved, which seemed to be not<br />

practicable in this particular application.<br />

There are other methods that can be used to compute a fronto-orthogonal view of an<br />

perspective image, which are based on characteristic image features such as parallel or<br />

orthogonal lines, angles, or point correspondences and do not need any knowledge on the<br />

interior or exterior camera parameters [30]. One common approach is based on point<br />

correspondences of at least 4 points xi and x ′ i with x′ i = Hxi (1 ≤ i ≤ 4) and<br />

⎡<br />

H = ⎣<br />

h1 h2 h3<br />

h4 h5 h6<br />

h7 h8 h9<br />

⎤<br />

⎦ (4.2)<br />

the projective transformation matrix representing the 2D homography.<br />

The unknown parameters of H can be computed in terms of the vector cross product<br />

x ′ i × Hxi = 0 using a Direct Linear Transformation (DLT) [30]. To correct the perspective<br />

of an image one has to find four points in the image that lie on the corners of a rectangle<br />

in the real world, but are perspectively distorted in the image. These points xi have to be<br />

mapped to points x ′ i that represent the corners of a rectangle in the image. Then, after H


4.3. CAMERA CALIBRATION 63<br />

is computed, each point in the image is transformed by H. Obviously, this is an expensive<br />

operation for larger images. Furthermore, in practice the question is where to place the<br />

calibration points. One possibility is to place them on top of the guide bars. The system<br />

could automatically detect the calibration points and check whether these points lie on a<br />

rectangle in the affine image space. This requires a very accurate positioning of the guide<br />

bars, and all marker points should be coplanar, i.e. lie in one plane. Assuming one can<br />

solve this mechanical problem there is still another problem, since - depending on how the<br />

destination rectangle is defined - the warped image may be scaled. In any case, warping<br />

discrete image points requires interpolation since transformed points may fall in between<br />

the discrete grid. Obviously, this can reduce the image quality.<br />

Online Grid Calibration Although the previous described approach does not require an<br />

accurate positioning of the camera, there are several drawbacks especially with respect to<br />

performance and image reliability. If there is a way to adjust the camera perfectly one<br />

does not need warping and perspective correction. However, a human operator must be<br />

able to perform this positioning task in an appropriate time.<br />

Therefore, an interactive camera positioning method has been developed denoted as<br />

Online Grid Calibration.<br />

First, the distance of the parallel guide bars has to be adjusted to the current tube size.<br />

Then, a planar chessboard pattern of known size is placed between the guide bars on the<br />

conveyor within the visual field of the camera. The horizontal lines on the chessboard must<br />

be parallel to the guide bars (see Figure 4.11). To simplify the adjustments, a mechanical<br />

device may be developed that can be placed in between the guide bars combining the<br />

function of a spacer bringing the guide bars into the right distance, and the calibration grid<br />

that perfectly fits into the space between the guide bars with the designated orientation.<br />

The underlying idea is as follows: If the chessboard is imaged in a way that vertical lines<br />

in the world are vertical in the image and horizontal lines appear horizontal respectively,<br />

while each grid cell of the chessboard results in the same size in the image, the camera is<br />

adjusted accurate enough to yield a fronto-orthogonal view.<br />

The process of camera adjustment can be simplified if the operator gets a feedback in<br />

real-time of how close the current viewing position is to the optimal position. Therefore,<br />

the live images of the camera are overlaid with an optimal visual grid of squares. This grid<br />

can be parametrized by two points, i.e. the upper left corner and the lower right corner<br />

respectively as well as the vertical and horizontal size of each grid cell. The operator can<br />

move the grid in horizontal and vertical direction and adjust the size. This is a good<br />

feature to initialize the grid or to perform the fine adjustments.<br />

For each image, the correspondence between the overlaid virtual grid and the underlying<br />

image data is computed. A two step method has been developed. At first the image<br />

gradient both in vertical and horizontal direction is extracted using the SOBELX and<br />

SOBELY operator. This information can be used to approximate the gradient magnitude<br />

and orientation (see Equation 2.21 and 2.24). Since there is a strong contrast between<br />

the black and white chessboard cells, the gradient magnitude at the edges is strong as<br />

well. If the virtual grid matches the current image data, the gradient orientation φ(p)<br />

on horizontal grid lines must be ideally π/2 or3π/2 respectively depending on whether<br />

an edge is a black-white or white-black transition. Remind that the gradient direction is<br />

always perpendicular to the edge. Correspondingly, vertical grid lines have orientations of


64 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

(a)<br />

(b)<br />

Figure 4.11: Online Grid Calibration using a 5 × 5mm chessboard pattern. (a) Calibration<br />

image distorted by perspective. The goal in this calibration step is to adjust the camera in a<br />

way that the chessboard pattern perfectly fits the overlaid grid as in (b).<br />

0orπ. Inpractice,thegradientorientationisallowedtobeinanarrowrangearoundthe<br />

ideal orientation, since the computation of φ(p) is only an approximation that estimates<br />

the real orientation up to an epsilon (see Figure 4.12(b)). Thus, theoretically each position<br />

on the virtual grid must meet the orientation constraints. In addition, the gradient<br />

magnitude must reach a certain threshold to prevent that edges induced by noise influence<br />

the calibration procedure.<br />

To reduce the computational load only a selection of points on the grid denoted as<br />

control points is considered. The position of these points can be seen in Figure 4.12(a).<br />

The ratio of grid matches to the total number of control points can be seen as score<br />

of correspondence. If the score reaches a threshold, e.g. more than 95% of all checked<br />

positions on the virtual grid match the real image data, the second step of the calibration<br />

is started.<br />

The second step concentrates on the size of each grid cell. Assuming negligible perspective<br />

effects if the camera is perfectly positioned, all grid cells should have the same<br />

size in the image. To compute the size of each grid cell as accurate as possible, the real<br />

edge location of the grid is detected with subpixel precision within a local neighborhood<br />

of each control point on the virtual grid. Therefore, the gradient magnitude of a 7 × 1<br />

neighborhood perpendicular to the grid orientation at a given control point is interpolated<br />

using cubic splines. Then, the width and height of a grid cell can be determined over<br />

the affine distance between two opposed subpixel grid positions. Finally, the mean grid<br />

size and standard deviation can be computed both for width and height. The standard<br />

deviation is used as measure of how close the current camera viewing position equals a<br />

fronto-orthogonal view. Ideally, if all squares have equal size, the standard deviation is<br />

zero. In practice the standard deviation is always larger than zero for example due to<br />

noise, edge localization errors, or a remaining small error of perspective. Experiments


4.4. TUBE LOCALIZATION 65<br />

(a) (b)<br />

Figure 4.12: (a) Control points (marked as crosses) are used to adjust the virtual calibration<br />

grid of width w and height h to the underlying image data. (b) Gradient orientation φ at<br />

each control point. Since the computed values are only an approximation, a narrow range of<br />

orientations indicated by the gray cones around the ideal orientation is also seen as match.<br />

have shown that it is possible to adjust the camera within an acceptable time to yield a<br />

100% coverage in step one and a grid standard deviation of less than 0.3pixels. In this<br />

case the camera is assumed to be adjusted good enough for accurate measurements.<br />

4.4. Tube Localization<br />

Since the system is intended to work without any external trigger (e.g. a light barrier)<br />

that gives a signal whenever a tube is totally in the visual field of the camera, the first<br />

step before further processing of a frame is to check whether there is a tube in the image<br />

that can be measured or not. If there is no tube in the image or only in parts, this image<br />

can be neglected. This decision has to be very fast and reliable.<br />

4.4.1. Gray Level Profile<br />

To classify an image into one of the states proposed in Section 4.2.2, an analysis of the intensity<br />

profile along the x-axis is performed. Strong changes in intensity indicate potential<br />

boundaries between tubes and background.<br />

In ideal images as be seen in Figure 4.2, the localization of object boundaries is almost<br />

trivial with standard edge detectors (see Section 2.3). In real image sequences, however,<br />

there are many changes in intensity of different origin that do not belong to the boundaries<br />

of a tube, e.g. caused by the background pattern (see Figure 3.8) or by dirt on the conveyor<br />

belt. Furthermore, the printing on transparent tubes, visible in the image using back light<br />

illumination, influences the intensity profile as will be seen later on.<br />

The intensity profile ˆ Py of an image row y can be formally defined as


66 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

250<br />

200<br />

150<br />

100<br />

50<br />

(a) transparent, 50mm length, ∅8mm (b) black, 50mm length, ∅8mm<br />

gray level profile<br />

0<br />

0 100 200 300 400 500 600 700<br />

(c)<br />

250<br />

200<br />

150<br />

100<br />

50<br />

gray level profile<br />

0<br />

0 100 200 300 400 500 600 700<br />

Figure 4.13: Sample images with 11 equally distributed vertical scan lines used for profile<br />

analysis within a certain region of interest. (c) and (d) show the resulting profiles of image<br />

(a) and (b) respectively.<br />

ˆPy(x) =I(x, y) (4.3)<br />

where I(x, y) indicates the gray level value of an image I at pixel position (x, y). Since<br />

a single scan line (e.g. ˆ P h/2 with h the image height) is very sensitive to noise and local<br />

intensity variations, the localization of the tube boundaries based on the profile of a single<br />

row can be error-prone. Hence, a set of n parallel scan lines is considered. The mean<br />

profile Pn of all n lines is calculated by averaging the intensity values at each position:<br />

Pn = 1<br />

n<br />

n�<br />

ˆPyi<br />

i=1<br />

(d)<br />

(4.4)<br />

One property of the resulting profile Pn is the projection of a two-dimensional to an onedimensional<br />

problem which can be solved even faster (processing speed is a very important<br />

criteria at this step of the computation). Since further processing steps with respect to Pn<br />

are independent of the number of scan lines n (n ≥ 1), Pn is denoted simply as P in the<br />

following. A more detailed view on the number of scan lines and the scan line distribution<br />

with respect to robustness and performance is given in Appendix A. In the following Nscan<br />

denotes the number of scanlines used.<br />

4.4.2. Profile Analysis<br />

Step 1: The first step is smoothing the profile P by convolving with a large 1D mean<br />

filter kernel of dimension Ksmooth:


4.4. TUBE LOCALIZATION 67<br />

�<br />

1<br />

Psmooth = P ∗<br />

Ksmooth<br />

�<br />

1<br />

Ksmooth<br />

��<br />

...<br />

�<br />

1<br />

Ksmooth<br />

�<br />

Ksmooth times<br />

(4.5)<br />

The idea of this low pass filtering operation is to reduce the high-frequency components<br />

in the profile, thus, especially the structure of the background pattern.<br />

Obviously, this step also blurs the tube edges, and therefore reduces the detection precision<br />

significantly. Having in mind the goal of the profile analysis, it is intended to verify<br />

whether a measurement is possible in the current frame or not. In a next step, the proper<br />

measurements have to be performed on the original image data and not on the profile.<br />

However,knowledgeofthisfirststepdoesnothavetobediscardedandcanbeusedinstead<br />

to optimize the following. In other words, if it is possible to predict a tube’s boundaries<br />

reliable, but not precise, this information is then used to define a region of interest (ROI)<br />

as close as possible around the exact location.<br />

Step 2: The next step is to detect strong changes in the profile. Large peaks in the first<br />

derivative of the profile indicate such changes and can be considered as candidates for<br />

tube boundaries. Therefore, a convolution with a symmetric 1D kernel approximating the<br />

first derivative of a Gaussian is performed:<br />

Pdrv = Psmooth ∗ Dx<br />

(4.6)<br />

The odd symmetric 9 × 1 filter kernel Dx is given by the following filter tab as proposed<br />

in [25] for the design of steerable filters:<br />

tab 0 1 2 3 4<br />

value 0.0 0.5806 0.302 0.048 0.0028<br />

With this kernel a dark-bright edge results in a negative response while a bright-dark<br />

edge leads to a positive response. The intensity of the response is proportional to the<br />

contrast at the edge.<br />

Assuming the potential tube boundaries have a sufficient contrast, only the strongest<br />

peaks of Pdrv are of interest for later processing. To simplify the task of peak detection,<br />

theabsolutevaluesofthedifferentiatedprofilearetakenintoaccountonly. Thisisdenoted<br />

as follows:<br />

as P +<br />

drv<br />

P +<br />

drv = |Pdrv| (4.7)<br />

Note that the information of the sign of a peak in Pdrv is still useful for later classification<br />

and has not to be discarded.<br />

Step 3: A thresholding is performed on P +<br />

drv to eliminate smaller peaks that correspond<br />

for example to changes in intensity due to the background pattern or dirt:<br />

� +<br />

P<br />

Pthresh(x) = drv (x)<br />

0<br />

+<br />

, if P drv (x) >τpeak<br />

, otherwise<br />

(4.8)


68 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

The threshold τpeak is calculated dynamically based on the mean of P +<br />

drv<br />

P +<br />

drv with<br />

τpeak = αpeakP +<br />

drv<br />

denoted as<br />

(4.9)<br />

The factor αpeak indirectly relates to the number of peaks left to be further processed.<br />

τpeak is also denoted as profile peak threshold. The goal is to remove as much peaks as<br />

possible that do not belong to a tube’s boundary without eliminating any relevant peak.<br />

If the images are almost uniform over larger regions as for black tubes, there are only<br />

a few strong changes in intensity. Thus, P +<br />

drv is expected to be quite low compared to<br />

max(P +<br />

) and the peaks belonging to the tube boundaries are conserved even for a larger<br />

drv<br />

αpeak. On the other hand, for transparent tubes the contrast between foreground and<br />

background is lower. Hence, the distance between intensity changes due to background<br />

clutter and those at the tube boundaries is much smaller. The choice of the right threshold<br />

is more critical in this situation and αpeak has to be selected carefully. If it is too low,<br />

too many peaks will survive the thresholding. Otherwise if it is too large, important<br />

peaks will be eliminated as well. The profile peak threshold is closely related to the<br />

detection sensitivity of the system as will be discussed in more detail in later sections.<br />

More sophisticated calculations of τpeak considering the difference between maximum value<br />

and mean or the median did not perform better.<br />

Step 4: The x-coordinates of the remaining peaks defined as local maxima in Pthresh<br />

arestoredinalistdenotedascandidate positions Ω in ascending order. NΩ indicates the<br />

number of elements in Ω, i.e. the number of potential tube boundaries in an image.<br />

4.4.3. Peak Evaluation<br />

The process described in the previous section results in a number of candidate positions<br />

that have to be evaluated since it is possible that there are more candidate positions<br />

than the number of tube boundaries. This is due to the fact that the thresholding is<br />

parametrized to avoid the elimination of relevant positions. The actual number of tube<br />

boundaries indicating the current state as introduced in Section 4.2 is not known by now<br />

and has to be extracted by applying model knowledge to the candidate positions.<br />

Since only four of the nine possible states can be used for measuring, it is of interest to<br />

know whether the current image matches one of these four states. If this is the case, it is<br />

sufficient to localize the boundaries of the centered tube. Under the assumptions made in<br />

Section 4.2 only one tube can be in the visual field of the camera completely at one time.<br />

In the following, an approach reducing this problem to an iterative search for boundaries<br />

that belong to a single foreground object is presented.<br />

First, Ω is extended to Ω ′ by two more x-positions: x = 0 at the front and x = xmax<br />

at the back of the list, where xmax is the largest possible x-coordinate in the profile.<br />

Then, any segment s(i), defined as the region between two consecutive positions Ω(i) and<br />

Ω(i + 1) , can be assigned to one of two classes in {BG, TUBE} representing background<br />

and foreground respectively. In this way, the whole profile is partitioned into NΩ +1<br />

segments if there are NΩ peaks.


4.4. TUBE LOCALIZATION 69<br />

Global Threshold The classification into BG and TUBE is based on the general assumption<br />

that the mean intensity of objects is darker than the background. In more detail,<br />

taking the mean value of the smoothed profile Psmooth as a global reference and calculating<br />

the local mean value for each segment s(i), the classification C can be expressed as:<br />

�<br />

TUBE , mean(s(i)) ≤ Psmooth<br />

C1(s) =<br />

(4.10)<br />

BG , otherwise<br />

In image segmentation the mean value is widely used as an initial guess of a threshold<br />

separating two classes of data distinguishable via the gray level [48, 2]. There are many<br />

more sophisticated approaches for threshold selection including histogram shape analysis<br />

[57, 63, 26], entropy [54], fuzzy sets [20, 14] or cluster-based approaches [55, 46]. The different<br />

techniques are summarized and compared in several surveys [59, 47, 60]. However,<br />

in this application the threshold is used for classification and it is not intended for calculation<br />

of a binary image that segments the tubes from the background. Since processing<br />

time is strictly limited and critical in this application, it is essential to save computation<br />

time if possible. As introduced before, the actual segmentation is based on strong vertical<br />

edges in the profile, but does not include any semantic meaning of the segments. In the<br />

classification step, the mean turned out to be a reliable and fast choice to distinguish between<br />

foreground and background segments both for black and transparent tubes if there<br />

is a uniform and sufficient contrast between tubes and the background over the whole image.<br />

In this case there is no need for another threshold than the mean - saving additional<br />

operations.<br />

Insteadofcomparingtheglobalmeanwiththelocalmean,thelocalmediancouldbe<br />

observed to result in a more distinct measure for discrimination:<br />

�<br />

TUBE , median(s(i)) ≤ Psmooth<br />

C2(s) =<br />

(4.11)<br />

BG , otherwise<br />

The better performance of measure C2 originates in the characteristic of the median<br />

tobelesssensitivetooutlierscomparedtothemean[32]. Thisisimportantsincethe<br />

input data can be very unsteady due to the background texture or printing visible on<br />

transparent tubes (independent of the additional camera noise level). As mentioned before,<br />

thesmoothingoftheprofileatthefirststepalsoblursthetubeedgescausingthesegment<br />

boundaries not to be totally precise. In this case, the local mean tends to move closer<br />

to the global mean, which does not have to implicate a misclassification. The median,<br />

however, turned out to be more distinct in most cases. Figure 4.14 shows the smoothed<br />

profile of (a) a transparent and (b) a black tube respectively. The examples represents the<br />

states entering + centered and entering + centered + leaving. The segment boundaries,<br />

which correspond to the locations of the strongest peaks in the first derivative of the<br />

profile, are visualized as well as the global mean and the local median. Segments that<br />

have a median above the global mean are classified as background.<br />

Regional Threshold One drawback of the global threshold approach is that different<br />

background segments are assumed to be almost equal in image brightness, i.e. the tubebackground<br />

contrast is approximately uniform within one image. This assumption, however,<br />

does not hold if there are larger variations in background brightness (for example<br />

due to material properties or dirt on the belt). Such variations can occur between images,


70 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

!<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

smoothed profile<br />

segment boundaries<br />

predicted tube boundaries<br />

local median<br />

global mean<br />

Background Background<br />

Tube 1 Tube 2<br />

0 20 40 60 80 100<br />

x<br />

120 140 160 180<br />

(a) Transparent/ State: entering + centered<br />

smoothed profile<br />

segment boundaries<br />

predicted tube boundaries<br />

local median<br />

global mean<br />

Background Background<br />

0<br />

Tube 1 Tube 2 Tube 3<br />

0 20 40 60 80 100<br />

x<br />

120 140 160 180<br />

(b) Black/ State: entering + centered + leaving<br />

Figure 4.14: Different steps and results of the profile analysis. After smoothing the profile,<br />

strong peaks in the first derivative indicate potential tube boundaries. The segments between<br />

the strongest peaks are classified into foreground and background based on the difference<br />

between the local median of each segment and the global mean. The background is assumed<br />

to be brighter on average. Neighboring segments of the same class are merged. The crosses<br />

mark the correctly predicted boundaries of the centered tube. Note the stronger contrast of<br />

black tubes.


4.4. TUBE LOCALIZATION 71<br />

but also over the whole image width or locally within a single image. The first case is<br />

uncritical as long as there is a sufficient contrast between a tube and the background. The<br />

later case, i.e. local variations in background brightness, can lead to failures of the global<br />

threshold. Figure 4.15(a) shows one characteristic situation which occurs quite often with<br />

transparent tubes. The background intensity on the left is much darker compared to the<br />

right. The global threshold fails, since the much brighter background regions on the right<br />

increase the global mean. Thus, the local median of the most left segment falls below<br />

the threshold and is therefore classified as foreground. Due to this misclassification no<br />

measuring will be performed on this frame, although it would be possible.<br />

A region based threshold can overcome this problem. The idea is to compute the<br />

classification threshold not globally, but on regional image brightness. While the local<br />

median is computed for each segment, a good classification threshold must consider at<br />

least one transition between background and foreground. Following the assumptions made<br />

in Section 4.2, two tubes can not be completely in the image at one time. Furthermore,<br />

the number of connected background regions in the image can not exceed two. If there<br />

are two connected background regions, one has to lie in the left half of the image while<br />

the other falls in the right half. Thus, one can define two regions, left and right of the<br />

image center respectively, and compute the mean for each region as analogue to the global<br />

mean. Inthefollowing,themeanoftheleftandrightsideofthe(smoothed)profileare<br />

denoted as Pleft and Pright respectively.<br />

If there is only one background region (states empty, entering, leaving, entering +<br />

leaving), splitting the image at the center has no negative effect. The left and right mean<br />

is computed either over a tube and background region, or over background only. At the<br />

very special case that the image width is exactly twice a tube’s length and the tube enters<br />

(or leaves) the scene with the right (or left) boundary exactly on the image center, the<br />

regional threshold is computed only over the tube and the classification may be either<br />

foreground or background. However, in both cases this situation can be detected as a<br />

state where a measurement is not possible and is therefore a sufficient solution.<br />

The region based classification of the segments can now be expressed as:<br />

�<br />

TUBE , median(s(i)) ≤ τregion<br />

C3(s) =<br />

BG , otherwise<br />

where τregion is defined as follows:<br />

⎧<br />

⎨ Pleft<br />

,s(i) falls into left region only<br />

τregion = Pright<br />

⎩<br />

max(Pleft, Pright)<br />

,s(i) falls into right region only<br />

,s(i) falls into both regions<br />

(4.12)<br />

(4.13)<br />

In Figure 4.15(b) one can see the difference between the global and the regional classification<br />

threshold. The regional threshold of the left half is much lower compared to<br />

the global threshold. On the other hand, since the second segment belonging to the tube<br />

intersects the center, the maximum of both regional thresholds is taken into account which<br />

lies significantly above the global threshold. Finally, all segments are classified correctly.<br />

With this threshold, the classification is less sensitive to darker background regions.<br />

Thetwomethodshavebeencomparedinthefollowingexperiment:<br />

A sequence of transparent tubes (50mm length, 8mm diameter) has been captured<br />

including 467 frames that have been manually classified as measurable, i.e. a tube is


72 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

250<br />

200<br />

150<br />

100<br />

50<br />

(a)<br />

Smoothed graylevel profile<br />

Regional mean<br />

Global mean<br />

peak candidates<br />

filtered peaks<br />

local median<br />

0<br />

0 20 40 60 80 100 120 140 160 180 200<br />

(b)<br />

Figure 4.15: (a) The background intensity at the left is much darker than at the right.<br />

The global mean as threshold can not compensate for such local variations as can be seen<br />

in (b). In this case, the left background region is wrongly classified as foreground, since the<br />

global threshold is larger than the local median of the corresponding segment. A region based<br />

threshold that considers the left and right image side independently can overcome this problem<br />

(see text).


4.4. TUBE LOCALIZATION 73<br />

Global Regional<br />

Total number: 467 467<br />

Measurable: 353 414<br />

Average PTM: 5.98 7.01<br />

Table 4.1: Comparison of the global and regional threshold used for classification in the<br />

profile analysis. The table shows the number of images that have been correctly detected<br />

as measurable compared to the total number, as well as the average number of per tube<br />

measurements (PTM). Using the regional threshold increases the number of measurements<br />

significantly.<br />

Figure 4.16: Ghost effect: If the parameters of the profile analysis are too sensitive, darker<br />

parts on the conveyor (e.g. due to dirt or background structure) can be wrongly classified as<br />

atube.<br />

completely in the image. The sequence has been analyzed once with the global threshold<br />

and once with the regional threshold. All other parameters have been constant. The<br />

results can be found in Table 4.1. In this context it is important to understand that the<br />

term measurable is related to a single image. It does not mean if the system fails to detect<br />

a tube in one image that the tube can pass the visual field of the camera undetected. This<br />

occurs only if it is not measured in all images that include this tube what is very unlikely.<br />

The experiment shows the average number of measurements per tube can be increased<br />

by approximately one if using the regional instead of the global mean as threshold for the<br />

tube classification. Particularly situations as in Figure 4.15(a) can be prevented.<br />

The reason why none of the two method has detected all measurable frames is due to<br />

other parameters for example a too little contrast between a tube and the background.<br />

The dynamic threshold τpeak as introduced before defines the strongest peaks of the profile<br />

derivative. If it is too large, low contrast tube edges may not be detected. On the other<br />

hand, if it is too low, darker regions in the background may be wrongly classified as<br />

foreground. This leads to ghost effects, i.e. the system detects a tube where actually<br />

no tube is as can be seen in Figure 4.16. Therefore, the weighting factor αpeak of τpeak<br />

(see Equation 4.9) must be adjusted in the teach-in step to the smallest value that does<br />

not produce ghost effects if inspecting an empty, moving conveyor belt. Obviously, the<br />

compromise gets larger with an increasing amount of dirt.<br />

Merging Segments In Figure 4.14(a), one can find two more segments than needed to<br />

represent the actual state entering + leaving. Two strong peaks on the right that are due<br />

to a dark dirt spot on the conveyor belt have not been eliminated by the thresholding.<br />

However, the corresponding segments are correctly classified as background leading to<br />

three consecutive background segments which could be merged to one large segment.<br />

In general, once all segments s(i) are classified the goal is to iteratively merge neighboring<br />

segments of the same class and to eliminate foreground segments that do not qualify


74 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

Input : coordinate list Ω ′ with N = |Ω ′ |<br />

global median of profile<br />

minimum tube segment s i z e MIN SIZE<br />

Step1 :<br />

Step2 :<br />

define segments: S = { s [ i ] = [Ω ′ [i], Ω ′ [i+1] ]}<br />

classify each segment based on Eq. 4.12 :<br />

s[i].label = C3(s[i])<br />

i f s [ i ] . l a b e l == TUBE f o r a l l i r e t u r n ERROR<br />

/ remove foreground segments at the borders /<br />

let i1 be the index of the first<br />

and i2 the index of the last BG segment<br />

set s[ j ]. label = BG for all j , 0≤j


4.5. MEASURING POINT DETECTION 75<br />

for measuring. An overview of the algorithm is shown in Listing 4.1. A size filter operation,<br />

which can be parametrized with respect to the given target length, is used to remove<br />

too small foreground segments (e.g. caused by dirt on the conveyor belt).<br />

The output of the algorithm is either one large background segment (i.e. all foreground<br />

segments have been removed if existed since they did not fulfill the criteria) or three<br />

segments in the form BG-TUBE-BG. In the later case, the peaks belonging to the left<br />

and right boundary of the remaining foreground segment are finally verified with respect<br />

to the sign of the derivative. With the derivative operator used, the position of the left<br />

boundary must result in a negative first-order derivative value (bright-dark edge) and the<br />

right boundary in a positive value (dark-bright edge). If the predicted tube boundaries<br />

are consistent with this last criterion, they are used to define two local ROIs of width<br />

WROI as starting point for a more precise detection of the measuring points. The local<br />

ROI height is defined over the distance between the two guide bars.<br />

ThemergingofthesegmentsisalinearoperationinthecomplexityofO(NΩ). Since it<br />

is only allowed to reclassify a former foreground segment into background in this procedure<br />

and never vice versa, Step2 of the algorithm is repeated only once if at all. Hence, the<br />

algorithm terminates for sure.<br />

If all segment are classified as TUBE in the first step, an error is returned. This error<br />

indicates the presence of state full (See Figure 4.2(i)). The reason can be due to a too<br />

small field of view of the camera or to a missing spacing between consecutive tubes. In<br />

any case it is not possible to perform a measuring. Since this state is critical compared to<br />

other states that can not be used for measuring, it is important to detect this situation.<br />

In practice, if this situation occurs an alert must be produced.<br />

4.5. Measuring Point Detection<br />

The previous sections described a fast method to distinguish whether a frame is useful or<br />

not. If a measuring is possible, two regions around the potential left and right boundary<br />

of a tube to be measured are the output of this first step. In the following, the exact tube<br />

boundaries have to be detected with subpixel accuracy.<br />

4.5.1. Edge Enhancement<br />

As introduced in Section 2.3 there is a large number of approaches for edge detection. Four<br />

common methods including the Sobel operator, Laplace operator, Canny edge detector [13]<br />

and a steerable filter edge detector based on the derivative of a parametrized Gaussian<br />

have been applied to test images. The results can be found in Table 4.2. It includes<br />

experiments with two transparent tubes (left boundary) of the same sequence and one<br />

black tube boundary. All tubes have a inner diameter of 8mm. The difference in size<br />

between the transparent and black tubes is due to a different camera-object distance.<br />

As can be seen the edge of the transparent tubes can differ in brightness, contrast and<br />

background pattern between frames.<br />

The goal was to find an edge detection operation that adequately extracts the tube<br />

boundaries under the presence of background structure and noise, and which is computational<br />

inexpensive in addition.


76 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

Input Input<br />

SOBELX<br />

SOBELY<br />

Gaussian<br />

5x5 (a)<br />

Gaussian<br />

5x5 (v)<br />

Laplace Gaussian<br />

7x7 (a)<br />

Canny<br />

(50/100)<br />

Canny<br />

(90/230)<br />

Canny<br />

(185/210)<br />

Gaussian<br />

7x7 (v)<br />

Gaussian<br />

11x11 (a)<br />

Gaussian<br />

11x11 (v)<br />

Table 4.2: Comparison of different edge detectors. The parameters of the Canny edge<br />

detector indicate the lower and upper threshold. The Gaussian derivative based edge detection<br />

results are all of first-order. An (a) indicates edges of all orientations (in discrete steps of 5 )<br />

are enhanced with a steerable filter approach, while (v) represents only vertical edges.


4.5. MEASURING POINT DETECTION 77<br />

The results of the transparent tubes are crucial for the selection of an appropriate<br />

edge detection approach used in this application, since due to the strong contrast the<br />

detection of the black tube boundaries is uncritical with all tested methods. For both<br />

tube types the edge detection results differ in detected orientation, edge elongation (i.e.<br />

how precise an edge can be localized), or edge representation (signed/unsigned values,<br />

floating point/binary, etc.).<br />

Canny Edge Detector The Canny edge detector results in a skeletonized one pixel wide<br />

response that precisely describes edges of arbitrary orientation. In this application the<br />

main drawback of Canny’s approach is the importance of the threshold choice. As can<br />

be seen in Table 4.2, different parameter sets yield very different results. If the upper<br />

hysteresis threshold used as starting point for edge linking is low (e.g. 100) combined with<br />

a lower second threshold (e.g. 50), too many background edges are detected as well. A<br />

larger upper threshold (e.g. > 200) reduces the number of detected edge pixels, but also<br />

eliminates parts of the tube edge. It is possible that it breaks up into parts. If the distance<br />

between upper and lower threshold is large, it is likely that background and tube edges<br />

are merged. In any case a threshold set working fine with one image can lead to very<br />

poor results in another. The result of the Canny edge detector is a binary image where<br />

non-edge pixels have a value of zero and edge pixels a value of one (or 255 in 8bit gray level<br />

images). Binary contour algorithms can be applied to analyze chains of connected edge<br />

pixels. As can be seen in the test images, depending on how many edge pixels survived<br />

the thresholding, such analysis can be very complex and time-consuming. Gaps within<br />

edges belonging to the tube boundary make this search even more complicated.<br />

Sobel The Sobel operator approximates a Gaussian smoothing combined with differentiation.<br />

It can be applied with respect to x- andy- direction. Accordingly to the filter<br />

direction, vertical or horizontal edges are enhanced. Since the tube boundaries have a vertical<br />

orientation, the SOBELX operator is an adequate choice in this application. Edges<br />

are located at local extrema, i.e. local minima at bright-dark edges and local maxima<br />

for dark-bright edges with respect to the gradient direction. A drawback is that also<br />

the background pattern is dominantly vertical oriented, thus, background edges are also<br />

detected. The intensity of an edge is related to the image contrast. Assuming a certain<br />

contrast between tubes and background, a large amount of background clutter could be<br />

removed by thresholding leaving only tube edges and edges due to high-contrast dirt particles.<br />

However, this would lead to a similar approach like the Canny edge detector with<br />

the drawbacks stated before.<br />

Laplace The implementation used to test the Laplacian calculates the second-order derivative<br />

in x- andy-direction using the Sobel operator and sums the results. The output is<br />

an image of signed floating point values. Edges are located at the zero crossings between<br />

strong peaks. The Laplacian is an anisotropic operator, thus, edges off all orientations are<br />

detected equally. One drawback of this method is the sensitivity to noise. In the resulting<br />

response there are many zero crossings. Compared to first-order derivatives, the edge criterion<br />

is more complex. A pixel is an edge pixel if the closest neighbor in the direction of<br />

the gradient is a local maximum while the opposite neighbor is a local minimum and both


78 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

neighbors must meet a certain threshold. However, the zero crossing can be computed<br />

with subpixel accuracy.<br />

Steerable Filters The idea with filters that are steerable for example in scale and orientation<br />

is to design a filter that performs best for a particular edge detection task (See<br />

Section 2.3.3). In this application the goal is to find a filter that extracts the tube edges<br />

with maximum precision, while background edges and dirt are suppressed. The steerable<br />

filter approach allows for testing a large range of different edge detection kernels.<br />

Experiments with systematically varied parameter sets of first-derivative Gaussian filters<br />

following the approach of Freeman and Adelson [25] are applied to the test images. Some<br />

of the results are visualized in Figure 4.2.<br />

As can be seen, the background clutter can not be eliminated even with larger kernel<br />

sizes while the tube edges get blurred. No parameter setting for a Gaussian derivative<br />

kernel has been found that performs significantly better as a tube edge detector than the<br />

computational less expensive Sobel operator.<br />

All tested methods beside the Canny edge detector can be seen more as edge enhancer<br />

than as real edge detectors. This means, the results do not fulfill the second and third<br />

criterion for good edge detection (See Section 2.3.1). Further processing of the edge<br />

responses such as nonmaximum suppression is necessary. An alternative is a template<br />

based edge localization step which is introduced in the next section.<br />

4.5.2. Template Based Edge Localization<br />

It is important to state that even precisely detected edges (including Canny’s approach)<br />

still have no semantical meaning. In all tested methods there have been false positives,<br />

i.e. edges belonging to the background, dirt, or noise. Hence, model knowledge has to be<br />

applied to the detected edges to ensure whether an edge really corresponds to a tube’s<br />

boundary or not.<br />

In this application, the highly constrained conditions reduce the number of expected<br />

situations to a small, well defined minimum. The edges belonging to the tube boundaries<br />

of interest are always approximately vertical. Due to perspective the tube boundary<br />

appears straight or slightly curved in a convex fashion under back light, depending on the<br />

position of the tube with respect to the optical ray of the camera. The more the tube<br />

boundary is displaced from the camera center the larger is the curvature.<br />

At this stage it is of interest to locate a tube’s boundaries within the two local ROIs<br />

(left and right respectively). Strong changes in image intensity in x direction (vertical<br />

edges) have been enhanced using the SOBELX operator. The goal is not only to find<br />

the strongest peaks in the edge image, but also the strongest connected ridge along such<br />

peaks that most likely corresponds to the tube boundary. This task can be performed by<br />

template matching (See Section 2.4).<br />

If the feature to be detected can be modeled by a template, the response of the crosscorrelation<br />

with this template computes a match probability within a given search region.<br />

The idea is to design a template that models the response of the edge enhancer and<br />

correlate this template with the local ROI. The position where the correlation has its<br />

maximum provides close information on the tube boundary location. Therefore, it is


4.5. MEASURING POINT DETECTION 79<br />

-50<br />

-100<br />

-150<br />

0<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

-50<br />

-100<br />

-150<br />

0<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

5<br />

10<br />

15<br />

x<br />

20<br />

25<br />

(a)<br />

0 5 10 15 20 25 30 35 40<br />

x<br />

(c)<br />

30<br />

35<br />

0<br />

10<br />

20<br />

30<br />

40<br />

50 y<br />

60<br />

70<br />

80<br />

40 90<br />

0<br />

10<br />

20<br />

30<br />

40<br />

50<br />

60<br />

70<br />

80<br />

90<br />

y<br />

-50<br />

-100<br />

-150<br />

0<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0 5 10 15 20 25 30 35 40<br />

x<br />

(b)<br />

0 5 10 15 20 25 30 35 40<br />

0<br />

10<br />

20<br />

30<br />

40<br />

50<br />

60<br />

70<br />

80<br />

90<br />

0<br />

10<br />

20<br />

30<br />

40<br />

50<br />

60<br />

70<br />

80<br />

90<br />

Figure 4.17: Edge detection results of the SOBELX operator applied to different tubes<br />

(right boundary). The tube boundary corresponds to the strongest ridge in vertical direction in<br />

each plot. It can be seen that the edge response differs in curvature, intensity and background<br />

clutter. (a) Almost straight edge (close to the optical center of the camera) of a transparent<br />

tube with a quite uniform region left of the ridge belonging to the tube, and a more varying<br />

area on the right due to the background structure. (b) The tube boundary looks convex if<br />

further away from the camera center due to perspective. The edge response is much stronger<br />

at the ends of the ridge than at the center. This is due to the amount of light which is<br />

transmitted by the tube (see text). (c) Edge of a transparent tube with a printing close to<br />

the boundary visible as smaller ancillary ridge on the left. (d) Boundary of a black tube. The<br />

edge response is about three times stronger compared to transparent tubes due to the strong<br />

image contrast.<br />

1000<br />

800<br />

600<br />

400<br />

200<br />

0<br />

-200<br />

x<br />

(d)<br />

y<br />

y


80 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

important to have a closer look on the response of the edge detection results with respect<br />

to the input data. Consistent characteristics can used for the design of the right template.<br />

Figure 4.17 shows examples of the SOBELX operator applied to test images. In this<br />

case, the response corresponds to the right ROI of three transparent tubes (Figure 4.17(a)-<br />

(c)) and one black tube (Figure 4.17(d)) at different positions in the image with respect<br />

to the x-axis. The tube boundary can be detected intuitively by humans even under the<br />

presence of background clutter. However, one can find the edge response differs between<br />

the different plots due to image contrast or perspective.<br />

Figure 4.17(a) shows an almost straight edge (close to the optical center of the camera)<br />

with a quite uniform region left of the ridge belonging to the tube, and a more varying area<br />

on the right due to the background structure. It can be observed that the edge response<br />

is stronger at the ends of the ridge than in the center, which is due to the transmittance<br />

characteristic of transparent tubes (see Section 4.2). More light is transmitted at the<br />

center leading to brighter intensity values and a poorer contrast, while the corners (‘L’corners<br />

between horizontal and vertical boundary of a tube) are darker and yield a better<br />

contrast. This effect can be seen also very clearly in Figure 4.17(b). In addition, the tube<br />

boundary looks convex due to perspective since it is further away from the camera center.<br />

Vertical edges of printings on a tube’s surface are also extracted by the edge detection step<br />

as can be seen in Figure 4.17(c). In this case, the straight line of an upsight-down capital<br />

‘D’ falls into the right local ROI, causing the smaller ancillary ridge on the left of the tube<br />

boundary. Figure 4.17(d) includes the boundary of a black tube. Due to the strong image<br />

contrast the edge response is about three times stronger compared to transparent tubes.<br />

The influence of the background clutter reduces to a minimum and since printings are not<br />

visible on black tubes at back light, this problem vanishes completely. The edge response<br />

does not differ in intensity at the ends like with transparent tubes.<br />

4.5.3. Template Design<br />

The goal is to design a universal, minimum set of templates that covers all potential edge<br />

responses of both transparent and black tube boundaries. The templates must model<br />

different curvatures to be able to handle perspective effects. Assuming a constant horizontal<br />

orientation and a constant size, the curvature is the only varying parameter between<br />

templates. The following two-dimensional function has been developed that can be parametrized<br />

to approximate the expected edge responses:<br />

� �<br />

y<br />

Tψ(x, y) =aexpb<br />

HT<br />

� 2<br />

− (x − (ψy2 )) 2<br />

2σ 2<br />

�<br />

(4.14)<br />

It is based on a Gaussian with standard deviation σ in x-direction extended with respect<br />

to y. The curvature is denoted by ψ. A value of ψ = 0 represents no curvature, while<br />

the curvature increases with increasing values of ψ (ψ ≤ 1). The first summand in the<br />

exponent of the exponential function can be used to emphasize the ends of the template<br />

in y-direction which is motivated in the characteristic response of transparent tubes. The<br />

edge detector results in higher values at the ends than at the center. b controls the amount<br />

of height displacement. If b = 0, the template is equally weighted. HT corresponds to the<br />

template height. a determines the sign of the template values. For bright-dark edges like at<br />

the left boundary the edge response is negative, thus a


4.5. MEASURING POINT DETECTION 81<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

1<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

0<br />

0<br />

2<br />

2<br />

x<br />

x<br />

4<br />

4<br />

6<br />

6<br />

8<br />

(a)<br />

8<br />

(c)<br />

10 80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

y<br />

10<br />

0<br />

10<br />

20<br />

30<br />

40<br />

50<br />

y<br />

60<br />

70<br />

0<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

0<br />

2<br />

x<br />

4<br />

6<br />

8<br />

(b)<br />

10 80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

y<br />

10<br />

0<br />

10<br />

20<br />

30<br />

40<br />

50<br />

y<br />

60<br />

70<br />

80<br />

Figure 4.18: Different templates generated using Equation 4.14. (a) Straight edge: ψ =0,<br />

b = 0. (b) Curved edge: ψ =0.005, b = 0. (c) Curved edge: ψ =0.02, b =0. (d)Curvededge<br />

with emphasized ends: ψ =0.002, b =3. (σ =0.8 anda = 1 has been used for all templates<br />

in this figure). Note the differently scaled axis x and y.<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

1<br />

1.8<br />

1.6<br />

1.4<br />

1.2<br />

2<br />

0<br />

2<br />

4<br />

x<br />

6<br />

(d)<br />

8<br />

10<br />

0


82 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

# templates ψmin ψmax χ a b<br />

Left: 30 0.0 0.02 0.00066 −1 3<br />

Right: 30 −0.02 0.0 0.00066 1 3<br />

Table 4.3: A set of 30 templates with curvatures equally distributed between ψmin and ψmax<br />

at a curvature resolution (step size) χ has been used to determine the occurrence of certain<br />

curvatures empirically.<br />

side a>0 is used to model the positive response of dark-bright edges. Figure 4.18 shows<br />

some examples that visualize Equation 4.14 and the effect of the different parameters.<br />

Template Dimension A constant template width of 11pixels is used, which is large<br />

enough to represent both straight and maximal curved tube boundaries. The template<br />

height is defined over the global ROI height. Assuming the guide bars are always arranged<br />

so that the guide bar distance is only slightly larger than the tube’s perimeter, the global<br />

ROI height is a good reference on the tube size. It is possible to compute a well guess of<br />

the tube height by the following equation:<br />

where HROIG<br />

HT = γHROIG<br />

is the global ROI height and γ a factor between 0 and 1.<br />

(4.15)<br />

Curvature Thequestionis,whatrangeofcurvaturesoccursinpracticeandhowmany<br />

templates are needed to cover that range. Therefore, several test sequences with both<br />

black and transparent tubes of different diameter have been captured. 30 templates of<br />

different curvature have been generated for both tube sides. The parameters can be found<br />

in Table 4.3.<br />

Foreachmeasurableframethecurvatureofthetemplatethatreachesthemaximum<br />

correlation value is taken into account to build a histogram of curvature occurrence both<br />

for the left and ride tube side. The normalized cross-correlation (see Equation 2.31) is<br />

used as measure evaluating the match quality of each template at a certain location. The<br />

results can be found in Figure 4.19.<br />

It shows, the occurring curvatures are limited to a small range denoted as Rψ, left and<br />

Rψ, right with Rψ, left =[0, 0.005] for the left and Rψ, right =[−0.005, 0] for the right<br />

side respectively. In order to reduce the number of templates all curvatures outside this<br />

range can be ignored.<br />

Another important criteria is the step size or curvature resolution χ, i.e. how many<br />

steps between ψmin and ψmax are taken into account. Theoretically one could quantize<br />

the curvature ranges into very small steps. However, since correlation is an expensive<br />

operation one has to make a compromise between accuracy and performance. It was<br />

observed that if more than 15 templates have to be tested at each tube side per frame, the<br />

system starts to drop frames, i.e. this is a quantitative indicator that the overall processing<br />

time exceeds the current frame rate. Therefore the total number of templates is restricted<br />

to 10 in this application. The corresponding step size between two curvatures is 0.0005.


4.5. MEASURING POINT DETECTION 83<br />

occurrence<br />

0.2<br />

0.18<br />

0.16<br />

0.14<br />

0.12<br />

0.1<br />

0.08<br />

0.06<br />

0.04<br />

0.02<br />

0<br />

-0.002 0 0.002 0.004 0.006 0.008 0.01 0.012<br />

curvature<br />

(a)<br />

occurrence<br />

0.3<br />

0.25<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

0<br />

-0.025 -0.02 -0.015 -0.01<br />

curvature<br />

-0.005 0 0.005<br />

Figure 4.19: Histogram of template occurrence for (a) left and (b) right tube side. It can<br />

be seen that only a small range of curvatures can be observed. This reduces the number of<br />

templates that have to be tested each time.<br />

120<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

0<br />

2<br />

x<br />

4<br />

6<br />

8<br />

0<br />

10<br />

20<br />

30<br />

40<br />

50<br />

60<br />

y<br />

70<br />

10 80<br />

Figure 4.20: If the height weighting coefficient gets too large (here: b = 20), the center of<br />

the tube edge does not contribute to the matching score anymore.<br />

Template Weighting The weighting coefficient b in Equation 4.14 is important for transparent<br />

tubes. Due to a poor contrast, the overall edge response of a transparent tube might<br />

be low. If considering only the center region of an edge, the contrast might be even lower<br />

than a background edge at worst case. The cross-correlation only computes the similarity<br />

of a template at a certain location in the image. The maximum response is taken as match,<br />

since it is assumed that there must be a tube edge in the search region, even if the same<br />

or another template matches the real tube edge perfectly, but with a lower score. Finally<br />

this will lead to a wrong measurement.<br />

With model knowledge about the tube characteristics one can assume that the contrast<br />

at the edge ends is significantly stronger. If the template is weighted uniformly at the<br />

center and the ends, the correlation score depends on the whole edge equally. On the<br />

other hands, if the ends of the template are weighted stronger than the center, a template<br />

that perfectly fits the tube edge will yield a larger score, since background edges are usually<br />

uniform. Thus, the template is designed to prefer tubes edges.<br />

The weighting coefficient b hastobelargerthanonetoyieldthedesiredeffect.Onthe<br />

other hand, b must not be too large as well, since then the ends get too much influence. In<br />

the extreme case, the template equals two spots at a certain distance that do not represent<br />

a tube edge anymore (see Figure 4.20).<br />

(b)


84 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0 2 4 6 8 10 12<br />

x<br />

14 16 18 20<br />

(a) (b)<br />

(c)<br />

5<br />

4<br />

3<br />

2<br />

Max<br />

y<br />

1<br />

0<br />

0 2 4 6 8 10 12<br />

x<br />

14 16 18 20<br />

Figure 4.21: Effect of the weighting coefficient b in Equation 4.14. (a) Tube edge detection<br />

results with a uniform weighted template (b = 0). (b) Results of a template with enhanced<br />

ends (b = 3). (c) Corresponding cross-correlation results of (a), and (d) the cross-correlation<br />

results of (b) respectively. The maximum in (c) and (d) corresponds to the pixel position<br />

where the template matches best. In this example, the ridge closer to the observer is due to<br />

a background edge while the ridge further away corresponds to the real tube edge.<br />

Figure4.21showsanexampleofhowtheweightingofthetemplateendsimprovesthe<br />

tube edge detection. In this example the right boundary contrast of a transparent heat<br />

shrink tube is quite low. Using a uniformly weighted template, i.e. b = 0, the maximum<br />

correlation score is reached at a background edge (see Figure 4.21(a) and (c)). In this<br />

case, the tube would be measured larger than it really is. On the other hand, with<br />

an enhancement of the template ends, the tube edge results in a larger score than the<br />

background edge leading to a correct detection as can be seen in Figure 4.21(b) and (d).<br />

The enhancement of the template ends is motivated in transparent tube characteristics.<br />

For black tubes, b = 0 describes the response of the SOBELX operator best. However,<br />

there is no disadvantage if using the same weighting coefficient as for transparent tubes.<br />

Due to the strong contrast of black tubes, the curvature and size of the template are the<br />

dominant factors influencing the matching results.<br />

Template Rotation The templates generated by Equation 4.14 are symmetric along the<br />

y-axis with respect to the template center. Thus, the ends of the template lie always on<br />

one line perpendicular to the x-axis. In the ideal case, the edge response of a heat shrink<br />

tube has the same characteristic. In practice, however, a tube can be slightly angular<br />

within the guide bars, or the tube edge might be cut skew. In both cases the strong edge<br />

responses at the ends do not have to lie on one line perpendicular to the x-axis as in the<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

Max<br />

(d)<br />

5<br />

4<br />

3<br />

2<br />

y<br />

1<br />

0


4.5. MEASURING POINT DETECTION 85<br />

(a)<br />

(b) (c)<br />

Figure 4.22: (a) Edge response of an angular oriented (transparent) tube edge. The characteristic<br />

peaks at the ends of transparent tube edges do not have to lie on one line perpendicular<br />

to the x-axis. The red line visualizes the slight angular orientation of the tube edge. (b) Example<br />

detection result with k = 1 orientations. (c) Corresponding result with k = 3 orientations.<br />

template. Figure 4.22(a) visualizes the edge response of a slightly angular tube edge of a<br />

transparent tube (left side). In such a situation no template will fit the edge perfectly. This<br />

can be critical if the edge contrast is poor. In this case, as mentioned before, the stronger<br />

weighting of the template ends helps to support a match at the real tube boundary instead<br />

of at a background edge. With an angular tube edge, a symmetric template can not be<br />

shifted over the image in a way it matches both edge ends. Thus, the cross-correlation<br />

score is significantly smaller and the probability increases that a background edge yields<br />

a larger score.<br />

A little rotation of the template can overcome this problem. Therefore, the bank of<br />

templates is extended by k − 1 rotated versions of each template. It turned out that it is<br />

sufficient to rotate each template by ±2 degrees to cover the range of expected deviations<br />

from the ideal symmetric model. Thus, k = 3 has been used throughout the experiments.<br />

It is assumed that larger angular deviations can not occur due to the guide bars.<br />

Model Knowledge Optimization The number of templates to be checked each time on<br />

the left and right side increases with the number of rotations. Instead of 2 × 10 templates


86 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

curvature<br />

4.5<br />

4<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

3<br />

x 10<br />

5<br />

0<br />

0 50 100 150 200<br />

x<br />

250 300 350 400<br />

(a) Left tube side<br />

curvature<br />

3<br />

x 10<br />

0<br />

0.5<br />

1<br />

1.5<br />

2<br />

2.5<br />

3<br />

3.5<br />

4<br />

4.5<br />

5<br />

350 400 450 500 550<br />

x<br />

600 650 700 750<br />

(b) Right tube side<br />

Figure 4.23: Curvature of best matching template depending on the x-position of the match.<br />

one has to consider 2 × 10 × 3 templates if k = 3. Since correlation is an expensive<br />

operation, the processing time increases significantly even if the local ROIs are relative<br />

small. It turned out that not more than 15 templates can be checked at each side without<br />

skipping frames at a frame rate of 50fps at an AMD Athlon 64 FX-55 processor with 2GB<br />

RAM.<br />

One thinkable optimization is to reduce the curvature resolution, i.e. quantize the same<br />

range of curvatures to ≤ 5 templates at each side. Obviously this reduces the accuracy of<br />

the edge localization and is no satisfying solution in this application.<br />

Instead one can apply model knowledge to exclude several curvatures depending on<br />

the horizontal image position. It can be assumed that the curvature is maximal at the<br />

image boundaries and decreases toward the image center. Real sequences support this<br />

assumption. Figure 4.23 shows the occurrence of different curvatures with respect to x.<br />

The data was acquired over several sequences including transparent and black tubes. It<br />

turns out that the curvature decreases linearly within a certain band. The upper and lower<br />

boundary of this band determine which curvatures can be excluded at a given position.<br />

The range distance of curvatures dψ at a position x is defined as:<br />

dψ(x) =ψmax(x) − ψmin(x) (4.16)<br />

where ψmax(x) andψmin(x) are the maximum and minimum curvature occurring at<br />

this position. dψ is the average range distance over all x. This range must be checked<br />

each time and is covered by n templates. In practice n = 5 is used, since as mentioned<br />

before the maximum number of templates that can be processed with the given hardware<br />

in real-time is 15 (in addition to all further processing that is needed), and 5 curvatures ×<br />

3 rotations = 15 templates to be checked each frame at one tube side. To yield the desired<br />

resolution over the whole range of curvatures, the total number of curvatures Nψ,total is<br />

computed as follows:<br />

Nψ,total = n(ψmax − ψmin)<br />

dψ<br />

(4.17)


4.5. MEASURING POINT DETECTION 87<br />

where ψmax and ψmin indicate the overall maximum and minimum curvature a template<br />

can have. Hence, one has to compute Nψ,total × k templates for each side. This can be<br />

done in a preprocessing step to reduce the computational load. During inspection one has<br />

to determine which templates have to be checked at a given position defined by the center<br />

of the local ROI around a predicted tube edge. For an efficient implementation a look up<br />

table (LUT) is used for this task.<br />

4.5.4. Subpixel Accuracy<br />

The maximum accuracy of the template based edge localization so far is limited by the<br />

discrete pixel grid. The templates are shifted pixelwise within the local ROIs to find the<br />

position that reaches the maximum correlation score. Following the assumptions of tubes<br />

under perspective (see Section 4.2.3) the measuring is performed between the most outer<br />

points of the convex tube edges.<br />

The way the templates are defined the template center corresponds always to the most<br />

outer point of the generated ridge. This is consistent to template rotation, since the<br />

rotation is performed around the template center. In the special case that the template is<br />

not curved, the template center is still the valid measuring point. With the knowledge of<br />

this point within the template and the position where this template matches best in the<br />

underlying image, the position of the measuring point in the image can be easily computed.<br />

However, pixel grid resolution is not accurate enough in this application. For example<br />

one pixels represents about 0.12mm in the measuring plane ΠM in a typical setup for<br />

50mm tubes. The allowed tolerance for 50mm tubes is ±0.7mm. As a rule of thumb for<br />

reliable results, the measuring system should be as accurate as 1/10thofthetolerance,<br />

i.e. 0.07mm in this example. To reach that accuracy one has to apply subpixel techniques<br />

to overcome the pixel limits.<br />

Figure 4.24(a) visualizes the results of the cross-correlation of an image ROI around the<br />

right boundary of a transparent tube with the template that yields maximum score. The<br />

maximum is located at position Mmax =(19, 5). These coordinates refer directly to the<br />

edge position in the image, since the template function is known and therefore the exact<br />

location of the template ridge.<br />

The real maximum that describes the tube edge location most accurate may lie in between<br />

of two grid positions. With respect to the measuring task, the edge has to be<br />

detected as accurate as possible. Interpolation methods have been introduced in Section<br />

2.3.4 to overcome the pixel grid limits in edge detection. The same can be applied at<br />

this stage to the template matching results.<br />

Cubic spline interpolation is used to compute the subpixel maximum within a certain<br />

neighborhood around the discrete maximum. Cubic splines approximate a function based<br />

on a set of sample points using piecewise third-order polynomials. They have the advantage<br />

of being smooth in the first-derivative and continuous in the second derivative, both within<br />

an interval and its boundaries [53].<br />

The interpolation is performed only with respect to the x direction, since this is the<br />

measuring direction. A subpixel location with respect to y has only a marginal effect on<br />

themeasurements.Ideally,themeasuringpointsontheleftandrightsidehavethesame<br />

y value. Assuming the real maximum location is displaced by maximal 0.5 pixels at each


88 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

-0.2<br />

-0.4<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

-0.2<br />

-0.4<br />

-0.6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0 30<br />

-0.6<br />

0 5 10 15 20 25 30<br />

(b)<br />

25<br />

samples<br />

cubic spline interpolation<br />

maximum<br />

20<br />

(a)<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

-0.2<br />

15<br />

10<br />

5<br />

-0.4<br />

10 10.5 11 11.5 12 12.5 13 13.5 14 14.5 15<br />

(c)<br />

0<br />

samples<br />

cubic spline interpolation<br />

maximum<br />

Figure 4.24: (a) Cross-correlation results of an image patch around the right boundary of a<br />

transparent tube and the best scoring template. The maximum is located at position (19, 5).<br />

(b) Cubic spline interpolation in a local neighborhood around the maximum. In this case, the<br />

interpolated maximum is equal to the discrete position. (c) Matching results of a different<br />

image. Here, the interpolated subpixel maximum differs from the discrete maximum and can<br />

be found at x =12.2.


4.6. MEASURING 89<br />

side of the tube, the worst-case displacement is 0.5 at one side and −0.5 at the other side<br />

leading to a total displacement of 1. A straight line connecting the two measuring points<br />

in an Euclidean plane is slightly longer than the distance in x. Following Pythagoras’<br />

theorem the maximum expectable error due to a vertical inaccuracy is:<br />

errory = � l 2 +1− l (4.18)<br />

where l is the pixel length between the left and right measuring point. With respect to<br />

the definition of the camera’s field of view and the image resolution, the length of a tube<br />

is about 415 pixels in an image. In this case, the worst-case error is about 0.0012 pixel.<br />

Assuming one pixel represents 0.12mm (a typical value for 50mm tubes) this corresponds<br />

to an acceptable error of 0.14µm which is far beyond the imaging capabilities of the camera<br />

used (each sensor element has a size of about 8.3 × 8.3µm).<br />

Other than in the vertical direction, a subpixel shift of the best matching template<br />

position in horizontal direction has a significant influence on the length measurement<br />

results. Again, assuming a maximum error of 0.5 pixels if discrete pixel grid resolution is<br />

used, the total error at both sides sums up to 1 in worst-case. If one pixel corresponds to<br />

0.12mm as in the example above, this means the measuring system has an inaccuracy of<br />

the same length purely depending on the edge localization. Obviously, this error depends<br />

on the resolution of the camera and can become even worse if one pixels represents a larger<br />

distance.<br />

The interpolation considers five discrete points: The maximum matching position Mmax<br />

and the two nearest neighbors left and right to Mmax in x-direction respectively. In<br />

Figure 4.24(b), the interpolation results of the local neighborhood around the discrete<br />

maximum of Figure 4.24(a) are drawn into the plot of the match profile at y =5. It<br />

shows the interpolated values describe the sampled values quite well. In this example, the<br />

interpolated subpixel maximum equals the discrete maximum. This does not always have<br />

to be the case as can be seen in Figure 4.24(c). Here, the discrete maximum is located at<br />

x = 12, whereas the subpixel maximum lies at x =12.2. In the first case, the neighbor<br />

pixels of the maximum yield almost equal results at both sides. On the other hand in the<br />

second example, the right neighbor of the maximum is significantly larger than the left<br />

one. This explains the shift of the subpixel maximum toward the right. The precision of<br />

the subpixel match localization is 1/10 pixel. Mathematically, much higher precision is<br />

possible,butthesignificanceofsuchresultsisquestionablewithrespecttotheimaging<br />

system and noise, and increases the computational costs unnecessary.<br />

4.6. Measuring<br />

The result of the template matching are two subpixel positions indicating the left and right<br />

measuring point of a tube. This section introduces how a pixel distance is transformed<br />

into a real world length and how the measurements of one tube are combined. Therefore,<br />

a tracking mechanism is required that assures the correct assignment of a measurement to<br />

a particular tube. This means, one has to detect when a tube enters or leaves the visual<br />

field of the camera.


90 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

length [pixel]<br />

418<br />

417.5<br />

417<br />

416.5<br />

416<br />

415.5<br />

Measurements<br />

Polynomial Fit<br />

415<br />

0 50 100 150 200<br />

x<br />

250 300 350 400<br />

(a)<br />

Correction [pixel]<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

Perspective Correction Function<br />

0<br />

0 50 100 150 200<br />

x<br />

250 300 350 400<br />

(b)<br />

Length [pixel]<br />

418<br />

417.5<br />

417<br />

416.5<br />

416<br />

415.5<br />

Corrected Measurements<br />

Mean<br />

415<br />

0 50 100 150 200<br />

x<br />

250 300 350 400<br />

Figure 4.25: Perspective correction. (a) The measured length varies depending on the<br />

image position in terms of the left measuring point. Due to perspective the length of one tube<br />

appears larger at the image center than at the image boundaries. The effect of perspective<br />

can be approximated by a 2nd order polynomial. (b) The correction function computed from<br />

the polynomial coefficients. (c) The result of the perspective correction.<br />

4.6.1. Distance Measure<br />

The distance between the two measuring points pL and pR (see Section 4.2) is computed<br />

over the Euclidean distance. Thus, the pixel length l ofatubeisdefinedasfollows:<br />

l = � (pR − pL) 2 (4.19)<br />

where l is expressed in terms of pixels. In the following, l(x) denotes the pixel length of<br />

a tube at position x where x = xpL , i.e. the position of a measurement is defined by the<br />

x-coordinate of the left measuring point.<br />

4.6.2. Perspective Correction<br />

Figure 4.25(a) shows the measured pixel length l(x) of a metal reference tube (gage) at<br />

different image positions. The sequence was acquired at the slowest conveyor velocity.<br />

In the ideal case l should be equal independent of the measuring position. However, the<br />

measured length is smaller at the boundaries and maximal at the image center due to<br />

perspective. This property is consistent between tubes. To approximate the ideal case, a<br />

perspective correction can be applied to the real measurements. Mathematically this can<br />

be expressed as:<br />

lcor(x) =l(x)+fcor(x) (4.20)<br />

where lcor is the perspective corrected pixel length, and fcor a correction function. The<br />

perspective variation in the measurements can be approximated by a 2nd order polynomial<br />

of the form:<br />

f(x) =c1x 2 + c2x + c3<br />

(c)<br />

(4.21)<br />

where the coefficients of the polynomial ci have to be determined in the teach-in step<br />

by fitting the function f(x) to measured length values l(x) in least-squares sense. Then,<br />

the correction function fcor canbecomputedas:


4.6. MEASURING 91<br />

fcor(x) =−(c1x 2 + c2x)+c1s 2 + c2s (4.22)<br />

where s is the x-coordinate of the peak of f(x) withs = −c2/(2c1), i.e. the point where<br />

the first-derivative of f(x) is zero. Thus, fcor is the 180 ◦ rotated version of f(x) whichis<br />

shifted so that fcor(s) = 0 as can be seen in Figure 4.25(b).<br />

This function applied to the measurements has the effect of all values being adjusted<br />

to approximately one length l(s). The corrected length values lcor(x) areshowninFigure<br />

4.25(c). As one can see, the mean value over all measurements describes the data<br />

much better after perspective correction.<br />

To reduce the computational load the correction function is computed only once for<br />

each position at discrete steps and stored in a look up table for fast access.<br />

4.6.3. Tube Tracking<br />

Assuming a sufficient frame rate, one tube is measured several times at different positions<br />

while moving through the visual field of the camera. One constraint in Section 4.2.2<br />

regarding the image content states that only one tube is allowed to be measurable at one<br />

time. The question is whether the current measurement belongs to an already inspected<br />

tube or if there is a new tube in the visual field of the camera. Since there is no external<br />

trigger, this task has to be solved by the software.<br />

Consecutive tubes appear quite equal in shape, size, or texture (especially black tubes).<br />

Itisdifficultuptoimpossibletofindreliablefeaturesinformofanuniquefingerprint<br />

that can be used to distinguish between tubes. In addition the extraction and comparison<br />

of such fingerprints would be computational expensive. Standard tracking approaches<br />

such as Kalman filtering [24] or condensation [8] are also not suited in this particular<br />

application, since such approaches are quite complex and are worthwhile only if an object<br />

is expected to be in the scene over a certain time period. At faster velocities, however, a<br />

tube is in the image for about 4-7 frames only.<br />

Since processing time is highly limited, it is a better choice to develop fast heuristics<br />

based on model-knowledge that replace the problem of tube tracking by detecting when<br />

a tube has left the visual field. Therefore, the following very fast heuristics have been<br />

defined:<br />

1. Backward motion<br />

2. Timeout<br />

Backward motion Since the conveyor moves always in one direction (e.g. from left to<br />

right in the image), it is impossible that a tube moves backward. Thus, if the horizontal<br />

image position of the tube at time t is smaller than at time t − 1(i.e. thetubewouldhave<br />

moved further to the left), this can be used as indicator that the current measurement<br />

belongs to the next tube. The position of a tube can be defined as the x-coordinate of<br />

the left measuring point. Hence with the image content assumption the tube measured at<br />

time t − 1 has left the visual field if xpL (t)


92 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

Timeout The backward motion heuristic assumes a tube has passed the visual field of<br />

the camera when the next tube is measured for the first time. This requires a successor<br />

for each tube within a certain time period. With respect to the blow out mechanism<br />

it is important that the good/ bad decision is made quickly, since the controller (see<br />

Section 3.4) must receive the result before the tube has passed the light barrier. Thus, a<br />

timeout mechanism is integrated. If no new tube arrives for more than ∆t frames, it is<br />

assumed that the previously measured tube has passed the measuring area and the total<br />

length can be computed. In practice, ∆t should be oriented on the average number of per<br />

tube measurements and the distance between measuring area and light barrier.<br />

4.6.4. Total Length Calculation<br />

Oncethereisevidencethatatubehaspassedthevisualfieldofthecamera,thesingle<br />

measurements have to be combined to a total length. Let mi denote the number of<br />

measurements assigned to tube i, andlj(i) thepixellengthofthejth measurement (0 <<br />

j ≤ mi) ofthattube.Themeanlengthl(i) oftubei canbecomputedas:<br />

mi �<br />

l(i) = lj(i) (4.23)<br />

j=1<br />

The mean has the significant drawback, since it is quite sensitive to outliers. For example<br />

assume at one of five measurements a background edge is wrongly classified as tube edge<br />

and the resulting length is therefore larger than the actual length. This outlier would also<br />

enlarge the resulting mean length, even if the remaining measurements have approximately<br />

the same (correct) value. To reduce the influence of outliers, the k strongest outliers are<br />

excluded from the averaging. Therefore, the measurements are sorted in ascending order<br />

based on the squared distance dj(i) tothemeanl(i) with<br />

dj(i) =(lj(i) − l(i)) 2<br />

(4.24)<br />

In the following, only the first mi − k measurements in the sorted list are averaged to<br />

the total length ltotal(i) oftubei as:<br />

ltotal(i) =<br />

mi−k �<br />

j=1<br />

l ′ j(i) (4.25)<br />

where l ′ indicates the measurements are sorted based on Equation 4.24, i.e. dj(i) <<br />

dj+1(i) for 0


4.7. TEACH-IN 93<br />

inthemeasuringplaneΠM that can be represented by one pixel in the image plane. The<br />

total length in mm Ltotal of tube i canbecomputedasfollows:<br />

Ltotal(i) =ltotal(i)fpix2mm<br />

(4.27)<br />

The length Ltotal is used for the good/bad classification whether a tube meets the allowed<br />

tolerances. This can be formalized to:<br />

�<br />

GOOD if |Ltotal(i) − Ltarget|


94 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />

and background afterward. It is computed dynamically based on the regional mean of<br />

the profile and a constant factor αpeak (see Equation 4.9). Although this parameter is<br />

assumed to be constant it has to be trained once with respect to the conveyor belt used.<br />

The teach-in of this parameter is very simple and intuitive. The visual system is set to<br />

inspection mode, i.e. it is started as for standard measuring. The conveyor is empty, but<br />

moving. The operator can adjust αpeak online starting at a quite low value. This value<br />

is slightly increased as long as the system detects tubes (ghosts) where actually no tubes<br />

are. Until now this procedure has to be performed manually, but one could think of an<br />

automated version to reduce the influence of a human operator which is always a source<br />

of errors.<br />

To ensure the threshold has not become too large, several tubes are placed on the<br />

conveyor. If the system is able to successfully detect all tubes (detection does not mean<br />

the length has to be computed correctly in this context), the profile threshold factor is<br />

assumed to be trained sufficiently. If the conveyor belt is not uniformly translucent, i.e.<br />

the overall image brightness changes significantly over time, one has to assure that the<br />

system is able to detect a tube both at the brightest and at the darkest region of the belt.<br />

4.7.3. Perspective Correction Parameters<br />

As introduced in Section 4.6.2 perspective effects in the measuring data can be reduced<br />

using a perspective correction function fcor(x). This function has two parameters c1 and<br />

c2 that have to be learned in the teach-in step from real data.<br />

One intuitive method to do this is to measure a tube at a very slow conveyor velocity.<br />

The result is a set of pixel length measurements (see Figure 4.25(a)) at almost every<br />

position in the image. Then, the parameters of a second order polynomial f(x) =c1x 2 +<br />

c2x + c3 can be computed using nonlinear least-squares (NLLS) methods. In this case, a<br />

standard Levenberg-Marquardt algorithm [53] is used.<br />

The resulting parameters c1 and c2 can be directly inserted into Equation 4.22 to compute<br />

fcor(x).<br />

For robust results this procedure can be repeated several times and the final parameter<br />

set is averaged. Alternatively one could first acquire measurements of several tubes and<br />

fit the correction function to the total data.<br />

4.7.4. Calibration Factor<br />

The most important parameter to be trained in the teach-in step is the calibration factor<br />

that relates a length in the image to a real world length in the measuring plane ΠM. This<br />

factor has been introduced as fpix2mm. The idea is to learn the calibration factor based<br />

on correspondences between measurements and ground truth data.<br />

In an interactive process the operator places a tube of known length onto the moving<br />

conveyor. The velocity of the conveyor is set to production velocity, i.e. the velocity where<br />

the tubes will be measured later. When the tube reaches the visual field of the camera<br />

it is measured with the described approach, but at pixel level only. Once the tube has<br />

left the measuring area, the total pixel length is computed and the user is asked to enter<br />

the real world length of this tube into a dialog box. Again the input device is a standard<br />

keyboard in the prototype version of the system.


4.7. TEACH-IN 95<br />

The pair of a pixel length l(i) and a real world reference L(i) can be used to compute<br />

the ideal factor fpix2mm(i) thatconvertspixelsintomm for a measurement i as follows:<br />

fpix2mm(i) = L(i)<br />

(4.29)<br />

l(i)<br />

This procedure has to be repeated several times for different reference tubes. Finally,<br />

the estimated calibration factor is computed analog to Equation 4.25 using a k-outlier<br />

filter before averaging:<br />

fpix2mm =<br />

N−k �<br />

j=0<br />

f ′ pix2mm(j) (4.30)<br />

where k is the number of outliers, N the number of iterations, and f ′ pix2mm indicates the<br />

single calibration factors sorted by the squared distance to the mean in ascending order.<br />

The median could be also used instead of averaging.<br />

The root-mean-square error at iteration i betweentheknownrealworldlengthsand<br />

thelengthscomputedbasedontheestimatedcalibrationfactorcanbeusedasmeasure<br />

of quality.<br />

�<br />

�<br />

�<br />

Err(i) = � i �<br />

(L(j) − l(j)fpix2mm) 2 (4.31)<br />

j=1<br />

If the error is low, this can be used as indicator that the learned calibration factor is<br />

a good approximation of the ideal magnification factor that relates a pixel length in the<br />

image into a real world length in the measuring plane ΠM without any knowledge on the<br />

distance between ΠM and the camera.<br />

In practice, the learning of the calibration factor is an interactive process. One can<br />

define a minimum and maximum number of iterations Nmin and Nmax respectively. Once<br />

Nmin correspondences have been acquired, fpix2mm and Err(i) are computed for the first<br />

time. The operator continues the procedure as long as the calibration at iteration i +1<br />

does change more than a little epsilon compared to iteration i. This means the learning<br />

can be stopped if |Err(i +1)− Err(i)|


96 CHAPTER 4. LENGTH MEASUREMENT APPROACH


5. Results and Evaluation<br />

5.1. Experimental Design<br />

There are several parameters influencing the measuring results both in the hardware setup<br />

and in the vision algorithms. To yield meaningful results, it is important to vary not more<br />

than one parameter within the same experiment. In the following the parameters that are<br />

tested as well as the evaluation criteria and the strategies used are proposed.<br />

5.1.1. Parameters<br />

The different parameters of the system can be grouped into four main categories including<br />

tube, conveyor, camera and software respectively. Table 5.1 summarizes the most<br />

important representatives of each category.<br />

Obviously, there are much more parameters which have been described in the previous<br />

chapter that theoretically fall in the last category. However, most of these parameters<br />

do not have to be changed (e.g. the number of profile scanlines or the local ROI width).<br />

The corresponding value assignments have been determined empirically at representative<br />

sequences and are summarized in Table 5.2.<br />

αpeak =4.0 has been determined in a teach-in step as proposed in Section 4.7.2 and<br />

yields best results for transparent tubes with the conveyor belt and the illumination used.<br />

This assignment does also cover black tubes, although the threshold could be much larger<br />

in that case. As long as the conveyor belt is not changed and the amount of dirt on<br />

the conveyor does not change significantly, the detection sensitivity does not have to be<br />

re-initialized each time.<br />

A timeout period of ∆t = 5 frames for the tube tracking (see Section 4.6.3) has been<br />

used throughout the experiments, which is a good compromise between the number of<br />

expected per tube measurements and the distance to the light barrier.<br />

Approximately 1/4 of all measurements (rounded to the next integer value) are not<br />

considered for the total length computation with αoutlier =0.25 to eliminate outliers in<br />

the single measurements as introduced in Section 4.6.4. The same value is used for the<br />

outlier filter in the teach-in step (see Section 4.7.4)<br />

The teach-in of the calibration factor fpix2mm (see Section 4.7.4) terminates if the root<br />

mean square error does not change for more than ɛ =0.0001 between two iterations.<br />

Since it is still very complex to test all permutations and assignments of the remaining<br />

parameters, one has to make compromises in the experimental design. Therefore, some<br />

of the parameters listed above have been adjusted before the experiments to meet the<br />

assumptions made in Section 4.2. This includes the guide bar distance as well as the<br />

illumination (fiber optical back light setup through the conveyor belt) and all camera<br />

parameters, i.e. lens, working distance, exposure time and F-number. For all experiments<br />

with 50mm tubes a 16mm focal length lens at a working distance of approximately 250mm<br />

is used. The shutter time has been adjusted to 1.024ms which is a good compromise<br />

97


98 CHAPTER 5. RESULTS AND EVALUATION<br />

Category Parameter<br />

Tube Color<br />

Length<br />

Diameter<br />

Conveyor Velocity<br />

Tube spacing<br />

Guide bar distance<br />

Camera Lens<br />

Working distance<br />

Exposure time<br />

F-number<br />

Software Profile peak threshold τpeak (sensitivity)<br />

Number of templates (scale, orientation, curvature)<br />

Perspective correction<br />

Calibration factor<br />

Table 5.1: Overview on different test parameters<br />

Parameter Category Description Value Section<br />

Nscan Profile Analysis Number<br />

scanlines<br />

of 11 4.4.1<br />

Ksmooth Profile Analysis Smoothing<br />

kernel size<br />

19 4.4.2<br />

αpeak Profile Analysis Peak threshold<br />

factor<br />

4.0 4.4.2<br />

WROI Edge detection Local<br />

width<br />

ROI 15 4.4.3<br />

γ Template Generation Template 0.95 4.5.3<br />

Rψ, right Template Generation<br />

height ratio<br />

Curvature<br />

range right<br />

[-0.005, 0] 4.5.3<br />

Rψ, left Template Generation Curvature<br />

range left<br />

[0, 0.005] 4.5.3<br />

χ Template Generation Curvature resolution<br />

0.0005 4.5.3<br />

b Template Generation Height weighting<br />

coefficient<br />

3 4.5.3<br />

k Template Generation Number of rotations<br />

3 4.5.3<br />

∆t Tube Tracking Time out period<br />

5 4.6.3<br />

αoutlier Total Length Outlier factor 0.25 4.6.4<br />

ɛ Teach-In Allowed cali- 0.0001 4.7.4<br />

bration error<br />

Table 5.2: Constant software parameter settings throughout the experiments.


5.1. EXPERIMENTAL DESIGN 99<br />

between light efficiency and motion blur effects. This shutter time requires a small Fnumber<br />

of 1.4 to yield sufficient bright images.<br />

In all experiments it is assumed that the system is calibrated correctly, the radial distortioncoefficientsareknownandateach-instephasbeenperformedtolearnfpix2mm.<br />

In<br />

addition, the perspective correction function has been determined before each experiment<br />

to compensate for perspective distortions.<br />

5.1.2. Evaluation Criteria<br />

There are several criteria that can be used to compare and evaluate the results of different<br />

experiments. These can be classified into quantitative and qualitative criteria.<br />

Quantitative Criteria<br />

Total Detection Ratio The system must exactly detect the number of tubes that pass<br />

the visual field of the camera. Formally, this can be expressed in the following score Ωtotal:<br />

Ωtotal = Ndetected<br />

(5.1)<br />

Ntotal<br />

where Ndetected indicates the number of detected tubes and Ntotal the total number<br />

of tubes respectively. Ωtotal = 1 is a necessary but not sufficient criterion for a correct<br />

working inspection system.<br />

Per Tube Measurements The average number of single measurements for each tube<br />

depends mainly on the velocity of the conveyor and the camera frame rate. If N tubes<br />

have been measured, the mean number of per tube measurements can be computed as:<br />

ΩPTM = 1<br />

N<br />

where mi isthenumberofsinglemeasurementsoftheith tube.<br />

N�<br />

i=1<br />

mi<br />

(5.2)<br />

False Positives/ False Negatives Each tube T can be classified into one of the three<br />

groups G0 (good ),G− (too short), and G+ (too long) ifmeasuredmanually. G0 is defined<br />

by the target length and the allowed tolerance for this length. It contains all tubes that<br />

meet the tolerance in the real world. G− and G+ include all tubes of a real world length<br />

that lie below the lower or above the upper tolerance threshold respectively.<br />

In the same way, each tube can be categorized into one of the three groups G ′ 0 , G′ −,or<br />

G ′ + based on the measured length by the visual inspection system. In the ideal case, this<br />

three groups are equal to the corresponding ground truth classifications, i.e. G ′ 0 = G0,<br />

G ′ − = G−, andG ′ + = G+ 1 .<br />

In practice, however, the measurements are biased by many factors like perspective<br />

errors, curved tubes, skew tube edges, noise, motion blur, or failures in measuring point<br />

detection. In addition, as will be introduced in Section 5.1.3, the manually acquired ground<br />

1 Theoretically, a fourth group U for unsure can be defined including all tubes that could not be detected<br />

at all. These tubes have to be handled by different mechanisms as will be discussed in later sections


100 CHAPTER 5. RESULTS AND EVALUATION<br />

truth data has also a certain variance. Thus, the distributions measured by humans and<br />

a machine vision system may differ. This gets critical if two distributions intersect.<br />

Tubes that are actually too short or too long, but are measured to be within the tolerance<br />

are denoted as false positives (FP). On the other hand, tubes of an allowed length can be<br />

wrongly classified as outlier and are denoted as false negatives (FN). More mathematically,<br />

false positives and false negatives can be defined as follows:<br />

FP = {T |T ∈ G ′ 0 ∧ T /∈ G0} (5.3)<br />

FN = {T |T /∈ G ′ 0 ∧ T ∈ G0} (5.4)<br />

In terms of system evaluation, the following measures can be used:<br />

ΩFP = NFP<br />

Ntotal<br />

ΩFN = NFN<br />

Ntotal<br />

(5.5)<br />

(5.6)<br />

where NFP and NFN indicate the number of false positives and false negatives respectively.<br />

Both the false positive ratio ΩFP and the false negative ratio ΩFN should be zero<br />

in the optimal case. As already discussed in the introduction, ΩFP is more critical than<br />

ΩFN, since it is less bad to sort out a good tube than delivering a failure to the customer.<br />

Performance The performance of the system can be evaluated with respect to the average<br />

processing time that is needed to analyze a frame:<br />

ΩTIME = 1<br />

M<br />

M�<br />

i=1<br />

ti<br />

(5.7)<br />

where M is the number of frames considered and ti represents the processing time<br />

of frame i. ΩTIME is expressed in terms of ms/frame. This measure can be used to<br />

determine the maximum possible capture rate. Skipped frames indicate that the camera<br />

captures more frames than the system is able to process.<br />

Qualitative Criteria<br />

Standard Deviation Per Tube The multi-image measuring approach is based on the<br />

idea, that more robust measuring results can be reached if each tube is measured several<br />

times. In the ideal case, all measurements should yield the equal length value. In practice,<br />

however, the single measurements can differ. The standard deviation σtube(i) canbeused<br />

as an indicator of how much these measurements vary. It is computed as:<br />

�<br />

�<br />

�<br />

σtube(i) = � 1<br />

mi � � �2 lj(i) − l(i)<br />

mi − 1<br />

j=1<br />

(5.8)


5.1. EXPERIMENTAL DESIGN 101<br />

where lj(i) indicates the length of the jth single measurement of tube i, l(i) themean<br />

over all single measurements of this tube, and mi is the total number of single measurements<br />

of tube i. σtube is expressed in terms of pixels.<br />

A large per tube standard deviation represents a uncertainness in the results. In this<br />

case, the mean describes the data only roughly. If the uncertainness is too large, it may<br />

be better to blow out the particular tube, since the probability of a false positive decision<br />

increases proportional with the standard deviation.<br />

Sequence Standard Deviation The standard deviation of a sequence σseq is computed<br />

analogue to σtube, but not with respect to the single measurements of one tube, but to the<br />

computed total length ltotal of N tubes:<br />

�<br />

�<br />

�<br />

σseq = � 1<br />

N − 1<br />

N�<br />

i=1<br />

� �2 ltotal(i) − ltotal<br />

(5.9)<br />

where ltotal is the mean over all total measurements. Finally, all measurements can be<br />

represented by a Gaussian distribution function G(x) as:<br />

G(x) =<br />

�<br />

1 (x − µseq)<br />

√ exp<br />

2π 2 �<br />

(5.10)<br />

σseq<br />

2σ2 seq<br />

where µseq = ltotal. The production is most accurate if the distance between the given<br />

target length and the mean of this distribution is small.<br />

Ground Truth Distance The difference between the vision-based length measurement<br />

results and the manually acquired ground truth data can be seen as relative error assuming<br />

the ground truth data is correct. Interesting are the minimum and maximum ground truth<br />

distance (GTD) of a sequence of tubes defined as:<br />

GT Dmin = min {(ltotal(i) − lgt(i)) | 1 ≤ i ≤ N} (5.11)<br />

GT Dmax = max {(ltotal(i) − lgt(i)) | 1 ≤ i ≤ N} (5.12)<br />

where ltotal(i) is the computed total length of tube i, lgt(i) the corresponding ground<br />

truth length, and N the number of tubes considered. If the mean ground truth distance<br />

GT D is approximately zero, the deviation is distributed equally. Otherwise, if GT D > 0,<br />

the measured length is predominantly larger than the ground truth measurement. Accordingly<br />

if GT D < 0, the opposite is valid. In both cases, the systematic error indicates<br />

the system is probably not calibrated correctly.<br />

Root Mean Square Error (RMSE) The root mean square error measure is used to<br />

compare the measurements of the visual inspection system to manually acquired ground<br />

truth data over a sequence as follows:<br />

�<br />

�<br />

�<br />

RMSE = � 1<br />

N�<br />

(ltotal(i) − lgt(i))<br />

N<br />

2<br />

(5.13)<br />

i=1


102 CHAPTER 5. RESULTS AND EVALUATION<br />

Figure 5.1: Measuring slide used for acquiring ground truth measurements by hand.<br />

with ltotal(i), lgt(i) andN as defined before. A small root mean square error indicates the<br />

measurements are close to the ground truth data.<br />

5.1.3. Ground Truth Measurements<br />

The acquisition of ground truth data is important for evaluating the vision-based inspection<br />

system with respect to human measurements. For this purpose a special digital<br />

measuring slide as can be seen in Figure 5.1 has been used. The precision of this device<br />

is up to 1/100mm.<br />

However, there is a significant deviation in human measurements, since heat shrink<br />

tubes are flexible. Depending on the force the human operator applies to the measuring<br />

slide, the measured length gets smaller or larger. This variation has been investigated<br />

empirically.<br />

12 sample tubes of different diameter (6, 8 and 12) are selected as test set (see Table<br />

5.3). One half of the samples are black, the other half transparent tubes. For each<br />

combination of color and diameter, one tube has a length of approximately 50mm and one<br />

was manipulated, i.e. slightly larger or shorter than the tolerance allows for.<br />

No. color diameter mean length<br />

1 Transparent 8 49.95<br />

2 Transparent 6 49.77<br />

3 Transparent 12 49.82<br />

4 Transparent 8 48.19<br />

5 Transparent 6 51.33<br />

6 Transparent 12 51.88<br />

7 Black 8 50.98<br />

8 Black 6 50.19<br />

9 Black 12 50.00<br />

10 Black 6 50.84<br />

11 Black 8 49.66<br />

12 Black 12 51.56<br />

Table 5.3: Test set used to determine the human variance in measuring.<br />

The results are shown in Figure 5.2. In a first experiment, the variance of a single<br />

person is investigated denoted as intra human variance. Each tube in the test set has<br />

been measured 10 times by the same person with the goal to be as precise as possible.


5.1. EXPERIMENTAL DESIGN 103<br />

Ground Truth Length [mm]<br />

52.5<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

Intra Human Variance<br />

48<br />

0 2 4 6 8 10 12 14<br />

Tube<br />

(a)<br />

Ground Truth Length [mm]<br />

52.5<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

Inter Human Variance<br />

48<br />

0 2 4 6 8 10 12 14<br />

Tube<br />

Figure 5.2: Intra and inter human variance for the test set in Table 5.3 under ideal laboratory<br />

conditions. The error bars indicate the maximum and minimum length for each of the 12 tubes<br />

as well as the mean value of the measurements once for one person (a) and once for 10 persons<br />

(b). The average inter human variance is slightly larger compared to the intra human variance.<br />

Theerrorbarsindicatethemaximumandminimumlengthaswellasthemeanvalueof<br />

all measurements. The computed mean standard deviation is 0.078mm.<br />

In a second experiment, the inter human variance is determined. Therefore, 10 persons<br />

have been asked to measure the same test set again as precise as possible. The inter human<br />

variance is slightly larger than the intra human variance (see Figure 5.2(b)). In this case,<br />

the mean standard deviation was observed to be 0.083mm.<br />

Furthermore, it is important to state that the manual measurements for the ground<br />

truth data have been acquired very carefully with elevated concentration under laboratory<br />

conditions and with the aim to be as precise as possible using the digital measuring slide<br />

(see Figure 5.1). Less than 5 tubes can be measured within one minute at this precision. At<br />

production, the sample measurements are performed with a standard sliding caliper and<br />

at a much higher rate. There is a definitively tradeoff between accuracy and speed. The<br />

expected individual measuring error at production is much larger. Furthermore, factors<br />

like tiredness or distraction can significantly increase the inter and intra human measuring<br />

variance.<br />

The accuracy and precision of the visual inspection system, however, should be evaluated<br />

with respect to the maximum possible accuracy humans can reach with the given<br />

measuring slide under ideal conditions. Throughout this thesis, manual ground truth measurements<br />

always refer to the ideal, laboratory condition measurements. One has to keep<br />

in mind that there is still a certain unsureness in these measurements. The real absolute<br />

length of a tube can not be determined exactly.<br />

For the following experiments, all tubes have been measured three times to reduce the<br />

influence of the human variance. The mean of the three measurements is taken as ground<br />

truth reference. All measurements are stored in a database and each measured tube is<br />

labeled by hand with a four digit ID using a white touch-up pen.<br />

(b)


104 CHAPTER 5. RESULTS AND EVALUATION<br />

Figure 5.3: At velocities > 30m/min larger sequences of tubes with a small spacing have to<br />

be placed on the conveyor using a special supply tube.<br />

5.1.4. Strategies<br />

Online vs Offline Inspection There are two main strategies for evaluation of the inspection<br />

system. The first strategy analyzes the tubes online, i.e. in real-time on the conveyor.<br />

This includes the tube localization, tracking, measuring as well as the good/bad classification.<br />

The results are stored in a file and can be further processed or visualized afterward.<br />

This is closely related to the application at production. The drawback of this approach is<br />

that if there is some interesting or strange behavior observed in the resulting data, it is<br />

difficult to localize the origin.<br />

Therefore, the second evaluation strategy is based on an offline inspection. This means<br />

a sequence of tubes is first captured into a file at the maximum frame rate that can be<br />

processed online. Then, the sequence can be analyzed repetitive with different sets of<br />

parameters or methods. This is a significant advantage if one wants to compare different<br />

techniques or parameter settings.<br />

In the following experiments, both strategies will be applied.<br />

Tube Placement The prototype setup in the laboratory has one significant drawback.<br />

The tubes to be inspected have to be added manually to the conveyor, since there is no<br />

second conveyor from which the tubes fall onto continuously like in production. The size of<br />

the conveyor allows for about 21 tubes of 50mm length with a spacing of 10mm in between.<br />

If all tubes are placed on the inactive conveyor it takes some time until the desired velocity<br />

is reached. Therefore, at faster velocities, the first tubes pass the measuring area with a<br />

slower velocity leading to unequal conditions between measurements.<br />

Hence, either less tubes have to be placed on the conveyor (starting further away from<br />

themeasuringarea)orthetubeshavetobeplacedontotheconveyorwhileitisrunning<br />

at the desired velocity. The later is hardly possible for a human without producing large<br />

spacings between two consecutive tubes. Instead a certain supply tube of about 1.30m<br />

length, with a diameter slightly larger than the current tube diameter, can be used as<br />

magazine for about 25 tubes of 50mm length (see Figure 5.3). The supply tube is placed<br />

at steep angle at the front of the conveyor (in moving direction). If the conveyor is not<br />

moving, the tubes are blocked and can not leave the supply tube. On the other hand if<br />

theconveyorismoving,thebottomtubeisgrippedbythebeltandcanleavethesupply<br />

tube through a bevel opening in moving direction. If the velocity of the conveyor is fast


5.2. TEST SCENARIOS 105<br />

enough, the time until the next tube in the supply tube is gripped by the belt is sufficient to<br />

produce a spacing. Experiments have shown that the supply tube works only for velocities<br />

> 30m/min. Otherwise it is possible that two consecutive tubes are not separated.<br />

Thus, one has two disjunctive methods to fit a conveyor with tubes. One is working well<br />

for lower velocities, the other for faster ones. In both cases the maximum number of tubes<br />

is limited. Therefore, larger experiments have to be partitioned over several sequences.<br />

Test Data Since it is not worthwhile to manually measure thousands of tubes as ground<br />

truth reference, the number of tubes that can be compared to such reference lengths is<br />

limited. However, it is possible to increase the number of ground truth comparisons if one<br />

repeats the automated visual measurement of a manually measured tube. For example, one<br />

can manually measure 20 tubes of each particular type (that number can be placed onto<br />

the conveyor or into the supply tube at one time) and repeat the automated inspection<br />

several times. From the algorithmic perspective the system is confronted with a new<br />

situation every time, independent if there are 100 different tubes to be inspected or 5×20.<br />

In the following it is distinguished between tubes of a length that meet the given target<br />

length within the allowed tolerance and tubes of manipulated length falling outside this<br />

tolerance. The system must be able to separate the manipulated tubes from the proper<br />

ones.<br />

5.2. Test Scenarios<br />

Eight test scenarios have been developed to evaluate the system. In each scenario only<br />

one parameter is varied, while the others are kept constant. The different scenarios are<br />

introduced in the following.<br />

Noise Before the system is tested with respect to real data, the accuracy and precision<br />

of the measuring approach is evaluated on synthetic images. A rectangle of known pixel<br />

size simulates the projection of an ideal tube that is not deformed by perspective. The<br />

‘tube edges’ as well as the measuring points are detected with subpixel precision like at<br />

real images. The resulting length in pixels must equal the rectangle width. To evaluate<br />

the accuracy under the presence of noise, Gaussian noise of different standard deviation<br />

is added systematically to the sequences.<br />

Minimum Tube Spacing In this scenario the minimum spacing between tubes is investigated<br />

both for black and transparent tubes on real images. The test objects have a size<br />

of about 50mm within the allowed tolerance and a diameter of 8mm. The velocity of the<br />

belt is 30m/min. Starting at sequences that allow for only one tube in the visual field,<br />

e.g. the spacing is larger than the tube length, the spacing is decreased until the detection<br />

rate Ωtotal fallsbelow1,i.e.atleastonetubecouldnotbedetected.<br />

Conveyor Velocity The goal in this scenario is to investigate how accuracy and precision<br />

of the measurements depend on the velocity of the conveyor. The focus is on four different<br />

velocities: slow (10m/min), medium (20m/min), fast (30m/min), and very fast (40m/min).<br />

This is the maximum velocity that can be reached at production. Currently, the production<br />

line runs at approximately 20m/min. To test the limits of the system, even higher velocities


106 CHAPTER 5. RESULTS AND EVALUATION<br />

up to 55m/min are tested. For all velocities > 30m/min,thetubeshavetobeplacedonto<br />

the conveyor using the supply tube.<br />

Again the inspected tube size is about 50mm in length within the allowed tolerance and<br />

a diameter of 8mm both for black and transparent tubes. The spacing in between the tubes<br />

must be large enough following the results of the minimum tube spacing experiments.<br />

In this scenario, all evaluation criteria introduced in Section 5.1.2 are considered including<br />

a comparison to ground truth measurements. The evaluation is performed offline.<br />

Tube Diameter If the distance between camera and conveyor belt does not change,<br />

the diameter of a tube influences the distance between the measuring plane ΠM (see<br />

Section 4.2) and the image plane. Tubes with a smaller diameter are further away and<br />

appear smaller in the image, while tubes with a larger diameter are magnified in the image.<br />

Thus, the calibration factor that relates a pixel length to a real world length in mm has<br />

to be adapted.<br />

The test data includes transparent and black tubes with a diameter of 6, 8 and 12mm<br />

and a length of 50mm that meet the allowed tolerances. The conveyor velocity is constant<br />

at 30m/min. Again all evaluation criteria are considered and the evaluation is performed<br />

offline.<br />

Repeatability In this scenario, a tube of known size is measured many times in a row<br />

at a constant velocity of 30m/min. Theoretically, the system should measure the same<br />

length each time, since one can assume the length of the tube does not change throughout<br />

the experiments. As mentioned before there are several parameters that can influence the<br />

repeatability in practice like a varying background.<br />

In the same experiment one can not only determine the repeatability, i.e. the precision<br />

of the system, but also the accuracy if one does not use a heat shrink tube, but an ideal<br />

tube gage. Such a gage can be made from metal with much higher precision overcoming<br />

the human variance in measuring deformable heat shrink tubes. For comparable results,<br />

the gage should have the same shape and dimension of a heat shrink tube. Since it does<br />

not transmit light, a metallic gage can simulate black tubes only.<br />

The real world length of the gage is known very accurate and precise. Thus, the RMSE<br />

of the measuring results gets almost independent of errors in the ground truth data.<br />

The measurements can be best performed online, i.e. in real-time, due to the amount<br />

of accumulating data. The resulting lengths are stored in a file for later evaluation.<br />

Outlier Detection Until now, all experiments are based on test data that is known to<br />

meet the given tolerances. In this scenario, tubes of approximately 50mm length are mixed<br />

with tubes that are too long or too short, i.e. differ from the target length for more than<br />

0.7mm.Thepositionandthenumberoftheoutliersinasequenceisknown.Thesystem<br />

must be able to detect the outliers correctly. Thus, the false positive and false negative<br />

rate are the main criteria of interest in this scenario.<br />

The evaluation can be performed both offline or online.<br />

Tube Length As mentioned before, the focus in this thesis is set to tubes of 50mm length.<br />

In addition it is shown that the system is able to measure also tubes of different length<br />

exemplary for tubes of 30 and 70mm length.


5.3. EXPERIMENTAL RESULTS 107<br />

The tolerances for these lengths differ, i.e. the 30mm tubes are allowed to deviate only<br />

up to 0.5mm around the target length while 70mm tubes have a larger tolerance of 1mm.<br />

The measuring precision can be directly linked to these tolerances. Accordingly the system<br />

must measure smaller tubes with a higher precision then larger ones.<br />

In this scenario, the accuracy and precision is evaluated based on the mean and standard<br />

deviation of a sequence of tubes measured online that approximately meet the given target<br />

length. Corresponding ground truth data is available.<br />

Performance Finally, it is of interest to determine the performance of the system in<br />

terms of the average per frame processing time ΩTIME. It is investigated how the total<br />

processing time is distributed over the different stages of the inspection including radial<br />

distortion compensation, profile analysis, edge detection and template matching, as well<br />

as the total length computation and tracking.<br />

5.3. Experimental Results<br />

In this section the experimental results of the different scenarios are presented and discussed.<br />

Further discussion as well as an outlook on future work is given in Section 5.4.<br />

5.3.1. Noise<br />

The influence of noise on the measuring accuracy is tested on synthetic sequences. Rectangles<br />

of 200 pixels width are placed on a uniform background with a contrast of 70 gray<br />

levels between the object and the brighter background. The image size is 780 × 160, and<br />

the sequence is analyzed like a real sequence with two differences. First, the perspective<br />

correction function is disabled, since the synthetic ‘tube’ is not influenced by perspective,<br />

i.e. the width of the rectangle is constant independent of the image position. Furthermore,<br />

the dynamic selection of template curvatures based on the image position does not work<br />

as well in this scenario, since the model knowledge assumptions do not hold. Thus, in<br />

this experiment all templates are tested at each position (computation time is not critical<br />

here).<br />

Gaussian noise of standard deviation σN has been added to the ideal images, with<br />

σN ∈{5, 10, 25}. Sample images of each noise level are shown in Figure 5.4(a)-(d).<br />

The measuring results are evaluated using the root-mean-square-error between the<br />

ground truth length of 200pixels and the result of the single measurements. The results<br />

show that in the ideal (noise free) case, the pixel length is always measured correctly.<br />

Under the presence of noise, the measured length varies at subpixel level. Figure 5.4(e)<br />

shows how the measurements differ in accuracy and precision under the presence of noise.<br />

The maximum deviation from the target length occurs at the largest standard deviation<br />

(σN = 25). The RMSE results can be found in Figure 5.4(f). For sequences with only<br />

a little amount of noise (σN = 5) the RMSE is acceptable low with 0.122. If one pixel<br />

represents 0.12mm in the measuring plane, the real world error is about 1/100mm. Even<br />

under strong noise (σN = 25), which is far beyond the noise level of real images, the<br />

measuring error is 0.252pixels or 0.03mm in the example. This is still significantly below<br />

the human measuring variance.


108 CHAPTER 5. RESULTS AND EVALUATION<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

(a) σN =0 (b) σN =5<br />

(c) σN =10 (d) σN =25<br />

std=0<br />

std=5<br />

std=10<br />

std=25<br />

0<br />

199 199.2 199.4 199.6 199.8 200<br />

Length [pixel]<br />

200.2 200.4 200.6 200.8 201<br />

(e) (f)<br />

σN RMSE<br />

0 0<br />

5 0.122<br />

10 0.158<br />

25 0.252<br />

Figure 5.4: Accuracy evaluation of length measurements at synthetic sequences under the<br />

influence of noise. (a)-(d) Rectangles of known size (length = 200 pixels) simulate a tube<br />

on a uniform background without perspective effects. Gaussian noise of different standard<br />

deviation σN ∈{5, 10, 25} has been added to the ideal images. (e) Gaussian distribution of<br />

the measurements. (f) Root mean square error (RMSE) for each noise level.


5.3. EXPERIMENTAL RESULTS 109<br />

Detection rate<br />

1.05<br />

1<br />

0.95<br />

0.9<br />

0.85<br />

0.8<br />

black<br />

transparent<br />

0.75<br />

0 10 20 30<br />

Tube spacing [mm]<br />

40 50 60<br />

Figure 5.5: Detection rate of black and transparent tubes depending on the spacing between<br />

consecutive tubes.<br />

Thus, one can conclude the system is able to detect the synthetic tube edges very accurate<br />

even under the presence of noise if there is a sufficient contrast between background<br />

and foreground.<br />

5.3.2. Minimum Tube Spacing<br />

10 black and 10 transparent tubes are used to investigate the influence of the spacing<br />

on the detection rate. The tubes have been placed on the conveyor at an approximately<br />

constant spacing. Five gap sizes are tested: 60, 30, 20, 10, and 5mm respectively. Each<br />

load of tubes passes the measuring area five times for each gap size at a conveyor velocity<br />

of 30m/min. In this experiment the total detection rate Ωtotal is considered only, i.e. how<br />

many tubes are detected by the system at least once. The results are averaged over the 5<br />

iterations.<br />

As can be seen in Figure 5.5 the detection of black tubes is uncritical indicated by<br />

Ωtotal = 1 until the tube spacing is less than 10mm. This means no black tube can pass<br />

the measuring area without being measured if the spacing is ≥ 10mm. The decrease at<br />

5mm gaps to Ωtotal =0.98 (i.e. 1 tube out of 50 is not detected) may be due to the fact<br />

that the manual tube placing can not guarantee an exact spacing of 5mm. It is likely<br />

that the distance between two tubes has become even smaller leading to the failure. Since<br />

the tests have been performed online it is not possible to locate the origin of the outlier.<br />

Therefore, it has been investigated how small the gap between two black tubes must be<br />

until the profile analysis fails to locate the tube. The results are shown in Figure 5.6. Even<br />

a spacing of about 2mm as in (a) is large enough to reliably detect the background regions<br />

between the tubes as can be also seen at the corresponding profile analysis results in (c).<br />

A gap of about 1mm, however, is too small even for black tubes. Due to perspective the<br />

points closer to the camera merge (see Figure 5.6(b) and (d)).<br />

Thetransparenttubesshowadetectionrateof< 1 even for the largest tested gap<br />

size of 60. This can be explained by the much lower contrast to the background. If the<br />

system must able to overcome a strong non-uniform background brightness, one has to<br />

make a larger compromise in terms of detection sensitivity. As it turns out there is no<br />

parameter setting that can guarantee that all tubes are detected independent of the gap


110 CHAPTER 5. RESULTS AND EVALUATION<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

(a) (b)<br />

smoothed profile<br />

segment boundaries<br />

local median<br />

global mean<br />

regional mean<br />

predicted tube boundaries<br />

0<br />

0 20 40 60 80 100 120 140 160 180<br />

(c)<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

smoothed profile<br />

segment boundaries<br />

local median<br />

global mean<br />

regional mean<br />

0<br />

0 20 40 60 80 100 120 140 160 180<br />

Figure 5.6: Minimum tube spacing for black tubes. (a) A spacing of about 2mm is still<br />

sufficient to locate the measurable tube correctly. (b) The detection fails if the two tubes<br />

appear to touch under perspective as on the left side. (c) Profile analysis of (a). (d) Profile<br />

analysis of (b).<br />

size. However, the results have shown that the detection rate decreases drastically below<br />

10mm(seeFigure5.5).<br />

As the result of these experiments the minimum spacing used in the following experiments<br />

is 10mm for black tubes and 20mm for transparent tubes.<br />

5.3.3. Conveyor Velocity<br />

The test data in this scenario includes 17 transparent and 21 black tubes of 50mm length<br />

and 8mm diameter. Manual ground truth measurements of these tubes are available. The<br />

number of tubes of each color is geared to the number of tubes that can be placed on the<br />

conveyor with a sufficient spacing. To increase the probability of a 100% detection rate,<br />

the spacing between two transparent tubes has to be larger than for black tubes. Each<br />

charge of tubes is measured 5 − 6 times at each velocity of 10, 20, 30, and 40m/min to<br />

yield a total number of > 100 measurements (based on even more single measurements)<br />

in each experiment. Thus, all tubes have to pass the measuring area many times.<br />

Before presenting the results in detail, Figure 5.7 shows an example of how the system<br />

has measured (a) the charge of black tubes and (b) the charge of transparent tubes at<br />

20m/min respectively. Both the single measurements per tube (indicated by the crosses) as<br />

well as the computed total length and the corresponding ground truth length are visualized.<br />

The lengths measured by the system are quite close to the ground truth data.<br />

These results are just an example to show what kind of data is evaluated in the following.<br />

Since it is not possible to visualize longer sequences as detailed as in Figure 5.7 due to the<br />

(d)


5.3. EXPERIMENTAL RESULTS 111<br />

Length [mm]<br />

Length [mm]<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

48<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

48<br />

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140<br />

Measurement number<br />

(a) 21 black tubes at 20m/min<br />

measurements<br />

upper tolerance<br />

lower tolerance<br />

resulting mean length<br />

ground truth<br />

boundaries<br />

0 10 20 30 40 50 60 70 80 90 100 110<br />

Measurement number<br />

(b) 17 transparent tubes at 20m/min<br />

measurements<br />

upper tolerance<br />

lower tolerance<br />

resulting mean length<br />

ground truth<br />

boundaries<br />

Figure 5.7: Measuring results at 20m/min for (a) black and (b) transparent tubes. The red<br />

crosses indicate single measurements, while the dashed vertical lines represent the boundaries<br />

between measurements belonging to the same tube. The averaged total length as well as the<br />

corresponding ground truth length are also shown in the plots. All measured tubes of this<br />

sequence meet the tolerances. However, while the transparent tubes have approximately the<br />

target length of 50mm on average, the mean of the black tubes is slightly shifted, i.e. all tubes<br />

tend to be shorter than the target length.


112 CHAPTER 5. RESULTS AND EVALUATION<br />

v[m/min] Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE<br />

10 1 11.4 0.05 -0.12 0.14 0.01 0.07<br />

20 1 6.9 0.04 -0.16 0.11 -0.02 0.07<br />

30 1 4.6 0.05 -0.19 0.19 0.0 0.07<br />

40 1 3.2 0.07 -0.21 0.17 -0.01 0.09<br />

55 1 2.3 0.07 -0.16 0.16 0.01 0.08<br />

Table 5.4: Evaluation results at different conveyor velocities v for black tubes (50mm length,<br />

∅8mm). The accuracy of the measurements does not decrease significantly with faster velocities<br />

nor with a decreasing number of per tube measurements ΩPTM indicated by the RMSE.<br />

σtube is the per tube standard deviation and GT D stands for ground truth distance (see<br />

Section 5.1.2).<br />

amount of data, more comprehensive representations will be used based on the proposed<br />

evaluation criteria.<br />

Black Tubes The results of the velocity experiments with black tubes are summarized<br />

in Table 5.4.<br />

TheblacktubesshowadetectionrateΩtotal of 1 for all velocities, i.e. no tube has<br />

passed the measuring area without being measured independent of how fast the tubes are<br />

moved. The average number of per tube measurements ΩPTM decreases from 11.4 atthe<br />

slowest velocity (10m/min) to 3.2 at the maximum possible production velocity. Even at<br />

55m/min each tube is measured at least twice. The average standard deviation σtube of<br />

themeasurementspertubereachesfrom0.04 to 0.07mm, again there is only a very little<br />

rise from the slower to the faster velocities. The absolute ground truth distance does not<br />

exceed 0.21 and measurements that are shorter or larger than the ground truth are equally<br />

distributed indicated by the mean ground truth distance GT D that is approximately zero.<br />

As an example, the ground truth distance at 30m/min is shown in Figure 5.8(a). If the<br />

distance is larger than 0, the manually measured length is shorter than the vision-based<br />

measurement and vice versa. Due to the variance in the ground truth data it is not very<br />

likely that the distance is zero for all values. However, the distance should be as small as<br />

possible. If the ground truth distance is one-sided, i.e. all measurements of the system<br />

are larger or shorter than the corresponding ground truth measurement, this indicates an<br />

imprecise calibration factor. The conversion of the pixel length into a real world length<br />

results in a systematical error which has to be compensated by adapting the calibration<br />

factor.<br />

The RMSE differs only marginally between the tested velocities. The largest RMSE<br />

is computed at 40m/min with 0.09. This value is only slightly larger than the deviation of<br />

human measurements. For lower velocities it is even better with 0.07. Another indicator<br />

of how the vision-based measurements converge to the ground truth data is the Gaussian<br />

distribution over the sequence of all measurements. This distribution is based on the<br />

mean µseq and standard deviation σseq (see Section 5.1.2). Figure 5.8(b) compares the<br />

vision-based distribution (solid line) at 30m/min and the corresponding ground truth<br />

distribution (dashed line). The mean is 49.66 in both cases. σseq is slightly larger with<br />

0.1193 compared to the ground truth with 0.1027.<br />

In terms of accuracy and precision this means the vision-based measurements of black<br />

tubes are equally accurate compared to human measurements (laboratory conditions) and


5.3. EXPERIMENTAL RESULTS 113<br />

GTD [mm]<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

-0.4<br />

-0.5<br />

-0.6<br />

ground truth distance<br />

-0.7<br />

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105<br />

Tube number<br />

(a)<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

Measurement distribution<br />

Ground truth distribution<br />

49 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51<br />

Length [mm]<br />

Figure 5.8: (a) Ground truth distance GT D in mm for black tubes (50mm length, ∅8mm)<br />

at 30m/min. (b) Gaussian distribution of all measurements compared to the ground truth<br />

distribution.<br />

v[m/min] Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE<br />

10 0.99 9.6 0.06 -0.14 0.32 0.09 0.13<br />

20 0.98 5.2 0.09 -0.16 0.29 0.08 0.11<br />

30 1 3.9 0.15 -0.16 0.66 0.15 0.20<br />

40 0.97 2.4 0.18 -0.27 0.75 0.23 0.28<br />

Table 5.5: Evaluation results at different conveyor velocities v for transparent tubes (50mm<br />

length, ∅8mm). The accuracy seems to decrease with faster velocities as can be seen at the<br />

RMSE and the mean per tube standard deviation σtube. The number of per tube measurements<br />

ΩPTM is smaller for transparent tubes. Due to the lower contrast it is more likely that<br />

a tube is not detected as measurable.<br />

are only marginally less precise. Furthermore, as an additional benefit, it is possible to<br />

show that a sequence of tubes is systematically shorter than the target length (although<br />

still in the tolerances). This information could be used to adjust the cutting machine until<br />

µseq approximates the given target length.<br />

Transparent Tubes The same experiments have been repeated with transparent tubes.<br />

The results are summarized in Table 5.5.<br />

The detection rate Ωtotal tends to decrease with an increasing velocity, although all<br />

tubes have been detected at 30m/min in this experiment. 3% of the tubes have passed<br />

the visual field of the camera without being measured at 40m/min.<br />

Due to the poorer contrast of transparent tubes the probability increases that a tube<br />

can not be located in the profile analysis step. This can be seen on the average number<br />

of per tube measurements ΩPTM. While black tubes are measured about 11.4 timesat<br />

v = 10m/min, the transparent tubes reach only 9.6 measurements per tube at the same<br />

velocity. At 40m/min this number decreases to 2.4. At faster velocities, e.g. 55m/min,<br />

the number of per tube measurements falls short of 1. Reliable measurements are not<br />

(b)


114 CHAPTER 5. RESULTS AND EVALUATION<br />

GTD [mm]<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

-0.4<br />

-0.5<br />

-0.6<br />

ground truth distance<br />

tube marker<br />

-0.7<br />

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100<br />

Tube number<br />

(a)<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

Measurement distribution<br />

Ground truth distribution<br />

49 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51<br />

Length [mm]<br />

Figure 5.9: (a) Ground truth distance GT D in mm for transparent tubes (50mm length,<br />

∅8mm) at 30m/min. The measurements marked by a ‘+’ are all belonging to the same tube<br />

that reached the maximum GT D at measurement 68. As one can see it is not systematically<br />

measured wrong. A poor contrast region on the conveyor belt is rather the origin for the strong<br />

deviations from the ground truth. (b) Gaussian distribution of all measurements compared to<br />

the ground truth distribution.<br />

possible at this velocity for transparent tubes so far and are therefore not considered in<br />

Table 5.5.<br />

The standard deviation σtube of transparent tubes moved at 40m/min is three times<br />

larger than at 10m/min. This can be explained by the smaller number of per tube measurements.<br />

The ground truth distance increases also with the velocity. Especially GT D<br />

gets conspicuously larger, i.e. the measured lengths are larger than the ground truth<br />

length on average. This trend can also be observed at the absolute value of GT Dmax and<br />

GT Dmin. At a velocity of 40m/min the maximum ground truth distance is 0.75 which<br />

is more than the allowed tolerance. In this context one has to keep in mind that these<br />

values are only the extrema and do not describe the average distribution. This makes the<br />

ground truth distance measure very sensitive to outliers. However, a large GT D value<br />

does not have to mean poor accuracy automatically. On the other hand if the ground<br />

truth distance is low in the extrema as with the black tubes in this experiment, this is an<br />

additional indicator of high accuracy. The ground truth distance of the transparent tubes<br />

at 30m/min is shown in Figure 5.9(a). The deviations are significantly larger compared<br />

to Figure 5.8(a).<br />

Instead of being approximately equally distributed as for the black tubes, the error of<br />

transparent tubes seems to increase and decrease randomly, but always over a range of<br />

consecutive measurements. This observation can be explained by the varying background<br />

intensity at back light through the conveyor belt. The periodic intensity changes influence<br />

the transparent tubes obviously much stronger than the black tubes since the detection<br />

quality depends mostly on the image contrast. Figure 5.10 shows how the mean image<br />

intensity of a moving empty conveyor belt changes over time. If a tube is measured at<br />

a part on the conveyor belt that yields a poor contrast under back light, the GT D is<br />

likely to increase. Having in mind each tube passes the measuring area 6 times in this<br />

experiment, the probability is small that it is always measured at the same position on<br />

the conveyor. The tube measured with the maximum GT D hasbeenmarkedintheplot<br />

(b)


5.3. EXPERIMENTAL RESULTS 115<br />

gray level<br />

155<br />

150<br />

145<br />

140<br />

135<br />

130<br />

125<br />

120<br />

115<br />

110<br />

105<br />

0 50 100 150 200 250 300 350 400 450 500<br />

t<br />

Mean image brightness<br />

Figure 5.10: Mean image intensity of a moving empty conveyor belt over time. The deviation<br />

between the brightest and the darkest region on the conveyor exceeds 40 gray levels and is<br />

originated in non uniform translucency characteristics of the belt. Example images showing<br />

this non uniformity can be found in Figure 3.4.<br />

as well as all other measurements belonging to this particular tube. It turns out that<br />

the average ground truth distance of this tube is 0.3mm which is still larger than the<br />

RMSE of the whole sequence due to the outliers. However it is shown that this tube is<br />

not measured wrongly in general. Furthermore one can see that all neighboring tubes that<br />

lie in the same region on the conveyor are also measured inaccurately. It is assumed that<br />

with a more uniform conveyor belt such deviations could be avoided.<br />

The mean over all measurements is 50.04 at 30m/min compared to 49.96 in the ground<br />

truth. This is still very accurate. The precision of the vision-based measurements is<br />

0.15 compared to 0.09 of human measurements under ideal laboratory conditions. The<br />

corresponding Gaussian distributions are plotted in Figure 5.9(b).<br />

Finally, the RMSE increases with faster velocities, and the total error is larger compared<br />

to black tubes. The lowest error was measured at 20m/min (approximately the current<br />

production velocity) with 0.11. This error is still only slightly larger than the human<br />

variance.<br />

One can conclude that the results of the black tubes are very accurate both for slow and<br />

fast conveyor velocities. The RMSE falls even below the standard deviation of human<br />

measurements. The accuracy of transparent tubes decreases with faster velocities, but is<br />

still in a range that allows for measurements with the given tolerance specifications. Best<br />

results have been achieved at a velocity of 20m/min. As it turns out, all tubes meeting the<br />

tolerances in the real world (based on manual ground truth data) have been also measured<br />

reliably to be within the tolerances by the system, i.e. ΩFN =0. Thus,notubewould<br />

have been blown out wrongly at any velocity.


116 CHAPTER 5. RESULTS AND EVALUATION<br />

Diameter Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE<br />

6mm (B) 1 4.8 0.05 -0.40 0.29 -0.13 0.18<br />

8mm (B) 1 4.6 0.05 -0.19 0.19 0.0 0.07<br />

12mm (B) 1 4.6 0.07 -0.44 0.31 -0.11 0.19<br />

6mm (T) 0.92 2.8 0.18 -1.15 0.87 0.01 0.20<br />

8mm (T) 1 3.9 0.15 -0.16 0.66 0.15 0.20<br />

12mm (T) 0.98 3.12 0.24 -0.69 0.67 0.07 0.20<br />

Table 5.6: Measuring results of 50mm length tubes with different diameter at a velocity of<br />

30m/min. The first two rows show to black tubes (B) and the last two rows transparent (T)<br />

ones.<br />

Figure 5.11: The thin 6mm tubes are likely to be bent. The distance between the defined<br />

measuring points in the image does not represent the length of the straight tube correctly.<br />

5.3.4. Tube Diameter<br />

Beside tubes of 8mm diameter as investigated in the velocity experiments, there are also 6<br />

and 12mm diameter tubes to be considered in the DERAY-SPLICEMELT series. Therefore,<br />

the test data in this scenario includes transparent and black tubes of 50mm length<br />

with these diameters. The velocity is constant at 30m/min. Again more than 100 tubes<br />

are measured for each combination of color and diameter. The summarized evaluation<br />

results can be found in Table 5.6.<br />

Black Tubes As for 8mm tubes, 100% of the black tubes both for 6 and 12mm diameter<br />

are measured by the system indicated by a score of Ωtotal = 1. The number of per tube<br />

measurements is also approximately equal with 4.8 for 6mm diameter tubes and 4.6 for<br />

12mm tubes. The per tube standard deviation σtube is slightly larger for 12mm with 0.07<br />

compared to 0.05 at 6 and 8mm tubes. One significant difference to 8mm tubes are the<br />

larger extrema in the ground truth distance GT Dmin and GT Dmax and the definite shift<br />

in the average ground truth distance GT D. Values of −0.13 for 6mm and −0.11 for 12mm<br />

indicate the vision-based lengths are mostly shorter than the manual measurements.<br />

This has basically two different origins: Tubes with a diameter of 6mm are bent much<br />

stronger than tubes of larger diameters as can be seen for example in Figure 5.11. In this<br />

case,bothmanualaswellasvision-basedmeasurementsaredifficult. Thelengthofatube<br />

intheimageisdefinedasthedistancebetweentheleftandrightendofthetubeatthe<br />

most outer points of the corresponding edges. If the tube is bent, however, the distance<br />

between the measuring points is obviously smaller than the real length. This can be seen<br />

in the ground truth distance as well as in the resulting RMSE which is significantly larger<br />

with 0.18 compared to 8mm tubes. Figure 5.12(a) visualizes the results of a sequence of<br />

21 black tubes at 30m/min. The bent tube in Figure 5.11 corresponds to the 10th tube in<br />

this plot (located between measurement number 45 and 50) and is measured significantly


5.3. EXPERIMENTAL RESULTS 117<br />

Length [mm]<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

48<br />

measurements<br />

upper tolerance<br />

lower tolerance<br />

resulting mean length<br />

ground truth<br />

boundaries<br />

0 10 20 30 40 50 60 70 80 90 100<br />

Measurement number<br />

(a) black, 6mm diameter<br />

Length [mm]<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

48<br />

measurements<br />

upper tolerance<br />

lower tolerance<br />

resulting mean length<br />

ground truth<br />

boundaries<br />

0 10 20 30 40 50 60 70 80 90 100<br />

Measurement number<br />

(b) black, 12mm diameter<br />

Figure 5.12: Length measurement results of black tubes with different diameter at 30m/min.<br />

The plots show only a section of the total number of measured tubes. Although the RMSE<br />

is larger both for 6 and 12mm tubes compared to the 8mm results, the measurements are still<br />

accurate enough to correctly detect all tubes within the allowed tolerances.<br />

shorter than the ground truth. The total results of the experiment with 6mm diameter<br />

black tubes are shown in terms of the ground truth distance in Figure 5.13(a).<br />

Only a few tubes are measured too long while most measurements are shorter than the<br />

ground truth depending on how much a tube is bent, i.e. how much it is differing from<br />

the assumed straight tube model. However, all tubes out of 100 are measured correctly to<br />

lie within the allowed tolerances leading to a false negative rate of ΩFN =0(ΩFP =0is<br />

implicit since there are no outliers in the test data).<br />

While bending is no problem for black tubes with a diameter of 12mm, these tubes have<br />

another drawback. The larger diameter makes the tubes more susceptible to deformations<br />

of the circular cross-section shape. This means, only a little pressure is needed to deform<br />

the cross-section of a tube to an ellipse. These deformations occur if the tubes are stored<br />

for example in a bag or box and many tubes lay on top of each other. The tubes used as<br />

test set have been delivered in such way. In addition, the effect is increased since most<br />

tubes are grabbed by hand several times, e.g. to measure the ground truth distance or if<br />

experiments have been repeated with the same tubes. Each manual handling is a potential<br />

source for a deformation. With respect to the vision-based measuring results the elliptical<br />

cross-section of a tube leads to a significant problem. In the model assumptions the<br />

measuring plane ΠM is defined at a certain distance above the conveyor belt. This distance<br />

is assumed to be exactly the outer diameter of an ideal circular tube (see Figure 5.14(a)).<br />

The magnification factor that relates a pixel length into a real world length is valid only<br />

in the measuring plane. With a weak-perspective camera model it is assumed that this<br />

factor is also valid within a certain range of depth around this plane.<br />

For a deformed tube the measuring points in the image pL and pR do not originate in<br />

points that lie in the measuring plane. If the cross-section is elliptical it is most likely<br />

that the tube will automatically roll to the largest contact area. In this case the points<br />

closest to the camera will be further away than the measuring plane. Under perspective<br />

the resulting length in the image will be shorter. This is exactly what is observed in


118 CHAPTER 5. RESULTS AND EVALUATION<br />

GTD [mm]<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

-0.4<br />

-0.5<br />

-0.6<br />

ground truth distance<br />

-0.7<br />

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105<br />

Tube number<br />

(a) black, 6mm diameter<br />

GTD [mm]<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

-0.4<br />

-0.5<br />

-0.6<br />

ground truth distance<br />

-0.7<br />

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105<br />

Tube number<br />

(b) black, 12mm diameter<br />

Figure 5.13: Ground truth distance in mm of all measured black tubes with a diameter of<br />

6 and 12mm at 30m/min.<br />

the experiments. Although it is less likely, it is also possible that a tube lies on the side<br />

with the smaller contact area. This happens if the tube is leaned against a guide bar for<br />

example. The result are measuring points above the measuring plane leading to a larger<br />

length in the image.<br />

Figure 5.12(b) shows a section of 21 black tubes with a diameter of 12mm measured at<br />

30m/min. The larger distance to the ground truth data is clearly visible. However, the<br />

system is again able to reliably detect all tubes correctly within the tolerances without<br />

any false negatives (ΩFN = 0). As an example of how the deformation of a tube influences<br />

the measuring results, images of the 7th and the 11th tube 2 of this sequence are shown<br />

in Figure 5.14(b) and (c) respectively. The extension in vertical direction of tube No. 7<br />

is definitely smaller than for the neighboring tubes. This is an indicator that the tube is<br />

deformed and lies on the smaller side, thus, it is measured larger than it actually is. On<br />

the other hand, tube No. 11 is larger in the vertical extension indicating it is lying on<br />

the larger contact area. The result is a much shorter length measured by the vision-based<br />

system which can be also seen in Figure 5.13(b). Like for 6mm tubes, the measurements are<br />

mostly shorter compared to the ground truth, although the origin is different as introduced<br />

above.<br />

These results show the accuracy limits of the weak-perspective model. If higher accuracy<br />

is needed, a telecentric lens could be used to overcome the perspective effects of different<br />

depths, or the height of a tube in the image could be exploited to adapt the calibration<br />

factor fpix2mm dynamically.<br />

Transparent Tubes The experiments with different diameters have been repeated with<br />

transparent tubes. Only 92% of all transparent tubes with a diameter of 6mm are detected<br />

and measured by the system in this experiment. This is mainly due to the nonuniform<br />

translucency of the conveyor belt. Especially the thin 6mm tubes are very sensitive to<br />

2 Note: The tube number does not correspond to the (single) measurement number. The dashed lines<br />

indicate which measurements belong to the same tube.


5.3. EXPERIMENTAL RESULTS 119<br />

(a)<br />

(b) (c)<br />

Figure 5.14: (a) Idealized cross-section of deformed tubes (frontal view). The measuring<br />

plane ΠM is defined based on an ideal circular tube (center). Deviations denoted as ∆1 (left)<br />

and ∆2 (right) influence the length measurement in the image projection. (b) Example of a<br />

deformed tube (No. 7 in Figure 5.12(b)) lying on the smaller side. The measuring points are<br />

closer to the camera and due to perspective, the tube appears measurable larger in the image.<br />

(c) The opposite effect occurs if a deformed tube (No. 11 in Figure 5.12(b)) lies on the larger<br />

contact area.<br />

changes in brightness, since they are more translucent than 8mm and 12mm tubes. At<br />

regions on the conveyor belt that transmit more light, the thin tubes almost disappear.<br />

Thus, one has to reduce the intensity of the light source. This is a tradeoff, because<br />

other regions that transmit less light get even darker while the structure of the belt is<br />

emphasized. If the contrast is too low, the tube can not be located in the profile. This<br />

problem could be prevented if one would use a more homogenously translucent conveyor<br />

belt.<br />

The 12mm diameter tubes yield generally a better contrast which can be seen on the<br />

detection rate of 98%. The number of per tube measurements ΩPTM is 3.12 compared to<br />

2.8 for 6mm tubes. However, the average standard deviation is larger for the 12mm tubes<br />

with 0.24. A RMSE of 0.2 for both 6 and 12mm transparent tubes indicates the measuring<br />

results are almost equally accurate than black tubes of the same diameter, although the<br />

extrema are significantly larger. As already mentioned, these values can be influenced by a<br />

few outliers. The values of GT D show a much more uniform distribution of the deviations<br />

compared to black tubes. This is due to the fact that transparent tubes are more sensitive<br />

to strong background edges which can be wrongly detected as tube edge. Figure 5.17<br />

gives an example of how the system can fail leading to a larger measured length. The<br />

poor contrast at the tube boundary can not be compensated by the stronger responses at<br />

thetubeedgeends. Themaximumcorrelationscoreisreachedatthebackgroundedge.<br />

This problem does not occur at black tubes due to the stronger contrast.<br />

Thus, in addition to the problems described for black tubes of 6 and 12mm diameter,<br />

transparent tubes may be measured longer than they really are. Figure 5.15 visualizes the<br />

experimental results with different diameters of transparent tubes. Again, this is only a<br />

section of the total number of measurements which are summarized more comprehensive<br />

in Figure 5.16 based on the ground truth distance. Compared to the experiments with<br />

black tubes there have been false negatives among the transparent tubes, i.e. tubes have


120 CHAPTER 5. RESULTS AND EVALUATION<br />

Length [mm]<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

48<br />

measurements<br />

upper tolerance<br />

lower tolerance<br />

resulting mean length<br />

ground truth<br />

boundaries<br />

0 10 20<br />

Measurement number<br />

30<br />

(a) transparent, 6mm diameter<br />

Length [mm]<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

48<br />

measurements<br />

upper tolerance<br />

lower tolerance<br />

resulting mean length<br />

ground truth<br />

boundaries<br />

0 10 20 30 40 50<br />

Measurement number<br />

(b) transparent, 12mm diameter<br />

Figure 5.15: Experimental results of transparent tubes (50mm length) with a diameter of 6<br />

and 12mm at 30m/min. The plots show a section of the total number of tubes only.<br />

GTD [mm]<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

-0.4<br />

-0.5<br />

-0.6<br />

ground truth distance<br />

-0.7<br />

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90<br />

Tube number<br />

(a) transparent, 6mm diameter<br />

GTD [mm]<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

-0.1<br />

-0.2<br />

-0.3<br />

-0.4<br />

-0.5<br />

-0.6<br />

ground truth distance<br />

-0.7<br />

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95<br />

Tube number<br />

(b) transparent, 12mm diameter<br />

Figure 5.16: Ground truth distance in mm of all measured transparent tubes with a diameter<br />

of 6 and 12mm at 30m/min.


5.3. EXPERIMENTAL RESULTS 121<br />

(a) (b) (c)<br />

Figure 5.17: The tube edge detection can fail if the contrast between tube and background<br />

is poor. (a) Zoomed region of an input image. (b) Edge response of this image within the local<br />

ROI around the assumed edge location. Only the ends of the tube edge yield a significant<br />

response which is of little account compared to the edge response of the background. (c) The<br />

maximum correlation score between a template and the image within the local ROI (blue<br />

bounding box) is reached at the background edge indicated by the red dots. The resulting<br />

measured length is obviously wrong.<br />

been wrongly classified as too long or too short. For 6mm tubes the false negative rate<br />

is ΩFN =0.02 and for 12mm tubes ΩFN =0.01. This means 1 − 2 tubes out of hundred<br />

would have been sorted out wrongly by the system.<br />

5.3.5. Repeatability<br />

A transparent tube of 50.0mm and a black tube of 49.7mm (manual ground truth length)<br />

have been measured 100 times (based on several single measurements in each case) by<br />

the system at a constant velocity of 30m/min. The tubes have a diameter of 8mm. The<br />

measuring results of the black tube are shown in Figure 5.18(a) and the results of the transparent<br />

tube in Figure 5.18(c). The corresponding Gaussian distribution functions based<br />

on the mean and standard deviation over all measurements can be found in Figure 5.18(b)<br />

and (d) respectively. The narrower the distribution the better is the repeatability of the<br />

measurements.<br />

The mean of the 100 measurements of the black tube is 49.66 which is pretty close to<br />

the ground truth length. The standard deviation of the black tube is 0.0614mm. Thus,<br />

the deviation between measurements of the same tube is less than 1/10th of the tolerance<br />

and significantly smaller than the deviation between human measurements.<br />

The measuring results of the transparent tubes show a mean of 49.99 and a standard<br />

deviation of 0.051. With the results of the previous experiments one could have expected<br />

the deviation of a transparent tube would be larger than for a black tube. In this experiment<br />

the transparent tube has been detected 100times in a row correctly (as the black<br />

tube). The only difference between the two tubes is the shape of the cross-section. Both<br />

tubes are not ideally circular, but the material of the black tubes is slightly softer, i.e.<br />

more susceptible for deformations than transparent tubes. In this experiment each tube<br />

is manually put onto the conveyor belt 100 times. Thus, even if the operator tries to grab<br />

the tubes as carefully as possible, deformations can not be prevented for both tube types<br />

leading to the observed deviations in the measurements. Obviously the total deviation


122 CHAPTER 5. RESULTS AND EVALUATION<br />

Length [mm]<br />

Length [mm]<br />

50<br />

49.8<br />

49.6<br />

49.4<br />

49.2<br />

50.4<br />

50.2<br />

50<br />

49.8<br />

49.6<br />

Measurements<br />

Mean<br />

0 20 40 60 80 100<br />

N<br />

(a) black (49.7mm)<br />

0 20 40 60 80 100<br />

N<br />

(c) transparent (50.0mm)<br />

Measurements<br />

Mean<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

Measurement distribution<br />

49 49.2 49.4 49.6 49.8 50 50.2<br />

Length [mm]<br />

(b) black (49.7mm)<br />

Measurement distribution<br />

0<br />

49.4 49.6 49.8 50<br />

Length [mm]<br />

50.2 50.4 50.6<br />

(d) transparent (50.0mm)<br />

Figure 5.18: Repeatability of the measurement of one tube. (a) 100 measurements of one<br />

black tube with the ground truth length of 49.7mm. (b) Corresponding Gaussian distribution<br />

of all measurements in (a) with µ =49.66 and σ =0.0614. (c) 100 measurements of<br />

one transparent tube with the ground truth length of 50.0mm. (d) Corresponding Gaussian<br />

distribution of all measurements in (c) with µ =49.99 and σ =0.051. The belt velocity is<br />

30m/min in both experiments.


5.3. EXPERIMENTAL RESULTS 123<br />

Length [mm]<br />

50.4<br />

50.2<br />

50<br />

49.8<br />

49.6<br />

Measurements<br />

Mean<br />

0 20 40 60 80 100<br />

N<br />

(a)<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

Measurement distribution<br />

0<br />

49.4 49.6 49.8 50<br />

Length [mm]<br />

50.2 50.4 50.6<br />

Figure 5.19: Repeatability results of a metallic cylinder simulating a tube of 49.99mm ground<br />

truth length. (a) 100 measurements of the gage at 30m/min. (b) Gaussian distribution of the<br />

results with µ =49.94 and σ =0.033.<br />

is also influenced by other parameters such as the tube orientation within the guide bars<br />

and the limits of the discrete input image (although subpixel techniques are applied).<br />

This experiment shows how accurate the vision-based system is able to measure even<br />

transparent tubes if the tube edge detection is successful.<br />

The experiment has been repeated with a metallic cylinder of 49.99mm length simulating<br />

an ideal tube (gage). The cross-section of this gage is circular and not deformable manually.<br />

The results of this experiment are shown in Figure 5.19(a) and (b). The mean over all<br />

100 measurements is 49.94 with a standard deviation of 0.0331. This deviation is close to<br />

the error that has been estimated in Section 4.2.6 with respect to the maximum possible<br />

tube orientation within the guide bars.<br />

One can conclude, as long as the orientation within the guide bars is neglected, the<br />

maximum precision of the system is about 0.03mm for tubes that are ideally round and<br />

not bent. This is much more than twice as precise as human measurements. It is assumed<br />

that this precision could be even increased, if the tubes are not only approximately but<br />

ideally horizontally oriented.<br />

5.3.6. Outlier<br />

The system is evaluated with respect to outliers in two steps. First more than 150 tubes<br />

(about 50mm, ∅8mm) are measured by the system at 30m/min. Approximately 1/3 of<br />

thetubesmeetthetoleranceswhiletheother2/3 have a manipulated length. The ground<br />

truth length of the tubes is known as well as the measuring order, i.e. each measurement<br />

can be assigned to a corresponding ground truth length. With the results of the previous<br />

experiments one can assume that the results of the black tubes will be better or equal to<br />

the transparent tube results.<br />

The results of this experiment are visualized in Figure 5.20. All of the 150 tubes are<br />

classified correctly. There is not a single false positive or false negative in the data.<br />

In the second stage of this experiment 30 manipulated and 22 good tubes are randomly<br />

mixed. All tubes are measured online at 30m/min while the blow out mechanism is<br />

(b)


124 CHAPTER 5. RESULTS AND EVALUATION<br />

52<br />

51.5<br />

51<br />

50.5<br />

50<br />

49.5<br />

49<br />

48.5<br />

48<br />

upper tolerance<br />

lower tolerance<br />

resulting mean length<br />

ground truth<br />

0 20 40 60 80 100 120 140<br />

Figure 5.20: 150 transparent tubes of both good and manipulated tubes have been measured<br />

by the system at 30m/min and compared to ground truth data. The system is able to reliably<br />

separate the tubes that meet the tolerances around the target length of 50mm from the<br />

manipulated tubes without any false positive or false negative.<br />

activated. This means tubes that do not meet the tolerances should be sorted out. Once<br />

all tubes have passed the measuring area it is checked how many of the manipulated tubes<br />

have also passed the blow out mechanism (false positives) and how many good tubes<br />

have been sorted out (false negatives). To simplify this task the manipulated tubes have<br />

been marked before. This experiment is repeated 22 times leading to a total number of<br />

1144 inspected tubes. The results can be found in Table 5.7. The total detection rate<br />

is Ωtotal =0.99, i.e. 6 tubes out of 1144 could pass the measuring area without being<br />

measured at. Three tubes have been sorted out wrongly representing a false negative rate<br />

of ΩFN =0.0026, i.e. 2.6 .<br />

The false positives are more critical. 5 outliers have not been blown out correctly, thus,<br />

ΩFP =0.0043. However, it turns out that 4 of the 5 false positives occur at sequences<br />

with at least one non detected tube. Hence with the ratio of good and manipulated<br />

tubes of about 2:3, the probability is larger that the not inspected tube is a manipulated<br />

one. In this case the false positives are most likely not due to failures in measuring, but<br />

originated in the fact that these tubes have not been measured at all. At production, all<br />

non inspected tubes should be sorted out and revised to be sure that no outlier can pass.<br />

5.3.7. Tube Length<br />

Measuring tubes of a different length requires the adaptation of the visual field of the<br />

camera. For tubes < 50mm this means placing the camera closer to the conveyor. However,<br />

due to the minimum object distance (250mm) of the 16mm lens used in the experiments<br />

before and with the consideration made in Section 3.2.1, a lens with a longer focal length<br />

is needed to yield the desired field of view. In this case a 25mm focal length lens is used.


5.3. EXPERIMENTAL RESULTS 125<br />

Total Detected Missed FN FP<br />

52 52 0 1 0<br />

52 52 0 0 0<br />

52 52 0 0 0<br />

52 52 0 0 0<br />

52 52 0 0 0<br />

52 52 0 1 0<br />

52 51 1 0 1<br />

52 52 0 0 0<br />

52 50 2 0 1<br />

52 52 0 0 0<br />

52 52 0 0 1<br />

52 52 0 0 0<br />

52 52 0 0 0<br />

52 52 0 0 0<br />

52 52 0 1 0<br />

52 52 0 0 0<br />

52 51 1 0 1<br />

52 52 0 0 0<br />

52 51 1 0 1<br />

52 52 0 0 0<br />

52 51 1 0 0<br />

52 52 0 0 0<br />

1144 1138 6 3 5<br />

Table 5.7: Results of repeated blow out experiments. 22 × 52 transparent tubes have been<br />

measured at 30m/min. The test data included 22 tubes within the allowed tolerances and 30<br />

outliers. Detected outliers should have been sorted out by the blow out mechanism. 3 tubes<br />

have been sorted out wrongly (false negatives) and 5 outliers have passed (false negatives).<br />

Conspicuously, 4 of the 5 false positives occur if at least one tube has not been detected at all<br />

by the system.


126 CHAPTER 5. RESULTS AND EVALUATION<br />

Larger tubes can be covered by the 16mm focal length lens like 50mm tubes, but the<br />

camera has to be placed further away from the conveyor to yield a larger field of view. The<br />

resulting pixel representation, i.e. the length a pixel represents in the measuring plane,<br />

increases as mentioned before. Hence, the precision decreases.<br />

In each experiment a charge of 50 tubes (transparent and black) of 30mm and 70mm<br />

length and 8mm diameter is used as test data. Each charge has been measured by hand and<br />

is evaluated with respect to mean and standard deviation. Each tube passes the measuring<br />

area once in this experiment and is measured as often as possible (single measurements)<br />

while it is in the visual field of the camera. The mean over the computed total lengths<br />

as well as the standard deviation are determined and compared to the ground truth data.<br />

The results are summarized in Table 5.8 and visualized in Figure 5.21 in terms of Gaussian<br />

distributions.<br />

The number of per tube measurements ΩPTM of 30mm tubes is slightly smaller compared<br />

to experiments with 50mm tubes at the same velocity. This is due to the smaller<br />

field of view of the camera. Obviously the tubes leave the measuring area faster. However,<br />

there are still more than 3 single measurements of each tube both for black and transparent<br />

tubes on average. The larger 70mm tubes have been measured even more often<br />

than 50mm tubes with 6.12 single measurements for black and 4.85 for transparent tubes<br />

respectively. This can be explained by a larger field of view.<br />

The mean value over a sequence of tubes µseq equals the expectation µGT in almost<br />

all experiments. Only the 30mm transparent tubes differ from the ground truth of about<br />

0.01mm which is acceptable small. This indicates the calibration factor between pixels<br />

and mm has been trained perfectly in all experiments.<br />

The standard deviation is much smaller for 30mm tubes both in the manual and automated<br />

measurements compared to 70mm tubes. In general black tubes are measured with<br />

higher precision than transparent tubes by the system according to the observations in<br />

previous experiments. The higher precision for 30mm tubes is important with respect to<br />

the specified tolerances (see Table 1.2). In all experiments beside the 70mm black tubes<br />

the manual precision is only slightly better than the precision of the visual inspection<br />

system. However, the results of the system have been always precise enough to allow for<br />

reliable measurements in terms of the allowed tolerances. At 70mm black tubes the system<br />

performed even better than humans with a standard deviation of 0.14 compared to 0.16<br />

measured by hand.<br />

It is important to state that the precision in these experiments depends both on the<br />

measuring variance of the system and the real variance of the tubes. Accordingly one<br />

can not compare the results directly with those in Section 5.3.5 where only one tube was<br />

measured several times in one experiment.<br />

One can conclude the visual inspection system is able to measure also tubes of different<br />

lengths as accurate as humans on average.<br />

5.3.8. Performance<br />

Finally, the performance of the system is evaluated on an Athlon64 FX-55 (2.6GHz, 2GB<br />

RAM) platform.<br />

The total processing time can be divided into five main groups including profile analysis,<br />

compensation for radial distortion, edge detection and template matching, as well as length


5.3. EXPERIMENTAL RESULTS 127<br />

Color Ltarget ΩPTM µseq µGT σseq σGT<br />

(a) Black 30 3.43 30.06 30.06 0.09 0.08<br />

(b) Transparent 30 3.18 30.07 30.06 0.12 0.08<br />

(c) Black 70 6.12 69.76 69.76 0.14 0.16<br />

(d) Transparent 70 4.85 70.21 70.21 0.27 0.20<br />

Table 5.8: Results of 30mm and 70mm tubes at 30m/min. Ltarget represents the target<br />

length and ΩPTM the average number of per tube measurements. The mean and standard<br />

deviation of the length measuring distributions are denoted as µseq and σseq for the automated,<br />

and µGT and σGT for the human measurements respectively. The results are also visualized<br />

in Figure 5.21.<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

Measurement distribution<br />

Ground truth distribution<br />

29 29.5 30 30.5 31<br />

Length [mm]<br />

(a) 30mm black<br />

69 69.5 70 70.5 71<br />

Length [mm]<br />

(c) 70mm black<br />

Measurement distribution<br />

Ground truth distribution<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

Measurement distribution<br />

Ground truth distribution<br />

29 29.5 30 30.5 31<br />

Length [mm]<br />

(b) 30mm transparent<br />

Measurement distribution<br />

Ground truth distribution<br />

0<br />

68.5 69 69.5 70<br />

Length [mm]<br />

70.5 71 71.5<br />

(d) 70mm transparent<br />

Figure 5.21: Length distribution of 30mm and 70mm tubes at 30m/min for automated (solid<br />

line) and manual measurements (dashed line). All experiments show a very good accuracy, i.e.<br />

the vision system measures the same length on average. Black tubes are generally measured<br />

slightly more precise than transparent tubes. The vision system is even more precise at 70mm<br />

black tubes than human measurements.


128 CHAPTER 5. RESULTS AND EVALUATION<br />

computation and tracking. The last group contains all remaining operations that are not<br />

considered by any of the groups before.<br />

Manythousandsofframeshavebeentimedwithandwithouttubesinthevisualfield<br />

of the camera. The results of the performance evaluation can be found in Figure 5.22. It<br />

turns out that the processing of a measurable frame requires 17.8ms on average. Thus,<br />

all images at a capture rate of 50fps (i.e. a new image is acquired every 20ms) can be<br />

processed.<br />

The dominant part of the processing is consumed by edge detection and template matching<br />

where the later is mostly expensive. 82% of the total processing time is needed for<br />

this step on average, although the number of pixels considered is highly restricted by the<br />

local ROIs. The undistortion operation is the second most expensive operation with 10%<br />

followed by the length computation and tracking with 4%. The profile analysis, thougth as<br />

fast heuristic to locate a tube roughtly, is proven to be very fast with only 0.29ms/frame.<br />

The remaining 3% represent operations such as image conversions, copying or drawing<br />

functions to visualize the detection results. The later could be saved at production if<br />

visualization is not required.<br />

If the profile analysis detects a non measurable frame, the template matching is not<br />

performed. Thus, the remaining time could be used for different side operations in future,<br />

e.g. to save logging information or to run certain self control mechanisms. Such mechanisms<br />

could check whether the illumination is still bright enough or if the camera position<br />

has changed for example.


5.3. EXPERIMENTAL RESULTS 129<br />

Task Ωtime [ms/frame]<br />

Profile Analysis 0.29<br />

Undistortion 1.79<br />

Edge Detection/<br />

14.57<br />

Template Matching<br />

Length computation/<br />

Tracking<br />

0.69<br />

Other 0.48<br />

Total 17.82<br />

Undistortion: 10%<br />

Profile Analysis: 2%<br />

(a)<br />

Other: 3%<br />

Length computation/<br />

Tracking: 4%<br />

Edge detection/ Template matching: 82%<br />

(b)<br />

Figure 5.22: (a) Average processing time per frame divided into different steps of the visual<br />

inspection. (b) Corresponding pie chart. As one can see, the edge detection and template<br />

matching is the dominant operation throughout inspection.


130 CHAPTER 5. RESULTS AND EVALUATION<br />

5.4. Discussion and Future Work<br />

The main difficulties with transparent tubes come along with the nonuniform brightness<br />

and the texture of the background. A conveyor belt which is equally translucent over<br />

the whole length could prevent many problems. The parameters controlling the detection<br />

sensitivity must cover both the brightest and the darkest region of the conveyor belt. This<br />

is always a compromise leading to poorer results on average. However, if the contrast<br />

between tubes and the background does not depend on where the tube is located on the<br />

conveyor belt, the parameters can be adjusted much more specific.<br />

The background texture of the conveyor belt used for the prototype has the drawback<br />

of regular vertical structures. If the tube edge contrast is poor, the edge response of<br />

the background may be stronger than the tube edge. Model knowledge can be used to<br />

improve the tube edge localization even under the presence of strong vertical background<br />

edges. However, there is still a certain error probability which can be drastically reduced<br />

if vertical background edges are suppressed. The best solution would be to use a conveyor<br />

belt with a canvas of horizontal structure. This would obviously simplify the detection<br />

task without requiring any computation time.<br />

If no conveyor belt can be found that provides the desired horizontal structure in combination<br />

with good translucency characteristics, one can think of suppressing the background<br />

pattern within the local ROI around a tube edge algorithmically by exploiting the regularity<br />

of the background pattern. One idea is to transform the spatial image into the<br />

frequency domain using the Fourier transform. For more information on the Fourier transform<br />

and the frequency domain it is referred to [64]. If it is possible to find characteristic<br />

frequencies belonging to the background pattern, one can remove these frequencies in the<br />

frequency domain and apply the inverse Fourier transform to the filtered spectrum. The<br />

result is a filtered spatial image with reduced background structure. The filter must be<br />

designed carefully to preserve the tube edges.<br />

In a first experiment, test images of both a conveyor with and without a tube have been<br />

acquired and transformed into the frequency domain. Figure 5.23(a) and (b) show an<br />

example of the spectrum of an image with background only and with transparent tubes<br />

in the image respectively. The spatial domain of (b) can be seen in (d). One eye-catching<br />

consistency in the spectra are the bright spots. If one removes these spots in the spectrum<br />

of an image indicated by the black regions in (c) and applies the inverse Fourier transform<br />

to this filtered spectrum, the result is an image with a significantly reduced background<br />

pattern. The actual tube edges, however, are quite well preserved. In this case the<br />

spectrum has been filtered by hand and only coarse. Much more work has to be spent<br />

in designing more sophisticated and reliable filters that perform well for a large number<br />

of images without removing or blurring any relevant edges. Removing a frequency from<br />

the spectrum does always influence the whole image. The filter in the example produces<br />

new structure at the tube regions, especially around the printings. In addition, the darker<br />

stripe in the background on the right of the input image is still present in the filtered<br />

version, since it does not belong to the regular pattern of the background. Although in<br />

this example the dark stripe is not critical it might be in other situations. This shows the<br />

limits of this approach. Any deviations from the regular background pattern are difficult<br />

to suppress in the frequency domain. If the conveyor belt is changed, the texture of the<br />

belt might by completely different. In this case the filter has to be adapted. An automated<br />

filter adaptation and background learning is non trivial.


5.4. DISCUSSION AND FUTURE WORK 131<br />

(a) Background only (b) Background + Tubes (c) Masked spectrum<br />

(d) Source Image (e) Filtered Image<br />

Figure 5.23: Background suppression in the frequency domain. (a) Fourier transform of an<br />

image of an empty conveyor. (b) Fourier transform of (d). (c) Certain frequencies have been<br />

removed by hand indicated by the black regions. (e) Inverse Fourier transform of the filtered<br />

spectrum. The characteristic vertical background pattern could be reduced quite well while<br />

thetubeedgesarepreserved.<br />

The experiments have shown that tubes of 8mm diameter are most robust against<br />

deformations. While thinner tubes of 6mm diameter tend to be bent, tubes of 12mm may<br />

be elliptical in the cross-section. In both cases the accuracy and precision decreases. The<br />

question is whether such deformations are only caused by the way the tubes have been<br />

stored, transported and handled throughout the experiments in the laboratory or if they<br />

also occur at production. The later can be assumed, at least in a certain amount. A<br />

telecentric lens could overcome the problem of perspective occurring with deformed 12mm<br />

tubes.<br />

A less cost expensive improvement would be to measure not only the length, but also<br />

the height of a tube in the image. A larger height indicates the tube is closer to the camera<br />

and vice versa. The calibration factor relating pixels to mm could be defined as a function<br />

ofthetubeheight.Obviously,thisrequiresamorecomplexteach-instep.<br />

Another potential source of deviations in the measurements is the tube orientation.<br />

The guide bars restrict the maximum tube rotation to a minimum. The remaining error<br />

has been approximated. Although it is very small, it could be even further reduced by<br />

tilting the whole conveyor slightly around its longitudinal axis. The angular orientation<br />

guarantees that all tubes will roll to the lower guide bar. If the guide bar is horizontal<br />

in the image, so will be the tubes. Accordingly the camera position has to be adapted<br />

to reestablish the fronto-orthogonal view. The proposed camera positioning method is<br />

independent of the orientation of the conveyor and the camera in 3D space.


132 CHAPTER 5. RESULTS AND EVALUATION<br />

The blow out mechanism was tested successfully in the prototype setup. The advantage<br />

of this mechanism is that it works almost independent of the conveyor velocity and the<br />

position of the light barrier relative to the measuring area. One has to assure only that<br />

no tube passes the light barrier before the good/bad decision of the measuring system<br />

reaches the blow out controller.<br />

One drawback of the current strategy is the sensitivity to ghosts. If the system detects<br />

a tube where actually no tube is, the resulting classification of the ghost is send to the<br />

controller anyhow and stored in the FIFO memory. Since a ghost is never detected by<br />

the light barrier, the good/bad decision of the ghost is still in the memory when the next<br />

tube passes the light barrier. Instead of considering the decision belonging to this tube<br />

(appended to the FIFO memory) the decision of the ghost is evaluated. This leads to a<br />

loss of synchronization, i.e. a tube T is related to the decision of tube T − 1. Over time<br />

this effect can increase and the reliability of the system is obviously violated.<br />

A potential solution of this problem can be achieved by replacing the FIFO memory by<br />

a single register that is able to store only the latest decision. Without loss of generality<br />

a0inthisregistermightcorrespondtoblowingoutthenexttubewhilea1indicatesthe<br />

next tube can pass. The register is set to 0 by default. Each time the inspection system<br />

measures a tube to be within the allowed tolerances a signal is send to the controller that<br />

sets the bit in the register to 1. As soon as the tube has passed the light barrier, the<br />

register is reset to 0. This has to be done before the next tube is measured. Therefore<br />

the light barrier has to be placed quite close to the measuring area. The advantage of this<br />

approach is that the memory contains always the current decision belonging to the tube<br />

that passes the light barrier next. A timer can be used to reset the register if no tube<br />

intersects the light barrier within the expected time. Thus, ghosts become uncritical.<br />

Furthermore, since the register is reset each time, this helps also to prevent the problemsofnondetectedtubes,i.e.<br />

tubesthathavepassedthevisualfieldofthecamera<br />

without being measured. In the outlier experiment (see Section 5.3.6) the false positive<br />

rate increased drastically if tubes could not be detected. In this case the system does not<br />

send a good/bad decision for the missed tube to the controller. The light barrier, however,<br />

detects every tube independent of being measured or not. With the single register strategy<br />

these tubes are blown out by default. Thus, only tubes that have been measured by the<br />

system and meet the allowed tolerances are able to pass the blow out nozzle.<br />

If tubes are not detected at all or measurements do not result in a meaningful length<br />

value (e.g. the standard deviation of the single measurements is too large), the corresponding<br />

tubes define another group U including all unsure measurements that can not<br />

definitely be assigned to G ′ 0 , G′ −,orG ′ +. All tubes of this class should be blown out by<br />

default to ensure no outlier can pass the quality control. These tubes do not have to<br />

be considered as rejections, but could be measured by hand afterward or recirculated to<br />

be inspected again by the vision-based measuring system depending on the frequency of<br />

occurrence.<br />

The experiments have shown that more than 80% of the total processing time is needed<br />

for the template based edge localization. In the current implementation the left and<br />

right ROI are processed sequential. One possible optimization could be to parallelize this<br />

problem. This means, the computation within the left and right ROI could be performed<br />

in separate threads to exploit the power of curret dual core architectures. This is possible,<br />

since the processing in the two ROIs is independent of each other.


6. Conclusion<br />

In this thesis a functioning prototype for a vision-based heat shrink tube measuring system<br />

has been presented allowing for an 100% online inspection in real-time. Extensive experiments<br />

have shown the accuracy and precision of the developed system which is reaching<br />

the quality of accurate human measurements under ideal laboratory conditions. The advantage<br />

of the developed system is that this accuracy can be achieved even at conveyor<br />

velocities of up to 40m/min.<br />

A multi-measurement approach has been investigated in which each decision whether<br />

a tube has to be sorted out is based on 2-11 single measurements depending on the tube<br />

type and conveyor velocity. This requires video frame rates of ≥ 50fps to be processed<br />

in real-time. Fast algorithms, heuristics and model knowledge are used to improve the<br />

performance in this constrained application. Tube edge specific templates have been defined<br />

that are able to locate a tube edge with subpixel accuracy even in low contrast<br />

images under the presence of background clutter. In the prototype setup, the tube edge<br />

detection has been complicated by the strong vertical structure of the conveyor belt and<br />

an inhomogeneous translucency leading to non uniform bright background regions. The<br />

consequences for transparent tubes have been discussed including the possibility of tubes<br />

that can pass the visual field of the camera without being detected.<br />

Since black tubes are not translucent, they yield an optimal contrast to the background<br />

with a back lighting setup. On the other hand, transparent tubes are much more sensitive<br />

to the structure of the background and the local tube edge contrast. All parameters<br />

adjusted for transparent tubes turned out to have no disadvantage for black ones. Thus,<br />

the parameters for transparent tubes are used in general, leading to a more uniform<br />

solution in the system design.<br />

Beside the algorithmic part of the work the engineering of the whole system including<br />

the proper selection of a camera, optical system, and illumination has been solved. The<br />

integration of the micro controller and the air blow nozzle completes the prototype, allowing<br />

for concrete demonstrations of how tubes that do not meet the tolerances are blown<br />

out.<br />

A simple and intuitive initialization of the system has been developed. Most parameters<br />

can be trained interactively and automated without complicated user interactions. Even<br />

an unskilled worker should be able to perform the teach-in step after a few instructions.<br />

The only critical part of the teach-in is the camera positioning. To exclude as many sources<br />

of error the camera should be mounted as stable as possible at fix orientation (which has<br />

to be calibrated only once). The required height adjustments to cover the range of tube<br />

lengths should be automated if possible.<br />

The maximum measuring precision of 0.03mm was reached for a metallic tube model<br />

simulating an ideal tube (at a conveyor velocity of 30m/min). During the experiments<br />

it has been observed that deformations of real heat shrink tubes (elliptical cross-section<br />

133


134 CHAPTER 6. CONCLUSION<br />

or bending) have a certain influence on the measuring precision. However, the average<br />

precision is still < 0.1mm for real tubes. In general, tubes of 8mm diameter have been<br />

measured more precisely than 6mm or 12mm tubes.<br />

The average accuracy (root mean square error) of the automated measurements, i.e.<br />

the distance to some ground truth reference, is about 0.1mm for black tubes and about<br />

0.2mm for transparent tubes at velocities of 30m/min. The ground truth has been acquired<br />

manually under ideal laboratory conditions and has also a certain inter and intra human<br />

deviation of about 0.1mm. While the velocity has only a minor influence on the accuracy of<br />

black tubes, the accuracy of transparent tubes decreases significantly with higher velocities.<br />

The main reason for this observation is the decreasing number of per tube measurements,<br />

sinceaveragingoverthesinglemeasurementsgetsmoresensitivetooutliers.Inaddition,<br />

the probability increases that a transparent tube is not detected at all if the background<br />

contrast is poor. However, in general, the accuracy and precision has been good enough in<br />

all experiments to reliably detected both black and transparent tubes of different length<br />

and diameter with respect to the specified tolerances. Experiments with transparent tubes<br />

of manipulated lengths have shown the system is able to separate the good ones from the<br />

tubes that do not meet the tolerances successfully. The false negative rate, i.e. the number<br />

of tubes that have been sorted out wrongly, is 2.6 .Lessthan4.3 of failures could<br />

pass the measuring area. However, 80% of the false positives have not been detected at all<br />

by the system. With the adaptation of the blow out strategy as suggested in Section 5.4<br />

these tubes would have been blown out, too. Hence, the theoretically remaining false<br />

positive rate is 0.87 for transparent tubes. Following the experimental results one can<br />

assume that the false positive rate for black tubes will be less or equal.<br />

The measuring results have a positive side effect, since it is possible to compute the<br />

moving average over the last N measurements. An operator can compare the current<br />

mean length to the given target length. This can be useful especially during the teachin<br />

of the machine. At production, deviations can be corrected before the tolerances are<br />

exceeded. In a more sophisticated solution the adjustment could be automated. If one can<br />

assure the current mean length measured by the vision system equals the target length,<br />

the blow out mechanism may never need to be activated and the probability for false<br />

positives can be further decreased.<br />

In addition, the system is able to store the inspection results in a file or database. Such<br />

statistics can be also useful for the management or controlling since they include not only<br />

the length distribution of the production, but also information about the total number of<br />

tubesproduced,thetimeofproduction,aswellasthenumberofdefectives.<br />

The good results of the prototype support the use of an optical inspection system for<br />

length measurements of heat shrink tubes. Manual sample inspections as used currently at<br />

production are influenced by many factors like concentration, speed, motivation, or tiredness<br />

of the individual operator. In general, less precision can be assumed for measurements<br />

at production compared to ideal laboratory measurements as used for evaluating the system.<br />

The advantage of the automated vision-based system is the ability to inspect each<br />

tube at laboratory precision without getting tired.


Appendix<br />

135


A. Profile Analysis Implementation Details<br />

Details regarding the implementation of the profile analysis with a focus on performance<br />

aspects are introduced in the following.<br />

A.1. Global ROI<br />

A simple, but very effective way to decrease the computational load is to restrict the image<br />

processing to a certain region of interest (ROI). Following the assumption that parts of<br />

the guide bars are visible in the images at the top and the bottom without containing<br />

any information, the guide bars can be excluded from further processing and, thus, the<br />

ROI lies in between these guide bars. The height of the ROI is given by the guide bar<br />

distance which should be almost constant over the whole image since they are adjusted to<br />

be parallel to the x-axis in the image. The ROI extends in horizontal direction over the<br />

whole image width minus a certain offset at both sides. This offset is due to the fact that<br />

the image distortion is maximal at the boundaries. The actual value of the offset depends<br />

on the ability to overcome the distortion at measuring. If the measurements are accurate<br />

even at the image boundaries, the offset tends against zero. In the following, the ROI<br />

betweentheguidebarsisalsoreferredtoasglobal ROI.<br />

Section 3.2 states it is possible to adapt the camera resolution to a user-defined size.<br />

The reason why the image size is not adjusted to cover the global ROI exactly (by what it<br />

becomes redundant) is a very practical one. First of all, the guide bars provide a valuable<br />

clue in adjusting the field of view of the camera. In addition, smaller images mean less data<br />

has to be transferred and consequentially a larger number of images can be transferred in<br />

the same time. If the image size is too small, the actual frame rate exceeds the number of<br />

frames that can be processed without skipping frames which should be avoided.<br />

The extraction of the global ROI can be automated using a similar profile analysis<br />

approach as used for tube localization but in vertical direction. Again several vertical<br />

scan lines are used to build the profile. If there is no tube in the image (empty scene),<br />

the guide bars can be detected clearly since the contrast between the bright conveyor belt<br />

and the black guide bars is very strong. A smoothing step as used in horizontal direction<br />

to overcome the background clutter is not necessary. This has the benefit that the two<br />

strongest peaks in the profile describe the guide bar location quite accurate. The detection<br />

of the global ROI has to be performed only once at an initialization step if assuming a static<br />

setup of camera and conveyor that does not change over time. In future, it is thinkable<br />

that everytime the state ’empty’ is detected, the ROI is reinitialized and compared with<br />

the previous location. A difference indicates something changed with the setup and may<br />

induce an alert or some specific reaction.<br />

137


138 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILS<br />

A.2. Profile Subsampling<br />

In many computer vision tasks it is common to perform a specific operation on lower<br />

resolution images than the input to increase computation speed. For example one could<br />

simply discard every second row or column two obtain an image of half size of the original<br />

image. However, to avoid a violation of the sampling theorem it is important to apply<br />

a low-pass filter operation on the data before. This mechanism can be used to generate<br />

pyramids of images at different resolutions or scales. Each layer in the pyramid has half<br />

the size of the layer above with the top layer corresponding to the original size. Before<br />

subsampling the data a Gaussian smoothing operation is performed to suppress higher<br />

frequencies. Thus, such pyramids are called Gaussian Pyramids in the literature [24].<br />

The same can be applied to one-dimensional signals such as gray level profiles. In<br />

this application, experiments have shown the information about the tube boundaries is<br />

conserved a coarser scale. Thus, a subsampled version two levels down the pyramid instead<br />

of the original profile is used in praxis. The data to be processed after this step is only<br />

a fourth of the input. Obviously, the profile analysis can be accelerated by this step.<br />

Experiments investigating whether the profile subsampling could replace step one in the<br />

profile analysis, i.e. the smoothing with a large mean kernel, came to the conclusion that<br />

in connection with transparent tubes and dark printing, the strong contrast of the letters<br />

could be misclassified as tube boundary. The system tries to detect the real tube location<br />

in a certain region around the wrong position and is likely to fail. The mean filter instead<br />

is able to reduce the influence of the lettering and must not be replaced.<br />

A.3. Scan Lines<br />

As mentioned in Section 4.4.2, the profile to be evaluated is based on the normalized sum<br />

of Nscan scan lines equally distributed over the global ROI. The reason why a single scan<br />

line is not sufficient is shown in Figure A.1(b). Three sample profiles at different heights<br />

(61, 80 and 100) are selected to visualize the influence of the printing. One can see the<br />

strong contrast at the letters as well as a poor contrast at the right tube boundary. Since<br />

it is non-deterministic of whether the printing of a particular tube is visible in an image,<br />

one has to consider the worst case. This is a scan line passing through the printing at as<br />

many positions possible. The global mean of the resulting profile is much lower in this case<br />

and it is possible that the intensity of the tube at regions outside the printing is wrongly<br />

classified as background. The result of this effect is shown in Figure A.1(d). On the other<br />

hand, the usage of several scan lines decreases the influence of the printing significantly.<br />

The probability that more than a few scan lines will pass through the printing is low.<br />

For example, among the sample tubes used for testing of the prototype, the coverage of<br />

the printing is about 16% with respect to the diameter. Thus, it is very likely to have<br />

more than one scan line passing through tube regions without printing. In total, the<br />

influence of the printing decreases with the number of scan lines. However, Figure A.1(c)<br />

shows 11 scan lines equally distributed over the global ROI in y-direction are sufficient<br />

to yield almost equal results as with considering all rows of the ROI. Here, the profile<br />

consisting of 11 scan lines is shifted, i.e. the intensity values are lower compared to the<br />

profile calculated from all ROI rows (90 in this example). This is due to the location of


A.3. SCAN LINES 139<br />

gray value<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

1 scanline (y=61)<br />

1 scanline (y=80)<br />

1 scanline (y=100)<br />

0<br />

0 100 200 300 400<br />

x<br />

500 600 700<br />

(b)<br />

(a)<br />

gray value<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

normalized sum of 11 scanlines<br />

normalized sum of all rows<br />

0<br />

0 100 200 300 400<br />

x<br />

500 600 700<br />

(d) (e)<br />

Figure A.1: Comparison of a single and multi scan line approach. (a) Input gray scale image.<br />

(b) Profiles of three selected scan lines at height 61, 80 and 100 respectively. The first two<br />

scan lines pass through the printing leading to strong variations in the profile. Compared to<br />

these variations the poor contrast of the right tube border makes a correct detection difficult.<br />

(c) The normalized sum of several scan lines reduces the effect of the printing bringing out the<br />

location of the tube much more clearly. It can be seen that 11 scan lines equally distributed<br />

over the global ROI are sufficient to yield almost equivalent results as if considering every row.<br />

(Note: The profile of the 11 scan lines is shifted since the global ROI included parts of the<br />

guide bars at the upper and bottom row. Since these pixels have a value near zero, they do not<br />

contribute much to the profile sum but are considered in normalization. The scale, however,<br />

does not affect the actual tube location.) (d) Wrong detection of the tube boundaries if using<br />

a single scan line. (e) Result of the multi scan line approach.<br />

(c)


140 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILS<br />

the global ROI. As can be seen in Figure A.1(e) the global ROI is a bit too large, thus, the<br />

upper and bottom row hits the border of the guide bars. Scan lines through these rows do<br />

not contribute much to the overall profile, but have an effect in normalization. This shift,<br />

however, does not affect the actual tube location. With respect to performance, rows that<br />

have no influence should be ignored.<br />

Obviously the problem with the printing on a tube’s surface comes only with transparent<br />

tubes since the printing is not visible on the black tubes at back light. If black tubes are<br />

inspected, a single scan line in the image center is sufficient to localize the tube correctly,<br />

but more scan lines do not impair the results. To have a more universal solution, the multi<br />

scan line approach is used for all tube types and it is not distinguished at this part of the<br />

system to keep it simple.<br />

A.4. Notes on Convolution<br />

At several steps in the profile analysis a convolution operation is performed. With respect<br />

to the derivation of the profile by convolving with a first derivative Gaussian kernel in step<br />

two, it is important to note what boundary condition is used, since in discrete convolution<br />

there are positions at the image boundaries that are undefined. There are many different<br />

strategies to adopt this problem including padding the image with constant values (e.g.<br />

zero), reflecting the image boundaries periodically or simply ignoring the boundaries [24].<br />

Here, a symmetric reflection strategy is used:<br />

P (−i) = P (i − 1) (A.1)<br />

P (NP + i) = P (NP +1− i); (A.2)<br />

where the first equation is used for the left and the second equation for the right boundary<br />

respectively. NP indicates the length of P and P (x) the intensity value in the profile at<br />

position x. The advantage of this strategy compared to a padding with zeros for example<br />

is that no artificial edges are introduced.


B. Hardware Components<br />

B.1. Camera<br />

Specification MF-033C MF-046B<br />

Image Device 1/2” (diag. 8 mm) type progressive scan 1/2” (diag. 8 mm) type progressive scan<br />

SONY IT CCD<br />

SONY IT CCD<br />

Effective Picture Elements 656 (H) × 492 (V) 780 (H) × 580 (V)<br />

Lens Mount C-mount: 17.526 mm (in air); ∅ 25.4 mm C-mount: 17.526 mm (in air); ∅ 25.4 mm<br />

(32 T.P.I.) Mechanical Flange Back to filter (32 T.P.I.) Mechanical Flange Back to filter<br />

distance: 8.2 mm<br />

distance: 8.2 mm<br />

640 × 480 pixels (Format 0)<br />

Picture Sizes<br />

640 × 480 pixels (Format 0; Mode 5)<br />

656 × 492 pixels (Format 7; Mode 0)<br />

780 × 580 pixels (Format 7; Mode 0)<br />

388 × 580 pixels (Format 7; Mode 1)<br />

780 × 288 pixels (Format 7; Mode 2)<br />

388 × 288 pixels (Format 7; Mode 3)<br />

Cell Size 9.9 µm × 9.9 µm 8.3 µm × 8.3 µm<br />

ADC 10 Bit 10 Bit<br />

Color Modes Raw 8, YUV 4:2:2, YUV 4:1:1 -<br />

Data Path 8Bit 8<br />

Frame Rates 3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 74 Hz 3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 53 Hz<br />

in Format 7(RAW);68Hz(YUV4:1:1);up<br />

to 51 Hz in YUV 4:2:2<br />

in Format 7<br />

Gain Control Manual: 0-16 dB (0.035 dB/step); Auto gain Manual: 0-24 dB (0.035 dB/step); Auto gain<br />

(select. AOI)<br />

(select. AOI)<br />

White Balance Manual<br />

AOI)<br />

(U/V); One Push; Auto (select. -<br />

Shutter Speed 20 . . . 67.108.864 µs (∼ 67s); Auto shutter 20 . . . 67.108.864 µs (∼ 67s); Auto shutter<br />

(select. AOI)<br />

(select. AOI)<br />

External Trigger Shutter Trigger Mode 0, Trigger Mode 1, Advanced Trigger Mode 0, Trigger Mode 1, Advanced<br />

feature: Trigger Mode 15 (bulk); image feature: Trigger Mode 15 (bulk); image<br />

transfer by command; Trigger delay<br />

transfer by command; Trigger delay<br />

Internal FIFO-Memory Up to 17 frames Up to 13 frames<br />

#LookUpTables One, user programmable (10 Bit → 8 Bit); One, user programmable (10 Bit → 8 Bit);<br />

Gamma (0.45)<br />

Gamma (0.45)<br />

Smart Functions Real time shading correction, image sequenc- Real time shading correction, image sequencing,<br />

two configurable inputs, two configing, two configurable inputs, two configurable<br />

outputs, image mirror (L-R ↔ R-L), urable outputs, image mirror (L-R ↔ R-L),<br />

serial port (IIDC v. 1.31)<br />

binning, serial port (IIDC v. 1.31)<br />

Transfer Rate 100 Mb/s, 200 Mb/s, 400 Mb/s 100 Mb/s, 200 Mb/s, 400 Mb/s<br />

Digital Interface IEEE 1394 IIDC v. 1.3 IEEE 1394 IIDC v. 1.3<br />

Power Requirements DC 8 V - 36 V via IEEE 1394 cable or 12-pin DC 8 V - 36 V via IEEE 1394 cable or 12-pin<br />

HIROSE<br />

HIROSE<br />

Power Consumption Less than 3 Watts (@ 12 V d.c) Less than 3 Watts (@ 12 V d.c)<br />

Dimension 58 mm × 44 mm × 29 mm (L × W × H); 58 mm × 44 mm × 29 mm (L × W × H);<br />

without tripod and lens<br />

without tripod and lens<br />

Mass<br />

Operating Temparature<br />

< 120g(withoutlens)<br />

+5 – +45<br />

< 120g(withoutlens)<br />

◦ Celsius +5 – +45 ◦ Storage Temparature −10 – +60<br />

Celsius<br />

◦ Celsius −10 – +60 ◦ Celsius<br />

Regulations EN 55022, EN 61000, EN 55024, FCC Class EN 55022, EN 61000, EN 55024, FCC Class<br />

A, DIN ISO 9022<br />

A, DIN ISO 9022<br />

Options Host adapter card, locking IEEE 1394 ca- Removable IR-cut-filter, Host adapter card,<br />

ble, API (FirePackage), TWAIN (WIA)- and locking IEEE 1394 cable, API (FirePackage),<br />

WDM stream driver<br />

TWAIN (WIA)- and WDM stream driver<br />

Table B.1: Camera specifications for the AVT Marlin F-033C and F-046B.<br />

141


142 APPENDIX B. HARDWARE COMPONENTS<br />

B.2. Illumination Hardware<br />

Description Value<br />

Rated Power Output 200 Watts<br />

Output Voltage 0.0, 0.5 to 20.5 VDC<br />

Input Voltage Rating, 50/60 Hz 90 to 265 VAC<br />

Power Factor Correction @ 230 VAC, 50 Hz > 0.99, < 4 ◦<br />

Hold-up Time, Nominal AC Input, Full Load 8.3 ms<br />

Line Regulation, Over Entire Input Range ±0.5%<br />

Current Limit Set Point 8.5 Amps<br />

Temperature Range: Operating 0 ◦ to 45 ◦ C<br />

Storage −25 ◦ to 85 ◦ C<br />

Relative Humidity, Non-condensing 5% to 95%<br />

Table B.2: Light Source (A20800.2) with DDL Lamp<br />

Description Value<br />

Calibrated Area 3” × 5” (76 × 127mm)<br />

Panel Size 4” × 6” (102 × 152mm)<br />

Overall Thickness .05” (1.3mm)<br />

Table B.3: SCHOTT PANELite Backlight (A23000) (flexible fiber optical area light).


B.2. ILLUMINATION HARDWARE 143<br />

Description Value<br />

Bulb Type DDL<br />

Voltage 20<br />

Wattage 150<br />

Lamp Base GX5.3<br />

Bulb Finish Clear<br />

Burn Position Base/Down Horz.<br />

Shape MR-16<br />

Color Temp. 3150<br />

Filament CC-6<br />

Lamp Fill Halogen<br />

Lamp Life 500 Hrs.<br />

Over All Lengt [mm] 44.5<br />

Reflector Design Dichroic<br />

Reflector Size [mm] 50.7<br />

Working Distance [mm] 194.5<br />

Table B.4: Lamp specifications


144 APPENDIX B. HARDWARE COMPONENTS


Bibliography<br />

[1] Y.I. Abdel-Aziz and H.M. Karara. Direct linear transformation from comparator<br />

coordinates into object space coordinates in close-range photogrammetry. Proc. of<br />

the Symposium on Close-Range Photogrammetry, pages 1–18, 1971.<br />

[2] M. B. Ahmad and T. S. Choi. Local threshold and boolean function based edge<br />

detection. IEEE Trans. on Consumer Electronics, 45(3):674–679, August 1999.<br />

[3] Allied Vision Technologies GmbH, Taschenweg 2a, D-07646 Stadtroda, Germany.<br />

AVT Marlin - Technical Manual, 7 2004.<br />

[4] A. Alper. An inside look at machine vision. Managing Automation, 2005.<br />

[5] American Society for Photogrammetry and Remote Sensing (ASPRS). Manual of<br />

Photogrammetry. Asprs Pubns, 4th edition, 1980.<br />

[6] K. Astrom and A. Heyden. Stochastic modelling and analysis of sub-pixel edge detection.<br />

In International Conference on Pattern Recognition (ICPR), pages 86–90,<br />

1996.<br />

[7] B. Batchelor and F. Waltz. Intelligent Machine Vision. Springer, 2001.<br />

[8] A. Blake. Active Contours. Springer, 1999.<br />

[9] J. Y. Bouguet. Camera calibration toolbox for matlab.<br />

[10] I. N. Bronstein, G. Musiol, H. Mühlig, and K. A. Semendjajew. Taschenbuch der<br />

Mathematik. Harri Deutsch, 2001.<br />

[11] D. C. Brown. Decentering distortion of lenses. Photometric Engineering, 32(3):444–<br />

462, 1966.<br />

[12] D. C. Brown. Lens distortion for close-range photogrammetry. Photometric Engineering,<br />

37(8):855–866, 1971.<br />

[13] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern<br />

Analysis and Machine Intelligence (PAMI), 8:679–698, 1986.<br />

[14] T. Chaira and A. K. Ray. Threshold selection using fuzzy set theory. Pattern Recognition<br />

Letters (PRL), 25(8):865–874, June 2004.<br />

[15] R. W. Conners, D. E. Kline, P. A. Araman, and T.H. Drayer. Machine vision technology<br />

for the forest products industry. Computer, 30(7):43–48, 1997.<br />

[16] E. R. Davies. Machine Vision- Theory, Algorithms, Practicalities. Elsevier, 2005.<br />

[17] C. de Boor. A practical guide to splines. Springer, 1978.<br />

145


146 Bibliography<br />

[18] C. Demant, B. Streicher-Abel, and P. Waszkewitz. Industrial Image Processing -<br />

Visual Quality Control in Manufacturing. Springer, 1999.<br />

[19] R. Deriche. Using canny’s criteria to derive a recursively implemented optimal edge<br />

detector. International Journal of Computer Vision (IJCV), 1(2):167–187, 1987.<br />

[20] S. di Zenzo, L. Cinque, and S. Levialdi. Image thresholding using fuzzy entropies.<br />

IEEE Transactions on Systems, Man, and Cybernetics (SMC-B), 28(1):15–23, February<br />

1998.<br />

[21] O. Faugeras. Three-Dimensional Computer Vision. A Geometric Viewpoint. MIT<br />

Press, Cambridge, 1993.<br />

[22] J. Föglein. On edge gradient approximations. Pattern Recognition Letters (PRL),<br />

1:429–434, 1983.<br />

[23] P. J. Flynn and A. K. Jain. Cad-based computer vision: From cad models to relational<br />

graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),<br />

13(2):114–132, 1991.<br />

[24] D. A. Forsyth and J. Ponce. Computer Vision - A modern approach. Pearson Education<br />

International, 2003.<br />

[25] W. T. Freeman and E. H. Adelson. The design and use of steerable filters. IEEE<br />

Transactions on Pattern Analysis and Machine Intelligence (PAMI), 13(9):891–906,<br />

1991.<br />

[26] C. A Glasbey. An analysis of histogram-based thresholding algorithm. Graphical<br />

Models and Image Processing, 55(6):532–537, November 1993.<br />

[27] E. B. Goldstein. Sensation and Perception. California: Brooks/Cole Publishing Co.,<br />

1996.<br />

[28] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice Hall, 2nd<br />

edition, 2002.<br />

[29] E. R. Hancock and J. V. Kittler. Adaptive estimation of hysteresis thresholds. In<br />

Proc. of the IEEE Computer Vision and Pattern Recognition (CVPR), pages 196–201,<br />

1991.<br />

[30] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge<br />

University Press, 2nd edition, 2003.<br />

[31] J. Heikkila and O. Silven. A four-step camera calibration procedure with implicit<br />

image correction. In Proc. of the IEEE Computer Vision and Pattern Recognition<br />

(CVPR), pages 1106–1112, 1997.<br />

[32] R. V. Hogg and A. T. Craig. Introduction to Mathematical Statistics. Prentice Hall,<br />

5 edition, 1994.<br />

[33] D. H. Hubel. Exploration of the primary visual cortex, 1955-1978. Nature, 299:515–<br />

524, 1982.


Bibliography 147<br />

[34] R. J. Hunsicker, J. Patten, A. Ledford, C Ferman, et al. Automatic vision inspection<br />

and measurement system for external screw threads. Journal of Manufacturing<br />

Systems, 1994.<br />

[35] R. W. Hunt. Measuring Colour. Ellis Horwood Ltd. Publishers, 2nd edition, 1991.<br />

[36] B. Jähne. Digital Image Processing. Springer, 6th edition, 2005.<br />

[37] B. Julez. A method of coding TV signals based on edge detection. Bell System Tech.,<br />

38(4):1001–1020, July 1959.<br />

[38] R. King. Brunelleschi’s Dome: How a Renaissance Genius Reinvented Architecture.<br />

Penguin Books, 2001.<br />

[39] R. K. Lenz and R. Y. Tsai. Calibrating a cartesian robot with eye-on-hand configuration<br />

independent of eye-to-hand relationship. IEEE Transactions on Pattern Analysis<br />

and Machine Intelligence (PAMI), 11(9):916–928, September 1989.<br />

[40] J. Linkemann. Optics recommendation guide. http://www.baslerweb.com/.<br />

[41] E. P. Lyvers, O. R. Mitchell, M. L. Akey, and A. P. Reeves. Subpixel measurements<br />

using a moment-based edge operator. IEEE Transactions on Pattern Analysis and<br />

Machine Intelligence (PAMI), 11(12):1293–1309, December 1989.<br />

[42] E. N. Malamas, E. G. M. Petrakis, M. E. Zervakis, L. Petit, and J. D. Legat. A<br />

survey on industrial vision systems, applications and tools. Israel Venture Capital<br />

(IVC), 21(2):171–188, February 2003.<br />

[43] M. Malassiotis and G. Strintzis. Stereo vision system for precision dimensional inspection<br />

of 3d holes. Machine Vision and Applications, 15(2):101–113, December<br />

2003.<br />

[44] J. Malik and P. Perona. Preattentive texture discrimination with early vision mechanism.<br />

Journal of the Optical Society of America, 7(5):923–932, May 1990.<br />

[45] D. Marr and E. C. Hildreth. Theory of edge detection. Proc. Royal Soc. London,<br />

B207:187–217, 1980.<br />

[46] N. Otsu. A threshold selection method from grey-level histograms. IEEE Transactions<br />

on Systems, Man, and Cybernetics (SMC), 9(1):62–66, January 1979.<br />

[47] N. R. Pal and S. K. Pal. A review on image segmentation techniques. Pattern<br />

Recognition, 26(9):1277–1294, September 1993.<br />

[48] J.R. Parker. Algorithms for image processing and computer vision. John Wiley &<br />

Sons, Inc., 1997.<br />

[49] P. Perona. Deformable kernels for early vision. IEEE Transpaction on Pattern Analysis<br />

and Machine Intelligence, 17(5):488–499, May 1995.<br />

[50] D. T1 Pham and R. J Alcock. Automated visual inspection of wood boards: Selection<br />

of features for defect classification by a neural network. In Proc.oftheIMECH<br />

E Part E Journal of Process Mechanical Engineering, volume 213, pages 231–245.<br />

Professional Engineering Publishing, 1999.


148 Bibliography<br />

[51] K.K. Pingle. Visual perception by a computer. In Proc. of Analogical and Inductive<br />

Inference (AII), pages 277–284, 1969.<br />

[52] W. J. Plut and G. M. Bone. Grasping of 3-d sheet metal parts for robotic fixtureless<br />

assembly. In Proc. of the CSME Forum - Engineering Applications of Mechanics,<br />

pages 221–228, Hamilton, Ont., 1996.<br />

[53] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.<br />

Numerical Recipes in C: The Art of Scientific Computing. Cambridge University<br />

Press, Cambridge, UK, 2nd edition, 1993.<br />

[54] T. Pun. Entropic thresholding: A new approach. Computer Graphics and Image<br />

Processing (CGIP), 16(3):210–239, July 1981.<br />

[55] T. W. Ridler and S. Calvard. Picture thresholding using an iterative selection method.<br />

IEEE Transactions on Systems, Man, and Cybernetics (SMC), 8(8):629–632, August<br />

1978.<br />

[56] P. Rockett. The accuracy of sub-pixel localisation in the canny edge detector. In<br />

Proc. of the British Machine Vision Conference (BMVC), 1999.<br />

[57] A. Rosenfeld and P. de la Torre. Histogram concavity analysis as an aid in threshold<br />

selection. IEEE Transactions on Systems, Man, and Cybernetics (SMC), 13(3):231–<br />

235, March 1983.<br />

[58] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. Real-time 3d model acquisition. ACM<br />

Transactions on Graphics, 21(3):438–446, July 2002.<br />

[59] P.K. Sahoo, S. Soltani, A. K. C. Wong, and Y.C. Chen. A survey of thresholding<br />

techniques. Computer Vision, Graphics, and Image Processing (CVGIP), 41(2):233–<br />

260, February 1988.<br />

[60] B. Sankur and M. Sezgin. A survey over image thresholding techniques and quantitative<br />

performance evaluation. Journal of Electronic Imaging, 13(1):146–165, 1994.<br />

[61] J. L. Sanz and D. Petkovic. Machine vision algorithms for automated inspection of<br />

thin-film disk heads. IEEE Transactions on Pattern Analysis and Machine Intelligence<br />

(PAMI), 10(6), 1988.<br />

[62] M. Seul, L. O’Gorman, and M. J. Sammon. Practical Algorithms For Image Analysis.<br />

Cambridge University Press, 2000.<br />

[63] M.I. Sezan. A peak detection algorithm and its application to histogram-based image<br />

data reduction. Computer Vision, Graphics, and Image Processing (CVGIP),<br />

49(1):36–51, January 1990.<br />

[64] S. W. Smith. The Scientist and Engineer’s Guide to Digital Signal Processing. California<br />

Technical Publishing, 1997.<br />

[65] E. Trucco and A. Verri. Introductory Techniques for 3-D Computer Vision. Prentice<br />

Hall PTR, 1998.


Bibliography 149<br />

[66] F. Truchetet, F. Nicolier, and O. Laligant. Supixel edge detection for dimensional<br />

control by artificial vision. Journal of Electronic Imaging, 10(1):234–239, Januar<br />

2001.<br />

[67] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machine<br />

vision metrology using off-the-shelf tv cameras and lenses. Robotics and Automation,<br />

IEEE Journal, 3(4):323–344, 1987.<br />

[68] H. Voorhees and T. Poggio. Detecting textons and texture boundaries in natural<br />

images. In Proc. of the International Conference on Computer Vision (ICCV), pages<br />

250–258, 1987.<br />

[69] J. Weickert. Anisotropic Diffusion in Image Processing. ECMI. Teubner, Stuttgart,<br />

1998.<br />

[70] J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models and<br />

accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence<br />

(PAMI), 14(10):965–980, October 1992.<br />

[71] G. A. W. West and T. A Clarke. A survey and examination of subpixel measurement<br />

techniques. ISPRS Int. Conf. on Close Range Photogrammetry and Machine Vision,<br />

1395:456 – 463, 1990.<br />

[72] P. C. West. High speed, real-time machine vision. Technical report, Imagenation and<br />

Automated Vision Systems, 2001.<br />

[73] M. Young. The pinhole camera, imaging without lenses or mirrors. The Physics<br />

Teacher, pages 648–655, December 1989.<br />

[74] Z. Y. Zhang. A flexible new technique for camera calibration. IEEE Transactions<br />

on Pattern Analysis and Machine Intelligence (PAMI), 22(11):1330–1334, November<br />

2000.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!