Master Thesis - Fachbereich Informatik
Master Thesis - Fachbereich Informatik
Master Thesis - Fachbereich Informatik
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Fachbereich</strong> <strong>Informatik</strong><br />
Department of Computer Science<br />
<strong>Master</strong> <strong>Thesis</strong><br />
Visual Inspection of Fast Moving Heat<br />
Shrink Tubes in Real-Time<br />
Alexander Barth<br />
A thesis submitted to the Bonn-Rhein-Sieg University of Applied Sciences<br />
in partial fulfillment of the requirements for the degree of<br />
<strong>Master</strong> of Science in Computer Science<br />
Date of submission: December 16, 2005<br />
Examination Committee: Prof. Dr.-Ing. Rainer Herpers (Supervisor)<br />
Prof. Dr. Dietmar Reinert
Declaration<br />
I hereby declare, that the work presented in this thesis is solely my work and<br />
thattothebestofmyknowledgethisworkisoriginal,exceptwhereindicated<br />
by references to other authors.<br />
This thesis has neither been submitted to another committee, nor has it been<br />
published before.<br />
St. Augustin, December 16, 2005<br />
Alexander Barth<br />
i
Acknowledgments<br />
First of all, I would like to thank my thesis advisor Prof. Dr.-Ing. Rainer Herpers and Prof.<br />
Dr. Dietmar Reinert for guiding this work and for their helpful input and discussions.<br />
Many thanks to the company DSG-Canusa for funding this work and for supporting me<br />
in designing the hardware setup. A special thanks to Thomas Schminke, Markus Greßnich,<br />
Manfred Hirn, Andreas Dederichs and Klaus Lanzerath.<br />
During my thesis work, several people, fellow students and friends working in the Computer<br />
Vision Lab at Bonn-Rhein-Sieg University of Applied Sciences provided me with<br />
useful comments and ideas. Particularly, I would like to thank Stefan Hahne, Axel Hau,<br />
Bernd Göbel, Ingmar Burak, Christian Becker and Nils Neumaier, who also modeled the<br />
nice 3D figures. I also thank Philipp Wegner and Patrick Schmitz for assisting me during<br />
the experiments and for measuring several hundreds of heat shrink tubes by hand.<br />
Furthermore, I appreciate the time I could spent at the York University in Toronto<br />
and at the Centre of Vision Research, Toronto. Thanks to all professors and students<br />
who contributed to my interest in Computer Vision. A special thanks goes to Markus<br />
Enzweiler. We really had a great time not only during our stay in Canada and always<br />
kept in touch for productive discussions.<br />
Many thanks to Gemma Adcock from New Zealand, who gave me great native support<br />
in writing my thesis in English.<br />
Finally, I would like to thank Steffi for being so understanding during stressful times.<br />
iii
Abstract<br />
Heat shrink tubing is widely used in electrical and mechanical applications for insulating<br />
and protecting cable splices. Especially in the automotive supply industry accuracy<br />
demands are very high and quality assurance is an important factor in establishing and<br />
maintaining customer relationships. In production, the heat shrink tubes are cut into<br />
lengths (between 20 and 100mm) from a continuous tube. During this process, however,<br />
deviations from the target length can occur.<br />
In this thesis, a prototype of a vision-based length measuring sensor for a range of heat<br />
shrink tubes is presented. The measuring is performed on a conveyor belt in real-time at<br />
velocities of up to 40m/min. The tubes can differ in color, diameter and length.<br />
In a multi-measurement strategy, the total length of each tube is computed based on up<br />
to 11 single measurements while the tube is in the visual field of the camera. Tubes that<br />
do not meet the allowed tolerances between ±0.5mm and ±1mm depending on the target<br />
length are sorted out by air-pressure. Both the engineering and the software development<br />
are part of this thesis work.<br />
About 70% of all manufactured tubes are transparent, i.e. they show a poor contrast<br />
to the background. Thus, sophisticated but fast algorithms are developed which reliably<br />
detect even low contrast tube edges under the presence of background clutter (e.g. belt<br />
texture or dirt) with subpixel accuracy. For this purpose, special tube edge templates are<br />
defined and combined with model knowledge about the inspected objects. In addition,<br />
perspective and lens specific distortions have to be compensated.<br />
An easy to operate calibration and teach-in step has been investigated which is importanttobeabletoproducedifferenttubetypesatthesameproductionlineinshort<br />
intervals.<br />
The prototype system has been tested in extensive experiments at varying velocities<br />
and for different tube diameters and lengths. The measuring precision of non deformed<br />
tubes can reach 0.03mm at a conveyor velocity of 30m/min. Even with elliptical deformations<br />
of the cross-section or deflections it is still possible to achieve an average precision<br />
of < 0.1mm. The results have been compared to manually acquired ground truth measurements,<br />
which also show a standard deviation of about 0.1mm under ideal laboratory<br />
conditions. Finally, a 100% control during production is possible with this system - reaching<br />
the same accuracy and precision than humans without getting tired.<br />
v
Contents<br />
Acknowledgments iii<br />
Abstract v<br />
List of Tables xi<br />
List of Figures xiii<br />
1. Introduction 1<br />
1.1. MachineVision-StateofArt.......................... 1<br />
1.2. ProblemStatement................................ 3<br />
1.3. Requirements................................... 4<br />
1.4. RelatedWork................................... 5<br />
1.5. <strong>Thesis</strong>Outline .................................. 6<br />
2. Technical Background 9<br />
2.1. VisualMeasurements............................... 9<br />
2.1.1. AccuracyandPrecision ......................... 9<br />
2.1.2. InverseProjectionProblem ....................... 10<br />
2.1.3. CameraModels.............................. 10<br />
2.1.4. CameraCalibration ........................... 13<br />
2.2. Illumination.................................... 16<br />
2.2.1. LightSources............................... 16<br />
2.2.2. IncidentLighting............................. 18<br />
2.2.3. Backlighting ............................... 19<br />
2.3. EdgeDetection.................................. 20<br />
2.3.1. EdgeModels ............................... 20<br />
2.3.2. DerivativeBasedEdgeDetection.................... 21<br />
2.3.3. CommonEdgeDetectors ........................ 23<br />
2.3.4. SubpixelEdgeDetection......................... 27<br />
2.4. TemplateMatching................................ 29<br />
3. Hardware Configuration 31<br />
3.1. Conveyor ..................................... 31<br />
3.2. Camerasetup................................... 33<br />
3.2.1. CameraSelection............................. 33<br />
3.2.2. CameraPositioning ........................... 38<br />
3.2.3. LensSelection .............................. 38<br />
3.3. Illumination.................................... 43<br />
3.4. BlowOutMechanism .............................. 48<br />
vii
viii Contents<br />
4. Length Measurement Approach 51<br />
4.1. SystemOverview................................. 51<br />
4.2. ModelKnowledgeandAssumptions ...................... 53<br />
4.2.1. CameraOrientation ........................... 53<br />
4.2.2. ImageContent .............................. 53<br />
4.2.3. TubesUnderPerspective ........................ 54<br />
4.2.4. EdgeModel................................ 56<br />
4.2.5. Translucency ............................... 56<br />
4.2.6. TubeOrientation............................. 57<br />
4.2.7. BackgroundPattern ........................... 59<br />
4.3. CameraCalibration ............................... 59<br />
4.3.1. CompensatingRadialDistortion .................... 59<br />
4.3.2. Fronto-Orthogonal View Generation . . . ............... 60<br />
4.4. TubeLocalization ................................ 65<br />
4.4.1. GrayLevelProfile ............................ 65<br />
4.4.2. ProfileAnalysis.............................. 66<br />
4.4.3. PeakEvaluation ............................. 68<br />
4.5. MeasuringPointDetection ........................... 75<br />
4.5.1. EdgeEnhancement............................ 75<br />
4.5.2. TemplateBasedEdgeLocalization................... 78<br />
4.5.3. TemplateDesign ............................. 80<br />
4.5.4. SubpixelAccuracy ............................ 87<br />
4.6. Measuring..................................... 89<br />
4.6.1. DistanceMeasure............................. 90<br />
4.6.2. PerspectiveCorrection.......................... 90<br />
4.6.3. TubeTracking .............................. 91<br />
4.6.4. TotalLengthCalculation ........................ 92<br />
4.7. Teach-In...................................... 93<br />
4.7.1. RequiredInput.............................. 93<br />
4.7.2. Detection Sensitivity ........................... 93<br />
4.7.3. PerspectiveCorrectionParameters................... 94<br />
4.7.4. CalibrationFactor ............................ 94<br />
5. Results and Evaluation 97<br />
5.1. ExperimentalDesign............................... 97<br />
5.1.1. Parameters ................................ 97<br />
5.1.2. EvaluationCriteria............................ 99<br />
5.1.3. GroundTruthMeasurements ......................102<br />
5.1.4. Strategies .................................104<br />
5.2. TestScenarios...................................105<br />
5.3. ExperimentalResults...............................107<br />
5.3.1. Noise ...................................107<br />
5.3.2. MinimumTubeSpacing .........................109<br />
5.3.3. ConveyorVelocity ............................110<br />
5.3.4. TubeDiameter ..............................116<br />
5.3.5. Repeatability...............................121<br />
5.3.6. Outlier ..................................123
Contents ix<br />
5.3.7. TubeLength ...............................124<br />
5.3.8. Performance ...............................126<br />
5.4. DiscussionandFutureWork...........................130<br />
6. Conclusion 133<br />
Appendix 135<br />
A. Profile Analysis Implementation Details 137<br />
A.1.GlobalROI ....................................137<br />
A.2.ProfileSubsampling ...............................138<br />
A.3.ScanLines.....................................138<br />
A.4.NotesonConvolution ..............................140<br />
B. Hardware Components 141<br />
B.1.Camera ......................................141<br />
B.2.IlluminationHardware..............................142<br />
Bibliography 145
x Contents
List of Tables<br />
1.1. Rangeoftubetypesconsideredinthisthesis. ................. 4<br />
1.2. Tolerancespecifications ............................. 5<br />
3.1. Lensselection-Overview ............................ 42<br />
3.2. Lensselection-FieldofViewatminimumobjectdistance.......... 43<br />
3.3. Lensselection-Workingdistances ....................... 43<br />
3.4. Blowoutcontrolprotocol ............................ 48<br />
4.1. Thresholdcomparisonofprofileanalysis.................... 73<br />
4.2. Comparisonofdifferentedgedetectors..................... 76<br />
4.3. Templatecurvaturetestsetparameters .................... 82<br />
5.1. Overviewondifferenttestparameters ..................... 98<br />
5.2. Constantsoftwareparametersettingsthroughouttheexperiments. ..... 98<br />
5.3. Testsetusedtodeterminethehumanvarianceinmeasuring. ........102<br />
5.4. Resultsof50mmtubesatdifferentvelocities(black) .............112<br />
5.5. Resultsof50mmtubesatdifferentvelocities(transparent)..........113<br />
5.6. Resultsof50mmtubeswithdifferentdiameterat30m/min .........116<br />
5.7. Resultsofblowoutexperiment .........................125<br />
5.8. Resultsof30mmand70mmtubesat30m/min ................127<br />
B.1. Camera specifications for the AVT Marlin F-033C and F-046B. . ......141<br />
B.2. Light Source (A20800.2) with DDL Lamp ...................142<br />
B.3.Backlightspecifications .............................142<br />
B.4.Lampspecifications................................143<br />
xi
xii List of Tables
List of Figures<br />
2.1. AccuracyandPrecision ............................. 10<br />
2.2. Parallellinesatperspective ........................... 11<br />
2.3. Pinholegeometry................................. 12<br />
2.4. Thinlensmodel ................................. 13<br />
2.5. Incidentlightingsetups ............................. 18<br />
2.6. Edgemodels ................................... 21<br />
2.7. Comparisonofdifferentedgedetectors..................... 24<br />
2.8. Orientationselectivefilters ........................... 27<br />
2.9. Subpixelaccuracyusinginterpolationtechniques ............... 28<br />
3.1. Hardwaresetupoftheprototype ........................ 32<br />
3.2. BAYERmosaic.................................. 34<br />
3.3. Comparisonofcolorandgraylevelcamera................... 36<br />
3.4. Colorinformationoftransparenttubes..................... 37<br />
3.5. Telecentriclens.................................. 40<br />
3.6. FieldofViewgeometry ............................. 42<br />
3.7. Tubesatdifferentfrontlightingsetups..................... 44<br />
3.8. Backlightingthroughaconveyorbelt ..................... 45<br />
3.9. Polarizedbacklighting.............................. 46<br />
3.10.Backlightpanel ................................. 47<br />
3.11.Blowoutsetup .................................. 48<br />
4.1. Systemoverview ................................. 52<br />
4.2. Potentialimagestates .............................. 53<br />
4.3. Tubemodels ................................... 54<br />
4.4. Measuringplanedefinition............................ 55<br />
4.5. Characteristicintensitydistributionoftransparenttubes........... 57<br />
4.6. Tubeorientationerror .............................. 58<br />
4.7. Cameracalibration-Calibrationimages.................... 60<br />
4.8. Cameracalibration-Subpixelcornerextraction................ 61<br />
4.9. Cameracalibration-Extrinsicparameters................... 61<br />
4.10.Cameracalibration-Radialdistortionmodel ................. 62<br />
4.11.Camerapositioning-OnlineGridCalibration................. 64<br />
4.12.Camerapositioning-controlpoints....................... 65<br />
4.13.Scanlinesforprofileanalysis .......................... 66<br />
4.14.Profileanalysis .................................. 70<br />
4.15.Motivationforaregion-basedprofilethreshold ................ 72<br />
4.16.Ghosteffect.................................... 73<br />
4.17.Characteristictubeedgeresponses ....................... 79<br />
xiii
xiv List of Figures<br />
4.18.TemplateDesign ................................. 81<br />
4.19.TemplateOccurrence............................... 83<br />
4.20.Templatewithextremeheightweightingcoefficient.............. 83<br />
4.21.TemplateWeighting ............................... 84<br />
4.22.Templaterotation-Motivation ......................... 85<br />
4.23.Templatecurvatureoccurrence ......................... 86<br />
4.24.Subpixelaccuratetemplatematching...................... 88<br />
4.25.Perspectivecorrectionfunction ......................... 90<br />
5.1. Measuring slide used for acquiring ground truth measurements by hand. . . 102<br />
5.2. Intraandinterhumanmeasuringvariance...................103<br />
5.3. Supplytube....................................104<br />
5.4. Accuracyevaluationoflengthmeasurementsatsyntheticsequences.....108<br />
5.5. Resultsofminimumspacingexperiment ....................109<br />
5.6. Minimumtubespacingforblacktubes.....................110<br />
5.7. Measuringresultsat20m/min..........................111<br />
5.8. Resultsof8mmblacktubesat30m/min....................113<br />
5.9. Resultsof8mmtransparenttubesat30m/min ................114<br />
5.10.Brightnessvarianceofanemptyconveyorbeltatbacklight .........115<br />
5.11.Bent6mmtube..................................116<br />
5.12.Experimentalresultsofblacktubeswith6and12diameter .........117<br />
5.13.Groundtruthdistanceofblacktubeswith6and12mmdiameter......118<br />
5.14.Influenceofcross-sectiondeformationsat12diametertubes.........119<br />
5.15. Experimental results of transparent tubes with 6 and 12mm diameter . . . 120<br />
5.16. Ground truth distance of transparent tubes with 6 and 12mm diameter . . 120<br />
5.17.Failureoftubeedgedetectionduetoapoorcontrast.............121<br />
5.18.Repeatabilityofthemeasurementofonetube.................122<br />
5.19. Repeatability of the measurement of a metallic cylinder . ..........123<br />
5.20.Resultsofoutlierexperiment ..........................124<br />
5.21.Resultsof30mmand70mmtubesat30m/min ................127<br />
5.22.Performanceevaluationresults .........................129<br />
5.23.Backgroundsuppressioninthefrequencydomain...............131<br />
A.1.Comparisonofdifferentscanlines........................139
1. Introduction<br />
Heat shrinkable tubing is widely used for electrical and mechanical insulation, sealing,<br />
identification and connection solutions. Customers are mainly from the automotive, electronics,<br />
military or aerospace sector. In terms of competition in world markets, high<br />
quality assurance standards are essential in establishing and maintaining customer relationships.<br />
Especially in the automotive supply industry, accuracy demands are very high,<br />
and tolerated outliers are specified in only a few parts-per-million.<br />
In this master thesis, a prototype of a vision-based sensor for real-time length measurement<br />
of heat shrink tubes in line production is presented. The main objectives are<br />
accuracy, reliability and meeting time constraints.<br />
The thesis work has been accomplished in cooperation with the company DSG-Canusa,<br />
Meckenheim, Germany.<br />
1.1. Machine Vision - State of Art<br />
This section gives an overview on the term Machine Vision (MV), the use of vision systems<br />
in industrial applications, and a brief historical review. In addition, the advantages and<br />
drawbacks of MV are discussed and related applications are presented. The term Machine<br />
Vision is defined by Davies [16] as follows:<br />
“Machine Vision is the study of methods and techniques whereby artificial vision systems<br />
can be constructed and usefully employed in practical applications. As such, it<br />
embraces both the science and engineering of vision.”<br />
Researchers and engineers argue whether the terms Machine Vision and Computer<br />
Vision can be used synonymously [7]. Both terms are part of a larger field called Artificial<br />
Vision and have many things in common. The main objective is to make artificial systems<br />
‘see’. However, the priorities of the two subjects differ.<br />
Computer Vision has arisen in the academic field and concentrates mainly on theoretical<br />
problems with a strong mathematical background. Usually, as the term Computer Vision<br />
indicates, a computer processes an input image or a sequence of images. Nevertheless,<br />
many methods and algorithms developed in Computer Vision can be adapted to practical<br />
applications.<br />
Machine Vision, on the other hand, implies practical solutions for many applications,<br />
and covers not only the image processing itself, but also the engineering that makes a<br />
system work [16]. This includes the right choice of the sensor, optics, illumination, etc.<br />
MV systems are often used in industrial environments making robustness, reliability and<br />
cost-effectiveness very important. If an application is highly time-constrained or computationally<br />
expensive, specific hardware (e.g. DSPs, ASICs, or FPGAs) is used instead of<br />
an off-the-shelf computer [42]. A current trend is to develop imaging sensors, that have<br />
1
2 CHAPTER 1. INTRODUCTION<br />
on-chip capabilities for image processing algorithms. Thus, the image processing moves<br />
from the computer into the camera superseding the bottleneck of data transfer.<br />
During the 1970s and 1980s, western companies faced a new challenge with the Asian<br />
market [7]. Especially countries like Japan established new production methods, leading<br />
to an increased significance of quality in manufacturing at the international markets.<br />
Many western companies proved unable to meet the challenge and failed to survive, while<br />
others realized the importance of quality assurance and started to investigate the use of<br />
new technologies like Machine Vision. MV has many advantages and is able to improve<br />
product quality, to enhance processing efficiency, and to increase operational safety.<br />
In the early 1980s, the development in the field of Artificial Vision was slow and mainly<br />
academic, and the industrial interest was low until the late 1980s and early 1990s [7]. A<br />
significant progress in computer hardware allows for real-time implementations of image<br />
processing algorithms, developed over the past 25 years, on standard platforms. The<br />
decreasing costs for computational power made MV systems more and more attractive,<br />
leading to a growth of MV applications and companies developing such systems. Today,<br />
the field of MV has become a confident multi-million dollar industry [7].<br />
The objectives of MV systems include position recognition, identification, shape and<br />
dimension check, completeness check, image and object comparison, and surface inspection<br />
[18]. Usually, the goal is to detect and sort out production errors or to guide a robot arm<br />
(or other devices) in a particular task [42].<br />
MV systems can be found in all industrial sectors and cover a huge range of inspected<br />
objects. Dimensional measuring tasks can be found for example in the inspection of<br />
bottles on assembly lines [72], wood [15, 50], screw threads [34], or thin-film disk heads<br />
[61]. Measuring objects is often related to 3D CAD models [23, 43]. An example for<br />
guiding a robot arm in grasping 3D sheet metal parts is given in [52]. Giving a detailed<br />
overview on all potential applications is beyond the scope of this thesis.<br />
Guaranteed product quality can help to establish and maintain customer relationships,<br />
enhancing the competitive position of a company. The main advantage of visual inspection<br />
in quality control is, beside its versatile range of applications, that it is non-contact, clean,<br />
fast [7].<br />
Although the interpretative capability of today’s vision systems can not achieve the<br />
ability of the human visual system in the overall case, it is possible to develop systems that<br />
perform better than people at some quantitative task. However, this assumes controlled<br />
and circumscribable conditions, reducing the problem to a defined and repetitive task.<br />
Usually, such conditions can be established at manufacturing lines.<br />
A human operator can be expected to be only 70-80% efficient, even under ideal conditions<br />
[7]. In practice, there are many factors that can reduce this productive efficiency<br />
of humans like tiredness, sickness, boredom, alcohol or drugs. For example, if a human<br />
is instructed to observe objects on a conveyor, this task is tiring and it is not unlikely<br />
that the operator is distracted after a while. On the other hand, a MV system could,<br />
theoretically, perform the same task 24 hours a day and 365 days a year without getting<br />
tired.<br />
If the inspection is performed in surroundings were working can be unpleasant, intolerable,<br />
dangerous or harmful to health for a human being, MV is a welcome option. This<br />
includes working under high (or low) temperatures, chemical exhalation, smoke, biological
1.2. PROBLEM STATEMENT 3<br />
hazards, risk of explosion, x-rays, radioactive material, loud noise levels, etc. [7]. On the<br />
other hand, in applications that require aseptic conditions as in the food or pharmaceutical<br />
industry, a human operator can be a ‘polluter’ as a source of dirt particles (hair, danders,<br />
bacteria, etc.). In this case, a MV system is a clean alternative.<br />
Machines usually exceed humans in all kinds of accurate vision-based measurements.<br />
Human vision performs well in comparing objects and in detecting differences for example<br />
in shape, color or texture [27]. Large deviations can be detected quickly. As the difference<br />
is getting smaller, however, the time of inspection increases or the deviation can not be<br />
detected at all without technical tools. With respect to the task considered in this thesis,<br />
a human is not able to determine the length of an object at sub millimeter precision just<br />
by looking at it. Manual measurements are slow and not practicable in line production if<br />
100% control is desired, and can thus be used only for random inspection of few objects.<br />
MV systems on the other hand can measure the length (or other features) of an object<br />
without contact up to nm precision - depending on the optical system and the size of the<br />
object [16]. Furthermore, humans soon reach limits, if the number of objects to inspect per<br />
minute increases significantly. Many manufacturing processes are so fast that the human<br />
eye has problems to even perceive the objects, not to mention the ability to accomplish<br />
any inspection task. MV systems, however, can handle several hundred objects per minute<br />
with high accuracy.<br />
Although MV systems have many advantages for manufacturers, there are also drawbacks.<br />
Usually, a MV system is designed and optimized for a specific task in a constraint<br />
environment. If the requirements of the application change, the system has to be adapted,<br />
which can be difficult and expensive [7]. Furthermore, the system can be sensitive to a<br />
lot of influences of the (industrial) environment like heat, humidity, dirt, dust, or ambient<br />
lighting. Respective precautions have to be taken to protect the system. Finally, like in<br />
general in automation, vision systems that exceed the power of a human at some specific<br />
task, replace human operators and will therefore supersede mostly low-skilled jobs in the<br />
future. Addressing this problem in more detail is outside the scope of this thesis.<br />
1.2. Problem Statement<br />
A large variety of heat shrink tubes of different sizes, material and shrinking properties is<br />
available on the market. The focus in this thesis will be on the DSG- Canusa DERAY-<br />
SPLICEMELT series. These tubes are commonly used for insulation of cable splices in<br />
the automotive industry (see Figure 1.1 for an example). A film of hotmelt adhesive inside<br />
the heat shrink tubes provides a waterproof sealing around the splice after shrinking. In<br />
addition, the DERAY- SPLICEMELT series shows a strong resistance against thermal,<br />
chemical and mechanical strains. The easy and fast handling allows for an application in<br />
series production. Accordingly, if the heat shrinking is performed in an automated fashion,<br />
the accuracy demands increase.<br />
In production, the heat shrink tubes are cut into lengths from a continuous tube. During<br />
this process, however, deviations from a specific target length can occur. In terms of quality<br />
assurance, any deviations above a tolerable level must be detected so that failings can be<br />
sorted out.
4 CHAPTER 1. INTRODUCTION<br />
(a) (b) (c)<br />
Figure 1.1: Application of a transparent heat shrink tube of type DERAY- SPLICEMELT.<br />
After shrinking the heat shrink tube provides a robust, waterproof insulation of the cable<br />
splice. (Source: DSG-Canusa)<br />
Property Attributes<br />
Color transparent, black<br />
Length 20-100mm<br />
Diameter 6, 8, 12mm<br />
Table 1.1: Range of tube types considered in this thesis.<br />
Delivering defectives must be avoided at highest priority to satisfy the customer and to<br />
retain a good reputation. In this context, tolerable failure rates are specified in parts per<br />
million. Rejected goods can be very expensive.<br />
Up to now, length measurements have been performed manually by a human operator.<br />
This has several drawbacks. First, only random samples can be controlled by hand, since<br />
10 parts per second and more considerably exceed the human capabilities. Furthermore,<br />
one operator is busy doing the monotone measuring task at one machine and can not be<br />
deployed to other tasks. This leads to a low effective productivity. In practice, more than<br />
one production line that cuts the heat shrink tubes into lengths is running in parallel,<br />
requiring even more human resources which is very expensive. In addition, there is always<br />
a non-negligible possibility of subjective errors when human operators carry out the<br />
inspections - they also show symptoms of fatigue over time in this highly repetitive task.<br />
The measuring quality varies detectable between morning and late shift.<br />
In this thesis work a machine vision inspection system is developed that is able to replace<br />
the human operator at this particular measuring task allowing for a reliable 100% control.<br />
1.3. Requirements<br />
The system must cover a range of tube types, differing in diameter, length or material<br />
properties. An overview of the variety of tube types can be found in Table 1.1.<br />
ThetwomainclassesofDERAY-SPLICEMELT heat shrink tubes considered in this<br />
thesis are black or transparent in color - transparent tubes cover about 70% of the production.<br />
Unlike black tubes, the transparent ones are translucent and appear slightly<br />
yellowish or reddish due to a film of hotmelt adhesive inside the tube.<br />
Most tubes have a printing on the surface that can consist of both letters and numbers<br />
(e.g. DSG2). Since this printing is plotted onto the continuous tube before being cut into
1.4. RELATED WORK 5<br />
Length [mm] Tolerance [mm]<br />
20 − 30 ±0.5<br />
31 − 50 ±0.7<br />
51 − 100 ±1.0<br />
Table 1.2: Tolerance specifications of different tube lengths<br />
lengths, the position of the printing is not consistent among the tubes and must not affect<br />
the measuring results.<br />
The tube length ranges from 20mm to 100mm. In this thesis, however, the focus will<br />
be on 50mm tubes since this is the dominant length in production. The outer diameter<br />
varies between 6mm and 12mm.<br />
The tolerances differ between 0.5 and1.0mm depending on the tube length as can be<br />
seen in Table 1.2. This table includes the tolerable deviations from a given target length<br />
in mm.<br />
Themeasurementshavetobeaccomplishedatlineproductiononaconveyorinrealtime.<br />
The system is intended to reach a 100% control without reducing production velocity.<br />
Currently the conveyor runs at approximately 20m/min, i.e. 3-17 tubes per second are<br />
cut depending on the segment size. Theoretically the cutting machine is able to run at<br />
up to 40m/min. A faster velocity results in less processing time per tube segment. The<br />
system design must be robust with respect to industrial use. Theoretically, it must be<br />
able to run stable 24 hours/day, 7 days/week and 365 days/year.<br />
Although there are many different tube types, only one kind of tube is processed at<br />
one production line over a certain period of time. This means, the tube segments to be<br />
inspected on the conveyor are all of the same kind. However, to be flexible to customer<br />
demands, a production line must be able to be rearranged to a different kind of tube<br />
several times a day. This emphasizes the importance of an easy to operate calibration and<br />
teach-in step of the inspection system for practical application.<br />
The goal of the visual inspection is a reliable good/bad decision for each tube segment<br />
whether it has to be sorted out or not. In the following, tube segments wrongly classified<br />
as proper, but nevertheless deviating from the given target length above the allowed<br />
tolerances (see Table 1.2), are denoted as false positives. On the other hand, false negatives<br />
are tube segments that are classified for sorting out, although the actual length meets the<br />
tolerances. To reach optimal product quality, the number of false positives must be reduced<br />
to zero. Large numbers of false negatives indicate that the system is not adjusted properly<br />
and has to be reconfigured.<br />
1.4. Related Work<br />
In Section 1.1 several examples of vision-based measuring systems in industrial applications<br />
have been presented. Much more work in this area has been done over the past 20 years<br />
[4]. However, MV related publications of academic interest often consider only specific<br />
subproblems, but do not present a detailed insight of the whole system. On the other<br />
hand, commercial manufacturers of MV systems hide the technical details in order to keep<br />
the competitive advantage [18].
6 CHAPTER 1. INTRODUCTION<br />
There are several useful books addressing the fundamental methods, techniques and<br />
algorithms used to develop machine vision applications in a comprehensive fashion [7, 16,<br />
18, 62].<br />
Dimensional measuring of objects requires knowledge of an object’s boundaries. A<br />
common indicator for object boundaries, both in human and artificial vision, are edges.<br />
Edge detection is a widely investigated area of vision research dating back from 1959 in the<br />
field of TV signal processing [37] to the present. The edge detection methods considered<br />
in this thesis are related to the work of Sobel [36, 51], Marr and Hildreth [45] and Canny<br />
[13].<br />
In addition, anisotropic approaches have been proposed [69], i.e. orientation selective<br />
edge detectors. These filters have many applications for example in texture analysis or<br />
in the design of steerable filters that efficiently control the orientation and scale of filters<br />
to extract certain features in an adaptive way [25, 49]. Many of these approaches are<br />
motivated in early human vision. In their investigation of the visual cortex, Hubel and<br />
Wiesel discovered orientation selective cells in the striate cortex V1 [33]. In several theories<br />
it is assumed that humans perceive low-level features such as edges or lines by combinations<br />
of the response of these cells [27]. Many computer vision researchers, however, adapted<br />
the idea of orientation selective cells or filters which can be combined to produce a certain<br />
response. Such sets of filters are often called filter banks. Malik and Perona [44] used a<br />
filter bank based on even symmetric difference of offset Gaussians (DOOG) for texture<br />
discrimination.<br />
The discrete pixel grid resolution of CCD camera images limits the measuring accuracy.<br />
Thus, several techniques have been proposed that compute subpixel edge positions [6, 41,<br />
66, 56, 71].<br />
A common task in vision applications is to search whether a particular pattern is part<br />
of an image, and if so, where it is located [28]. Template matching is one method to<br />
tackle this problem. Cross-correlation techniques are widely used as measure of similarity<br />
[64, 18, 62]. In stereo vision, correlation is used to solve the problem of correspondences<br />
between the left and right view [21, 65]. Other practical applications can be found in<br />
feature trackers, pattern recognition, or registration of e.g. medical image data.<br />
Accurate visual measurements often require a camera calibration step to relate 3D points<br />
in the real world to image coordinates and to compensate for lens distortions. One early<br />
approach was presented by Tsai [39, 67]. An extensive introduction into calibration is<br />
given by Faugeras [21] or Hartley and Zisserman [30]. The calibration approach in this<br />
thesis work is closely related to the work of Zhang [74] and Heikkilä and Silvén [31].<br />
1.5. <strong>Thesis</strong> Outline<br />
The remainder of this thesis is organized as follows: Chapter 2 provides the theoretical<br />
background on models and techniques used in later sections with regard to measuring with<br />
video cameras. This chapter also gives an overview on different illumination techniques<br />
used for machine vision applications.<br />
In Chapter 3, the physical design of the system is introduced. Especially the camera<br />
and lens selection as well as the illumination setup are discussed in detail in this chapter.
1.5. THESIS OUTLINE 7<br />
The vision part of the system is presented in Chapter 4. After describing assumptions<br />
and model knowledge used throughout the inspection, the different steps of the length<br />
measuring are proposed. This chapter also contains the calibration and teach-in of the<br />
system as well as the algorithms and techniques used to perform the measuring task with<br />
respect to real-time demands.<br />
The system is systematically evaluated in Chapter 5. Therefore, several quantitative<br />
and qualitative evaluation criteria as well as different test scenarios are introduced. The<br />
automated measurements are compared to human measurements in terms of accuracy and<br />
precision. Finally, the results are discussed, and ideas for future work are given. The<br />
thesis concludes with a summary on the presented work in Chapter 6.
8 CHAPTER 1. INTRODUCTION
2. Technical Background<br />
2.1. Visual Measurements<br />
This section introduces the basic concepts and techniques making visual measurements<br />
possible. It is elementary to understand the fundamental process of image acquisition<br />
as well as the underlying camera models and geometries to be able to understand what<br />
parameters influence the measurement of real world objects in video images. Based on<br />
these concepts one can determine the factors that influence accuracy and precision.<br />
Extracting information about real world objects from images in machine vision applications<br />
is closely related to the area of photogrammetry. In [5], photogrammetry is defined<br />
as the art, science, and technology of obtaining reliable information about physical objects<br />
and the environment through the processes of recording, measuring, and interpreting photographic<br />
images and patterns of electromagnetic radiant energy and other phenomena.<br />
There are many traditional applications of photogrammetry in geography, remote sensing,<br />
medicine, archaeology, or crime detection. In machine vision applications, there is a<br />
wide range of measuring tasks including dimensional measuring (size, distance, diameter,<br />
etc.) or angles. Although sophisticated algorithms can increase accuracy, the quality and<br />
repeatability of measurements is always related to the hardware used (e.g. camera sensor,<br />
optical system, digitizer) as well as the environmental conditions (e.g. illumination).<br />
2.1.1. Accuracy and Precision<br />
Throughout this thesis the terms accuracy and precision are used quite often and are<br />
mostly related to measuring quality. Although these terms may be used synonymously in<br />
a different context, with respect to measurements they have a very distinct meaning.<br />
Accuracy relates a measured length to a known reference truth or ground truth. The<br />
closer a measurement approximates the ground truth, the more accurate is the measuring<br />
system. Precision represents the repeatability of measurements, i.e. how much different<br />
measurements of the same object vary. The more precise a measuring system is, the closer<br />
lie the measured values together.<br />
Figure 2.1 visualizes the definition of accuracy and precision in a mathematical sense.<br />
The distribution of a set of measurements can be expressed in terms of a Gaussian probability<br />
density function. The peak of this distribution corresponds to the mean value of the<br />
measurements. The distance between the mean value and the reference ground truth value<br />
determines the accuracy of this measurement. The standard deviation of the distribution<br />
can be used as measure of precision.<br />
It is important to state that accuracy does not have to imply precision and vice versa.<br />
For example the measuring result of a tube of 50mm length could be 50 ± 20mm. This<br />
statement is very accurate but not very precise. On the other hand a measuring system can<br />
be very precise, but not accurate if it is not calibrated correctly. Thus, good measurements<br />
for industrial inspection tasks have to be both accurate and precise.<br />
9
10 CHAPTER 2. TECHNICAL BACKGROUND<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
49.5 49.6 49.7 49.8 49.9 50 50.1 50.2 50.3 50.4 50.5<br />
Reference<br />
Value<br />
Figure 2.1: Visualization of the difference between accuracy and precision in terms of measurements.<br />
A good measuring system must be both accurate and precise.<br />
2.1.2. Inverse Projection Problem<br />
A general problem of human vision denoted as inverse projection problem [27] can also<br />
be applied for artificial systems. It states that the (perspective) projection of threedimensional<br />
world objects onto a two-dimensional image plane can not be inverted welldefined.<br />
The loss of dimension indicates a loss of information, which can not be compensated<br />
in general, since it is possible to produce the same stimulus on the human retina or<br />
the camera sensor by different origins. So several objects of different size or shape can look<br />
identical in an image. One important property to consider in this context is the influence<br />
of perspective. The term perspective is further discussed again in Section 2.1.3.<br />
Humans can compensate for the inverse projection problem by certain heuristics and<br />
model knowledge of the scene in many situations. Similar techniques can be adapted<br />
to artificial systems. Especially in machine vision applications where conditions are welldefined<br />
and known, model knowledge of the inspection task can be derived and integrated.<br />
2.1.3. Camera Models<br />
There are several approaches to model the geometry of a camera. Addressing all these<br />
models is outside the scope of this thesis. In the following only the most common camera<br />
models are introduced that provide a theoretic basis for visual measurements with CCD<br />
cameras.<br />
Pin Hole Camera The simplest form of a camera known as camera obscura was invented<br />
in the 16th century. The underlying principle of this camera was already known long before<br />
by Aristotle (384-322 BC): Light enters an image plane through an (ideally) infinite small<br />
hole, so only one ray of light from the world passes through the hole for each point in<br />
the 2D image plane leading to an one-to-one correspondence. Objects at a wide range<br />
of distances from the camera can be imaged sharp and undistorted [65, 73]. The camera<br />
obscura is formally named pin hole camera. In the non-ideal case the pinhole has a finite<br />
size, thus, each image point collects light from a cone of rays.<br />
Accuracy<br />
Precision
2.1. VISUAL MEASUREMENTS 11<br />
(a)<br />
Figure 2.2: Parallel lines intersect at horizon at perspective. Image taken by F. Wagenfeld<br />
at Alaska Highway between Watson Lake and Whitehorse, Canada.<br />
In the 15th century, Filippo Brunelleschi used the pin hole camera model to demonstrate<br />
the laws of perspective discovered earlier [24, 38]. Two main effects characterize the pin<br />
hole perspective or central perspective:<br />
Close objects appear larger than far ones<br />
Parallel lines intersect at horizon<br />
Figure 2.2 visualizes these effects of perspective at an example.<br />
A drawback of the pinhole camera with respect to practical use in combination with a<br />
photosensitive device is its long exposure time, since only a little amount of light enters<br />
the image plane at one time [65]. However, the pinhole model can be used to derive<br />
fundamental properties in a mathematical sense that describe the imaging process. These<br />
properties can be extended by more realistic models to imply real imaging devices.<br />
Figure 2.3(a) gives an overview over the pinhole geometry. The camera center O, also<br />
denoted as optical center or center of projection, is the origin of a 3D coordinate system<br />
with the axis X, Y and Z. This 3D coordinate system is denoted as camera reference<br />
frame or simply camera frame. The image plane ΠI is defined to be parallel to the XY<br />
plane, i.e. perpendicular to the Z axis. The point o where the Z axis intersects the image<br />
planeisreferredtoasimage center. TheZ axis, i.e. the line through O and o is denoted<br />
as optical axis.<br />
The fundamental equations of a perspective camera describe the relationship between<br />
apointP =(X, Y, Z) T in the camera frame and a point p =(x, y) T in the image plane:<br />
x = f X<br />
Z<br />
y = f Y<br />
Z<br />
(2.1)<br />
(2.2)<br />
where f is the focal length of the camera. p can be seen as the point of intersection of a<br />
line through P and the center of projection with the image plane ΠI [30]. This relationship<br />
can be easily derived from Figure 2.3(b). In the following, lower-case letters will always
12 CHAPTER 2. TECHNICAL BACKGROUND<br />
(a) (b)<br />
Figure 2.3: (a) Pinhole geometry. (b) Projection of a point P in the camera frame onto the<br />
image plane ΠI (herewithregardtoY ).<br />
indicate image coordinates, while upper-case letters refer to 3D coordinates outside the<br />
image plane.<br />
Weak-Perspective Camera If the relative distance between points in the camera frame<br />
with respect to the Z axis (scene depth) is small compared to the average distance from<br />
the camera, these points are approximately projected onto the image plane like lying all<br />
on one Z-plane Z0. Thus,theZ coordinate of each point can be approximated by Z0 as:<br />
x ≈ f X<br />
Z0<br />
y ≈ f Y<br />
Z0<br />
(2.3)<br />
This has the effect of all points being projected with a constant magnification [24]. If<br />
thedistancebetweencameraandplaneZ0 increases to infinity, there is a direct mapping<br />
between 3D points in the camera frame and in the image plane:<br />
x = X (2.4)<br />
y = Y<br />
This projection is denoted as orthographic projection [65]. To overcome the described<br />
problems of pinhole cameras, real imaging systems are usually provided with a lens which<br />
collects rays of light and brings them into focus on the image plane.<br />
Thin Lens Camera The simplest optical system can be modeled by a thin lens. The<br />
main characteristics of a thin lens are [65]:
2.1. VISUAL MEASUREMENTS 13<br />
Figure 2.4: Thin lens camera model.<br />
Any ray entering the lens parallel to the axis on one side goes through the focus on<br />
the other side<br />
Any ray entering the lens from the focus on one side emerges parallel to the axis on<br />
the other side<br />
The geometry of a thin lens imaging system is shown in Figure 2.4. F and ˆ F are the<br />
focus points before and behind the lens. From this model one can derive the fundamental<br />
equation of thin-lenses [65]:<br />
1 1 1<br />
+ = (2.5)<br />
Z z f<br />
where Z is the distance or depth of a point to the lens and z the distance between the<br />
lens and the image plane. The focal length f, i.e. the distance between the focus point<br />
and the lens is equal at both sides of the thin lens in the ideal model.<br />
Thick Lens Camera Real lenses are represented much better by a thick lens model. The<br />
thin lens model does not consider several aberrations that come with real lenses. This<br />
includes defocusing of rays that are neither parallel nor go through the focus (spherical<br />
aberration), different refraction based on the wavelength or color of light rays entering<br />
the lens (chromatic aberration), or focusing of objects at different depths. Another factor<br />
that is important with real lenses with respect to accurate measuring applications, is lens<br />
distortion. Ideally, a world point, its image point and the optical center are collinear, and<br />
world lines are imaged as lines [30]. For real cameras this model does not hold. Especially<br />
at the image boundaries, straight lines appear curved (radial distorted). The effect of<br />
distortion will be re-addressed in following sections.<br />
2.1.4. Camera Calibration<br />
Until now, all relationships between 3D points and image coordinates have been defined<br />
with respect to a common (camera) reference frame. Usually, the location of a point<br />
in the world is not known in camera coordinates. Thus, if one wants to relate world<br />
coordinates to image coordinates, or vice versa, one has to consider geometric models and
14 CHAPTER 2. TECHNICAL BACKGROUND<br />
physical parameters of the camera. At this stage, one can distinguish between intrinsic<br />
and extrinsic parameters [24].<br />
Intrinsic Parameters The intrinsic parameters describe the projection of a point in the<br />
camera frame onto the image plane, i.e. the transformation of camera coordinates into<br />
image coordinates. This transformation extends the ideal perspective camera model introduced<br />
in the previous section with respect to properties of real CCD cameras. One can<br />
derive the following projection matrix Mi:<br />
⎛<br />
⎞<br />
−f/sx k ox<br />
Mi = ⎝ 0 −f/sy oy ⎠ (2.6)<br />
0 0 1<br />
where f represents the focal length, sx and sy theeffectivepixelsizeinxand y direction<br />
respectively, k the skew coefficient, and (ox,oy) the coordinates of the image center. α = sy<br />
is the aspect ratio of the camera. If α = 1, the sensors of the CCD array are ideally square.<br />
The skew coefficient k determines the angle between the pixel axis and is usually zero, i.e.<br />
the x- and y axis are perpendicular. (ox,oy) can be seen as an offset that translates the<br />
projection of the camera origin onto the image origin in pixel dimensions. If sx = sy =1<br />
and ox = oy = k =0,Mi represents an ideal pinhole perspective camera.<br />
Extrinsic Parameters The extrinsic parameters take the transformation between a fixed<br />
world coordinate system (or object coordinate system) and the camera coordinate system<br />
into account. This includes the translation and rotation of the coordinate axis [65], i.e. a<br />
translation vector T =(Tx Ty Tz) T and a 3 × 3 rotation matrix R such as:<br />
⎛<br />
Me = ⎝<br />
r11 r12 r13 −R T 1 T<br />
r21 r22 r23 −R T 2 T<br />
r31 r32 r33 −R T 3 T<br />
⎞<br />
sx<br />
⎠ (2.7)<br />
where rij (i, j ∈{1, 2, 3}) are the matrix elements of R at (i, j) andRi indicates the<br />
ith row of R.<br />
Thus, the relationship between world and image coordinates can be written in terms of<br />
two matrix multiplications [65]:<br />
⎛<br />
⎝<br />
x1<br />
x2<br />
x3<br />
⎞<br />
⎠ = Mi Me<br />
⎛<br />
⎜<br />
⎝<br />
X<br />
Y<br />
Z<br />
1<br />
⎞<br />
⎟<br />
⎠<br />
(2.8)<br />
with (X, Y, Z, 1) T representing a 3D world point in homogeneous coordinates, and image<br />
coordinates can be computed as x = x1/x3 andy = x2/x3 respectively. M = MiMe is<br />
denoted as projection matrix in the following.<br />
Image Distortion The resulting image coordinates may be distorted by the lens, i.e.<br />
linear projection is not guaranteed. If high accuracy and precision is required, the simple<br />
mathematical relationships introduced before are not sufficient.<br />
To overcome this effect, a model of the distortion has to be defined. A common radial<br />
distortion model [30] can be written as:
2.1. VISUAL MEASUREMENTS 15<br />
� xd<br />
yd<br />
�<br />
�<br />
˜x<br />
= L(˜r)<br />
˜y<br />
�<br />
(2.9)<br />
where (˜x, ˜y) T is the undistorted and (xd,yd) the corresponding distorted image position.<br />
The function L(˜r) determines the amount of distortion depending on the radial distance<br />
˜r = � ˜x 2 +˜y 2 from the center for radial distortion.<br />
The correction of the distortion at a measured position p =(x, y) can be computed as:<br />
ˆx = xc + L(r)(x − xc) (2.10)<br />
ˆy = yc + L(r)(y − yc) (2.11)<br />
where (ˆx, ˆy) T is the undistorted (corrected) position, (xc,yc) T the center of the radial<br />
distortion, and r = � (x − xc) 2 +(y − yc) 2 theradialdistancebetweenp and the center<br />
of distortion.<br />
An arbitrary distortion factor L(r) can be approximated by the following equation [30]:<br />
L(r) =1+<br />
m�<br />
κir i<br />
i<br />
(2.12)<br />
whichisdefinedforr > 0andL(0) = 1. The distortion coefficients κi as well as<br />
the center of radial distortion (xc,yc) T can be seen as additional intrinsic parameters of<br />
the camera model. The number of coefficients m depends on the required accuracy and<br />
the available computation time. Usually less than the first three or four coefficients are<br />
considered. In common calibration procedures such as the calibration method proposed<br />
by Tsai [67], only the even coefficients (i.e. κ2, κ4,...) are taken into account while odd<br />
coefficients are set to zero. In this case, one or two coefficients are sufficient to compensate<br />
for the distortion in most cases [31].<br />
Beside the radial distortion model, there are several other models including tangential,<br />
linear, and thin prism distortion [31]. Usually a radial distortion model is combined with<br />
a tangential model as proposed in [11, 12].<br />
There are several approaches to compute the unknown intrinsic and extrinsic parameters<br />
of a camera. The most common methods are based on known correspondences between<br />
real world points and image coordinates. A chessboard-like calibration grid has become<br />
quite common as a calibration pattern. The corners of the grid provide a set of coplanar<br />
points.<br />
The world coordinates can be easily determined if one defines a coordinate system with<br />
the X- andY axis lying orthogonal in the chessboard plane and Z = 0 for all points. A<br />
corner not to close to the center represents the world origin. Based on these definitions,<br />
each corner of the calibration pattern can be described in the form (X, Y, Z) T . Threedimensional<br />
calibration rigs composed of orthogonal chessboard-planes are also used quite<br />
often.<br />
In a captured image of the calibration pattern, the corners can be extracted at pixel<br />
(or subpixel) level, and mapped to world coordinates. If there is a sufficient number of<br />
correspondences, one can try to solve a homogeneous linear system of equations based on
16 CHAPTER 2. TECHNICAL BACKGROUND<br />
the projection matrix M. The solution is also denoted as implicit camera calibration, since<br />
the resulting parameters do not have any physical meaning [31]. In the next stage, the<br />
intrinsic and extrinsic camera parameters can be extracted from the computed solution of<br />
M [21].<br />
There are linear and nonlinear methods to solve for the projection matrix. Linear methods<br />
assume an ideal pinhole camera and ignore distortion effects. Thus, these methods<br />
can be solved in closed-form. Abdel-Aziz and Karara [1] introduced a direct linear transform<br />
(DLT) to compute the parameters in a noniterative algorithm. If higher accuracy is<br />
needed, nonlinear optimization techniques have been investigated to accomplish for distortion<br />
models. Usually, the parameters are estimated by minimizing the pixel error between<br />
a measured point correspondence and the reprojected position of the world point using<br />
the projection matrix in least-square sense. This is an iterative process that may end<br />
up with a bad solution unless a good initial guess is available [70]. Therefore, linear and<br />
nonlinear methods are combined as the DLT can be used for initialization of the nonlinear<br />
optimization. One well-known two-step calibration method was proposed by Tsai [67, 39]<br />
2.2. Illumination<br />
In machine vision applications, the right choice of illumination can simplify the further<br />
image processing considerably [16]. It can preprocess the input signal (e.g. enhance<br />
contrast or object boundaries, eliminate background, diminish unwanted features etc.)<br />
without consuming any computational power. On the other hand, even the best imaging<br />
sensor can not compensate for the loss in image quality induced by poor illumination.<br />
There are several different approaches of illumination in MV. Depending on the application<br />
one has to consider what has to be inspected (e.g. boundaries, surface patterns<br />
or color), what are the material properties of the objects to be inspected (e.g. lighting<br />
reflection characteristics or translucency) and what are the environmental conditions (e.g.<br />
background characteristics, object dimension, camera position or the available space to<br />
install light sources). In the following, different types of light sources and lighting setups<br />
used in MV are introduced.<br />
2.2.1. Light Sources<br />
The light sources commonly used in machine vision include high-frequency fluorescent<br />
tubes, halogen lamps, xenon light bulbs, laser diodes and light emitting diodes (LED)<br />
[18].<br />
High-frequency fluorescent lights High-frequency fluorescent light sources are widely<br />
used in machine vision applications, since they produce a homogeneous, uniform, and<br />
very bright illumination. They feature white or ultraviolet light at low development of<br />
heat, thus, there is no need for fan cooling.<br />
Standard fluorescent light tubes are not suitable for vision applications since they flicker<br />
cyclically with the power supply frequency. This yields unwanted changes in intensity or<br />
color in the video image, whereas the effect increases, if the capturing rate of the camera is<br />
close to the power supply frequency (e.g. 50Hz in Germany). High-frequency fluorescent
2.2. ILLUMINATION 17<br />
tubes alternate at about 25kHz, what is far beyond what can be captured by a video<br />
camera.<br />
Fluorescent lights exist at different sizes, shapes, and setups. Beside the common light<br />
tube, there are also fluorescent ring lights or rectangle area lights. Low costs and a long<br />
life-time make fluorescent lights even more attractive.<br />
Light Emitting Diodes A LED is a semiconductor device that emits incoherent, monochromatic<br />
light with the wavelength depending on the chemical composition of the semiconductor.<br />
Today, different wavelengths of the visible spectrum for humans ranging from<br />
about 400 to 780nm, as well as ultraviolet or infrared wavelengths, can be covered by<br />
LEDs. The emitted visible light appears for example red, green, blue or yellow. Furthermore,<br />
it is possible to produce LEDs that appear “white” by combining a blue LED with<br />
a yellowish phosphor coating.<br />
LEDs have many advantages compared to other light sources. Due to the small size, they<br />
can be used for a variety of lighting geometries [18]. This includes ring lights, dome lights,<br />
area or line lights, spot lights, dark-field lights and backlights. Theoretically, each single<br />
LED in a cluster can be controlled independently. Thus, it is possible to generate different<br />
illumination conditions (for example different lighting angles or intensities) with a single<br />
setup by enabling and disabling certain LEDs, e.g. automated or software controlled. It<br />
is also possible to use LEDs in strobe light mode.<br />
Another advantage of LEDs is their energy efficiency and long lifetime with only a little<br />
loss in intensity over time. Thus, LEDs have low maintenance costs. Operated at DC<br />
power, LEDs do not produce any flickering visible as intensity changes in the video image.<br />
Halogen lights Halogen lamps are an extension of light bulbs and filled with a halogen<br />
gas (e.g. bromine or iodine). With respect to machine vision applications, halogen lamps<br />
are often used in combination with fiber optic light guides [18]. The emitted light of a<br />
light source is transferred through this fiber optic light guides, allowing for very flexible<br />
illumination setups and geometries. This includes ring lights, dome lights, area or line<br />
lights, spot lights, dark-field lights and backlights as for LEDs. Furthermore, there are a<br />
range of fiber optic bundles at different sizes to route and position the light for user-defined<br />
lighting.<br />
One disadvantage of halogen lamps is a large heat development. Thus, usually active<br />
cooling is required. Nevertheless, due to the bright “white” light emitted by halogen<br />
lights (color temperature of about 6000K), they are also called cold light sources in the<br />
literature. If heat development of the light source can be harmful to heat sensitive objects,<br />
fiber optics can be useful to keep the light source away from the point of inspection. Like<br />
LEDs, halogen lamps do not produce flickering effects, if the light source is DC-regulated.<br />
Thus, halogen lamps qualify for high accuracy inspection tasks.<br />
Xenon lights, often used for strobe light mode, are quite similar to halogen lamps. These<br />
lights allow for very short and bright light pulses, which are used to reduce the effect of<br />
motion blur.<br />
Besidethedifferentwaysoflightgeneration,therearemultiplepossiblesetupsofhow<br />
light sources are arranged. Especially LED lights and fiber optics are very flexible as<br />
introduced before. They can be adapted to a wide range of machine vision tasks at almost<br />
any size and geometry.
18 CHAPTER 2. TECHNICAL BACKGROUND<br />
(a) (b)<br />
(c) (d)<br />
Figure 2.5: Incident lighting setups. (a) Indirect diffuse illumination over a hemisphere. (b)<br />
Diffuse ring- or area light setup. (c) Darkfield illumination. (d) Coaxial illumination.<br />
2.2.2. Incident Lighting<br />
Incident lighting or front lighting is characterized by one or more light sources illuminating<br />
the object of interest from the cameras viewing direction. This includes diffuse front<br />
lighting, directional front lighting, polarized light, axial/in-line illumination and structured<br />
lighting [18]. Figure 2.5 gives an overview on different incident lighting setups.<br />
Diffuse Lighting Diffuse lighting reduces specular surface reflections and can be seen as a<br />
uniform, undirected illumination. It is usually generated by one or more light sources that<br />
are placed behind a diffuser at a certain distance. This yields the effect of one uniform<br />
area of light. A diffuser can be a pane of white translucent (acrylic) glass, mylar or other<br />
synthetic material. Instead of using a diffuser in front of a light source, indirect lighting<br />
can also result in diffuse illumination. A simple, but effective method reported in the<br />
literature [7] converts a Chinese wok into a hemisphere for diffuse lighting. The inner<br />
side of the wok is painted white. The camera can be placed at a hole on the top of the<br />
hemisphere (the bottom of the wok). The light sources are arranged in a way they can not<br />
directly illuminate the object, but the emitted light is reflected at the white screen inside<br />
the hemisphere (see Figure 2.5(a)). A diffuse illumination can also be achieved using a<br />
ring or area light source as in Figure 2.5(b)<br />
Directional Lighting Directional lighting is achieved by one or more directed light sources<br />
at a very low angle of incidence. The main characteristic of such type of illumination is the
2.2. ILLUMINATION 19<br />
effect of completely smooth objects appearing dark in the image, since the light rays are<br />
not reflected toward the camera, while unevenness leads to brighter image intensities. Due<br />
to this effect, directional lighting is also denoted as dark field illumination in the literature<br />
[18] (See Figure 2.5(c)). Directional lighting mostly qualifies for surface inspection tasks<br />
that consider the surface structure revealing irregularities or bumpiness.<br />
Polarized light In combination with a polarizing filter in front of the camera lens, incident<br />
lighting with polarized light can be used to avoid specular reflections. Such reflections<br />
preserve the polarization of a light ray, thus, with the right choice of filter, only scattered<br />
light rays can pass the filter and reach the camera. A maximal filter effect can be reached if<br />
the polarization of the light source and the filter are perpendicular to each other. Polarized<br />
light is often combined with a ring light setup to avoid both shadows and reflections.<br />
Structured lighting Structured lighting is used to obtain three-dimensional information<br />
of objects. A certain pattern of light (e.g. crisp lines, grids or cycles [18]) is projected<br />
onto the object. Based on deflections of this known pattern in the image, one can infer<br />
the object’s three-dimensional characteristics. For example, in [58], a 3D scanner using<br />
structured lighting is presented that integrates a real-time range scanning pipeline. In<br />
machine vision applications, structured lighting can be used for dimensional measuring<br />
tasks were the contrast between object and background is poor.<br />
Axial illumination In this type of illumination setup (see Figure 2.5(d)), also denoted as<br />
coaxial illumination in the literature, the light rays are directed to run along the optical<br />
axis of the camera [18]. This is achieved using an angled beam splitter or half-silvered<br />
mirror in combination with a diffuse light source. The beam of light has usually the same<br />
size as the camera’s field of view. The main application of axial illumination systems<br />
is to illuminate highly reflective, shiny materials such as plastic, metal or other specular<br />
materials, or for example to inspect the inside of bore holes. Axial illumination is typically<br />
used for inspection of small objects such as electrical connectors or coins.<br />
One potential problem with most incident lighting methods are shadows. Although the<br />
shadow contrast can be lowered using several light sources at different positions around<br />
the object (e.g. ring lights) or axial illumination setups, objects with sharp corners or<br />
concavities might have regions that can not be illuminated and therefore especially regions<br />
close to the object’s boundaries appear darker in the image. Thus, dark objects on a bright<br />
background may appear enlarged [16]. The effect of shadows is less significant for bright<br />
objects on dark background. In applications that require totally shadow-free conditions<br />
for highly accurate measurements of object contours, another lighting setup called back<br />
lighting can be used, as introduced in the following.<br />
2.2.3. Back lighting<br />
The setup were the object is placed between the light source and the camera, compared<br />
to incident lighting, is denoted as back light illumination. In this arrangement, the light<br />
enters the camera directly leading to bright intensity values at non-occluded regions. The<br />
object, on the other hand, casts a shadow on the image plane, thus, leading to darker<br />
intensity values. Non-translucent materials result in a very strong, shadow-free contrast,
20 CHAPTER 2. TECHNICAL BACKGROUND<br />
which makes back lighting interesting for dimensional measuring tasks. Furthermore,<br />
surface structures or textures can be suppressed. If the only light source is placed below<br />
the object, there will be no shadows around. Back lighting can also be used for localization<br />
of wholes and cracks, or for measuring translucency.<br />
In combination with polarized light, back lighting can also be adapted to enhance the<br />
contrast of transparent materials which are difficult to detect in an image at other lighting<br />
setups. In a typical scenario, polarized light entering the camera directly is filtered out by<br />
an adequate polarization filter in front of the camera lens, while the polarization of the<br />
light is changed when passing through the object. Thus, in opposition to back lighting<br />
without polarization, background regions appear dark in the image while (translucent)<br />
objects result in brighter intensities. Figure 3.9 in Section 3.3 visualizes the effect of back<br />
lighting in combination with a polarization filter.<br />
2.3. Edge Detection<br />
An edge can be defined as particularly sharp change in (image) brightness [24], or more<br />
mathematically speaking a strong discontinuity of the spatial image gray level function<br />
[36].<br />
Beside edges due to object boundaries, there are much more causes for edges in images<br />
such as shadows, reflectance, texture or depth. Thus, simply extracting edges in images<br />
is no general indicator for object boundaries. To yield a semantical meaning, edge information<br />
can be combined with other features including shape, color, texture, or motion.<br />
Model knowledge about expected properties can be useful to group these low-level features<br />
to objects.<br />
In real images there are many changes in brightness (or color), but with respect to a<br />
certain application it may be of interest to extract only the strongest edges or edges of a<br />
certain orientation. Thus, information such as edge strength and orientation have to be<br />
taken into account to link the results of the filter response. Furthermore, in real images<br />
there is also a certain amount of noise in the data which has to be handled carefully.<br />
2.3.1. Edge Models<br />
Edges can be modeled according to their intensity profiles [65].<br />
considered in this thesis are shown in Figure 2.6.<br />
The two edge models<br />
The ideal step edge is the basis for most theoretic approaches. It can be defined as:<br />
�<br />
i1<br />
Eideal(x) =<br />
i2<br />
,x
2.3. EDGE DETECTION 21<br />
i 2<br />
i 1<br />
0<br />
(a)<br />
Figure 2.6: (a) Ideal step edge model. (b) Ramp edge model.<br />
reach a higher precision than the discrete pixel grid (See Section 2.3.4). Ramp edges also<br />
appear if an object is not in focus, or if imaged at motion (motion blur).<br />
There are three common criteria for optimal edge detectors proposed by Canny:<br />
Good detection<br />
Good localization<br />
Uniqueness of response<br />
The first criterion states an optimal edge detector must not be affected by noise, i.e. it<br />
must be robust against false positives (edges due to noise). On the other hand, edges of<br />
interest have to be conserved.<br />
The good localization criterion takes into account the precision of the detected edge<br />
position. The distance between the real edge location and the detected position must<br />
vanish.<br />
The last criterion requires to have distinct and unique results where only the local<br />
maxima of an edge is relevant. Responses of more than one pixel describe an edge location<br />
only poorly and should be suppressed.<br />
The Canny edge detector [13] is designed to optimize all three criteria (See Section 2.3.3<br />
for more details). However, there is definitive tradeoff between the detection and localization<br />
criterion, since it is not possible to improve both criteria simultaneously [65].<br />
2.3.2. Derivative Based Edge Detection<br />
A common way to localize strong discontinuities in a mathematical function is to search for<br />
local extrema in the function’s first-order derivative or for zero crossings in the secondorder<br />
derivative. This principle can be easily adapted to images, thus, replacing the<br />
problem of edge detection by a search for extrema or zero crossings.<br />
In the discrete case, differentiation of the image gray level function f(x, y) can be approximated<br />
by finite differences. Since an image can be seen as a two-dimensional function,<br />
it can be differentiated in both the horizontal and vertical direction, i.e. with respect to<br />
the x- and y-axis respectively. Following the notation of [21], the partial derivatives ∂f<br />
∂x<br />
and ∂f<br />
∂y<br />
can be calculated as:<br />
i2<br />
i 1<br />
0<br />
(b)
22 CHAPTER 2. TECHNICAL BACKGROUND<br />
∂f<br />
∂x (x, y) ∼ = ∆xf(x, y) =f(x +1,y) − f(x, y) (2.14)<br />
∂f<br />
∂y (x, y) ∼ = ∆yf(x, y) =f(x, y +1)− f(x, y) (2.15)<br />
The partial derivative operators ∆x and ∆y can be expressed by a discrete convolution<br />
of the image with the filter kernel [1 -1] and[1-1] T for x- andy-direction respectively<br />
(the ‘center’ elements of the asymmetric kernel are printed in bold). There are other approximations<br />
possible including the mirrored versions of the kernels above or a symmetric<br />
kernel 1/2[1 0 − 1] [36].<br />
Accordingly, the second-order derivative can be approximated by the discrete operators<br />
∆ 2 x =[1 − 2 1] and ∆ 2 y =[1 − 2 1] T .<br />
Under the presence of noise (as usual in real images), edge detectors using the approximations<br />
introduced before work only poor. This is due to the fact that noise is mostly<br />
uncorrelated and is characterized by local changes in intensity. Assuming a uniform region,<br />
a good edge detector should result in a value of zero at this region. With noise the local<br />
intensity variations lead to noticeable responses (and local extrema) if using estimates of<br />
partial derivatives. Therefore, all common edge detectors include a certain smoothing step<br />
to reduce the influence of noise. The selection of the smoothing function, however, can<br />
differ between approaches. The most common smoothing function is a Gaussian.<br />
The Gaussian function is a widespread choice, since it comes with several advantages.<br />
This includes the property of a Gaussian that convolving a Gaussian with a Gaussian<br />
results in another Gaussian. Assume a Gaussian function G1 with standard deviation σ1<br />
and G2 with standard deviation σ2. The result of convolving G1 and G2 is a Gaussian<br />
with standard deviation σG1∗G2 :<br />
σG1∗G2 =<br />
�<br />
σ2 1 + σ2 2<br />
(2.16)<br />
Thus, instead of resmoothing a smoothed image to get a stronger smoothing, it is<br />
possible to use a single convolution with a Gaussian with larger standard deviation. This<br />
obviously saves computational costs, which is important since convolution is an expensive<br />
operation.<br />
Another advantage of a Gaussian kernel is its separability. This means, a two-dimensional,<br />
circularly symmetric Gaussian function Gσ(x, y) can be factored into two one-dimensional<br />
Gaussians (see [24]) as:<br />
Gσ(x, y) =<br />
=<br />
�<br />
1 (x2 + y2 )<br />
exp<br />
2πσ2 2σ2 �<br />
� �<br />
1 x2 √2πσexp 2σ2 �� � �<br />
1 y2 √2πσexp 2σ2 ��<br />
(2.17)<br />
Since convolution is an associative operation, the same results can be achieved by convolving<br />
an image with a two-dimensional kernel, or by applying a convolution once with<br />
the separated version in x-direction and convolve the result with the y-version. In practice,<br />
a convolution with a discrete N × N kernel can be replaced by two convolutions with a
2.3. EDGE DETECTION 23<br />
N × 1 kernel. This increases the performance significantly for large images and N. More<br />
information about convolution and filter separation can be found for example in [64].<br />
The general procedure of edge enhancement in common derivative-based edge detectors<br />
can be summarized into two steps:<br />
1. Smoothing of the image by convolving with a smoothing function<br />
2. Differentiation of the smoothed image<br />
Mathematically, this can be expressed as follows (here with respect to x):<br />
Iedge(x, y) = K∂/∂x ∗ (S ∗ I(x, y)) (2.18)<br />
= (K∂/∂x ∗ S) ∗ I(x, y)<br />
= ∂S<br />
∗ I(x, y)<br />
∂x<br />
where K∂/∂x indicates the filter kernel approximating the partial derivative with respect<br />
to x. S represents the kernel of the smoothing function. Again, the associativity of the<br />
convolution can be used to optimize processing. Thus, instead of first smoothing the<br />
image with kernel S and then calculating the partial derivative, it is possible to reduce<br />
the problem to a single convolution with the partial derivative of the smoothing kernel<br />
∂S<br />
∂x . Hence, the first-order derivative of a Gaussian is suited as an edge detector which is<br />
less sensitive to noise compared to finite difference filters [24]. The response of the edge<br />
detector can be parametrized by the standard deviation of the Gaussian to control the<br />
scale of detected edges, i.e. the level of detail. A larger σ suppresses high-frequency edges<br />
for example.<br />
2.3.3. Common Edge Detectors<br />
Due to the large number of approaches in this section only a selection of common edge<br />
detectors can be presented. Figure 2.7 visualizes the edge responses of different edge<br />
detectors that will be introduced in the following in more detail.<br />
Sobel Edge Detector A very early edge detector that is still used quite often in the<br />
present is the Sobel operator. It was first described in [51] and attributed to Sobel. It is<br />
the smallest difference filter with odd number of coefficients that averages the image in<br />
the direction perpendicular to the differentiation [36]. The corresponding filter kernel for<br />
x and y are:<br />
⎡<br />
SOBELX = ⎣<br />
⎡<br />
SOBELY = ⎣<br />
1 0 −1<br />
2 0 −2<br />
1 0 −1<br />
⎤<br />
1 2 1<br />
0 0 0<br />
−1 −2 −1<br />
⎦ (2.19)<br />
⎤<br />
⎦ (2.20)
24 CHAPTER 2. TECHNICAL BACKGROUND<br />
(a) (b)<br />
(c) (d)<br />
Figure 2.7: Comparison of different edge detectors. (a) Common LENA test image. (b)<br />
Gradient magnitude based on Sobel operator. (c) Edges enhanced via the discrete Laplace<br />
operator. (d) Result of Canny edge detector (Hyteresis thresholds: 150, 100).
2.3. EDGE DETECTION 25<br />
These operators compute the horizontal and vertical components of a smooth gradient<br />
[21], denoted as gx and gy in the following. The total gradient magnitude g at a pixel<br />
position p in an image can be computed by the following equation:<br />
�<br />
g(p) = g2 x(p)+g2 y(p) (2.21)<br />
An example of the gradient magnitude based on the Sobel operator can be found in<br />
Figure 2.7(b). The following approximations can be used in order to save computational<br />
costs:<br />
g(p) ≈ |gx(p)| + |gy(p)| (2.22)<br />
g(p) ≈ max(|gx(p)|, |gy(p)|) (2.23)<br />
These approximations yield equally accurate results on average [22]. Beside the gradient<br />
magnitude it is possible to compute the angle of the gradient as:<br />
� �<br />
gy(p)<br />
φ(p) =arctan<br />
(2.24)<br />
gx(p)<br />
Although there is a certain angular error with the Sobel gradient [36], it is used very<br />
often in practice, since it provides a good balance between the computational load and<br />
orientation accuracy [16].<br />
The Equations 2.21-2.24 are defined not only for the Sobel operator, but for every other<br />
operator that computes the horizontal and vertical gradient components.<br />
Canny Edge Detector Today, the Canny edge detector [13] is probably the most used<br />
edge detector, and is proven to be optimal in a precise, mathematical sense [65]. It is<br />
designed to detect noisy step edges of all orientations and consists of three steps:<br />
1. Edge enhancement<br />
2. Nonmaximum suppression<br />
3. Hysteresis thresholding<br />
The first step is based on a first-order Gaussian derivative as introduced before. For<br />
fast implementations, the separability of the filter kernel can be used to improve the performance.<br />
Gradient magnitude and orientation can be computed as in Equation 2.21 and<br />
2.24, or using the approximations. The standard deviation parameter σ of the Gaussian<br />
function influences the scale of the detected edges. A lower σ preserves more details (highfrequencies),<br />
but also noisy edges, while a larger σ leaves only the strongest edges. The<br />
appropriate σ depends on the image content and what kind of edges should be detected.<br />
The goal of the nonmaximum suppression step is to thin out ridges around local maxima<br />
and return a number of one pixel wide edges [65]. The dominant direction of the gradient<br />
calculated in step one determines the considered neighbors of a pixel. The gradient magnitude<br />
at this position must be larger than both neighbors, otherwise it is no maximum<br />
and its position is set to zero (suppressed) in the edge image.
26 CHAPTER 2. TECHNICAL BACKGROUND<br />
In the last stage of the Canny edge detector, an edge tracking combined with hysteresis<br />
thresholding is applied. Starting at a local maxima that meets the upper threshold of the<br />
hysteresis function, the algorithm follows the contour of neighboring pixels that have not<br />
been visited before and meet the lower threshold. Due to step two, a set of one-pixel wide<br />
contours is the output of the edge detection (see Figure 2.7(d) for an example with an<br />
upper threshold of 150 and a lower threshold of 100).<br />
As in most cases, a thresholding is always a tradeoff between false positives (in this<br />
case edges due to noise) and false negatives (suppressed or fragmented edges of interest).<br />
As with the standard deviation of the Gaussian in step one, the hysteresis thresholds<br />
have to be adapted depending on the particular image content. Methods for estimating<br />
the threshold parameters dynamically from image-statistics are reported for example in<br />
[68] or [29]. There are many variations and extensions of the Canny edge detector. One<br />
popular approach motivated by the Canny’s work is the edge detector of Deriche [19].<br />
Laplace The Laplace edge detector is a common representative for second-order derivative<br />
edge detectors. Recalling edges are localized at zero crossings in the second-order<br />
derivative of an image’s two-dimensional intensity function, the goal is to find zero crossings<br />
that are surrounded by strong peaks.<br />
The Laplacian of a function can be seen as sensible analogue to the second derivative<br />
and is rotationally invariant [24]. It is defined as<br />
∇ 2 (f(x, y)) = ∂2 f<br />
∂x 2 + ∂2 f<br />
∂y 2<br />
(2.25)<br />
As with first-order derivative edge detectors, a smoothing operation to reduce noise<br />
is performed before applying the edge detector, usually with a Gaussian. Analog to<br />
Equation 2.18, the two steps can be combined by applying the Laplacian function to<br />
the Gaussian smoothing kernel before convolution. This leads to an edge detector denoted<br />
as Laplacian of Gaussian (LoG) proposed by Marr and Hildreth [45]. It is quite common<br />
to replace the LoG with a Difference of Gaussians (DoG) [24] to reduce the computational<br />
load.<br />
A discrete Laplace operator can be derived directly from the first-order operators ∆ 2 x<br />
and ∆ 2 y as<br />
L∇2 = ∆ 2 x ⊕ ∆ 2 y (2.26)<br />
= [1 − 2 1]⊕ [1 − 2 1] T<br />
⎡<br />
⎤<br />
0 1 0<br />
= ⎣ 1 −4 1 ⎦<br />
0 1 0<br />
where the ⊕ operator denotes the tensor product [10] in this context. The result of the<br />
discrete Laplace operator applied to the LENA test image can be found in Figure 2.7(c).<br />
Edge detectors based on the Laplacian are isotropic, meaning the response is equally<br />
over all orientations [36]. One drawback of this approach is that second-order derivative<br />
based methods are much more sensitive to noise than gradient-based methods.
2.3. EDGE DETECTION 27<br />
(a) 0 ◦<br />
(b) 90 ◦<br />
(c) 30 ◦<br />
(d) (e) (f) (g)<br />
Figure 2.8: Orientation selective filters based on rotated versions of a first derivative Gaussian<br />
(Images taken from [25]).<br />
Orientation Selective Edge Detection Until now all presented approaches for edge detection<br />
have been more or less isotropic, but there are also many approaches that consciously<br />
exploit anisotropy leading to orientation selective edge detectors. A good overview<br />
on anisotropic filters can be found for example in [69]. These filters have many applications<br />
for example in texture analysis or in the design of steerable filters that efficiently<br />
controltheorientationandscaleoffilterstoextractcertainfeaturesinanadaptiveway.<br />
An orientation selective filter can be generated from a rotated version of an elongated<br />
Gaussian derivative. Figure 2.8 shows an example of different filters that are mostly sensitive<br />
to 0 ◦ ,90 ◦ ,and30 ◦ oriented edges respectively. If many different orientations should<br />
be detected independently in one image, common optimizations exploit the associativity<br />
of the convolution operation. Instead of convolving the image with a large number of different<br />
orientation specific filters, the image is convolved with few basis filters only. Then,<br />
an anisotropic response of an arbitrary orientation can be estimated over a weighted sum<br />
of the basis filter responses. For more information on the technical background of this<br />
approach is referred to the original papers [25, 49].<br />
2.3.4. Subpixel Edge Detection<br />
At image acquisition (e.g. with CCD cameras) light intensity is integrated over a finite,<br />
discrete array of sensor elements. Following the Sampling Theorem [36] this sampling can<br />
be seen as a low-pass filter on the incoming signal, cutting off high-frequencies. Hence,<br />
strong edges, which can be seen as high-frequency, may not be imaged precisely by the<br />
discrete grid. On the other hand, edge detectors that work on pixel level can detect the<br />
real edge position only roughly. The average localization error is 0.5 pixel since the center<br />
of the real edge could be anywhere within the pixel [65].<br />
In many applications such as high precision measuring tasks, detected edges at pixel grid<br />
accuracy are often not accurate enough. Thus, subpixel techniques have been developed<br />
toovercomethelimitsofdiscreteimagesandtocomputecontinuousvaluesthatliein<br />
between the sampled grid.
28 CHAPTER 2. TECHNICAL BACKGROUND<br />
(a)<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
Interpolated<br />
subpixel<br />
edge location<br />
Discrete 1st derivative<br />
Spline Interpolation<br />
Edge Profile<br />
50<br />
0 2 4 6 8 10 12 14 16 18<br />
x<br />
Figure 2.9: (a) Subpixel accuracy using bilinear interpolation. Pixel position P is a local<br />
maximum if the gradient magnitude of gradient g at P is larger than at the positions A<br />
and B respectively. These positions can be computed using bilinear interpolation between<br />
the neighboring pixels 0, 7and3, 4 respectively. The gradient direction determines which<br />
neighbors contribute to the interpolation. The edge direction is perpendicular to the gradient<br />
vector. (b) The discrete first derivative of a noisy step edge is approximated using cubic<br />
spline interpolation. The subpixel tube edge location is assumed to be at the maximum of the<br />
continuous spline function, which can lie in between two discrete positions (here at x =9.5).<br />
Interpolation is the most common technique to compute values between pixels by consideration<br />
of the local neighborhood of a pixel. This includes for example bilinear, polynomial,<br />
or B-spline interpolation. In [21], a linear interpolation of the gradient values within<br />
a3× 3 neighborhood around a pixel is proposed. Here, the gradient direction determines<br />
whichofthe8neighborsareconsidered(seeFigure2.9(a)). Sincethegradientdoesnot<br />
have to fall exactly on pixel positions on the grid, the gradient value is interpolated using<br />
a weighted sum of the two pixel positions respectively that are next to the position where<br />
the gradient intersects the pixel grid (denoted as A and B in the figure). In a nonmaximum<br />
suppression step, the center pixel is classified as edge pixel only if the gradient magnitude<br />
at this position is larger than at the interpolated neighbors. If so, the corresponding edge<br />
is perpendicular to gradient direction.<br />
Since the center pixel P lies still on the discrete pixel grid, one has to perform a second<br />
interpolation step, if higher precision is needed. The image gradient within a certain<br />
neighborhood along the gradient direction (e.g. A-P -B) can be approximated for example<br />
by a one-dimensional spline function [17, 66]. Figure 2.9(b) shows an example of a noisy<br />
step edge between the discrete pixel positions 9 and 10 in x-direction. The discrete first<br />
derivative of intensity profile is approximated with cubic splines. The extremum of this<br />
continuous function can be theoretically detected with an arbitrary precision representing<br />
the subpixel edge position. However, there are obviously limits of what is still meaningful<br />
with respect to the underlying input data. In this example, a resolution of 1/10 pixel was<br />
used. The maximum is found at 9.5, i.e. exactly in between the discrete positions.<br />
Rockett [56] analyzes the subpixel accuracy of a Canny implementation that uses interpolation<br />
by least-square fitting of a quadratic polynomial to the gradient normal to the<br />
detected edge. He found out that for high-contrast edges the edge localization reaches an<br />
(b)
2.4. TEMPLATE MATCHING 29<br />
accuracy of 0.01 pixels, while the error increases to about 0.1 pixels for low-contrast edges.<br />
Lyvers et al. [41] proposed a subpixel edge detector based on spatial moments of a gray<br />
level edge with an accuracy of better than 0.05 pixels for real image data. Aström [6] analyzes<br />
subpixel edge detection by stochastic models. A survey on subpixel measurements<br />
techniques can be found in [71].<br />
2.4. Template Matching<br />
A common task in vision applications is to search whether a particular pattern is part<br />
of an image, and if so, where it is located [28]. Template matching is one method to<br />
tackle this problem. The search pattern or template can be represented as an image and is<br />
usually considerably smaller than the inspected input image. Then, the template is shifted<br />
over the input image and compared with the underlying values. A measure of similarity is<br />
computed at each position. Positions reaching a high score are likely to match the pattern,<br />
or the other way around, if the template matches at a certain location, the score has a<br />
maximum at this location.<br />
A technique denoted as cross-correlation is widely used as measure of similarity between<br />
image patches [64]. It can be derived from the sum of squared differences (SSD):<br />
cSSD(x, y) =<br />
W� −1<br />
i=0<br />
H−1 �<br />
j=0<br />
(T (i, j) − I(x + i, y + j)) 2<br />
(2.27)<br />
where I is the discrete image function and T the discrete template function. W and<br />
H indicate the template width and height respectively. Expanding the squared quantity<br />
yields:<br />
cSSD(x, y) =<br />
W� −1<br />
i=0<br />
H−1 �<br />
j=0<br />
T 2 (i, j) − 2T (i, j)I(x + i, y + j)+I 2 (x + i, y + j) (2.28)<br />
Since the template is constant, the sum over the template patch T 2 (i, j) is constant as<br />
well and does not contain any information on similarity. The same holds approximately<br />
for the sum over the image patch I 2 (x + i, y + j) if there are no strong variances in image<br />
intensity. Hence, the term T (i, j)I(x+i, y +j) remains the only real indicator of similarity<br />
that depends on both the image and the template. This leads to the cross-correlation<br />
equation:<br />
c(x, y) =<br />
W� −1<br />
i=0<br />
H−1 �<br />
j=0<br />
T (i, j)I(x + i, y + j) (2.29)<br />
It turns out that the correlation looks very similar to the discrete convolution. Indeed,<br />
the only difference between correlation and convolution is the sign of the summation in the<br />
second term [28]. Thus, theoretically a correlation can be replaced by a convolution with a<br />
flipped version of the template [64]. Like convolution, correlation is an expensive operation<br />
if applied to large images and templates. In some cases it is faster to convert the spatial<br />
images into the frequency domain using the (discrete) Fast Fourier Transformation (FFT),
30 CHAPTER 2. TECHNICAL BACKGROUND<br />
multiply the resulting transform of one image with the complex conjugate of the other,<br />
and finally reconvert the result to the spatial domain using the inverse FFT [62, 64, 53].<br />
Unfortunately, the assumption of image brightness constancy is weak. If there is, for<br />
example, a bright spot in the image, the cross-correlation results in much larger values at<br />
this position than at darker regions. This may lead to incorrect matches. To overcome<br />
this problem, several normalized correlation methods have been introduced. One common<br />
measure is denoted as correlation coefficient. It can be computed as:<br />
� W −1<br />
i=0<br />
�H−1 � �<br />
j=0 T (i, j) − T �<br />
I(x + i, y + j) − I(x, y)<br />
ccoeff (x, y) =<br />
(2.30)<br />
WHσT σI(x,y) where T represents the mean template brightness and I(x, y) the mean image brightness<br />
within the particular window at position (x, y). σT and σI(x,y) indicate the standard<br />
deviation of the template and the image patch respectively. The resulting values lie in<br />
the range between −1 and 1. Obviously, the correlation coefficient is computational more<br />
expensive. If the standard cross-correlation yields accurate enough results in a certain<br />
application it may be of interest to use a less expensive normalization that simply maps<br />
the results of the cross-correlation into the range of −1 to1. Thiscanbeachievedover<br />
the following equation:<br />
c ′ (x, y) =<br />
� W −1<br />
i=0<br />
���W −1 �H−1 i=0<br />
�H−1 j=0 T (i, j)I(x + i, y + j)<br />
j=0 T (i, j)2 �W −1<br />
i=0<br />
�<br />
�H−1 j=0 I(x + i, y + j)2<br />
�<br />
(2.31)<br />
The term cross-correlation is usually used if two different images are correlated. If one<br />
image is correlated with itself, i.e. I = T ,thetermautocorrelation is commonly used [28].<br />
In practical applications it is often necessary to adapt the template by changing the<br />
orientation or scale to reach maximum matching results [28]. This increases the number<br />
of correlation operations, and thus, the computational load. Therefore, optimization<br />
strategies are used that try to exclude as many positions as possible that are very unlikely<br />
to match a template.
3. Hardware Configuration<br />
This chapter introduces the physical design of the visual inspection prototype. This includes<br />
the conveyor, the camera setup, the choice of illumination as well as the blow out<br />
mechanism. Figure 3.1 gives an overview on the hardware setup of the prototype.<br />
3.1. Conveyor<br />
For the prototype, a 200cm long and 10cm wide conveyor is used to simulate a production<br />
line. It can be manually fit with several tube segments where the exact number depends<br />
on the target length and the distance between two consecutive segments. The measuring<br />
is performed at a certain area of the conveyor denoted as measuring area in the following.<br />
The field of view of the camera is adjusted to this area, as well as the illumination as will<br />
be introduced in Section 3.2 and 3.3 respectively.<br />
The dimension of the measuring area depends on the size of the tubes to be measured.<br />
Therefore, with respect to the range of tube sizes, the measuring area is designed to cover<br />
the maximum tube size of 100mm in length and about 12mm in diameter. It must be<br />
even larger to be able to capture several images of each tube while passing the visual field<br />
of the camera.<br />
Since in production the tubes are cut to lengths from a continuous tube using a rotating<br />
knife (flying knife), there would not be a notable spacing between two consecutive tube<br />
segments if transfered to the measuring area with the same speed as entering the knife.<br />
Thus, it can be difficult to determine where one tube starts and ends in the continuous<br />
line by looking both for humans and artificial vision sensors. To overcome this problem,<br />
after cutting, the tube segments have to fall onto another conveyor with a faster velocity<br />
to separate them. The faster the second conveyor is compared to the first one, the larger<br />
the gap.<br />
Since processing time is expensive, the goal is to simplify the measuring conditions as<br />
much as possible using an elaborated hardware setup. One easy but effective simplification<br />
is to mount two guide bars to the conveyor that guarantee almost horizontal oriented tube<br />
segments. The guide bars are arranged like a narrow ‘V’ (see Figure 3.1(b)). The tubes<br />
enter the guide bars at the wider end and are adjusted into horizontal position while<br />
moving. At the measuring area the guide bars are almost parallel and just slightly wider<br />
than the diameter of the tubes. The distance of the guide bars can be easily changed using<br />
adjusting screws if the tube type changes.<br />
The color and structure of the conveyor belt is crucial to maximize the contrast between<br />
objects and background for the inspection task. Therefore, a white-colored belt is used.<br />
The advantage of this choice with respect to the range of tube types to be inspected in<br />
combination with the illumination setup will be discussed in more detail in Section 3.3.<br />
31
32 CHAPTER 3. HARDWARE CONFIGURATION<br />
(a)<br />
(b)<br />
Figure 3.1: Hardware setup of the prototype in the laboratory environment. (a) Total view.<br />
(b) View on the measuring area.
3.2. CAMERA SETUP 33<br />
3.2. Camera setup<br />
Machine vision applications have high demands on the imaging system, especially if high<br />
accuracy and precision is required. The camera and optical system, i.e. the lens, have to<br />
be selected with respect to the particular inspection task. This section gives an overview<br />
on the imaging system used in this application and how it was selected.<br />
3.2.1. Camera Selection<br />
The main criteria for camera selection with respect to the application in this thesis are:<br />
Image quality<br />
Speed<br />
Resolution<br />
The image quality is essential to allow for precise measurements. This includes a low<br />
signal-to-noise ratio, no or only a little cross-talking between neighboring pixels, and square<br />
pixel elements. As introduced in Section 1.3 the system is intended to work in continuous<br />
mode. Therefore, the speed, i.e. the possible frame rate, of the camera determines how<br />
many images of a tube can be captured within a given time period. Of course, this number<br />
is also depending on the velocity of the conveyor. Especially at higher velocities, a fast<br />
camera is important, since the idea of multi-image measurements fails if the camera is<br />
not able to capture more than one image of each tube that is possible to evaluate. The<br />
final frame rate should depend purely on the per frame processing time. This means, the<br />
camera must be able to capture at least as many frames as can be processed. Otherwise<br />
the camera would be a bottleneck. The frame rate of a camera is closely related to the<br />
image resolution. Higher resolutions mean a larger amount of data to be transferred and<br />
processed. Thus, there is a tradeoff between resolution and speed. A higher resolution<br />
means smaller elements on the CDD sensor array, hence, an object can be imaged more<br />
detailed. With respect to length measurements the effective pixel size decreases at a higher<br />
resolution, and a pixel represents a smaller unit in the real world.<br />
Three cameras have been tested and compared:<br />
Sony DFW VL-500<br />
AVT Marlin F-033C<br />
AVT Marlin F-046B<br />
These cameras are all IEEE 1394 (Firewire) progressive scan CCD cameras.<br />
The Sony camera has a 1/3” image device (Sony Wfine CCD) and provides VGA (640×<br />
480) resolution color images at a frame rate of 30 frames per second (fps). It is equipped<br />
with an integrated 12× zoom lens which can be adjusted over a motor.<br />
The Marlin F-033C is a color camera with a maximum resolution of 656 × 492 pixel in<br />
raw mode, while the F-046B is a gray scale camera with a resolution of 780 × 582 pixel in<br />
raw mode. Both cameras have a 1/2” image device (SONY IT CCD). The Marlin cameras<br />
reach much higher frame rates compared to the Sony. At full resolution, the F-033C
34 CHAPTER 3. HARDWARE CONFIGURATION<br />
R1 G1 R2 G2<br />
G3 B1 G4 B2<br />
P1 P2 P3<br />
Figure 3.2: The sensor elements of single chip color cameras like the Marlin F-033C are<br />
provided with color filters, so that each sensor element gathers light of a certain range of<br />
wavelengths only, corresponding to red, green, and blue respectively. The arrangement of<br />
the filters is denoted as BAYER mosaic. Interpolation is needed to compute the missing two<br />
channels at each pixel. Image taken from [3].<br />
features 74fps and the F-046B 53fps respectively. Since these cameras do not come with<br />
an integrated optical system, a particular lens (C-Mount) must be provided additionally.<br />
AmoredetailedspecificationoftheMarlincamerascanbefoundinAppendixB.1.<br />
It turned out that the Sony camera is not suited for this particular application. The<br />
main reason is the limited frame rate of 30fps, thus, a new image is captured approximately<br />
every 30ms. As mentioned before, the camera speed should not be the bottleneck of the<br />
application. However, as will be shown in Section 5.3.8, the processing time of one image is<br />
significantly less than 30ms, which excludes the Sony camera in this particular application.<br />
The Marlin cameras reach much higher frame rates and come with another advantage.<br />
Since the tube orientation can be considered as horizontal due to the guide bars as introduced<br />
in the previous section, one does not need the whole image height that a camera can<br />
provide. It is possible to reduce the image size to user-defined proportions also denoted<br />
as area of interest (AOI). This function is used to decrease the number of image rows to<br />
be transferred over the firewire connection, but keeping the full resolution in horizontal<br />
direction. For example, in a typical setup an image height of 160 pixels is large enough<br />
to include the whole region between the guide bars. The reduced image size is about 1/3<br />
of the original size. Combined with a short shutter time, the reduced number of image<br />
rows increases the effective frame rate significantly, so it is possible to reach frame rates<br />
of > 100fps.<br />
The decision whether to use the Marlin F-033C or the F-046B depends mainly on the<br />
question if color is a useful feature in this particular application. In general, single chip<br />
color cameras like the F-033C map a scene less accurate compared to gray scale cameras<br />
if image brightness is considered.<br />
This is due to how these cameras are designed. Each sensor cell of a single chip color<br />
camera is provided with a color filter for either red (R), green (G), or blue (B) respectively.<br />
Without these filters the sensor cells are equal to those in gray scale cameras. Usually, the<br />
filters are arranged in a pattern denoted as BAYER mosaic (see Figure 3.2). Within each<br />
2×2 region there are two green, one red, and one blue filter. This distribution is originated<br />
in human vision and leads to more natural looking images, since the human optical system<br />
is most sensitive to green light. The drawback of this approach is that the resolution of
3.2. CAMERA SETUP 35<br />
eachcolorchannelisreduced. Toovercomethisproblem,onehastointerpolatethetwo<br />
missing color channels at each pixel position. There are several interpolation approaches<br />
also denoted as BAYER demosaicing. With respect to speed it is important to use a not<br />
too expensive computation. The F-033C computes R-G-B values at virtual points Pi at<br />
the center of each local 2 × 2 neighborhood as follows [3]:<br />
= R1<br />
P 1red<br />
P 1green = 1<br />
P 1blue<br />
P 2red<br />
P 2green = 1<br />
P 2blue<br />
P 3red<br />
P 3green = 1<br />
P 3blue<br />
2 (G1+G3)<br />
= B1<br />
= R2<br />
2 (G1+G4)<br />
= B1<br />
= R2<br />
2 (G2+G4)<br />
= B2<br />
(3.1)<br />
where the location of the different points can be found in Figure 3.2. Obviously, this<br />
interpolation technique reduces the resolution of the sensor in all channels, since values<br />
can be computed only at positions where four pixels meet and not at the boundaries of<br />
the image. 1<br />
If the inspection task can be performed at gray scale images, gray scale cameras should<br />
be used instead of color cameras. Intuitively, the accuracy of a color camera can not be<br />
the same as that of a gray scale camera, because this requires two interpolation steps.<br />
First, one interpolates the R-G-B color channels as introduced before, and then has to<br />
estimate the image brightness from these interpolated values. A gray scale camera offers<br />
a more direct transformation between light intensity and image values, thus, leading not<br />
only to more accurate images, but also to higher frame rates. This can be supported by<br />
the following experiment.<br />
A test image of graph paper has been captured once with the F-033C and once with the<br />
F-046B. A 16mm fix-focal length lens has been used respectively, and the distance between<br />
camera and graph paper as well as the viewing direction has been the same. The focus of<br />
the optical system was adjusted to obtain a sharp image in both cases. The results can<br />
be found in Figure 3.3. The color image in (a) has been converted into gray level values<br />
using the following equation:<br />
I(x, y) =0.299R(x, y)+0.587G(x, y)+0.114B(x, y) (3.2)<br />
where R, G, B represent the three color channels for red, green, and blue respectively<br />
and I is the resulting gray level image.<br />
The grid appears to be more sharp in the image of the gray scale camera, although the<br />
color image was also at focus during acquisition. The profiles of two scan lines of equal<br />
length through an edge of the grid (visualized in (b) and (d)) can be found in Figure 3.3(e).<br />
1 There are also color cameras that are provided with three chip sensors. The incoming light is split into<br />
different wavelength ranges via a prism. Thus, each sensor yields a full resolution image of one color<br />
channel and interpolation is not necessary. These cameras, however, are quite expensive and could not<br />
be tested.
36 CHAPTER 3. HARDWARE CONFIGURATION<br />
170<br />
160<br />
150<br />
140<br />
130<br />
120<br />
110<br />
100<br />
90<br />
80<br />
(a) (b)<br />
(c) (d)<br />
Marlin F033C<br />
Marlin F046B<br />
70<br />
0 1 2 3 4 5<br />
(e)<br />
Figure 3.3: Comparison of the F-033C color and F-046B gray level camera. The test images<br />
show a graph paper captured from a distance of approximately 250mm using a 16mm fixfocal<br />
length lens. (a) Color image of the F-033C. (b) Zoom view showing the location of the<br />
scan line through a grid edge in the converted gray scale image of (a). (c) Gray scale image<br />
acquired with the F-046B. (d) Zoom view showing the location of the scan line through a grid<br />
edge in (c). (e) Profiles of the two scan lines. The F-046B acquires a significant sharper edge<br />
compared to the color camera which can be seen at the slope of the edge ramp.
3.2. CAMERA SETUP 37<br />
Figure 3.4: Color information of transparent tubes in HSV color space. Rows include from<br />
top to bottom: Color input image, hue channel, saturation channel, value (brightness) channel,<br />
and in the bottom row the computed gray scale image using Eq. 3.2. Although all images are<br />
taken from the same sequence, tubes and background can have very different color.<br />
The position of the scan lines corresponds to the same real world location. It can be seen<br />
that both edges are ramp edges (see Section 2.3.1). The slope of the edge profile, however,<br />
is larger for the gray level camera, i.e. the edge can be located more precise. This is an<br />
important advantage with respect to accurate measuring. Therefore, if color has no other<br />
significant advantage over gray scale images, a gray scale camera should be preferred in<br />
this application.<br />
One can think of using color information to segment the transparent tubes from the<br />
background, since they appear yellowish or reddish while the conveyor belt should be<br />
white. For black tubes, color has obviously no significant benefit, hence, it is adequate to<br />
concentrate on the transparent tubes in this context.<br />
The idea is to use color as a measure to distinguish between transparent tubes and the<br />
background, since here the gray scale contrast is lower compared to black tubes. However,<br />
as can be seen in Figure 3.4, in real images of transparent heat shrink tubes on a conveyor,<br />
the color of the conveyor belt can appear quite different. The test images have been taken<br />
from a sequence of tubes on a moving conveyor. The images have been illuminated via a<br />
back light setup, which will be introduced in Section 3.3. It can be observed that some<br />
regions of the same conveyor belt look yellowish, while others appear blueish in the image<br />
(see left column in Figure 3.4).<br />
There are several color models beside the R-G-B model. Humans intuitively perceive<br />
and describe color experiences in terms of hue (chromatic color), saturation (absence of<br />
white) and brightness [27]. A corresponding color model is the H-S-V model, where H<br />
stands for hue, S for saturation, and V for (brightness) value respectively. More detailed<br />
information on color models can be found for example in [35].<br />
In the hue domain, a yellowish transparent tube differs from a blueish background significantly<br />
(left column in Figure 3.4). If the background is also yellowish, the difference
38 CHAPTER 3. HARDWARE CONFIGURATION<br />
between tube and background decreases (center column). Strong discontinuities in background<br />
color (as in the right column) could be wrongly classified as a tube. The saturation<br />
domain is also a quite unstable feature. If the background contains a lot of white, it is<br />
more desaturated than the object (like in the center column) and yields a quite strong<br />
contrast. The example in the left column, however, shows that the difference in saturation<br />
does not always have to be that clear. The brightness channel (fourth row) is very<br />
close to the computed gray level image using Equation 3.2 (bottom row). Thus, it equals<br />
approximately what a gray level camera would see.<br />
In this experiment it has been shown color can be a very unstable feature. With respect<br />
to precise length measurements it definitely turns out that there are a lot of artifacts at<br />
the tube edges in the H and S color channel respectively. In the brightness channel, edges<br />
appear much more sharp. The little artifacts in this channel are due to the camera noise,<br />
motion blur effects, or not perfectly adjusted camera focus. Since the brightness channel<br />
is closely related to the gray value image converted from R-G-B values using Equation 3.2<br />
one could replace the brightness channel by this image. As can be seen in Figure 3.4, the<br />
bottom row yields even a better contrast between object and background.<br />
With the observations made before, one can conclude that a gray level camera is best<br />
suited in this particular application. It yields the best edge quality, which is important for<br />
precise measurements, and both black and transparent tubes are imaged with a sufficient<br />
contrast between object and background making it possible to locate a tube in the image<br />
without using color information. Hence, the Marlin F-046B camera has been selected for<br />
this prototype. It yields the best compromise in image quality, resolution, and speed.<br />
3.2.2. Camera Positioning<br />
The camera is placed at fix position and viewing angle above the measuring area of the<br />
conveyor (see Figure 3.1(b)). In a calibration step, it is adjusted to position the image<br />
plane parallel to the surface of the conveyor with the optical center above the center of the<br />
measuring area, thus, minimizing the perspective effects at this area. The exact calibration<br />
procedure will be explained in Section 4.3.2. The moving direction of the conveyor (and<br />
therefore of the tube segments) is horizontal in the image.<br />
The distance between camera and conveyor depends on the optical system, i.e. the lens,<br />
that is used and on the tube size to be inspected. In Section 4.2.2, the basic assumptions<br />
and constraints regarding the image content with respect to the image processing are<br />
presented. This includes the assumption that only one tube can be seen totally in an<br />
image at one time. Correspondingly, the cameras field of view has to be adapted to satisfy<br />
this criterion for different tube lengths.<br />
Placing the camera above the conveyor has the additional advantage of not extending<br />
the dimensions of the production line, since space in a production hall is limited and<br />
therefore expensive.<br />
3.2.3. Lens Selection<br />
Parameters such as object size, sensor size of the camera, camera distance, and accuracy<br />
requirements determine the right optical system (objective) for a particular application.<br />
In the following, the term lens will be used synonymously to the term optical system or<br />
objective, although an objective is actually more than just a single lens (iris, case, mount,
3.2. CAMERA SETUP 39<br />
adjusting screws, etc.). The lens, however, is the most important factor that determines<br />
thepropertiesoftheobjective.<br />
The most important parameters to specify a lens include the focal length, F-number,<br />
magnification, angle of view, depth of focus, minimum object distance, and finally the<br />
price. In addition, lenses can have a number of aberrations as introduced before in Section<br />
2.1.3. Lens manufacturers try to minimize for example chromatic or spherical aberrations,<br />
but it is not possible to produce an completely aberration free lens in the general<br />
case (e.g. for all wavelengths of light or angles). In practice, lenses are composed of<br />
different layers of special glass. High precision is needed to produce high quality lenses,<br />
thus, such lenses can be very expensive. There are different lens types available including<br />
fix-focal and zoom lenses. While fix-focal length lenses, as the term indicates, have a fix<br />
focal length, zoom lenses cover a range of different focal lengths. The actual focal length<br />
can be adjusted manually or motorized. For machine vision applications fix-focal length<br />
lenses are usually preferable [40]. If the conditions are highly constrained, the best suited<br />
lens can be selected a priori.<br />
This section should give a brief overview on the most important lens parameters and<br />
motivate the selection of the lens used in this application.<br />
Focal Length In the ideal thin lens camera model, the focal length is defined as the<br />
distance between the lens and the focal point, i.e. the point where parallel rays entering<br />
the lens intersect at the other side (see Figure 2.4). In practice, the focal length value<br />
specified by the manufacturer depends on the lens model used (which is usually unknown)<br />
and does not have to be accurate. In applications that require high accuracy, a camera<br />
calibration step is important to determine the intrinsic parameters of the camera including<br />
the effective focal length with respect to the underlying camera model.<br />
F-number The F-number describes the relation of the focal length to the relative aperture<br />
size such as [18]:<br />
F = f<br />
(3.3)<br />
d<br />
where d is the diameter of the aperture. Thus, the F-number is an indicator of the lightgathering<br />
power of the lens. Typical values are 1.0, 1.4, 2, 2.8, 4, 5.6, 8, 11, 16, 22, and<br />
32 with a constant ratio of √ 2 between consecutive values. A smaller F-number indicates<br />
more light can pass the lens and vice versa. Camera lenses are often specified by the<br />
minimum and maximum F-number, also denoted as iris range.<br />
Magnification In the weak perspective camera model (see Section 2.1.3), the ratio between<br />
focal length and the average scene depth Z0 can be seen as magnification, i.e.<br />
following Equations 2.3 the magnification m is expressed as [24]:<br />
m = f<br />
Z0<br />
(3.4)
40 CHAPTER 3. HARDWARE CONFIGURATION<br />
(a) (b)<br />
Figure 3.5: (a) Standard perspective lens. Closer objects appear larger in the image than<br />
objects of equal size further away. (b) Telecentric lenses map objects of equal size to the same<br />
image size independent of the depth within a certain range of distances. Images are taken<br />
from Carl Zeiss AG (www.zeiss.de)<br />
where Z0 can be seen as the lens-object distance also denoted as working distance in<br />
the following. This gives a good estimate how large an object will appear on the image<br />
plane at a given distance Z0 to the camera with a lens of focal length f.<br />
Depth of Focus Following the thin lens camera model, only points at a defined distance<br />
to the camera will be focused on the image plane. Points at shorter or further distance<br />
appear blurred in the ideal model. In practice, however, points within some range of<br />
distances are in acceptable focus [24]. This range is denoted as depth of focus or depth<br />
of field. This is due to the finite size of each sensor element, since there is no difference<br />
visible in the image if a point is focused on the image plane or not as long as it will not<br />
spread over several pixels [18]. The depth of focus increases with a larger F-number [18].<br />
Minimum Object Distance (MOD) All real lenses have a certain distance at which<br />
points that lie closer to the camera can not be focused anymore. This has both mechanical<br />
and physical reasons. The MOD value is important, since it determines the minimum<br />
distance of the camera to the objects in an application.<br />
AngleofView The angle of view is the maximum angle from which rays of light are<br />
imaged to the camera sensor by the lens. Short focal length lenses have usually a wider<br />
angle of view and therefore are also denoted as wide-angle lenses, while lenses with a larger<br />
focal length have a narrower angle of view. The angle of view determines the field of view<br />
of the camera at a given distance and a certain sensor size, this means what part of the<br />
world is imaged onto the sensor array of the camera.<br />
Commonly short focal lenses are used to capture images of a larger field of view for<br />
example in video surveillance applications that have to cover a larger area. With respect<br />
to machine vision applications, such lenses can also be used for close-up images at a<br />
short camera-object distance. The amount of radial distortion increases with a shorter<br />
focal length. The fish-eye lens is an extreme example for a very short focal length lens.<br />
Increasing the focal length increases the magnification. Thus, even smaller objects at
3.2. CAMERA SETUP 41<br />
further distance can be imaged over the whole image size with such lenses. However, the<br />
minimum object distance is larger for long focal length lenses.<br />
For two-dimensional measuring tasks most accurate and precise results can be achieved<br />
with telecentric lenses (see Figure 3.5). These special lenses are designed to map objects of<br />
the same size in the world to the same image size, even if the object to lens distance differs.<br />
It is important to note that the maximum object size can not be larger than the diameter<br />
of the lens. This makes telecentric lenses useful only in connection with relatively small<br />
objects. In addition, such lenses reach a size of over 0.5m for objects of about 100mm and<br />
a mass of approximated 4kg [18]. Finally, telecentric lenses are very expensive.<br />
Although a telecentric lens would be advantageous in the imaging properties, a less<br />
expensive solution had to be found for the prototype development in this application. The<br />
optical system must be able to map objects between 20 and 100mm to an 1/2” CDD<br />
sensor at a relative short camera-object distance, and which is expected not to be affected<br />
too much by aberrations and radial distortion.<br />
However, this is an optimization problem that has no universal solution for all tube<br />
lengths. Different tube lengths need different magnification factors and field of views if the<br />
maximum possible resolution should be exploited to reach the highest accuracy. Changing<br />
the magnification factor means changing either the focal length of the optical system or<br />
the distance between object and camera, or both. If moving the camera toward the object,<br />
the minimum object distance of the lens has to be considered to be able to yield sharp<br />
images. Zoom lenses could be used to change the focal length without changing the whole<br />
optical system. However, zoom lenses should be avoided in machine vision applications<br />
[40], since they have to make larger compromises than fix-focal lenses and usually have<br />
a minimum working distance of one meter and more. Hence, if using a fix-focal lens,<br />
this implies changing the camera-object distance to adapt to different tube lengths, or to<br />
physically exchange the lens when a new length is cut by the machine which can not be<br />
covert by the current lens.<br />
Several commercial lenses designed for machine vision applications have been compared<br />
to find the lens that is best suited to inspect different tube sizes (see Table 3.1). Figure 3.6<br />
gives an overview on the parameters that influence a camera’s field of view. The angle<br />
of view θ is specified by the lens manufacturer, and is depending on the focal length and<br />
the camera sensor size. All values in the following are oriented at an 1/2” CCD sensor,<br />
since this is the sensor size of the Marlin F-033C and F-046B. The working distance d<br />
is here defined as the distance between lens and conveyor. O represents the object size,<br />
and L indicates the size of the measuring area with respect to a certain tube size. L<br />
canbeapproximatedastwicetheobjectsizeO. The goal is to find a combination of a<br />
lens with a working distance that yields a visual field so that the size V of the imaged<br />
region of the conveyor equals the measuring area L. Note, in this context size can be<br />
replaced by length in horizontal, i.e. in the moving direction of the conveyor, since this<br />
is the measuring direction in this constraint application. Thus, in the following only this<br />
direction is considered.<br />
The geometry in Figure 3.6 leads to the following relationship between θ, d and V :<br />
� �<br />
θrad<br />
V =2dtan<br />
(3.5)<br />
2
42 CHAPTER 3. HARDWARE CONFIGURATION<br />
Figure 3.6: Parameters that influence the field of view (FoV) of a camera. θ indicates the<br />
angle of view of the optical system, d the distance between lens and conveyor, O the object<br />
size, V is the size of the region on the conveyor that is imaged, and L representing the size of<br />
the measuring area depending on the current tube size. The goal is to find a lens that yields<br />
afieldofviewsuchasV ≈ L at short distance.<br />
Model f θ dmin<br />
Pentax H1214-M 12mm 28.91 250mm<br />
Pentax C1614-M 16mm 22.72 250mm<br />
Pentax C2514-M 25mm 14.60 250mm<br />
Pentax C3516-M 35mm 10.76 400mm<br />
Pentax C5028-M 50mm 7.32 900mm<br />
Table 3.1: Different commercial machine vision lenses and there specifications including focal<br />
length f, horizontal angle of view θ (in degrees) with respect to an 1/2” sensor, and minimum<br />
object distance dmin respectively.<br />
where θrad represents the angle of view θ in radians. Using this equation one can<br />
compute the length of the conveyor that is imaged in horizontal direction at the minimum<br />
object distance of a lens. The results can be found in Table 3.2.<br />
This shows, none of the compared lenses is able to image small objects (< 30mm) in<br />
focus onto the camera sensor in a way that the object covers about half the full image<br />
width. Thus, the minimum tube size that can be inspected at full resolution under this<br />
assumption is 30mm. However, if one shrinks the image width manually (for example<br />
using the AOI function of the camera), the constraints can be reached even for tubes<br />
below 30mm.<br />
The real world representation s of one pixel in the image plane can be approximated as<br />
follows:<br />
s = V<br />
Wimg<br />
(3.6)<br />
where Wimg represents the image width in pixels. For example, for a 16mm focal<br />
length lens and a working distance of 250mm, one pixel represents about 0.12mm at this<br />
distance in the real world if the image resolution is 780 in horizontal direction. At the same<br />
distance, a 25mm focal length lens yields a pixel representation of about 0.08mm at the
3.3. ILLUMINATION 43<br />
f V<br />
12mm 129mm<br />
16mm 100mm<br />
25mm 64mm<br />
35mm 75mm<br />
50mm 115mm<br />
Table 3.2: Field of view of different fix-focal length lenses at the specified minimum object<br />
distance.<br />
V f=12mm 16mm 25mm 35mm 50mm<br />
40 •77 •99 •156 •212 •312<br />
60 •116 •149 •234 •318 •469<br />
100 •193 248 390 530 •782<br />
200 381 497 780 1006 1563<br />
Table 3.3: Working distances to yield a certain field of view for different focal length lenses.<br />
Distances that fall significantly below the minimum working distance are marked with a •.<br />
same resolution. Thus, smaller tubes can be measured theoretically at higher precision.<br />
The minimum object distance of the compared lenses, however, represents a certain limit<br />
in precision. Tubes below 30mm can not be measured with higher, but with the same<br />
precision as 30mm tubes. Reminding the tolerances introduced in Section 1.3, smaller<br />
tubes have a smaller tolerance than larger tubes, and 20 − 30mm tubes have the same<br />
tolerance.<br />
At the upper bound, larger tubes need a wider field of view of the camera. Hence, a<br />
larger region is mapped on the same image sensor, so one pixel represents more. For a<br />
200mm measuring area the pixel representation is about 0.25mm. The field of view can<br />
be achieved by placing the camera further away from the object. The distance increases<br />
with the focal length of the lens. Table 3.3 shows the approximated working distance for<br />
the compared lenses that are needed to result in a certain field of view. Distances that<br />
fall below the minimum object distance are marked with a ‘•’.<br />
It turns out that a 16mm focal length lens is best choice for tube lengths between 50<br />
and 100mm, since this lens maps the required measuring areas onto the image plane at the<br />
smallest working distance. However, tubes below 50mm can not be inspected with higher<br />
precision with this lens. In this case, a 25mm focal length lens has to be selected. This<br />
lens is the best compromise for small and large tube sizes. It has the drawback of a large<br />
working distance of up to 780mm for 100mm tubes. Both a 16mm (PENTAX C1614-M)<br />
and a 25mm (PENTAX C2514-M) focal lens have been used in the experiments.<br />
3.3. Illumination<br />
As introduced in Section 2.2, the right choice of illumination is substantial in machine<br />
vision applications. Accurate length measuring of heat shrink tubes requires a sharp contrast<br />
at the tube’s outline, especially at the boundaries that are considered as measuring<br />
points. Any shadows that would increase the tube’s dimension in the 2D image projection
44 CHAPTER 3. HARDWARE CONFIGURATION<br />
(a) (b)<br />
(c) (d)<br />
Figure 3.7: Heat shrink tubes at different front lighting setups. (a) Illumination by two<br />
desktop halogen lamps. Specular reflections at the tube boundaries complicate an accurate<br />
detection. (b) Varying the angle and distance of the light sources as in (a) can reduce reflections.<br />
(c) Professional front lighting setup with two line lights at both tube ends. (d)<br />
Resulting image of the setup in (c). Both in (b) and (d) shadows can not be eliminated<br />
completely. (Images (c) and (d) by Polytec GmbH, Waldbronn, Germany)
3.3. ILLUMINATION 45<br />
(a) (b) (c)<br />
Figure 3.8: Back lighting through different types of conveyor belts. The structure of the<br />
belt determines the amount of light entering the camera, thus, influencing the image quality<br />
significantly.<br />
must be avoided. In addition, the illumination setup should cover both black and transparent<br />
tubes, whereas the transparent tubes are translucent while the black ones are not.<br />
The surface of both materials appears mat under diffuse illumination, but shows specular<br />
reflections if illuminated directly with point light sources.<br />
In a first experiment with standard desktop halogen lamps a front lighting setup was<br />
tested. Two light sources have been placed at low angle to illuminate the tube boundaries<br />
from two sides at the measuring area inside the guide bars. The results are shown in<br />
Figure 3.7(a) and 3.7(b). This setup yielded good results with black heat shrink tubes,<br />
but it turned out to produce unacceptable reflections just at the measuring points with<br />
the transparent ones. Such reflections could be reduced by changing the angle of light<br />
incidence, but still left strongly non-uniform results. Although the halogen lamps are<br />
operated at DC power, the AC/DC conversion of off-the-shelf desktop lamps if often not<br />
stabilized, thus, leading to temporal and spatial variances in image intensities and color.<br />
This effect has been observed throughout the experiments with the desktop lamps at video<br />
frame rates of 50fps.<br />
Using a professional, flicker free, front lighting system with two fiber optical line lights<br />
illuminating the tube ends (see Figure 3.7(c)), the image quality could be increased as can<br />
be seen in Figure 3.7(d). However, there are still a few shadows left.<br />
Experiments with a back light setup have been accomplished, too. A calibrated fiber<br />
optical area light is placed at a certain distance (about 1-2cm) below the conveyor belt.<br />
The light has to shine through the belt, thus, it is important to use a material that is<br />
translucent. A typical belt core consist of a canvas (e.g. cotton) and a rubber coating,<br />
whereas thickness, structure and density of the canvas as well as the color of the rubber<br />
determine how much light can enter the camera. In the optimal case, no light at all would<br />
be absorbed by the belt what is technically hardly possible.<br />
Five different belt types have been tested. Some of the results can be seen in Figure 3.8.<br />
Each sample in this experiment consists of a transparent rubber coating and a white canvas<br />
as base. The structure of the belt canvas is visible in each image as background pattern.<br />
Obviously, the background should not influence the detection of the tube’s boundary.<br />
Thus, the goal is to find a belt type that allows for back lighting without adding to much<br />
unwanted information to the image that could complicate the measurements.<br />
In Figure 3.8(a), the coarse texture of the background significantly affects the tube ends<br />
of the transparent tube at the bottom. A sharp boundary is missing, making accurate<br />
and reliable measurements impossible. The belt type in Figure 3.8(b) has a finer texture,<br />
but transmits only a little amount of light. Figure 3.8(c) shows the belt type that yielded
46 CHAPTER 3. HARDWARE CONFIGURATION<br />
(a) (b) (c)<br />
(d) (e)<br />
Figure 3.9: Polarized back lighting. (a) Image of diffuse back light through polarized glasses<br />
used for viewing 3D stereo projections with no filter in front of the camera. (b) Setup as in (a)<br />
with an opposing polarization filter in front of the camera. Almost no light enters the camera<br />
at the polarized area. (c) Transparent heat shrink tube at polarized back light. There is a<br />
strong contrast at the polarized area, while it is impossible to locate the tube’s boundaries at<br />
the unpolarized area (bottom right). (d) Polarized back light through a conveyor belt. The<br />
polarization is changed both by the belt and the tube, thus, leading to a poor contrast. (e)<br />
For comparison: Back light setup without polarization.<br />
best results both in background texture and transmittance. As can be seen, there are no<br />
shadows at the tube boundaries.<br />
Since the black tubes do not let pass any light rays, the contrast between background and<br />
tube is excellent with all kinds of belt types tested. One advantage of black tubes follows<br />
from this property: The printing on the tube’s surface is not visible in the image. On<br />
the other hand, the transparent tubes do transmit the light coming from below. Positions<br />
covered by the printing show a minor transmittance, hence, the printing is visible in terms<br />
of darker intensity values in the image.<br />
As introduced in Section 2.2.3, polarized back lighting can be used to emphasize transparent,<br />
translucent objects. In an experiment, shown in Figure 3.9, the integration of<br />
polarization filters has been tested. Two polarized glasses originally used for viewing 3D<br />
stereo projections have been employed to polarize the light coming from the area back<br />
light. First, the principle is tested without a conveyor belt. Two opposite polarization filters<br />
are placed between light source and camera. As can be seen in Figure 3.9(b), the area<br />
covered by the two polarization filters at right angle appears black in the image while the<br />
areas without polarization filters are ideally white. A transparent tube between the two<br />
filters changes the polarization, and hence, making it possible that light enters the camera<br />
at locations that have been black before. There is an almost binary contrast between<br />
object and background (see Figure 3.9(c)). At regions that are not affected by the filters,<br />
there is no contrast at all making the tube invisible. Unfortunately these good results<br />
have no practical relevance, since in the real application the light has to pass the conveyor<br />
belt, too. If the belt is placed between the first polarization filter and the object, it also<br />
changes the polarization at regions that do belong to the background (see Figure 3.9(d)).<br />
The binary segmentation is lost and the structure of the conveyor belt is visible again.<br />
While it is not possible to install the first polarization filter between conveyor and tube,<br />
the polarized back light approach has no advantages compared to the unpolarized in this
3.3. ILLUMINATION 47<br />
(a) (b)<br />
Figure 3.10: (a) Installation of the back light panel. The measuring area is illuminated<br />
from below through a translucent conveyor belt. A diffuser is used to yield a more uniform<br />
light and to protect the fiber optical light panel. (b) SCHOTT PANELight Backlight A23000<br />
used for illumination (Source: SCHOTT).<br />
application. On the contrary it has the effect of less light entering the camera which yields<br />
darker images and increases the amount of sensor noise.<br />
As result of the experiments with different lighting techniques, the back lighting setup<br />
has been chosen for the prototype. It offers excellent properties for black tubes and yielded<br />
also very good results for the transparent tubes in connection with a fine structured,<br />
translucent conveyor belt. The incident lighting did not perform better in the experiments.<br />
A light source (SCHOTT DCR III) with a DDL halogen lamp (150W, 20V) has been<br />
selected in combination with the fiber optic area light (SCHOTT PANELight Backlight<br />
A23000) (see Figure 3.10(b)). The panel size is 102 × 152mm. It is installed 20mm below<br />
a cut-out in the conveyor below the belt as can be seen in Figure 3.10(a). A diffuser<br />
between light panel and conveyor belt provides a uniform illumination and protects the<br />
light area against dirt. More details regarding the illumination hardware can be found in<br />
Appendix B.2.<br />
The usage of a fiber optic area light below the conveyor belt has the advantage of a very<br />
low heat development since the light source can be placed outside at a certain distance.<br />
With respect to the characteristics of heat shrink tubes, the avoidance of heat is essential<br />
at this step to prevent deformations. The light is transmitted through a flexible tube of<br />
fibers. If the lamp is out of order it can be exchanged easily without changing anything at<br />
the conveyor. The lifetime of one halogen lamp is about 500 hours at maximum brightness.<br />
To eliminate the influence of illumination from other light sources than the back light,<br />
the whole measuring area including the camera is darkened. This guarantees constant<br />
illumination conditions. For the prototype, a wooden rack has been constructed that is<br />
placed around the measuring area on the conveyor. A thick black, non translucent fabric<br />
can be spanned around the rack leaving only two openings where the tubes enter and leave
48 CHAPTER 3. HARDWARE CONFIGURATION<br />
Figure 3.11: Air pressure is used to sort out tubes that do not meet the tolerances. The<br />
blow out unit consisting of an air blow nozzle, light barrier and a controller (not visible in the<br />
image) is placed at a certain distance behind the measuring area.<br />
the function room darkening. For industrial use this interim solution has to be replaced<br />
by a more robust and compact (metal) case that excludes environmental illumination and<br />
protects the whole measuring system against other outside influences in addition. A slight<br />
overpressure inside the closed case or an air filtering system could be integrated to avoid<br />
dust particles from entering the case through the required openings. Any accumulation of<br />
dust or other dirt on the lens is critical and must be prevented.<br />
3.4. Blow Out Mechanism<br />
After a tube has passed the measuring area the measured length is evaluated with respect<br />
to the given target length and tolerance. The result is a binary good/bad decision for<br />
each particular tube. Good tubes are allowed to pass the blow out unit, which is placed<br />
behind the measuring area at a certain distance. On the other hand, tubes that do not<br />
meetthetoleranceshavetobesortedout. Thisisdonebyairpressure. Aairblownozzle<br />
is arranged to blow out tubes from the conveyor. Therefore, the guide bars have to end<br />
behind the measuring area. The whole blow out setup can be seen in Figure 3.11.<br />
The visual inspection system sends the good/bad decision over a RS-232 connection<br />
(serial interface) to a controller unit in terms of a certain character followed by a carriage<br />
return (‘\r’). The used protocol can be seen in Table 3.4. Once the controller receives an<br />
A or B this message is stored in a first-in-first-out (FIFO) buffer.<br />
Message Code<br />
TUBE GOOD ‘A\r’<br />
TUBE BAD ‘B\r’<br />
RESET ‘Z\r’<br />
Table 3.4: Protocol used for communication between the inspection system and the blow<br />
out controller.
3.4. BLOW OUT MECHANISM 49<br />
A light barrier is used to send a signal to the controller when a tube is placed in front<br />
of the air blow nozzle. If the first entry in the FIFO buffer contains a B, thetubehasto<br />
be blown out and the air blow nozzle is activated. On the other hand, if the first entry<br />
contains an A, the tube can pass. In both cases the first entry in the buffer is deleted.<br />
The advantage of this approach is that the current conveyor velocity does not have to<br />
be known to compute the time a tube needs to move from a point x on the measuring<br />
area to the position of the air blow nozzle. The light barrier guarantees the blow out is<br />
activated when the tube is exactly at the intended position.
50 CHAPTER 3. HARDWARE CONFIGURATION
4. Length Measurement Approach<br />
While the previous chapter focused on the hardware setup, this chapter will present the<br />
methodical part of the system. After a brief overview, the different steps including the<br />
camera calibration and teach-in step as well as the tube localization, measuring point<br />
detection, tube tracking and the good/bad classification are introduced. All assumptions<br />
and the model knowledge used throughout these steps are presented before.<br />
4.1. System Overview<br />
The fundamental concept of the developed system is a so called multi-image measuring<br />
strategy. This means, the goal is to measure each tube not only once, but in as many<br />
images as possible while it is in the visual field of the camera. The advantage of this<br />
approach is that the decision whether a particular tube meets the length tolerances can<br />
be made based on a set of measurements. The total length is computed by averaging over<br />
these single measurements leading to more robust results. Furthermore, the system is less<br />
sensitive to detection errors. Depending on the conveyor velocity and the tube length one<br />
can reach between 2 and 10 measurements per tube.<br />
The system is designed to work without any external trigger that provokes the camera<br />
to grab a frame depending on a certain event, e.g. a tube passing a light barrier. Instead,<br />
the camera is operated in continuous mode, i.e. images are captured at a constant frame<br />
rate using an internal trigger. The absence of an external trigger, however, requires fast<br />
algorithms to evaluate whether a frame is useful, i.e. whether a measurement is possible.<br />
In addition, the system must be able to track a tube while it is in the visual field of the<br />
camera to assign measurements to this particular tube. Accurate length measurements of<br />
tubes require the very accurate detection of the tube edges. A template based tube edge<br />
localization method has been developed allowing for reliable, subpixel accurate detection<br />
resultsevenunderthepresenceoftubeedgelikebackgroundclutter. Oncethereisevidence<br />
that a tube has left the visual field of the camera, all corresponding measurements have<br />
to be evaluated with respect to the given target length and tolerances. The resulting<br />
good/bad decision must be delegated to the external controller handling the air pressure<br />
based blow out mechanism. Model knowledge regarding the inspected tubes under the<br />
constrained conditions is exploited if possible to optimize the processing.<br />
Before any measurements can be performed, the system has to be calibrated and trained<br />
to the particular target length. This includes camera positioning, radial distortion compensation<br />
and an online teach-in step.<br />
Figure 4.1 gives an overview on the different stages of the system. It can also be seen<br />
as outline of this section. The underlying methods and concepts will be introduced in the<br />
following in more detail.<br />
Throughout this chapter all parameters will be handled abstract. Corresponding value<br />
assignments used in the experiments are given in Section 5.1.1.<br />
51
52 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
No<br />
No<br />
Camera calibration<br />
Teach-In<br />
Next image<br />
Tube localization<br />
Measurement<br />
possible?<br />
Yes<br />
Measuring point detection<br />
Length measuring<br />
Tube passed?<br />
Yes<br />
Total length computation<br />
Good/bad classification<br />
Blow out control<br />
Figure 4.1: System overview. After camera calibration and a teach-in step the system<br />
evaluates the acquired images continuously. If a tube is located and assigned as measurable,<br />
the exact measuring points on the tube edges are detected and the tube length is calculated.<br />
Once a tube has passed the visual field of the camera, the computed total length is compared<br />
to the allowed tolerances for a good/bad classification. Finally, the blow out controller is<br />
notified whether the current tube is allowed to pass.
4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 53<br />
(a) ‘empty’ (b) ‘entering’ (c) ‘leaving’<br />
(d) ‘centered’ (e) ‘entering + centered’ (f) ‘ent. + centered + leav.’<br />
(g) ‘centered + leaving’ (h) ‘entering + leaving’ (i) ‘full’<br />
Figure 4.2: Potential image states. Each image can be categorized into one of these nine<br />
states. States that contain one tube completely with a clear spacing to neighboring tubes can<br />
be used for length measuring, i.e. state (d), (e), (f) and (g) respectively. The remaining states<br />
do not allow for a measurement and, thus, can be skipped. State (i) might be due to a too<br />
small field of view of the camera (i.e. tubes are too large), or to a failure in separation (i.e.<br />
the spacing between two or more tubes is missing). If this state is detected, a warning must<br />
be thrown.<br />
4.2. Model Knowledge and Assumptions<br />
The visual length measurement of heat shrink tubes to be proposed throughout this chapter<br />
is based on several assumptions and model knowledge regarding the inspected objects,<br />
which is introduced in the following.<br />
4.2.1. Camera Orientation<br />
As introduced in Section 3.2.2, the camera is placed above the conveyor. It must be<br />
adjusted to fulfill the following criteria:<br />
The optical ray is perpendicular to the conveyor<br />
The image plane is parallel to the conveyor<br />
This camera view is commonly denoted as fronto-parallel view [30]. If the image plane<br />
is parallel to the conveyor, the average scene depth is quite small. Therefore it is possible<br />
to approximate the perspective projection with a weak-perspective camera model. In this<br />
model (see Section 2.1.3) objects are projected onto the image plane up to a constant<br />
magnification factor. This means distances between two points lying in the same plane<br />
are preserved in the image plane until a constant scale factor. This property is important<br />
to allow for affine distance measurements in a fronto-parallel image view.<br />
4.2.2. Image Content<br />
The following assumptions regard the image content and capture properties of the camera:
54 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
(a) Ideal tube model (b) Perspective tube model<br />
Figure 4.3: (a) In the ideal model, the (parallel) projection of a 3D tube corresponds to a<br />
rectangle in the image. The distance d between the left and right edge is equal at each height.<br />
Under a perspective camera, objects closer to the camera appear larger in the image. Hence,<br />
the distance d1, belonging to the points on the tube edges that are closest to the camera,<br />
is larger than d2, and d2 is larger than d3 (the distance of the edge points that are farthest<br />
away). Note, the dashed lines are not visible in the image under back light, and the tube<br />
edges appear convex.<br />
Only one tube is visible completely (with left and right end) in each image at one<br />
time<br />
There is a clear spacing between two consecutive tubes<br />
The guide bars cover the upper and lower border of each image<br />
The guide bars are parallel and in horizontal direction<br />
The moving direction is from the left to the right<br />
The mean intensity of the background (conveyor belt) is brighter than the foreground<br />
(heat shrink tubes)<br />
There is a sufficient contrast between background and objects<br />
The video capture rate is fast enough to take at least one valuable image of each<br />
tube segment so that a length measurement can be performed. (Potentially the<br />
production speed has to be reduced to qualify this constraint)<br />
The image is not distorted, i.e. straight lines in the world are imaged as straight<br />
lines and parallel lines are also parallel in the image<br />
In this application, the variety of image situations to be observed is highly limited and<br />
constraint by the physical setup (see Chapter 3). Thus, it is possible to reduce the number<br />
of potential situations to nine defined states. Each image can be categorized into exactly<br />
one of these states as shown in Figure 4.2 by means of synthetic representatives. Only<br />
four of the nine states are measurable, i.e. state (d), (e), (f) and (g) respectively. In these<br />
states a tube is completely in the image.<br />
4.2.3. Tubes Under Perspective<br />
Under ideal conditions, i.e. with a parallel projection, a tube on the conveyor is represented<br />
by a rectangle in the image plane with the camera setup used (see Figure 4.3(a)). Due
4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 55<br />
Figure 4.4: The plane parallel to the conveyor plane ΠC that goes through the measuring<br />
points PL and PR is denoted as measuring plane ΠM . TheredlineinΠM between PL and<br />
PR corresponds to the measured distance d1 in Figure 4.3(b), i.e. the distance between the<br />
mostouterpointsoftheprojectedtubeedgeinanimage.<br />
to the guide bars this rectangle is oriented parallel to the x-axis in horizontal direction<br />
and parallel to the y-axis in vertical direction respectively. The length can be measured<br />
between the left and right edge of the tube in horizontal direction. The horizontal distance<br />
d is equal between the left and right tube boundary independent of the height. This is an<br />
ideal property for length measurements.<br />
However, if the camera is not provided with a telecentric lens or the camera is not placed<br />
at infinity, the tube’s projection is influenced by perspective. In general, objects that are<br />
closer to the camera are imaged larger than objects further away. Thus, the left and right<br />
tube edge do not appear straight in the image, but curved in a convex fashion due to the<br />
different distances between a point on the tube’s surface and the camera. Figure 4.3(b)<br />
visualizes a synthetic tube under perspective. The distance d1 between the two edge<br />
points closest to the camera is larger than the distances between points farther away.<br />
Accordingly, d2 is larger than d3, although in the real world d1 =d2 =d3 (assuming the<br />
tube is not cut skew). The perspective curvature increases with the distance to the image<br />
center. Thus, the maximum curvature is reached at the image boundaries, while an edge<br />
that lies directly below the optical center of the camera (approximately the image center)<br />
appears straight.<br />
With the constraints regarding the image content it is not possible to look inside a tube<br />
from the camera view if a tube is completely in the image. Therefore, one can assume<br />
that the most outer edge point of the tube edge corresponds always to the point that is<br />
closest to the camera, i.e. measuring between these two points corresponds always to the<br />
same distance in the world.<br />
In the following PL and PR will denote the points on the left and right side respectively<br />
that are closest to the camera. The tube length in the real world is defined as the length<br />
of the line connecting these two points (corresponding to d1 in Figure 4.3(b)). Assuming<br />
a tube has the same height at the left and right side, PL and PR lie in the same plane<br />
denoted as measuring plane ΠM. This plane is assumed to lie parallel to the image plane<br />
as can be seen in Figure 4.4. The measuring points have two correspondences in the image<br />
denoted as pL =(xpL ,ypL )T and pR =(xpR ,ypR )T respectively. The distance between pL<br />
and pR in the image can be related to the real world length up to a certain scale factor.
56 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
However, this scale factor may differ depending on the image position. It is expected<br />
that the distance between pL and pR will be slightly shorter at the image boundaries and<br />
maximal at the image center due to perspective.<br />
4.2.4. Edge Model<br />
Thetubeedgesaremodeledasramp edges as introduced in Section 2.3.1, since this model<br />
describes the real data most adequate both for transparent and black tubes. The slope of<br />
the ramp determines the sharpness of an edge. As steeper the rise (or fall respectively) as<br />
sharper the edge. Obviously, the edge position can be located much more precise if the<br />
ramp has only a minimum spatial extension.<br />
As mentioned before in the technical background section there are several factors that<br />
can cause ramp edges including the discrete pixel grid, the camera focus, and motion blur.<br />
The first factor can be reduced if using a high-resolution camera (reminding the trade off<br />
between resolution and speed as discussed in Section 3.2.1). The camera focus depends<br />
mainly on the depth of an object. In this application, the depth of an object does not<br />
change over time, since all tubes in a row have the same diameter and are lying on the<br />
planar conveyor belt which is parallel to the image plane. In the following it is assumed<br />
that the camera and the optical system are adjusted in way that a tube is imaged as sharp<br />
as possible. Motion is another common parameter influencing the appearance of an edge.<br />
Since the tubes are inspected at motion (up to 40m/min), a short shutter time (exposure<br />
time) of the camera is required. If the shutter time is too large, light rays from one point on<br />
the tube contribute to the integrated intensity values of several sensor elements along the<br />
moving direction. Especially the left and right tube boundary considered for measuring<br />
are affected by motion blur as they lie in the moving direction.<br />
Therefore, it is assumed that the shutter of the camera is adjusted to a very small<br />
exposure time to suppress motion blur as much as possible. A short shutter time requires<br />
a large amount of light to enter the camera at one time. The iris optical system has to be<br />
wide open (corresponding to a little F-number) and the illumination must be sufficiently<br />
bright.<br />
4.2.5. Translucency<br />
Translucency is the main property to distinguish between transparent and black tubes.<br />
Black tubes do not transmit light leading to one uniform black region with strong edges<br />
in the image under back light. In this case, the local edge contrast at a certain position<br />
depends on the background only. On the other hand, transparent tubes transmit light.<br />
However, some part of the light is also absorbed or reflected in directions that do not<br />
reach the camera. Therefore, a tube will appear darker in the image compared to the<br />
background. It will even be darker at positions where the light has to go through more<br />
material. This leads to two characteristic dark horizontal stripes at the top and bottom of<br />
a transparent tube as can be seen in Figure 4.5. This model knowledge has been exploited<br />
to define a robust feature for edge localization which can still be detected in situations<br />
where the contrast at the center of the edge is poor.<br />
The printing on the tubes also reduces the translucency and is therefore visible on<br />
transparent tubes in the image. On average it covers about 8% of a tube’s surface along<br />
the perimeter for 6, 8, and 12mm diameter tubes.
4.2. MODEL KNOWLEDGE AND ASSUMPTIONS 57<br />
Figure 4.5: The image intensity of transparent tubes is not uniform as for black tubes.<br />
Depending on how much light can pass through a tube, regions appear darker or brighter.<br />
One characteristic of transparent tubes under back light are two dark horizontal stripes at<br />
the top and the bottom of a tube indicated by the arrows. The printing also reduces the<br />
translucency and thus appears darker in the image.<br />
4.2.6. Tube Orientation<br />
The tube orientation is highly constrained by the guide bars as introduced in Section 3.1.<br />
Thus, an approximately horizontal orientation can be assumed throughout the design of<br />
the inspection algorithms.<br />
In practice, the distance between the guide bars is slightly larger than the outer diameter<br />
of a tube to prevent a blockage, since tubes may not be ideally round. This means, the<br />
cross-section of a tube can be elliptical instead of circular. Let dspace denote the vertical<br />
distance between the guide bar distance dGB, and hmax the maximum expected tube<br />
extension in vertical direction with respect to the image projection. The remaining spacing<br />
distance can be expressed as dspace = dGB − hmax ascanbeseeninFigure4.6(a).<br />
The maximum possible rotation is reached if the tube hits both guide bars at two points<br />
(see Figure 4.6(b)). The maximum angle of rotation θmax can be defined as the angle<br />
between the longitudinal axis of the tube and the x-axis. One can define an unrotated<br />
version of the tube with the longitudinal axis parallel to the x-axis and shifted so that the<br />
two axis intersect at the center of gravity of the rotated tube. In Figure 4.6(b) this virtual<br />
tube is visualized as dashed rectangle. The distance between the measuring points of the<br />
rotated and the ideal horizontal tube can be also seen in the Figure and are denoted as<br />
dL and dR for the left and right tube side respectively. Both dL and dR are ≤ dspace/2. If<br />
thetubeisnotbent,dL = dR. The maximum error between the ideal distance l and the<br />
rotated distance l ′ can be estimated as follows:<br />
errθ = l ′ − l<br />
�<br />
(4.1)<br />
= l2 + d2 space − l<br />
For example, in a typical setup for 50mm tubes of 8mm diameter one tube has a length of<br />
approximately 415 pixels and dspace = 15. This leads to an error of errθ =0.27pixel. Thus,<br />
with one pixel representing 0.12mm in the measuring plane, the acceptable maximum error<br />
due to orientation would be about 0.03mm. On average this error will be even smaller.<br />
Based on these estimation, the orientation error is neglected in the following, i.e. all tubes<br />
are assumed to be oriented ideally horizontal.
58 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
(a)<br />
(b)<br />
Figure 4.6: (a) The guide bar distance dGB and the maximum extension of a tube in vertical<br />
direction hmax define the maximum space between a tube and the guide bars dspace at ideal<br />
horizontal orientation. (b) The maximum possible tube orientation is limited by the guide<br />
bars. The angle θ between the longitudinal axis of the tube and the ideal measuring distance<br />
parallel to the x-axis determines the maximum distance the measuring point can be displaced<br />
by rotation (dL = dR ifthetubeisnotbent).Thisdistanceis≤ dspace/2 and can be used to<br />
estimate the error due to rotation between the ideal tube length l andtherotateddistancel ′ .
4.3. CAMERA CALIBRATION 59<br />
4.2.7. Background Pattern<br />
As introduced in Section 3.3, the measuring area is illuminated by a back light setup below<br />
the conveyor belt. This setup emphasizes the structure of the belt which can be seen as a<br />
characteristic pattern in the image. This pattern may differ between different belt types.<br />
Depending on the light intensity it is possible to eliminate the background completely. If<br />
the light source is bright enough, the background appears uniform white even with a short<br />
shutter. For black tubes such an overexposed image would lead to an almost binary image.<br />
Transparent tubes, however, do also disappear under too bright illumination. Hence, there<br />
will be always a certain amount of background structure visible in the image in practice.<br />
The strength of the background pattern increases with lower light intensity.<br />
In the following, it is generally assumed that the illumination is adjusted to allow for<br />
distinguishing between a tube edge and edges in the background. Larger amounts of dirt<br />
or other particles than heat shrink tubes on the conveyor must be prevented.<br />
4.3. Camera Calibration<br />
In the previous section several assumptions regarding the camera position and the image<br />
content have been presented. With respect to accurate measurements it is important that<br />
an object is imaged as reliably as possible, this means, straight lines should appear straight<br />
and not curved in the image, parallelism should be preserved, and objects of the same size<br />
should be mapped to the same size in the image. Unfortunately, the later properties do<br />
not hold in the perspective camera model as introduced before. However, under certain<br />
constraintsitispossibletominimizetheperspectiveeffects.<br />
If the internal camera parameters are known including the radial and tangential distortion<br />
coefficients, it is possible to compute an undistorted version of an image. After<br />
undistorting, straight lines in the world will appear as straight lines in the image. Furthermore,<br />
if one can arrange the camera in way that objects of equal size are projected<br />
onto the same size in the image within the camera’s field of view at a constant depth, one<br />
can assume that the image plane is approximately parallel to the conveyor.<br />
In the following the calibration method used to receive the intrinsic camera parameters<br />
as well as a method to arrange the camera in a way that perspective effects are minimized<br />
is presented.<br />
4.3.1. Compensating Radial Distortion<br />
To compensate for the radial distortion of an optical system, one needs to compute the<br />
intrinsic camera parameters. Since the intrinsic parameters can be assumed to be constant<br />
if the focal length is not changed, the calibration procedure does not have to be repeated<br />
every time the system is started and therefore can be precomputed offline.<br />
The common Camera Calibration Toolbox for Matlab of Jean-Yves Bouguet [9] is used<br />
for this purpose. It is closely related to the calibration method proposed in [74] and [31].<br />
The calibration pattern required in this method is a planar chessboard of known grid size.<br />
The calibration procedure has to be performed for each lens separately. The camera is<br />
placed at a working distance of approximately 250mm over the measuring area with a<br />
16mm fix-focal lens. It is adjusted to bring tubes with a diameter of 8mm at this distance<br />
intofocus(inthemeasuringplaneΠM).
60 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
Figure 4.7: 16 sample images used for calibrating the intrinsic camera parameters.<br />
16 images of a 21 × 10 chessboard of 2.5mm grid size at different spatial orientations<br />
around the measuring plane ΠM have been acquired. A selection of this images can be<br />
found in Figure 4.7.<br />
In each image the outer grid corners have to be selected by hand. The remaining corners<br />
are then extracted automatically at subpixel accuracy as can be seen in Figure 4.8. The<br />
coordinate axis of the world reference frame are also visualized. The Z axis is perpendicular<br />
to the chessboard plane in direction to the camera.<br />
The result of this calibration procedure are the intrinsic camera parameters including<br />
the radial distortion coefficients. The Camera Calibration Toolbox for Matlab allows also<br />
for visualization of the extrinsic location of each of the 16 calibration pattern with respect<br />
to the camera as shown in Figure 4.9. The actual working distance of approximately<br />
250mm is reconstructed very well. The resulting radial distortion model can be found in<br />
Figure 4.10. In Section 3.2 the area of interest function of the camera has been introduced<br />
since the whole image height is not needed. Obviously, the goal is to select the location of<br />
this area with respect to minimum distortions. The position of the AOI within a full size<br />
image is visualized by the red lines, i.e. only pixels between these lines are considered.<br />
4.3.2. Fronto-Orthogonal View Generation<br />
Once distortion effects have been compensated, the goal is to yield a view of the measuring<br />
area in which the world plane, e.g. the conveyor belt, is parallel to the image plane. There<br />
are two main strategies that can be applied.<br />
In the first strategy the camera is positioned only roughly. Afterward the perspective<br />
image is warped to yield an optimal synthetic fronto-orthogonal view of the scene. In the<br />
second strategy the camera is adjusted as precise as possible so that the resulting image<br />
is approximately fronto-orthogonal and does not need any correction.
4.3. CAMERA CALIBRATION 61<br />
Figure 4.8: Extracted grid corners at subpixel accuracy. The upper right corner is defined as<br />
origin O of the world reference frame. The directions of the X and Y axis are also visualized<br />
while the Z axis is perpendicular to the chessboard plane in direction to the camera.<br />
20 11<br />
0<br />
20<br />
40 20020<br />
O<br />
c Z<br />
X c<br />
c<br />
Y<br />
c<br />
0<br />
Extrinsic parameters (camera centered)<br />
50<br />
100<br />
150<br />
15<br />
14 6<br />
312<br />
516<br />
217 4<br />
1079<br />
81<br />
13<br />
Figure 4.9: Reconstructed extrinsic location of each calibration pattern relative to the camera.<br />
The working distance of approximately 250mm is detected very well.<br />
200
62 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
Figure 4.10: Visualization of the resulting radial distortion model. The computed center of<br />
distortion indicated by the ‘◦’ is slightly displaced from the optical center (‘×’). The image<br />
area of interest considered in this application lies in between the red lines.<br />
Perspective Warping One possibility to compute a synthetic fronto-orthogonal view of<br />
an image is based on the extrinsic relationship of the camera plane and a particular<br />
world plane (e.g. conveyor plane) that can be extracted in a calibration step. With the<br />
extrinsic parameters it is possible to describe the position and orientation of the world<br />
plane in the camera reference frame. Finally, one can compute a transformation that<br />
maps the world plane into a plane parallel to the image plane or vice versa, and warp the<br />
image to a synthetic fronto-orthogonal view. This approach has a significant drawback.<br />
First of all, the accuracy of the results is closely related to the calibration accuracy.<br />
Furthermore, the extrinsic parameters of a camera change if the camera is moved even<br />
slightly compared to the intrinsic parameters that can be assumed constant as long as<br />
the focus is not changed. Thus, one has to recalibrate the extrinsic parameters as well as<br />
the transformation parameters every time the camera is moved, which seemed to be not<br />
practicable in this particular application.<br />
There are other methods that can be used to compute a fronto-orthogonal view of an<br />
perspective image, which are based on characteristic image features such as parallel or<br />
orthogonal lines, angles, or point correspondences and do not need any knowledge on the<br />
interior or exterior camera parameters [30]. One common approach is based on point<br />
correspondences of at least 4 points xi and x ′ i with x′ i = Hxi (1 ≤ i ≤ 4) and<br />
⎡<br />
H = ⎣<br />
h1 h2 h3<br />
h4 h5 h6<br />
h7 h8 h9<br />
⎤<br />
⎦ (4.2)<br />
the projective transformation matrix representing the 2D homography.<br />
The unknown parameters of H can be computed in terms of the vector cross product<br />
x ′ i × Hxi = 0 using a Direct Linear Transformation (DLT) [30]. To correct the perspective<br />
of an image one has to find four points in the image that lie on the corners of a rectangle<br />
in the real world, but are perspectively distorted in the image. These points xi have to be<br />
mapped to points x ′ i that represent the corners of a rectangle in the image. Then, after H
4.3. CAMERA CALIBRATION 63<br />
is computed, each point in the image is transformed by H. Obviously, this is an expensive<br />
operation for larger images. Furthermore, in practice the question is where to place the<br />
calibration points. One possibility is to place them on top of the guide bars. The system<br />
could automatically detect the calibration points and check whether these points lie on a<br />
rectangle in the affine image space. This requires a very accurate positioning of the guide<br />
bars, and all marker points should be coplanar, i.e. lie in one plane. Assuming one can<br />
solve this mechanical problem there is still another problem, since - depending on how the<br />
destination rectangle is defined - the warped image may be scaled. In any case, warping<br />
discrete image points requires interpolation since transformed points may fall in between<br />
the discrete grid. Obviously, this can reduce the image quality.<br />
Online Grid Calibration Although the previous described approach does not require an<br />
accurate positioning of the camera, there are several drawbacks especially with respect to<br />
performance and image reliability. If there is a way to adjust the camera perfectly one<br />
does not need warping and perspective correction. However, a human operator must be<br />
able to perform this positioning task in an appropriate time.<br />
Therefore, an interactive camera positioning method has been developed denoted as<br />
Online Grid Calibration.<br />
First, the distance of the parallel guide bars has to be adjusted to the current tube size.<br />
Then, a planar chessboard pattern of known size is placed between the guide bars on the<br />
conveyor within the visual field of the camera. The horizontal lines on the chessboard must<br />
be parallel to the guide bars (see Figure 4.11). To simplify the adjustments, a mechanical<br />
device may be developed that can be placed in between the guide bars combining the<br />
function of a spacer bringing the guide bars into the right distance, and the calibration grid<br />
that perfectly fits into the space between the guide bars with the designated orientation.<br />
The underlying idea is as follows: If the chessboard is imaged in a way that vertical lines<br />
in the world are vertical in the image and horizontal lines appear horizontal respectively,<br />
while each grid cell of the chessboard results in the same size in the image, the camera is<br />
adjusted accurate enough to yield a fronto-orthogonal view.<br />
The process of camera adjustment can be simplified if the operator gets a feedback in<br />
real-time of how close the current viewing position is to the optimal position. Therefore,<br />
the live images of the camera are overlaid with an optimal visual grid of squares. This grid<br />
can be parametrized by two points, i.e. the upper left corner and the lower right corner<br />
respectively as well as the vertical and horizontal size of each grid cell. The operator can<br />
move the grid in horizontal and vertical direction and adjust the size. This is a good<br />
feature to initialize the grid or to perform the fine adjustments.<br />
For each image, the correspondence between the overlaid virtual grid and the underlying<br />
image data is computed. A two step method has been developed. At first the image<br />
gradient both in vertical and horizontal direction is extracted using the SOBELX and<br />
SOBELY operator. This information can be used to approximate the gradient magnitude<br />
and orientation (see Equation 2.21 and 2.24). Since there is a strong contrast between<br />
the black and white chessboard cells, the gradient magnitude at the edges is strong as<br />
well. If the virtual grid matches the current image data, the gradient orientation φ(p)<br />
on horizontal grid lines must be ideally π/2 or3π/2 respectively depending on whether<br />
an edge is a black-white or white-black transition. Remind that the gradient direction is<br />
always perpendicular to the edge. Correspondingly, vertical grid lines have orientations of
64 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
(a)<br />
(b)<br />
Figure 4.11: Online Grid Calibration using a 5 × 5mm chessboard pattern. (a) Calibration<br />
image distorted by perspective. The goal in this calibration step is to adjust the camera in a<br />
way that the chessboard pattern perfectly fits the overlaid grid as in (b).<br />
0orπ. Inpractice,thegradientorientationisallowedtobeinanarrowrangearoundthe<br />
ideal orientation, since the computation of φ(p) is only an approximation that estimates<br />
the real orientation up to an epsilon (see Figure 4.12(b)). Thus, theoretically each position<br />
on the virtual grid must meet the orientation constraints. In addition, the gradient<br />
magnitude must reach a certain threshold to prevent that edges induced by noise influence<br />
the calibration procedure.<br />
To reduce the computational load only a selection of points on the grid denoted as<br />
control points is considered. The position of these points can be seen in Figure 4.12(a).<br />
The ratio of grid matches to the total number of control points can be seen as score<br />
of correspondence. If the score reaches a threshold, e.g. more than 95% of all checked<br />
positions on the virtual grid match the real image data, the second step of the calibration<br />
is started.<br />
The second step concentrates on the size of each grid cell. Assuming negligible perspective<br />
effects if the camera is perfectly positioned, all grid cells should have the same<br />
size in the image. To compute the size of each grid cell as accurate as possible, the real<br />
edge location of the grid is detected with subpixel precision within a local neighborhood<br />
of each control point on the virtual grid. Therefore, the gradient magnitude of a 7 × 1<br />
neighborhood perpendicular to the grid orientation at a given control point is interpolated<br />
using cubic splines. Then, the width and height of a grid cell can be determined over<br />
the affine distance between two opposed subpixel grid positions. Finally, the mean grid<br />
size and standard deviation can be computed both for width and height. The standard<br />
deviation is used as measure of how close the current camera viewing position equals a<br />
fronto-orthogonal view. Ideally, if all squares have equal size, the standard deviation is<br />
zero. In practice the standard deviation is always larger than zero for example due to<br />
noise, edge localization errors, or a remaining small error of perspective. Experiments
4.4. TUBE LOCALIZATION 65<br />
(a) (b)<br />
Figure 4.12: (a) Control points (marked as crosses) are used to adjust the virtual calibration<br />
grid of width w and height h to the underlying image data. (b) Gradient orientation φ at<br />
each control point. Since the computed values are only an approximation, a narrow range of<br />
orientations indicated by the gray cones around the ideal orientation is also seen as match.<br />
have shown that it is possible to adjust the camera within an acceptable time to yield a<br />
100% coverage in step one and a grid standard deviation of less than 0.3pixels. In this<br />
case the camera is assumed to be adjusted good enough for accurate measurements.<br />
4.4. Tube Localization<br />
Since the system is intended to work without any external trigger (e.g. a light barrier)<br />
that gives a signal whenever a tube is totally in the visual field of the camera, the first<br />
step before further processing of a frame is to check whether there is a tube in the image<br />
that can be measured or not. If there is no tube in the image or only in parts, this image<br />
can be neglected. This decision has to be very fast and reliable.<br />
4.4.1. Gray Level Profile<br />
To classify an image into one of the states proposed in Section 4.2.2, an analysis of the intensity<br />
profile along the x-axis is performed. Strong changes in intensity indicate potential<br />
boundaries between tubes and background.<br />
In ideal images as be seen in Figure 4.2, the localization of object boundaries is almost<br />
trivial with standard edge detectors (see Section 2.3). In real image sequences, however,<br />
there are many changes in intensity of different origin that do not belong to the boundaries<br />
of a tube, e.g. caused by the background pattern (see Figure 3.8) or by dirt on the conveyor<br />
belt. Furthermore, the printing on transparent tubes, visible in the image using back light<br />
illumination, influences the intensity profile as will be seen later on.<br />
The intensity profile ˆ Py of an image row y can be formally defined as
66 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
250<br />
200<br />
150<br />
100<br />
50<br />
(a) transparent, 50mm length, ∅8mm (b) black, 50mm length, ∅8mm<br />
gray level profile<br />
0<br />
0 100 200 300 400 500 600 700<br />
(c)<br />
250<br />
200<br />
150<br />
100<br />
50<br />
gray level profile<br />
0<br />
0 100 200 300 400 500 600 700<br />
Figure 4.13: Sample images with 11 equally distributed vertical scan lines used for profile<br />
analysis within a certain region of interest. (c) and (d) show the resulting profiles of image<br />
(a) and (b) respectively.<br />
ˆPy(x) =I(x, y) (4.3)<br />
where I(x, y) indicates the gray level value of an image I at pixel position (x, y). Since<br />
a single scan line (e.g. ˆ P h/2 with h the image height) is very sensitive to noise and local<br />
intensity variations, the localization of the tube boundaries based on the profile of a single<br />
row can be error-prone. Hence, a set of n parallel scan lines is considered. The mean<br />
profile Pn of all n lines is calculated by averaging the intensity values at each position:<br />
Pn = 1<br />
n<br />
n�<br />
ˆPyi<br />
i=1<br />
(d)<br />
(4.4)<br />
One property of the resulting profile Pn is the projection of a two-dimensional to an onedimensional<br />
problem which can be solved even faster (processing speed is a very important<br />
criteria at this step of the computation). Since further processing steps with respect to Pn<br />
are independent of the number of scan lines n (n ≥ 1), Pn is denoted simply as P in the<br />
following. A more detailed view on the number of scan lines and the scan line distribution<br />
with respect to robustness and performance is given in Appendix A. In the following Nscan<br />
denotes the number of scanlines used.<br />
4.4.2. Profile Analysis<br />
Step 1: The first step is smoothing the profile P by convolving with a large 1D mean<br />
filter kernel of dimension Ksmooth:
4.4. TUBE LOCALIZATION 67<br />
�<br />
1<br />
Psmooth = P ∗<br />
Ksmooth<br />
�<br />
1<br />
Ksmooth<br />
��<br />
...<br />
�<br />
1<br />
Ksmooth<br />
�<br />
Ksmooth times<br />
(4.5)<br />
The idea of this low pass filtering operation is to reduce the high-frequency components<br />
in the profile, thus, especially the structure of the background pattern.<br />
Obviously, this step also blurs the tube edges, and therefore reduces the detection precision<br />
significantly. Having in mind the goal of the profile analysis, it is intended to verify<br />
whether a measurement is possible in the current frame or not. In a next step, the proper<br />
measurements have to be performed on the original image data and not on the profile.<br />
However,knowledgeofthisfirststepdoesnothavetobediscardedandcanbeusedinstead<br />
to optimize the following. In other words, if it is possible to predict a tube’s boundaries<br />
reliable, but not precise, this information is then used to define a region of interest (ROI)<br />
as close as possible around the exact location.<br />
Step 2: The next step is to detect strong changes in the profile. Large peaks in the first<br />
derivative of the profile indicate such changes and can be considered as candidates for<br />
tube boundaries. Therefore, a convolution with a symmetric 1D kernel approximating the<br />
first derivative of a Gaussian is performed:<br />
Pdrv = Psmooth ∗ Dx<br />
(4.6)<br />
The odd symmetric 9 × 1 filter kernel Dx is given by the following filter tab as proposed<br />
in [25] for the design of steerable filters:<br />
tab 0 1 2 3 4<br />
value 0.0 0.5806 0.302 0.048 0.0028<br />
With this kernel a dark-bright edge results in a negative response while a bright-dark<br />
edge leads to a positive response. The intensity of the response is proportional to the<br />
contrast at the edge.<br />
Assuming the potential tube boundaries have a sufficient contrast, only the strongest<br />
peaks of Pdrv are of interest for later processing. To simplify the task of peak detection,<br />
theabsolutevaluesofthedifferentiatedprofilearetakenintoaccountonly. Thisisdenoted<br />
as follows:<br />
as P +<br />
drv<br />
P +<br />
drv = |Pdrv| (4.7)<br />
Note that the information of the sign of a peak in Pdrv is still useful for later classification<br />
and has not to be discarded.<br />
Step 3: A thresholding is performed on P +<br />
drv to eliminate smaller peaks that correspond<br />
for example to changes in intensity due to the background pattern or dirt:<br />
� +<br />
P<br />
Pthresh(x) = drv (x)<br />
0<br />
+<br />
, if P drv (x) >τpeak<br />
, otherwise<br />
(4.8)
68 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
The threshold τpeak is calculated dynamically based on the mean of P +<br />
drv<br />
P +<br />
drv with<br />
τpeak = αpeakP +<br />
drv<br />
denoted as<br />
(4.9)<br />
The factor αpeak indirectly relates to the number of peaks left to be further processed.<br />
τpeak is also denoted as profile peak threshold. The goal is to remove as much peaks as<br />
possible that do not belong to a tube’s boundary without eliminating any relevant peak.<br />
If the images are almost uniform over larger regions as for black tubes, there are only<br />
a few strong changes in intensity. Thus, P +<br />
drv is expected to be quite low compared to<br />
max(P +<br />
) and the peaks belonging to the tube boundaries are conserved even for a larger<br />
drv<br />
αpeak. On the other hand, for transparent tubes the contrast between foreground and<br />
background is lower. Hence, the distance between intensity changes due to background<br />
clutter and those at the tube boundaries is much smaller. The choice of the right threshold<br />
is more critical in this situation and αpeak has to be selected carefully. If it is too low,<br />
too many peaks will survive the thresholding. Otherwise if it is too large, important<br />
peaks will be eliminated as well. The profile peak threshold is closely related to the<br />
detection sensitivity of the system as will be discussed in more detail in later sections.<br />
More sophisticated calculations of τpeak considering the difference between maximum value<br />
and mean or the median did not perform better.<br />
Step 4: The x-coordinates of the remaining peaks defined as local maxima in Pthresh<br />
arestoredinalistdenotedascandidate positions Ω in ascending order. NΩ indicates the<br />
number of elements in Ω, i.e. the number of potential tube boundaries in an image.<br />
4.4.3. Peak Evaluation<br />
The process described in the previous section results in a number of candidate positions<br />
that have to be evaluated since it is possible that there are more candidate positions<br />
than the number of tube boundaries. This is due to the fact that the thresholding is<br />
parametrized to avoid the elimination of relevant positions. The actual number of tube<br />
boundaries indicating the current state as introduced in Section 4.2 is not known by now<br />
and has to be extracted by applying model knowledge to the candidate positions.<br />
Since only four of the nine possible states can be used for measuring, it is of interest to<br />
know whether the current image matches one of these four states. If this is the case, it is<br />
sufficient to localize the boundaries of the centered tube. Under the assumptions made in<br />
Section 4.2 only one tube can be in the visual field of the camera completely at one time.<br />
In the following, an approach reducing this problem to an iterative search for boundaries<br />
that belong to a single foreground object is presented.<br />
First, Ω is extended to Ω ′ by two more x-positions: x = 0 at the front and x = xmax<br />
at the back of the list, where xmax is the largest possible x-coordinate in the profile.<br />
Then, any segment s(i), defined as the region between two consecutive positions Ω(i) and<br />
Ω(i + 1) , can be assigned to one of two classes in {BG, TUBE} representing background<br />
and foreground respectively. In this way, the whole profile is partitioned into NΩ +1<br />
segments if there are NΩ peaks.
4.4. TUBE LOCALIZATION 69<br />
Global Threshold The classification into BG and TUBE is based on the general assumption<br />
that the mean intensity of objects is darker than the background. In more detail,<br />
taking the mean value of the smoothed profile Psmooth as a global reference and calculating<br />
the local mean value for each segment s(i), the classification C can be expressed as:<br />
�<br />
TUBE , mean(s(i)) ≤ Psmooth<br />
C1(s) =<br />
(4.10)<br />
BG , otherwise<br />
In image segmentation the mean value is widely used as an initial guess of a threshold<br />
separating two classes of data distinguishable via the gray level [48, 2]. There are many<br />
more sophisticated approaches for threshold selection including histogram shape analysis<br />
[57, 63, 26], entropy [54], fuzzy sets [20, 14] or cluster-based approaches [55, 46]. The different<br />
techniques are summarized and compared in several surveys [59, 47, 60]. However,<br />
in this application the threshold is used for classification and it is not intended for calculation<br />
of a binary image that segments the tubes from the background. Since processing<br />
time is strictly limited and critical in this application, it is essential to save computation<br />
time if possible. As introduced before, the actual segmentation is based on strong vertical<br />
edges in the profile, but does not include any semantic meaning of the segments. In the<br />
classification step, the mean turned out to be a reliable and fast choice to distinguish between<br />
foreground and background segments both for black and transparent tubes if there<br />
is a uniform and sufficient contrast between tubes and the background over the whole image.<br />
In this case there is no need for another threshold than the mean - saving additional<br />
operations.<br />
Insteadofcomparingtheglobalmeanwiththelocalmean,thelocalmediancouldbe<br />
observed to result in a more distinct measure for discrimination:<br />
�<br />
TUBE , median(s(i)) ≤ Psmooth<br />
C2(s) =<br />
(4.11)<br />
BG , otherwise<br />
The better performance of measure C2 originates in the characteristic of the median<br />
tobelesssensitivetooutlierscomparedtothemean[32]. Thisisimportantsincethe<br />
input data can be very unsteady due to the background texture or printing visible on<br />
transparent tubes (independent of the additional camera noise level). As mentioned before,<br />
thesmoothingoftheprofileatthefirststepalsoblursthetubeedgescausingthesegment<br />
boundaries not to be totally precise. In this case, the local mean tends to move closer<br />
to the global mean, which does not have to implicate a misclassification. The median,<br />
however, turned out to be more distinct in most cases. Figure 4.14 shows the smoothed<br />
profile of (a) a transparent and (b) a black tube respectively. The examples represents the<br />
states entering + centered and entering + centered + leaving. The segment boundaries,<br />
which correspond to the locations of the strongest peaks in the first derivative of the<br />
profile, are visualized as well as the global mean and the local median. Segments that<br />
have a median above the global mean are classified as background.<br />
Regional Threshold One drawback of the global threshold approach is that different<br />
background segments are assumed to be almost equal in image brightness, i.e. the tubebackground<br />
contrast is approximately uniform within one image. This assumption, however,<br />
does not hold if there are larger variations in background brightness (for example<br />
due to material properties or dirt on the belt). Such variations can occur between images,
70 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
400<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
!<br />
400<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
smoothed profile<br />
segment boundaries<br />
predicted tube boundaries<br />
local median<br />
global mean<br />
Background Background<br />
Tube 1 Tube 2<br />
0 20 40 60 80 100<br />
x<br />
120 140 160 180<br />
(a) Transparent/ State: entering + centered<br />
smoothed profile<br />
segment boundaries<br />
predicted tube boundaries<br />
local median<br />
global mean<br />
Background Background<br />
0<br />
Tube 1 Tube 2 Tube 3<br />
0 20 40 60 80 100<br />
x<br />
120 140 160 180<br />
(b) Black/ State: entering + centered + leaving<br />
Figure 4.14: Different steps and results of the profile analysis. After smoothing the profile,<br />
strong peaks in the first derivative indicate potential tube boundaries. The segments between<br />
the strongest peaks are classified into foreground and background based on the difference<br />
between the local median of each segment and the global mean. The background is assumed<br />
to be brighter on average. Neighboring segments of the same class are merged. The crosses<br />
mark the correctly predicted boundaries of the centered tube. Note the stronger contrast of<br />
black tubes.
4.4. TUBE LOCALIZATION 71<br />
but also over the whole image width or locally within a single image. The first case is<br />
uncritical as long as there is a sufficient contrast between a tube and the background. The<br />
later case, i.e. local variations in background brightness, can lead to failures of the global<br />
threshold. Figure 4.15(a) shows one characteristic situation which occurs quite often with<br />
transparent tubes. The background intensity on the left is much darker compared to the<br />
right. The global threshold fails, since the much brighter background regions on the right<br />
increase the global mean. Thus, the local median of the most left segment falls below<br />
the threshold and is therefore classified as foreground. Due to this misclassification no<br />
measuring will be performed on this frame, although it would be possible.<br />
A region based threshold can overcome this problem. The idea is to compute the<br />
classification threshold not globally, but on regional image brightness. While the local<br />
median is computed for each segment, a good classification threshold must consider at<br />
least one transition between background and foreground. Following the assumptions made<br />
in Section 4.2, two tubes can not be completely in the image at one time. Furthermore,<br />
the number of connected background regions in the image can not exceed two. If there<br />
are two connected background regions, one has to lie in the left half of the image while<br />
the other falls in the right half. Thus, one can define two regions, left and right of the<br />
image center respectively, and compute the mean for each region as analogue to the global<br />
mean. Inthefollowing,themeanoftheleftandrightsideofthe(smoothed)profileare<br />
denoted as Pleft and Pright respectively.<br />
If there is only one background region (states empty, entering, leaving, entering +<br />
leaving), splitting the image at the center has no negative effect. The left and right mean<br />
is computed either over a tube and background region, or over background only. At the<br />
very special case that the image width is exactly twice a tube’s length and the tube enters<br />
(or leaves) the scene with the right (or left) boundary exactly on the image center, the<br />
regional threshold is computed only over the tube and the classification may be either<br />
foreground or background. However, in both cases this situation can be detected as a<br />
state where a measurement is not possible and is therefore a sufficient solution.<br />
The region based classification of the segments can now be expressed as:<br />
�<br />
TUBE , median(s(i)) ≤ τregion<br />
C3(s) =<br />
BG , otherwise<br />
where τregion is defined as follows:<br />
⎧<br />
⎨ Pleft<br />
,s(i) falls into left region only<br />
τregion = Pright<br />
⎩<br />
max(Pleft, Pright)<br />
,s(i) falls into right region only<br />
,s(i) falls into both regions<br />
(4.12)<br />
(4.13)<br />
In Figure 4.15(b) one can see the difference between the global and the regional classification<br />
threshold. The regional threshold of the left half is much lower compared to<br />
the global threshold. On the other hand, since the second segment belonging to the tube<br />
intersects the center, the maximum of both regional thresholds is taken into account which<br />
lies significantly above the global threshold. Finally, all segments are classified correctly.<br />
With this threshold, the classification is less sensitive to darker background regions.<br />
Thetwomethodshavebeencomparedinthefollowingexperiment:<br />
A sequence of transparent tubes (50mm length, 8mm diameter) has been captured<br />
including 467 frames that have been manually classified as measurable, i.e. a tube is
72 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
250<br />
200<br />
150<br />
100<br />
50<br />
(a)<br />
Smoothed graylevel profile<br />
Regional mean<br />
Global mean<br />
peak candidates<br />
filtered peaks<br />
local median<br />
0<br />
0 20 40 60 80 100 120 140 160 180 200<br />
(b)<br />
Figure 4.15: (a) The background intensity at the left is much darker than at the right.<br />
The global mean as threshold can not compensate for such local variations as can be seen<br />
in (b). In this case, the left background region is wrongly classified as foreground, since the<br />
global threshold is larger than the local median of the corresponding segment. A region based<br />
threshold that considers the left and right image side independently can overcome this problem<br />
(see text).
4.4. TUBE LOCALIZATION 73<br />
Global Regional<br />
Total number: 467 467<br />
Measurable: 353 414<br />
Average PTM: 5.98 7.01<br />
Table 4.1: Comparison of the global and regional threshold used for classification in the<br />
profile analysis. The table shows the number of images that have been correctly detected<br />
as measurable compared to the total number, as well as the average number of per tube<br />
measurements (PTM). Using the regional threshold increases the number of measurements<br />
significantly.<br />
Figure 4.16: Ghost effect: If the parameters of the profile analysis are too sensitive, darker<br />
parts on the conveyor (e.g. due to dirt or background structure) can be wrongly classified as<br />
atube.<br />
completely in the image. The sequence has been analyzed once with the global threshold<br />
and once with the regional threshold. All other parameters have been constant. The<br />
results can be found in Table 4.1. In this context it is important to understand that the<br />
term measurable is related to a single image. It does not mean if the system fails to detect<br />
a tube in one image that the tube can pass the visual field of the camera undetected. This<br />
occurs only if it is not measured in all images that include this tube what is very unlikely.<br />
The experiment shows the average number of measurements per tube can be increased<br />
by approximately one if using the regional instead of the global mean as threshold for the<br />
tube classification. Particularly situations as in Figure 4.15(a) can be prevented.<br />
The reason why none of the two method has detected all measurable frames is due to<br />
other parameters for example a too little contrast between a tube and the background.<br />
The dynamic threshold τpeak as introduced before defines the strongest peaks of the profile<br />
derivative. If it is too large, low contrast tube edges may not be detected. On the other<br />
hand, if it is too low, darker regions in the background may be wrongly classified as<br />
foreground. This leads to ghost effects, i.e. the system detects a tube where actually<br />
no tube is as can be seen in Figure 4.16. Therefore, the weighting factor αpeak of τpeak<br />
(see Equation 4.9) must be adjusted in the teach-in step to the smallest value that does<br />
not produce ghost effects if inspecting an empty, moving conveyor belt. Obviously, the<br />
compromise gets larger with an increasing amount of dirt.<br />
Merging Segments In Figure 4.14(a), one can find two more segments than needed to<br />
represent the actual state entering + leaving. Two strong peaks on the right that are due<br />
to a dark dirt spot on the conveyor belt have not been eliminated by the thresholding.<br />
However, the corresponding segments are correctly classified as background leading to<br />
three consecutive background segments which could be merged to one large segment.<br />
In general, once all segments s(i) are classified the goal is to iteratively merge neighboring<br />
segments of the same class and to eliminate foreground segments that do not qualify
74 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
Input : coordinate list Ω ′ with N = |Ω ′ |<br />
global median of profile<br />
minimum tube segment s i z e MIN SIZE<br />
Step1 :<br />
Step2 :<br />
define segments: S = { s [ i ] = [Ω ′ [i], Ω ′ [i+1] ]}<br />
classify each segment based on Eq. 4.12 :<br />
s[i].label = C3(s[i])<br />
i f s [ i ] . l a b e l == TUBE f o r a l l i r e t u r n ERROR<br />
/ remove foreground segments at the borders /<br />
let i1 be the index of the first<br />
and i2 the index of the last BG segment<br />
set s[ j ]. label = BG for all j , 0≤j
4.5. MEASURING POINT DETECTION 75<br />
for measuring. An overview of the algorithm is shown in Listing 4.1. A size filter operation,<br />
which can be parametrized with respect to the given target length, is used to remove<br />
too small foreground segments (e.g. caused by dirt on the conveyor belt).<br />
The output of the algorithm is either one large background segment (i.e. all foreground<br />
segments have been removed if existed since they did not fulfill the criteria) or three<br />
segments in the form BG-TUBE-BG. In the later case, the peaks belonging to the left<br />
and right boundary of the remaining foreground segment are finally verified with respect<br />
to the sign of the derivative. With the derivative operator used, the position of the left<br />
boundary must result in a negative first-order derivative value (bright-dark edge) and the<br />
right boundary in a positive value (dark-bright edge). If the predicted tube boundaries<br />
are consistent with this last criterion, they are used to define two local ROIs of width<br />
WROI as starting point for a more precise detection of the measuring points. The local<br />
ROI height is defined over the distance between the two guide bars.<br />
ThemergingofthesegmentsisalinearoperationinthecomplexityofO(NΩ). Since it<br />
is only allowed to reclassify a former foreground segment into background in this procedure<br />
and never vice versa, Step2 of the algorithm is repeated only once if at all. Hence, the<br />
algorithm terminates for sure.<br />
If all segment are classified as TUBE in the first step, an error is returned. This error<br />
indicates the presence of state full (See Figure 4.2(i)). The reason can be due to a too<br />
small field of view of the camera or to a missing spacing between consecutive tubes. In<br />
any case it is not possible to perform a measuring. Since this state is critical compared to<br />
other states that can not be used for measuring, it is important to detect this situation.<br />
In practice, if this situation occurs an alert must be produced.<br />
4.5. Measuring Point Detection<br />
The previous sections described a fast method to distinguish whether a frame is useful or<br />
not. If a measuring is possible, two regions around the potential left and right boundary<br />
of a tube to be measured are the output of this first step. In the following, the exact tube<br />
boundaries have to be detected with subpixel accuracy.<br />
4.5.1. Edge Enhancement<br />
As introduced in Section 2.3 there is a large number of approaches for edge detection. Four<br />
common methods including the Sobel operator, Laplace operator, Canny edge detector [13]<br />
and a steerable filter edge detector based on the derivative of a parametrized Gaussian<br />
have been applied to test images. The results can be found in Table 4.2. It includes<br />
experiments with two transparent tubes (left boundary) of the same sequence and one<br />
black tube boundary. All tubes have a inner diameter of 8mm. The difference in size<br />
between the transparent and black tubes is due to a different camera-object distance.<br />
As can be seen the edge of the transparent tubes can differ in brightness, contrast and<br />
background pattern between frames.<br />
The goal was to find an edge detection operation that adequately extracts the tube<br />
boundaries under the presence of background structure and noise, and which is computational<br />
inexpensive in addition.
76 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
Input Input<br />
SOBELX<br />
SOBELY<br />
Gaussian<br />
5x5 (a)<br />
Gaussian<br />
5x5 (v)<br />
Laplace Gaussian<br />
7x7 (a)<br />
Canny<br />
(50/100)<br />
Canny<br />
(90/230)<br />
Canny<br />
(185/210)<br />
Gaussian<br />
7x7 (v)<br />
Gaussian<br />
11x11 (a)<br />
Gaussian<br />
11x11 (v)<br />
Table 4.2: Comparison of different edge detectors. The parameters of the Canny edge<br />
detector indicate the lower and upper threshold. The Gaussian derivative based edge detection<br />
results are all of first-order. An (a) indicates edges of all orientations (in discrete steps of 5 )<br />
are enhanced with a steerable filter approach, while (v) represents only vertical edges.
4.5. MEASURING POINT DETECTION 77<br />
The results of the transparent tubes are crucial for the selection of an appropriate<br />
edge detection approach used in this application, since due to the strong contrast the<br />
detection of the black tube boundaries is uncritical with all tested methods. For both<br />
tube types the edge detection results differ in detected orientation, edge elongation (i.e.<br />
how precise an edge can be localized), or edge representation (signed/unsigned values,<br />
floating point/binary, etc.).<br />
Canny Edge Detector The Canny edge detector results in a skeletonized one pixel wide<br />
response that precisely describes edges of arbitrary orientation. In this application the<br />
main drawback of Canny’s approach is the importance of the threshold choice. As can<br />
be seen in Table 4.2, different parameter sets yield very different results. If the upper<br />
hysteresis threshold used as starting point for edge linking is low (e.g. 100) combined with<br />
a lower second threshold (e.g. 50), too many background edges are detected as well. A<br />
larger upper threshold (e.g. > 200) reduces the number of detected edge pixels, but also<br />
eliminates parts of the tube edge. It is possible that it breaks up into parts. If the distance<br />
between upper and lower threshold is large, it is likely that background and tube edges<br />
are merged. In any case a threshold set working fine with one image can lead to very<br />
poor results in another. The result of the Canny edge detector is a binary image where<br />
non-edge pixels have a value of zero and edge pixels a value of one (or 255 in 8bit gray level<br />
images). Binary contour algorithms can be applied to analyze chains of connected edge<br />
pixels. As can be seen in the test images, depending on how many edge pixels survived<br />
the thresholding, such analysis can be very complex and time-consuming. Gaps within<br />
edges belonging to the tube boundary make this search even more complicated.<br />
Sobel The Sobel operator approximates a Gaussian smoothing combined with differentiation.<br />
It can be applied with respect to x- andy- direction. Accordingly to the filter<br />
direction, vertical or horizontal edges are enhanced. Since the tube boundaries have a vertical<br />
orientation, the SOBELX operator is an adequate choice in this application. Edges<br />
are located at local extrema, i.e. local minima at bright-dark edges and local maxima<br />
for dark-bright edges with respect to the gradient direction. A drawback is that also<br />
the background pattern is dominantly vertical oriented, thus, background edges are also<br />
detected. The intensity of an edge is related to the image contrast. Assuming a certain<br />
contrast between tubes and background, a large amount of background clutter could be<br />
removed by thresholding leaving only tube edges and edges due to high-contrast dirt particles.<br />
However, this would lead to a similar approach like the Canny edge detector with<br />
the drawbacks stated before.<br />
Laplace The implementation used to test the Laplacian calculates the second-order derivative<br />
in x- andy-direction using the Sobel operator and sums the results. The output is<br />
an image of signed floating point values. Edges are located at the zero crossings between<br />
strong peaks. The Laplacian is an anisotropic operator, thus, edges off all orientations are<br />
detected equally. One drawback of this method is the sensitivity to noise. In the resulting<br />
response there are many zero crossings. Compared to first-order derivatives, the edge criterion<br />
is more complex. A pixel is an edge pixel if the closest neighbor in the direction of<br />
the gradient is a local maximum while the opposite neighbor is a local minimum and both
78 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
neighbors must meet a certain threshold. However, the zero crossing can be computed<br />
with subpixel accuracy.<br />
Steerable Filters The idea with filters that are steerable for example in scale and orientation<br />
is to design a filter that performs best for a particular edge detection task (See<br />
Section 2.3.3). In this application the goal is to find a filter that extracts the tube edges<br />
with maximum precision, while background edges and dirt are suppressed. The steerable<br />
filter approach allows for testing a large range of different edge detection kernels.<br />
Experiments with systematically varied parameter sets of first-derivative Gaussian filters<br />
following the approach of Freeman and Adelson [25] are applied to the test images. Some<br />
of the results are visualized in Figure 4.2.<br />
As can be seen, the background clutter can not be eliminated even with larger kernel<br />
sizes while the tube edges get blurred. No parameter setting for a Gaussian derivative<br />
kernel has been found that performs significantly better as a tube edge detector than the<br />
computational less expensive Sobel operator.<br />
All tested methods beside the Canny edge detector can be seen more as edge enhancer<br />
than as real edge detectors. This means, the results do not fulfill the second and third<br />
criterion for good edge detection (See Section 2.3.1). Further processing of the edge<br />
responses such as nonmaximum suppression is necessary. An alternative is a template<br />
based edge localization step which is introduced in the next section.<br />
4.5.2. Template Based Edge Localization<br />
It is important to state that even precisely detected edges (including Canny’s approach)<br />
still have no semantical meaning. In all tested methods there have been false positives,<br />
i.e. edges belonging to the background, dirt, or noise. Hence, model knowledge has to be<br />
applied to the detected edges to ensure whether an edge really corresponds to a tube’s<br />
boundary or not.<br />
In this application, the highly constrained conditions reduce the number of expected<br />
situations to a small, well defined minimum. The edges belonging to the tube boundaries<br />
of interest are always approximately vertical. Due to perspective the tube boundary<br />
appears straight or slightly curved in a convex fashion under back light, depending on the<br />
position of the tube with respect to the optical ray of the camera. The more the tube<br />
boundary is displaced from the camera center the larger is the curvature.<br />
At this stage it is of interest to locate a tube’s boundaries within the two local ROIs<br />
(left and right respectively). Strong changes in image intensity in x direction (vertical<br />
edges) have been enhanced using the SOBELX operator. The goal is not only to find<br />
the strongest peaks in the edge image, but also the strongest connected ridge along such<br />
peaks that most likely corresponds to the tube boundary. This task can be performed by<br />
template matching (See Section 2.4).<br />
If the feature to be detected can be modeled by a template, the response of the crosscorrelation<br />
with this template computes a match probability within a given search region.<br />
The idea is to design a template that models the response of the edge enhancer and<br />
correlate this template with the local ROI. The position where the correlation has its<br />
maximum provides close information on the tube boundary location. Therefore, it is
4.5. MEASURING POINT DETECTION 79<br />
-50<br />
-100<br />
-150<br />
0<br />
400<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
-50<br />
-100<br />
-150<br />
0<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
5<br />
10<br />
15<br />
x<br />
20<br />
25<br />
(a)<br />
0 5 10 15 20 25 30 35 40<br />
x<br />
(c)<br />
30<br />
35<br />
0<br />
10<br />
20<br />
30<br />
40<br />
50 y<br />
60<br />
70<br />
80<br />
40 90<br />
0<br />
10<br />
20<br />
30<br />
40<br />
50<br />
60<br />
70<br />
80<br />
90<br />
y<br />
-50<br />
-100<br />
-150<br />
0<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0 5 10 15 20 25 30 35 40<br />
x<br />
(b)<br />
0 5 10 15 20 25 30 35 40<br />
0<br />
10<br />
20<br />
30<br />
40<br />
50<br />
60<br />
70<br />
80<br />
90<br />
0<br />
10<br />
20<br />
30<br />
40<br />
50<br />
60<br />
70<br />
80<br />
90<br />
Figure 4.17: Edge detection results of the SOBELX operator applied to different tubes<br />
(right boundary). The tube boundary corresponds to the strongest ridge in vertical direction in<br />
each plot. It can be seen that the edge response differs in curvature, intensity and background<br />
clutter. (a) Almost straight edge (close to the optical center of the camera) of a transparent<br />
tube with a quite uniform region left of the ridge belonging to the tube, and a more varying<br />
area on the right due to the background structure. (b) The tube boundary looks convex if<br />
further away from the camera center due to perspective. The edge response is much stronger<br />
at the ends of the ridge than at the center. This is due to the amount of light which is<br />
transmitted by the tube (see text). (c) Edge of a transparent tube with a printing close to<br />
the boundary visible as smaller ancillary ridge on the left. (d) Boundary of a black tube. The<br />
edge response is about three times stronger compared to transparent tubes due to the strong<br />
image contrast.<br />
1000<br />
800<br />
600<br />
400<br />
200<br />
0<br />
-200<br />
x<br />
(d)<br />
y<br />
y
80 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
important to have a closer look on the response of the edge detection results with respect<br />
to the input data. Consistent characteristics can used for the design of the right template.<br />
Figure 4.17 shows examples of the SOBELX operator applied to test images. In this<br />
case, the response corresponds to the right ROI of three transparent tubes (Figure 4.17(a)-<br />
(c)) and one black tube (Figure 4.17(d)) at different positions in the image with respect<br />
to the x-axis. The tube boundary can be detected intuitively by humans even under the<br />
presence of background clutter. However, one can find the edge response differs between<br />
the different plots due to image contrast or perspective.<br />
Figure 4.17(a) shows an almost straight edge (close to the optical center of the camera)<br />
with a quite uniform region left of the ridge belonging to the tube, and a more varying area<br />
on the right due to the background structure. It can be observed that the edge response<br />
is stronger at the ends of the ridge than in the center, which is due to the transmittance<br />
characteristic of transparent tubes (see Section 4.2). More light is transmitted at the<br />
center leading to brighter intensity values and a poorer contrast, while the corners (‘L’corners<br />
between horizontal and vertical boundary of a tube) are darker and yield a better<br />
contrast. This effect can be seen also very clearly in Figure 4.17(b). In addition, the tube<br />
boundary looks convex due to perspective since it is further away from the camera center.<br />
Vertical edges of printings on a tube’s surface are also extracted by the edge detection step<br />
as can be seen in Figure 4.17(c). In this case, the straight line of an upsight-down capital<br />
‘D’ falls into the right local ROI, causing the smaller ancillary ridge on the left of the tube<br />
boundary. Figure 4.17(d) includes the boundary of a black tube. Due to the strong image<br />
contrast the edge response is about three times stronger compared to transparent tubes.<br />
The influence of the background clutter reduces to a minimum and since printings are not<br />
visible on black tubes at back light, this problem vanishes completely. The edge response<br />
does not differ in intensity at the ends like with transparent tubes.<br />
4.5.3. Template Design<br />
The goal is to design a universal, minimum set of templates that covers all potential edge<br />
responses of both transparent and black tube boundaries. The templates must model<br />
different curvatures to be able to handle perspective effects. Assuming a constant horizontal<br />
orientation and a constant size, the curvature is the only varying parameter between<br />
templates. The following two-dimensional function has been developed that can be parametrized<br />
to approximate the expected edge responses:<br />
� �<br />
y<br />
Tψ(x, y) =aexpb<br />
HT<br />
� 2<br />
− (x − (ψy2 )) 2<br />
2σ 2<br />
�<br />
(4.14)<br />
It is based on a Gaussian with standard deviation σ in x-direction extended with respect<br />
to y. The curvature is denoted by ψ. A value of ψ = 0 represents no curvature, while<br />
the curvature increases with increasing values of ψ (ψ ≤ 1). The first summand in the<br />
exponent of the exponential function can be used to emphasize the ends of the template<br />
in y-direction which is motivated in the characteristic response of transparent tubes. The<br />
edge detector results in higher values at the ends than at the center. b controls the amount<br />
of height displacement. If b = 0, the template is equally weighted. HT corresponds to the<br />
template height. a determines the sign of the template values. For bright-dark edges like at<br />
the left boundary the edge response is negative, thus a
4.5. MEASURING POINT DETECTION 81<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
1<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
0<br />
0<br />
2<br />
2<br />
x<br />
x<br />
4<br />
4<br />
6<br />
6<br />
8<br />
(a)<br />
8<br />
(c)<br />
10 80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
y<br />
10<br />
0<br />
10<br />
20<br />
30<br />
40<br />
50<br />
y<br />
60<br />
70<br />
0<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
0<br />
2<br />
x<br />
4<br />
6<br />
8<br />
(b)<br />
10 80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
20<br />
y<br />
10<br />
0<br />
10<br />
20<br />
30<br />
40<br />
50<br />
y<br />
60<br />
70<br />
80<br />
Figure 4.18: Different templates generated using Equation 4.14. (a) Straight edge: ψ =0,<br />
b = 0. (b) Curved edge: ψ =0.005, b = 0. (c) Curved edge: ψ =0.02, b =0. (d)Curvededge<br />
with emphasized ends: ψ =0.002, b =3. (σ =0.8 anda = 1 has been used for all templates<br />
in this figure). Note the differently scaled axis x and y.<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
1<br />
1.8<br />
1.6<br />
1.4<br />
1.2<br />
2<br />
0<br />
2<br />
4<br />
x<br />
6<br />
(d)<br />
8<br />
10<br />
0
82 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
# templates ψmin ψmax χ a b<br />
Left: 30 0.0 0.02 0.00066 −1 3<br />
Right: 30 −0.02 0.0 0.00066 1 3<br />
Table 4.3: A set of 30 templates with curvatures equally distributed between ψmin and ψmax<br />
at a curvature resolution (step size) χ has been used to determine the occurrence of certain<br />
curvatures empirically.<br />
side a>0 is used to model the positive response of dark-bright edges. Figure 4.18 shows<br />
some examples that visualize Equation 4.14 and the effect of the different parameters.<br />
Template Dimension A constant template width of 11pixels is used, which is large<br />
enough to represent both straight and maximal curved tube boundaries. The template<br />
height is defined over the global ROI height. Assuming the guide bars are always arranged<br />
so that the guide bar distance is only slightly larger than the tube’s perimeter, the global<br />
ROI height is a good reference on the tube size. It is possible to compute a well guess of<br />
the tube height by the following equation:<br />
where HROIG<br />
HT = γHROIG<br />
is the global ROI height and γ a factor between 0 and 1.<br />
(4.15)<br />
Curvature Thequestionis,whatrangeofcurvaturesoccursinpracticeandhowmany<br />
templates are needed to cover that range. Therefore, several test sequences with both<br />
black and transparent tubes of different diameter have been captured. 30 templates of<br />
different curvature have been generated for both tube sides. The parameters can be found<br />
in Table 4.3.<br />
Foreachmeasurableframethecurvatureofthetemplatethatreachesthemaximum<br />
correlation value is taken into account to build a histogram of curvature occurrence both<br />
for the left and ride tube side. The normalized cross-correlation (see Equation 2.31) is<br />
used as measure evaluating the match quality of each template at a certain location. The<br />
results can be found in Figure 4.19.<br />
It shows, the occurring curvatures are limited to a small range denoted as Rψ, left and<br />
Rψ, right with Rψ, left =[0, 0.005] for the left and Rψ, right =[−0.005, 0] for the right<br />
side respectively. In order to reduce the number of templates all curvatures outside this<br />
range can be ignored.<br />
Another important criteria is the step size or curvature resolution χ, i.e. how many<br />
steps between ψmin and ψmax are taken into account. Theoretically one could quantize<br />
the curvature ranges into very small steps. However, since correlation is an expensive<br />
operation one has to make a compromise between accuracy and performance. It was<br />
observed that if more than 15 templates have to be tested at each tube side per frame, the<br />
system starts to drop frames, i.e. this is a quantitative indicator that the overall processing<br />
time exceeds the current frame rate. Therefore the total number of templates is restricted<br />
to 10 in this application. The corresponding step size between two curvatures is 0.0005.
4.5. MEASURING POINT DETECTION 83<br />
occurrence<br />
0.2<br />
0.18<br />
0.16<br />
0.14<br />
0.12<br />
0.1<br />
0.08<br />
0.06<br />
0.04<br />
0.02<br />
0<br />
-0.002 0 0.002 0.004 0.006 0.008 0.01 0.012<br />
curvature<br />
(a)<br />
occurrence<br />
0.3<br />
0.25<br />
0.2<br />
0.15<br />
0.1<br />
0.05<br />
0<br />
-0.025 -0.02 -0.015 -0.01<br />
curvature<br />
-0.005 0 0.005<br />
Figure 4.19: Histogram of template occurrence for (a) left and (b) right tube side. It can<br />
be seen that only a small range of curvatures can be observed. This reduces the number of<br />
templates that have to be tested each time.<br />
120<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
0<br />
2<br />
x<br />
4<br />
6<br />
8<br />
0<br />
10<br />
20<br />
30<br />
40<br />
50<br />
60<br />
y<br />
70<br />
10 80<br />
Figure 4.20: If the height weighting coefficient gets too large (here: b = 20), the center of<br />
the tube edge does not contribute to the matching score anymore.<br />
Template Weighting The weighting coefficient b in Equation 4.14 is important for transparent<br />
tubes. Due to a poor contrast, the overall edge response of a transparent tube might<br />
be low. If considering only the center region of an edge, the contrast might be even lower<br />
than a background edge at worst case. The cross-correlation only computes the similarity<br />
of a template at a certain location in the image. The maximum response is taken as match,<br />
since it is assumed that there must be a tube edge in the search region, even if the same<br />
or another template matches the real tube edge perfectly, but with a lower score. Finally<br />
this will lead to a wrong measurement.<br />
With model knowledge about the tube characteristics one can assume that the contrast<br />
at the edge ends is significantly stronger. If the template is weighted uniformly at the<br />
center and the ends, the correlation score depends on the whole edge equally. On the<br />
other hands, if the ends of the template are weighted stronger than the center, a template<br />
that perfectly fits the tube edge will yield a larger score, since background edges are usually<br />
uniform. Thus, the template is designed to prefer tubes edges.<br />
The weighting coefficient b hastobelargerthanonetoyieldthedesiredeffect.Onthe<br />
other hand, b must not be too large as well, since then the ends get too much influence. In<br />
the extreme case, the template equals two spots at a certain distance that do not represent<br />
a tube edge anymore (see Figure 4.20).<br />
(b)
84 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
0 2 4 6 8 10 12<br />
x<br />
14 16 18 20<br />
(a) (b)<br />
(c)<br />
5<br />
4<br />
3<br />
2<br />
Max<br />
y<br />
1<br />
0<br />
0 2 4 6 8 10 12<br />
x<br />
14 16 18 20<br />
Figure 4.21: Effect of the weighting coefficient b in Equation 4.14. (a) Tube edge detection<br />
results with a uniform weighted template (b = 0). (b) Results of a template with enhanced<br />
ends (b = 3). (c) Corresponding cross-correlation results of (a), and (d) the cross-correlation<br />
results of (b) respectively. The maximum in (c) and (d) corresponds to the pixel position<br />
where the template matches best. In this example, the ridge closer to the observer is due to<br />
a background edge while the ridge further away corresponds to the real tube edge.<br />
Figure4.21showsanexampleofhowtheweightingofthetemplateendsimprovesthe<br />
tube edge detection. In this example the right boundary contrast of a transparent heat<br />
shrink tube is quite low. Using a uniformly weighted template, i.e. b = 0, the maximum<br />
correlation score is reached at a background edge (see Figure 4.21(a) and (c)). In this<br />
case, the tube would be measured larger than it really is. On the other hand, with<br />
an enhancement of the template ends, the tube edge results in a larger score than the<br />
background edge leading to a correct detection as can be seen in Figure 4.21(b) and (d).<br />
The enhancement of the template ends is motivated in transparent tube characteristics.<br />
For black tubes, b = 0 describes the response of the SOBELX operator best. However,<br />
there is no disadvantage if using the same weighting coefficient as for transparent tubes.<br />
Due to the strong contrast of black tubes, the curvature and size of the template are the<br />
dominant factors influencing the matching results.<br />
Template Rotation The templates generated by Equation 4.14 are symmetric along the<br />
y-axis with respect to the template center. Thus, the ends of the template lie always on<br />
one line perpendicular to the x-axis. In the ideal case, the edge response of a heat shrink<br />
tube has the same characteristic. In practice, however, a tube can be slightly angular<br />
within the guide bars, or the tube edge might be cut skew. In both cases the strong edge<br />
responses at the ends do not have to lie on one line perpendicular to the x-axis as in the<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
Max<br />
(d)<br />
5<br />
4<br />
3<br />
2<br />
y<br />
1<br />
0
4.5. MEASURING POINT DETECTION 85<br />
(a)<br />
(b) (c)<br />
Figure 4.22: (a) Edge response of an angular oriented (transparent) tube edge. The characteristic<br />
peaks at the ends of transparent tube edges do not have to lie on one line perpendicular<br />
to the x-axis. The red line visualizes the slight angular orientation of the tube edge. (b) Example<br />
detection result with k = 1 orientations. (c) Corresponding result with k = 3 orientations.<br />
template. Figure 4.22(a) visualizes the edge response of a slightly angular tube edge of a<br />
transparent tube (left side). In such a situation no template will fit the edge perfectly. This<br />
can be critical if the edge contrast is poor. In this case, as mentioned before, the stronger<br />
weighting of the template ends helps to support a match at the real tube boundary instead<br />
of at a background edge. With an angular tube edge, a symmetric template can not be<br />
shifted over the image in a way it matches both edge ends. Thus, the cross-correlation<br />
score is significantly smaller and the probability increases that a background edge yields<br />
a larger score.<br />
A little rotation of the template can overcome this problem. Therefore, the bank of<br />
templates is extended by k − 1 rotated versions of each template. It turned out that it is<br />
sufficient to rotate each template by ±2 degrees to cover the range of expected deviations<br />
from the ideal symmetric model. Thus, k = 3 has been used throughout the experiments.<br />
It is assumed that larger angular deviations can not occur due to the guide bars.<br />
Model Knowledge Optimization The number of templates to be checked each time on<br />
the left and right side increases with the number of rotations. Instead of 2 × 10 templates
86 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
curvature<br />
4.5<br />
4<br />
3.5<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
3<br />
x 10<br />
5<br />
0<br />
0 50 100 150 200<br />
x<br />
250 300 350 400<br />
(a) Left tube side<br />
curvature<br />
3<br />
x 10<br />
0<br />
0.5<br />
1<br />
1.5<br />
2<br />
2.5<br />
3<br />
3.5<br />
4<br />
4.5<br />
5<br />
350 400 450 500 550<br />
x<br />
600 650 700 750<br />
(b) Right tube side<br />
Figure 4.23: Curvature of best matching template depending on the x-position of the match.<br />
one has to consider 2 × 10 × 3 templates if k = 3. Since correlation is an expensive<br />
operation, the processing time increases significantly even if the local ROIs are relative<br />
small. It turned out that not more than 15 templates can be checked at each side without<br />
skipping frames at a frame rate of 50fps at an AMD Athlon 64 FX-55 processor with 2GB<br />
RAM.<br />
One thinkable optimization is to reduce the curvature resolution, i.e. quantize the same<br />
range of curvatures to ≤ 5 templates at each side. Obviously this reduces the accuracy of<br />
the edge localization and is no satisfying solution in this application.<br />
Instead one can apply model knowledge to exclude several curvatures depending on<br />
the horizontal image position. It can be assumed that the curvature is maximal at the<br />
image boundaries and decreases toward the image center. Real sequences support this<br />
assumption. Figure 4.23 shows the occurrence of different curvatures with respect to x.<br />
The data was acquired over several sequences including transparent and black tubes. It<br />
turns out that the curvature decreases linearly within a certain band. The upper and lower<br />
boundary of this band determine which curvatures can be excluded at a given position.<br />
The range distance of curvatures dψ at a position x is defined as:<br />
dψ(x) =ψmax(x) − ψmin(x) (4.16)<br />
where ψmax(x) andψmin(x) are the maximum and minimum curvature occurring at<br />
this position. dψ is the average range distance over all x. This range must be checked<br />
each time and is covered by n templates. In practice n = 5 is used, since as mentioned<br />
before the maximum number of templates that can be processed with the given hardware<br />
in real-time is 15 (in addition to all further processing that is needed), and 5 curvatures ×<br />
3 rotations = 15 templates to be checked each frame at one tube side. To yield the desired<br />
resolution over the whole range of curvatures, the total number of curvatures Nψ,total is<br />
computed as follows:<br />
Nψ,total = n(ψmax − ψmin)<br />
dψ<br />
(4.17)
4.5. MEASURING POINT DETECTION 87<br />
where ψmax and ψmin indicate the overall maximum and minimum curvature a template<br />
can have. Hence, one has to compute Nψ,total × k templates for each side. This can be<br />
done in a preprocessing step to reduce the computational load. During inspection one has<br />
to determine which templates have to be checked at a given position defined by the center<br />
of the local ROI around a predicted tube edge. For an efficient implementation a look up<br />
table (LUT) is used for this task.<br />
4.5.4. Subpixel Accuracy<br />
The maximum accuracy of the template based edge localization so far is limited by the<br />
discrete pixel grid. The templates are shifted pixelwise within the local ROIs to find the<br />
position that reaches the maximum correlation score. Following the assumptions of tubes<br />
under perspective (see Section 4.2.3) the measuring is performed between the most outer<br />
points of the convex tube edges.<br />
The way the templates are defined the template center corresponds always to the most<br />
outer point of the generated ridge. This is consistent to template rotation, since the<br />
rotation is performed around the template center. In the special case that the template is<br />
not curved, the template center is still the valid measuring point. With the knowledge of<br />
this point within the template and the position where this template matches best in the<br />
underlying image, the position of the measuring point in the image can be easily computed.<br />
However, pixel grid resolution is not accurate enough in this application. For example<br />
one pixels represents about 0.12mm in the measuring plane ΠM in a typical setup for<br />
50mm tubes. The allowed tolerance for 50mm tubes is ±0.7mm. As a rule of thumb for<br />
reliable results, the measuring system should be as accurate as 1/10thofthetolerance,<br />
i.e. 0.07mm in this example. To reach that accuracy one has to apply subpixel techniques<br />
to overcome the pixel limits.<br />
Figure 4.24(a) visualizes the results of the cross-correlation of an image ROI around the<br />
right boundary of a transparent tube with the template that yields maximum score. The<br />
maximum is located at position Mmax =(19, 5). These coordinates refer directly to the<br />
edge position in the image, since the template function is known and therefore the exact<br />
location of the template ridge.<br />
The real maximum that describes the tube edge location most accurate may lie in between<br />
of two grid positions. With respect to the measuring task, the edge has to be<br />
detected as accurate as possible. Interpolation methods have been introduced in Section<br />
2.3.4 to overcome the pixel grid limits in edge detection. The same can be applied at<br />
this stage to the template matching results.<br />
Cubic spline interpolation is used to compute the subpixel maximum within a certain<br />
neighborhood around the discrete maximum. Cubic splines approximate a function based<br />
on a set of sample points using piecewise third-order polynomials. They have the advantage<br />
of being smooth in the first-derivative and continuous in the second derivative, both within<br />
an interval and its boundaries [53].<br />
The interpolation is performed only with respect to the x direction, since this is the<br />
measuring direction. A subpixel location with respect to y has only a marginal effect on<br />
themeasurements.Ideally,themeasuringpointsontheleftandrightsidehavethesame<br />
y value. Assuming the real maximum location is displaced by maximal 0.5 pixels at each
88 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
-0.2<br />
-0.4<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
-0.2<br />
-0.4<br />
-0.6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0 30<br />
-0.6<br />
0 5 10 15 20 25 30<br />
(b)<br />
25<br />
samples<br />
cubic spline interpolation<br />
maximum<br />
20<br />
(a)<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
-0.2<br />
15<br />
10<br />
5<br />
-0.4<br />
10 10.5 11 11.5 12 12.5 13 13.5 14 14.5 15<br />
(c)<br />
0<br />
samples<br />
cubic spline interpolation<br />
maximum<br />
Figure 4.24: (a) Cross-correlation results of an image patch around the right boundary of a<br />
transparent tube and the best scoring template. The maximum is located at position (19, 5).<br />
(b) Cubic spline interpolation in a local neighborhood around the maximum. In this case, the<br />
interpolated maximum is equal to the discrete position. (c) Matching results of a different<br />
image. Here, the interpolated subpixel maximum differs from the discrete maximum and can<br />
be found at x =12.2.
4.6. MEASURING 89<br />
side of the tube, the worst-case displacement is 0.5 at one side and −0.5 at the other side<br />
leading to a total displacement of 1. A straight line connecting the two measuring points<br />
in an Euclidean plane is slightly longer than the distance in x. Following Pythagoras’<br />
theorem the maximum expectable error due to a vertical inaccuracy is:<br />
errory = � l 2 +1− l (4.18)<br />
where l is the pixel length between the left and right measuring point. With respect to<br />
the definition of the camera’s field of view and the image resolution, the length of a tube<br />
is about 415 pixels in an image. In this case, the worst-case error is about 0.0012 pixel.<br />
Assuming one pixel represents 0.12mm (a typical value for 50mm tubes) this corresponds<br />
to an acceptable error of 0.14µm which is far beyond the imaging capabilities of the camera<br />
used (each sensor element has a size of about 8.3 × 8.3µm).<br />
Other than in the vertical direction, a subpixel shift of the best matching template<br />
position in horizontal direction has a significant influence on the length measurement<br />
results. Again, assuming a maximum error of 0.5 pixels if discrete pixel grid resolution is<br />
used, the total error at both sides sums up to 1 in worst-case. If one pixel corresponds to<br />
0.12mm as in the example above, this means the measuring system has an inaccuracy of<br />
the same length purely depending on the edge localization. Obviously, this error depends<br />
on the resolution of the camera and can become even worse if one pixels represents a larger<br />
distance.<br />
The interpolation considers five discrete points: The maximum matching position Mmax<br />
and the two nearest neighbors left and right to Mmax in x-direction respectively. In<br />
Figure 4.24(b), the interpolation results of the local neighborhood around the discrete<br />
maximum of Figure 4.24(a) are drawn into the plot of the match profile at y =5. It<br />
shows the interpolated values describe the sampled values quite well. In this example, the<br />
interpolated subpixel maximum equals the discrete maximum. This does not always have<br />
to be the case as can be seen in Figure 4.24(c). Here, the discrete maximum is located at<br />
x = 12, whereas the subpixel maximum lies at x =12.2. In the first case, the neighbor<br />
pixels of the maximum yield almost equal results at both sides. On the other hand in the<br />
second example, the right neighbor of the maximum is significantly larger than the left<br />
one. This explains the shift of the subpixel maximum toward the right. The precision of<br />
the subpixel match localization is 1/10 pixel. Mathematically, much higher precision is<br />
possible,butthesignificanceofsuchresultsisquestionablewithrespecttotheimaging<br />
system and noise, and increases the computational costs unnecessary.<br />
4.6. Measuring<br />
The result of the template matching are two subpixel positions indicating the left and right<br />
measuring point of a tube. This section introduces how a pixel distance is transformed<br />
into a real world length and how the measurements of one tube are combined. Therefore,<br />
a tracking mechanism is required that assures the correct assignment of a measurement to<br />
a particular tube. This means, one has to detect when a tube enters or leaves the visual<br />
field of the camera.
90 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
length [pixel]<br />
418<br />
417.5<br />
417<br />
416.5<br />
416<br />
415.5<br />
Measurements<br />
Polynomial Fit<br />
415<br />
0 50 100 150 200<br />
x<br />
250 300 350 400<br />
(a)<br />
Correction [pixel]<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
Perspective Correction Function<br />
0<br />
0 50 100 150 200<br />
x<br />
250 300 350 400<br />
(b)<br />
Length [pixel]<br />
418<br />
417.5<br />
417<br />
416.5<br />
416<br />
415.5<br />
Corrected Measurements<br />
Mean<br />
415<br />
0 50 100 150 200<br />
x<br />
250 300 350 400<br />
Figure 4.25: Perspective correction. (a) The measured length varies depending on the<br />
image position in terms of the left measuring point. Due to perspective the length of one tube<br />
appears larger at the image center than at the image boundaries. The effect of perspective<br />
can be approximated by a 2nd order polynomial. (b) The correction function computed from<br />
the polynomial coefficients. (c) The result of the perspective correction.<br />
4.6.1. Distance Measure<br />
The distance between the two measuring points pL and pR (see Section 4.2) is computed<br />
over the Euclidean distance. Thus, the pixel length l ofatubeisdefinedasfollows:<br />
l = � (pR − pL) 2 (4.19)<br />
where l is expressed in terms of pixels. In the following, l(x) denotes the pixel length of<br />
a tube at position x where x = xpL , i.e. the position of a measurement is defined by the<br />
x-coordinate of the left measuring point.<br />
4.6.2. Perspective Correction<br />
Figure 4.25(a) shows the measured pixel length l(x) of a metal reference tube (gage) at<br />
different image positions. The sequence was acquired at the slowest conveyor velocity.<br />
In the ideal case l should be equal independent of the measuring position. However, the<br />
measured length is smaller at the boundaries and maximal at the image center due to<br />
perspective. This property is consistent between tubes. To approximate the ideal case, a<br />
perspective correction can be applied to the real measurements. Mathematically this can<br />
be expressed as:<br />
lcor(x) =l(x)+fcor(x) (4.20)<br />
where lcor is the perspective corrected pixel length, and fcor a correction function. The<br />
perspective variation in the measurements can be approximated by a 2nd order polynomial<br />
of the form:<br />
f(x) =c1x 2 + c2x + c3<br />
(c)<br />
(4.21)<br />
where the coefficients of the polynomial ci have to be determined in the teach-in step<br />
by fitting the function f(x) to measured length values l(x) in least-squares sense. Then,<br />
the correction function fcor canbecomputedas:
4.6. MEASURING 91<br />
fcor(x) =−(c1x 2 + c2x)+c1s 2 + c2s (4.22)<br />
where s is the x-coordinate of the peak of f(x) withs = −c2/(2c1), i.e. the point where<br />
the first-derivative of f(x) is zero. Thus, fcor is the 180 ◦ rotated version of f(x) whichis<br />
shifted so that fcor(s) = 0 as can be seen in Figure 4.25(b).<br />
This function applied to the measurements has the effect of all values being adjusted<br />
to approximately one length l(s). The corrected length values lcor(x) areshowninFigure<br />
4.25(c). As one can see, the mean value over all measurements describes the data<br />
much better after perspective correction.<br />
To reduce the computational load the correction function is computed only once for<br />
each position at discrete steps and stored in a look up table for fast access.<br />
4.6.3. Tube Tracking<br />
Assuming a sufficient frame rate, one tube is measured several times at different positions<br />
while moving through the visual field of the camera. One constraint in Section 4.2.2<br />
regarding the image content states that only one tube is allowed to be measurable at one<br />
time. The question is whether the current measurement belongs to an already inspected<br />
tube or if there is a new tube in the visual field of the camera. Since there is no external<br />
trigger, this task has to be solved by the software.<br />
Consecutive tubes appear quite equal in shape, size, or texture (especially black tubes).<br />
Itisdifficultuptoimpossibletofindreliablefeaturesinformofanuniquefingerprint<br />
that can be used to distinguish between tubes. In addition the extraction and comparison<br />
of such fingerprints would be computational expensive. Standard tracking approaches<br />
such as Kalman filtering [24] or condensation [8] are also not suited in this particular<br />
application, since such approaches are quite complex and are worthwhile only if an object<br />
is expected to be in the scene over a certain time period. At faster velocities, however, a<br />
tube is in the image for about 4-7 frames only.<br />
Since processing time is highly limited, it is a better choice to develop fast heuristics<br />
based on model-knowledge that replace the problem of tube tracking by detecting when<br />
a tube has left the visual field. Therefore, the following very fast heuristics have been<br />
defined:<br />
1. Backward motion<br />
2. Timeout<br />
Backward motion Since the conveyor moves always in one direction (e.g. from left to<br />
right in the image), it is impossible that a tube moves backward. Thus, if the horizontal<br />
image position of the tube at time t is smaller than at time t − 1(i.e. thetubewouldhave<br />
moved further to the left), this can be used as indicator that the current measurement<br />
belongs to the next tube. The position of a tube can be defined as the x-coordinate of<br />
the left measuring point. Hence with the image content assumption the tube measured at<br />
time t − 1 has left the visual field if xpL (t)
92 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
Timeout The backward motion heuristic assumes a tube has passed the visual field of<br />
the camera when the next tube is measured for the first time. This requires a successor<br />
for each tube within a certain time period. With respect to the blow out mechanism<br />
it is important that the good/ bad decision is made quickly, since the controller (see<br />
Section 3.4) must receive the result before the tube has passed the light barrier. Thus, a<br />
timeout mechanism is integrated. If no new tube arrives for more than ∆t frames, it is<br />
assumed that the previously measured tube has passed the measuring area and the total<br />
length can be computed. In practice, ∆t should be oriented on the average number of per<br />
tube measurements and the distance between measuring area and light barrier.<br />
4.6.4. Total Length Calculation<br />
Oncethereisevidencethatatubehaspassedthevisualfieldofthecamera,thesingle<br />
measurements have to be combined to a total length. Let mi denote the number of<br />
measurements assigned to tube i, andlj(i) thepixellengthofthejth measurement (0 <<br />
j ≤ mi) ofthattube.Themeanlengthl(i) oftubei canbecomputedas:<br />
mi �<br />
l(i) = lj(i) (4.23)<br />
j=1<br />
The mean has the significant drawback, since it is quite sensitive to outliers. For example<br />
assume at one of five measurements a background edge is wrongly classified as tube edge<br />
and the resulting length is therefore larger than the actual length. This outlier would also<br />
enlarge the resulting mean length, even if the remaining measurements have approximately<br />
the same (correct) value. To reduce the influence of outliers, the k strongest outliers are<br />
excluded from the averaging. Therefore, the measurements are sorted in ascending order<br />
based on the squared distance dj(i) tothemeanl(i) with<br />
dj(i) =(lj(i) − l(i)) 2<br />
(4.24)<br />
In the following, only the first mi − k measurements in the sorted list are averaged to<br />
the total length ltotal(i) oftubei as:<br />
ltotal(i) =<br />
mi−k �<br />
j=1<br />
l ′ j(i) (4.25)<br />
where l ′ indicates the measurements are sorted based on Equation 4.24, i.e. dj(i) <<br />
dj+1(i) for 0
4.7. TEACH-IN 93<br />
inthemeasuringplaneΠM that can be represented by one pixel in the image plane. The<br />
total length in mm Ltotal of tube i canbecomputedasfollows:<br />
Ltotal(i) =ltotal(i)fpix2mm<br />
(4.27)<br />
The length Ltotal is used for the good/bad classification whether a tube meets the allowed<br />
tolerances. This can be formalized to:<br />
�<br />
GOOD if |Ltotal(i) − Ltarget|
94 CHAPTER 4. LENGTH MEASUREMENT APPROACH<br />
and background afterward. It is computed dynamically based on the regional mean of<br />
the profile and a constant factor αpeak (see Equation 4.9). Although this parameter is<br />
assumed to be constant it has to be trained once with respect to the conveyor belt used.<br />
The teach-in of this parameter is very simple and intuitive. The visual system is set to<br />
inspection mode, i.e. it is started as for standard measuring. The conveyor is empty, but<br />
moving. The operator can adjust αpeak online starting at a quite low value. This value<br />
is slightly increased as long as the system detects tubes (ghosts) where actually no tubes<br />
are. Until now this procedure has to be performed manually, but one could think of an<br />
automated version to reduce the influence of a human operator which is always a source<br />
of errors.<br />
To ensure the threshold has not become too large, several tubes are placed on the<br />
conveyor. If the system is able to successfully detect all tubes (detection does not mean<br />
the length has to be computed correctly in this context), the profile threshold factor is<br />
assumed to be trained sufficiently. If the conveyor belt is not uniformly translucent, i.e.<br />
the overall image brightness changes significantly over time, one has to assure that the<br />
system is able to detect a tube both at the brightest and at the darkest region of the belt.<br />
4.7.3. Perspective Correction Parameters<br />
As introduced in Section 4.6.2 perspective effects in the measuring data can be reduced<br />
using a perspective correction function fcor(x). This function has two parameters c1 and<br />
c2 that have to be learned in the teach-in step from real data.<br />
One intuitive method to do this is to measure a tube at a very slow conveyor velocity.<br />
The result is a set of pixel length measurements (see Figure 4.25(a)) at almost every<br />
position in the image. Then, the parameters of a second order polynomial f(x) =c1x 2 +<br />
c2x + c3 can be computed using nonlinear least-squares (NLLS) methods. In this case, a<br />
standard Levenberg-Marquardt algorithm [53] is used.<br />
The resulting parameters c1 and c2 can be directly inserted into Equation 4.22 to compute<br />
fcor(x).<br />
For robust results this procedure can be repeated several times and the final parameter<br />
set is averaged. Alternatively one could first acquire measurements of several tubes and<br />
fit the correction function to the total data.<br />
4.7.4. Calibration Factor<br />
The most important parameter to be trained in the teach-in step is the calibration factor<br />
that relates a length in the image to a real world length in the measuring plane ΠM. This<br />
factor has been introduced as fpix2mm. The idea is to learn the calibration factor based<br />
on correspondences between measurements and ground truth data.<br />
In an interactive process the operator places a tube of known length onto the moving<br />
conveyor. The velocity of the conveyor is set to production velocity, i.e. the velocity where<br />
the tubes will be measured later. When the tube reaches the visual field of the camera<br />
it is measured with the described approach, but at pixel level only. Once the tube has<br />
left the measuring area, the total pixel length is computed and the user is asked to enter<br />
the real world length of this tube into a dialog box. Again the input device is a standard<br />
keyboard in the prototype version of the system.
4.7. TEACH-IN 95<br />
The pair of a pixel length l(i) and a real world reference L(i) can be used to compute<br />
the ideal factor fpix2mm(i) thatconvertspixelsintomm for a measurement i as follows:<br />
fpix2mm(i) = L(i)<br />
(4.29)<br />
l(i)<br />
This procedure has to be repeated several times for different reference tubes. Finally,<br />
the estimated calibration factor is computed analog to Equation 4.25 using a k-outlier<br />
filter before averaging:<br />
fpix2mm =<br />
N−k �<br />
j=0<br />
f ′ pix2mm(j) (4.30)<br />
where k is the number of outliers, N the number of iterations, and f ′ pix2mm indicates the<br />
single calibration factors sorted by the squared distance to the mean in ascending order.<br />
The median could be also used instead of averaging.<br />
The root-mean-square error at iteration i betweentheknownrealworldlengthsand<br />
thelengthscomputedbasedontheestimatedcalibrationfactorcanbeusedasmeasure<br />
of quality.<br />
�<br />
�<br />
�<br />
Err(i) = � i �<br />
(L(j) − l(j)fpix2mm) 2 (4.31)<br />
j=1<br />
If the error is low, this can be used as indicator that the learned calibration factor is<br />
a good approximation of the ideal magnification factor that relates a pixel length in the<br />
image into a real world length in the measuring plane ΠM without any knowledge on the<br />
distance between ΠM and the camera.<br />
In practice, the learning of the calibration factor is an interactive process. One can<br />
define a minimum and maximum number of iterations Nmin and Nmax respectively. Once<br />
Nmin correspondences have been acquired, fpix2mm and Err(i) are computed for the first<br />
time. The operator continues the procedure as long as the calibration at iteration i +1<br />
does change more than a little epsilon compared to iteration i. This means the learning<br />
can be stopped if |Err(i +1)− Err(i)|
96 CHAPTER 4. LENGTH MEASUREMENT APPROACH
5. Results and Evaluation<br />
5.1. Experimental Design<br />
There are several parameters influencing the measuring results both in the hardware setup<br />
and in the vision algorithms. To yield meaningful results, it is important to vary not more<br />
than one parameter within the same experiment. In the following the parameters that are<br />
tested as well as the evaluation criteria and the strategies used are proposed.<br />
5.1.1. Parameters<br />
The different parameters of the system can be grouped into four main categories including<br />
tube, conveyor, camera and software respectively. Table 5.1 summarizes the most<br />
important representatives of each category.<br />
Obviously, there are much more parameters which have been described in the previous<br />
chapter that theoretically fall in the last category. However, most of these parameters<br />
do not have to be changed (e.g. the number of profile scanlines or the local ROI width).<br />
The corresponding value assignments have been determined empirically at representative<br />
sequences and are summarized in Table 5.2.<br />
αpeak =4.0 has been determined in a teach-in step as proposed in Section 4.7.2 and<br />
yields best results for transparent tubes with the conveyor belt and the illumination used.<br />
This assignment does also cover black tubes, although the threshold could be much larger<br />
in that case. As long as the conveyor belt is not changed and the amount of dirt on<br />
the conveyor does not change significantly, the detection sensitivity does not have to be<br />
re-initialized each time.<br />
A timeout period of ∆t = 5 frames for the tube tracking (see Section 4.6.3) has been<br />
used throughout the experiments, which is a good compromise between the number of<br />
expected per tube measurements and the distance to the light barrier.<br />
Approximately 1/4 of all measurements (rounded to the next integer value) are not<br />
considered for the total length computation with αoutlier =0.25 to eliminate outliers in<br />
the single measurements as introduced in Section 4.6.4. The same value is used for the<br />
outlier filter in the teach-in step (see Section 4.7.4)<br />
The teach-in of the calibration factor fpix2mm (see Section 4.7.4) terminates if the root<br />
mean square error does not change for more than ɛ =0.0001 between two iterations.<br />
Since it is still very complex to test all permutations and assignments of the remaining<br />
parameters, one has to make compromises in the experimental design. Therefore, some<br />
of the parameters listed above have been adjusted before the experiments to meet the<br />
assumptions made in Section 4.2. This includes the guide bar distance as well as the<br />
illumination (fiber optical back light setup through the conveyor belt) and all camera<br />
parameters, i.e. lens, working distance, exposure time and F-number. For all experiments<br />
with 50mm tubes a 16mm focal length lens at a working distance of approximately 250mm<br />
is used. The shutter time has been adjusted to 1.024ms which is a good compromise<br />
97
98 CHAPTER 5. RESULTS AND EVALUATION<br />
Category Parameter<br />
Tube Color<br />
Length<br />
Diameter<br />
Conveyor Velocity<br />
Tube spacing<br />
Guide bar distance<br />
Camera Lens<br />
Working distance<br />
Exposure time<br />
F-number<br />
Software Profile peak threshold τpeak (sensitivity)<br />
Number of templates (scale, orientation, curvature)<br />
Perspective correction<br />
Calibration factor<br />
Table 5.1: Overview on different test parameters<br />
Parameter Category Description Value Section<br />
Nscan Profile Analysis Number<br />
scanlines<br />
of 11 4.4.1<br />
Ksmooth Profile Analysis Smoothing<br />
kernel size<br />
19 4.4.2<br />
αpeak Profile Analysis Peak threshold<br />
factor<br />
4.0 4.4.2<br />
WROI Edge detection Local<br />
width<br />
ROI 15 4.4.3<br />
γ Template Generation Template 0.95 4.5.3<br />
Rψ, right Template Generation<br />
height ratio<br />
Curvature<br />
range right<br />
[-0.005, 0] 4.5.3<br />
Rψ, left Template Generation Curvature<br />
range left<br />
[0, 0.005] 4.5.3<br />
χ Template Generation Curvature resolution<br />
0.0005 4.5.3<br />
b Template Generation Height weighting<br />
coefficient<br />
3 4.5.3<br />
k Template Generation Number of rotations<br />
3 4.5.3<br />
∆t Tube Tracking Time out period<br />
5 4.6.3<br />
αoutlier Total Length Outlier factor 0.25 4.6.4<br />
ɛ Teach-In Allowed cali- 0.0001 4.7.4<br />
bration error<br />
Table 5.2: Constant software parameter settings throughout the experiments.
5.1. EXPERIMENTAL DESIGN 99<br />
between light efficiency and motion blur effects. This shutter time requires a small Fnumber<br />
of 1.4 to yield sufficient bright images.<br />
In all experiments it is assumed that the system is calibrated correctly, the radial distortioncoefficientsareknownandateach-instephasbeenperformedtolearnfpix2mm.<br />
In<br />
addition, the perspective correction function has been determined before each experiment<br />
to compensate for perspective distortions.<br />
5.1.2. Evaluation Criteria<br />
There are several criteria that can be used to compare and evaluate the results of different<br />
experiments. These can be classified into quantitative and qualitative criteria.<br />
Quantitative Criteria<br />
Total Detection Ratio The system must exactly detect the number of tubes that pass<br />
the visual field of the camera. Formally, this can be expressed in the following score Ωtotal:<br />
Ωtotal = Ndetected<br />
(5.1)<br />
Ntotal<br />
where Ndetected indicates the number of detected tubes and Ntotal the total number<br />
of tubes respectively. Ωtotal = 1 is a necessary but not sufficient criterion for a correct<br />
working inspection system.<br />
Per Tube Measurements The average number of single measurements for each tube<br />
depends mainly on the velocity of the conveyor and the camera frame rate. If N tubes<br />
have been measured, the mean number of per tube measurements can be computed as:<br />
ΩPTM = 1<br />
N<br />
where mi isthenumberofsinglemeasurementsoftheith tube.<br />
N�<br />
i=1<br />
mi<br />
(5.2)<br />
False Positives/ False Negatives Each tube T can be classified into one of the three<br />
groups G0 (good ),G− (too short), and G+ (too long) ifmeasuredmanually. G0 is defined<br />
by the target length and the allowed tolerance for this length. It contains all tubes that<br />
meet the tolerance in the real world. G− and G+ include all tubes of a real world length<br />
that lie below the lower or above the upper tolerance threshold respectively.<br />
In the same way, each tube can be categorized into one of the three groups G ′ 0 , G′ −,or<br />
G ′ + based on the measured length by the visual inspection system. In the ideal case, this<br />
three groups are equal to the corresponding ground truth classifications, i.e. G ′ 0 = G0,<br />
G ′ − = G−, andG ′ + = G+ 1 .<br />
In practice, however, the measurements are biased by many factors like perspective<br />
errors, curved tubes, skew tube edges, noise, motion blur, or failures in measuring point<br />
detection. In addition, as will be introduced in Section 5.1.3, the manually acquired ground<br />
1 Theoretically, a fourth group U for unsure can be defined including all tubes that could not be detected<br />
at all. These tubes have to be handled by different mechanisms as will be discussed in later sections
100 CHAPTER 5. RESULTS AND EVALUATION<br />
truth data has also a certain variance. Thus, the distributions measured by humans and<br />
a machine vision system may differ. This gets critical if two distributions intersect.<br />
Tubes that are actually too short or too long, but are measured to be within the tolerance<br />
are denoted as false positives (FP). On the other hand, tubes of an allowed length can be<br />
wrongly classified as outlier and are denoted as false negatives (FN). More mathematically,<br />
false positives and false negatives can be defined as follows:<br />
FP = {T |T ∈ G ′ 0 ∧ T /∈ G0} (5.3)<br />
FN = {T |T /∈ G ′ 0 ∧ T ∈ G0} (5.4)<br />
In terms of system evaluation, the following measures can be used:<br />
ΩFP = NFP<br />
Ntotal<br />
ΩFN = NFN<br />
Ntotal<br />
(5.5)<br />
(5.6)<br />
where NFP and NFN indicate the number of false positives and false negatives respectively.<br />
Both the false positive ratio ΩFP and the false negative ratio ΩFN should be zero<br />
in the optimal case. As already discussed in the introduction, ΩFP is more critical than<br />
ΩFN, since it is less bad to sort out a good tube than delivering a failure to the customer.<br />
Performance The performance of the system can be evaluated with respect to the average<br />
processing time that is needed to analyze a frame:<br />
ΩTIME = 1<br />
M<br />
M�<br />
i=1<br />
ti<br />
(5.7)<br />
where M is the number of frames considered and ti represents the processing time<br />
of frame i. ΩTIME is expressed in terms of ms/frame. This measure can be used to<br />
determine the maximum possible capture rate. Skipped frames indicate that the camera<br />
captures more frames than the system is able to process.<br />
Qualitative Criteria<br />
Standard Deviation Per Tube The multi-image measuring approach is based on the<br />
idea, that more robust measuring results can be reached if each tube is measured several<br />
times. In the ideal case, all measurements should yield the equal length value. In practice,<br />
however, the single measurements can differ. The standard deviation σtube(i) canbeused<br />
as an indicator of how much these measurements vary. It is computed as:<br />
�<br />
�<br />
�<br />
σtube(i) = � 1<br />
mi � � �2 lj(i) − l(i)<br />
mi − 1<br />
j=1<br />
(5.8)
5.1. EXPERIMENTAL DESIGN 101<br />
where lj(i) indicates the length of the jth single measurement of tube i, l(i) themean<br />
over all single measurements of this tube, and mi is the total number of single measurements<br />
of tube i. σtube is expressed in terms of pixels.<br />
A large per tube standard deviation represents a uncertainness in the results. In this<br />
case, the mean describes the data only roughly. If the uncertainness is too large, it may<br />
be better to blow out the particular tube, since the probability of a false positive decision<br />
increases proportional with the standard deviation.<br />
Sequence Standard Deviation The standard deviation of a sequence σseq is computed<br />
analogue to σtube, but not with respect to the single measurements of one tube, but to the<br />
computed total length ltotal of N tubes:<br />
�<br />
�<br />
�<br />
σseq = � 1<br />
N − 1<br />
N�<br />
i=1<br />
� �2 ltotal(i) − ltotal<br />
(5.9)<br />
where ltotal is the mean over all total measurements. Finally, all measurements can be<br />
represented by a Gaussian distribution function G(x) as:<br />
G(x) =<br />
�<br />
1 (x − µseq)<br />
√ exp<br />
2π 2 �<br />
(5.10)<br />
σseq<br />
2σ2 seq<br />
where µseq = ltotal. The production is most accurate if the distance between the given<br />
target length and the mean of this distribution is small.<br />
Ground Truth Distance The difference between the vision-based length measurement<br />
results and the manually acquired ground truth data can be seen as relative error assuming<br />
the ground truth data is correct. Interesting are the minimum and maximum ground truth<br />
distance (GTD) of a sequence of tubes defined as:<br />
GT Dmin = min {(ltotal(i) − lgt(i)) | 1 ≤ i ≤ N} (5.11)<br />
GT Dmax = max {(ltotal(i) − lgt(i)) | 1 ≤ i ≤ N} (5.12)<br />
where ltotal(i) is the computed total length of tube i, lgt(i) the corresponding ground<br />
truth length, and N the number of tubes considered. If the mean ground truth distance<br />
GT D is approximately zero, the deviation is distributed equally. Otherwise, if GT D > 0,<br />
the measured length is predominantly larger than the ground truth measurement. Accordingly<br />
if GT D < 0, the opposite is valid. In both cases, the systematic error indicates<br />
the system is probably not calibrated correctly.<br />
Root Mean Square Error (RMSE) The root mean square error measure is used to<br />
compare the measurements of the visual inspection system to manually acquired ground<br />
truth data over a sequence as follows:<br />
�<br />
�<br />
�<br />
RMSE = � 1<br />
N�<br />
(ltotal(i) − lgt(i))<br />
N<br />
2<br />
(5.13)<br />
i=1
102 CHAPTER 5. RESULTS AND EVALUATION<br />
Figure 5.1: Measuring slide used for acquiring ground truth measurements by hand.<br />
with ltotal(i), lgt(i) andN as defined before. A small root mean square error indicates the<br />
measurements are close to the ground truth data.<br />
5.1.3. Ground Truth Measurements<br />
The acquisition of ground truth data is important for evaluating the vision-based inspection<br />
system with respect to human measurements. For this purpose a special digital<br />
measuring slide as can be seen in Figure 5.1 has been used. The precision of this device<br />
is up to 1/100mm.<br />
However, there is a significant deviation in human measurements, since heat shrink<br />
tubes are flexible. Depending on the force the human operator applies to the measuring<br />
slide, the measured length gets smaller or larger. This variation has been investigated<br />
empirically.<br />
12 sample tubes of different diameter (6, 8 and 12) are selected as test set (see Table<br />
5.3). One half of the samples are black, the other half transparent tubes. For each<br />
combination of color and diameter, one tube has a length of approximately 50mm and one<br />
was manipulated, i.e. slightly larger or shorter than the tolerance allows for.<br />
No. color diameter mean length<br />
1 Transparent 8 49.95<br />
2 Transparent 6 49.77<br />
3 Transparent 12 49.82<br />
4 Transparent 8 48.19<br />
5 Transparent 6 51.33<br />
6 Transparent 12 51.88<br />
7 Black 8 50.98<br />
8 Black 6 50.19<br />
9 Black 12 50.00<br />
10 Black 6 50.84<br />
11 Black 8 49.66<br />
12 Black 12 51.56<br />
Table 5.3: Test set used to determine the human variance in measuring.<br />
The results are shown in Figure 5.2. In a first experiment, the variance of a single<br />
person is investigated denoted as intra human variance. Each tube in the test set has<br />
been measured 10 times by the same person with the goal to be as precise as possible.
5.1. EXPERIMENTAL DESIGN 103<br />
Ground Truth Length [mm]<br />
52.5<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
Intra Human Variance<br />
48<br />
0 2 4 6 8 10 12 14<br />
Tube<br />
(a)<br />
Ground Truth Length [mm]<br />
52.5<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
Inter Human Variance<br />
48<br />
0 2 4 6 8 10 12 14<br />
Tube<br />
Figure 5.2: Intra and inter human variance for the test set in Table 5.3 under ideal laboratory<br />
conditions. The error bars indicate the maximum and minimum length for each of the 12 tubes<br />
as well as the mean value of the measurements once for one person (a) and once for 10 persons<br />
(b). The average inter human variance is slightly larger compared to the intra human variance.<br />
Theerrorbarsindicatethemaximumandminimumlengthaswellasthemeanvalueof<br />
all measurements. The computed mean standard deviation is 0.078mm.<br />
In a second experiment, the inter human variance is determined. Therefore, 10 persons<br />
have been asked to measure the same test set again as precise as possible. The inter human<br />
variance is slightly larger than the intra human variance (see Figure 5.2(b)). In this case,<br />
the mean standard deviation was observed to be 0.083mm.<br />
Furthermore, it is important to state that the manual measurements for the ground<br />
truth data have been acquired very carefully with elevated concentration under laboratory<br />
conditions and with the aim to be as precise as possible using the digital measuring slide<br />
(see Figure 5.1). Less than 5 tubes can be measured within one minute at this precision. At<br />
production, the sample measurements are performed with a standard sliding caliper and<br />
at a much higher rate. There is a definitively tradeoff between accuracy and speed. The<br />
expected individual measuring error at production is much larger. Furthermore, factors<br />
like tiredness or distraction can significantly increase the inter and intra human measuring<br />
variance.<br />
The accuracy and precision of the visual inspection system, however, should be evaluated<br />
with respect to the maximum possible accuracy humans can reach with the given<br />
measuring slide under ideal conditions. Throughout this thesis, manual ground truth measurements<br />
always refer to the ideal, laboratory condition measurements. One has to keep<br />
in mind that there is still a certain unsureness in these measurements. The real absolute<br />
length of a tube can not be determined exactly.<br />
For the following experiments, all tubes have been measured three times to reduce the<br />
influence of the human variance. The mean of the three measurements is taken as ground<br />
truth reference. All measurements are stored in a database and each measured tube is<br />
labeled by hand with a four digit ID using a white touch-up pen.<br />
(b)
104 CHAPTER 5. RESULTS AND EVALUATION<br />
Figure 5.3: At velocities > 30m/min larger sequences of tubes with a small spacing have to<br />
be placed on the conveyor using a special supply tube.<br />
5.1.4. Strategies<br />
Online vs Offline Inspection There are two main strategies for evaluation of the inspection<br />
system. The first strategy analyzes the tubes online, i.e. in real-time on the conveyor.<br />
This includes the tube localization, tracking, measuring as well as the good/bad classification.<br />
The results are stored in a file and can be further processed or visualized afterward.<br />
This is closely related to the application at production. The drawback of this approach is<br />
that if there is some interesting or strange behavior observed in the resulting data, it is<br />
difficult to localize the origin.<br />
Therefore, the second evaluation strategy is based on an offline inspection. This means<br />
a sequence of tubes is first captured into a file at the maximum frame rate that can be<br />
processed online. Then, the sequence can be analyzed repetitive with different sets of<br />
parameters or methods. This is a significant advantage if one wants to compare different<br />
techniques or parameter settings.<br />
In the following experiments, both strategies will be applied.<br />
Tube Placement The prototype setup in the laboratory has one significant drawback.<br />
The tubes to be inspected have to be added manually to the conveyor, since there is no<br />
second conveyor from which the tubes fall onto continuously like in production. The size of<br />
the conveyor allows for about 21 tubes of 50mm length with a spacing of 10mm in between.<br />
If all tubes are placed on the inactive conveyor it takes some time until the desired velocity<br />
is reached. Therefore, at faster velocities, the first tubes pass the measuring area with a<br />
slower velocity leading to unequal conditions between measurements.<br />
Hence, either less tubes have to be placed on the conveyor (starting further away from<br />
themeasuringarea)orthetubeshavetobeplacedontotheconveyorwhileitisrunning<br />
at the desired velocity. The later is hardly possible for a human without producing large<br />
spacings between two consecutive tubes. Instead a certain supply tube of about 1.30m<br />
length, with a diameter slightly larger than the current tube diameter, can be used as<br />
magazine for about 25 tubes of 50mm length (see Figure 5.3). The supply tube is placed<br />
at steep angle at the front of the conveyor (in moving direction). If the conveyor is not<br />
moving, the tubes are blocked and can not leave the supply tube. On the other hand if<br />
theconveyorismoving,thebottomtubeisgrippedbythebeltandcanleavethesupply<br />
tube through a bevel opening in moving direction. If the velocity of the conveyor is fast
5.2. TEST SCENARIOS 105<br />
enough, the time until the next tube in the supply tube is gripped by the belt is sufficient to<br />
produce a spacing. Experiments have shown that the supply tube works only for velocities<br />
> 30m/min. Otherwise it is possible that two consecutive tubes are not separated.<br />
Thus, one has two disjunctive methods to fit a conveyor with tubes. One is working well<br />
for lower velocities, the other for faster ones. In both cases the maximum number of tubes<br />
is limited. Therefore, larger experiments have to be partitioned over several sequences.<br />
Test Data Since it is not worthwhile to manually measure thousands of tubes as ground<br />
truth reference, the number of tubes that can be compared to such reference lengths is<br />
limited. However, it is possible to increase the number of ground truth comparisons if one<br />
repeats the automated visual measurement of a manually measured tube. For example, one<br />
can manually measure 20 tubes of each particular type (that number can be placed onto<br />
the conveyor or into the supply tube at one time) and repeat the automated inspection<br />
several times. From the algorithmic perspective the system is confronted with a new<br />
situation every time, independent if there are 100 different tubes to be inspected or 5×20.<br />
In the following it is distinguished between tubes of a length that meet the given target<br />
length within the allowed tolerance and tubes of manipulated length falling outside this<br />
tolerance. The system must be able to separate the manipulated tubes from the proper<br />
ones.<br />
5.2. Test Scenarios<br />
Eight test scenarios have been developed to evaluate the system. In each scenario only<br />
one parameter is varied, while the others are kept constant. The different scenarios are<br />
introduced in the following.<br />
Noise Before the system is tested with respect to real data, the accuracy and precision<br />
of the measuring approach is evaluated on synthetic images. A rectangle of known pixel<br />
size simulates the projection of an ideal tube that is not deformed by perspective. The<br />
‘tube edges’ as well as the measuring points are detected with subpixel precision like at<br />
real images. The resulting length in pixels must equal the rectangle width. To evaluate<br />
the accuracy under the presence of noise, Gaussian noise of different standard deviation<br />
is added systematically to the sequences.<br />
Minimum Tube Spacing In this scenario the minimum spacing between tubes is investigated<br />
both for black and transparent tubes on real images. The test objects have a size<br />
of about 50mm within the allowed tolerance and a diameter of 8mm. The velocity of the<br />
belt is 30m/min. Starting at sequences that allow for only one tube in the visual field,<br />
e.g. the spacing is larger than the tube length, the spacing is decreased until the detection<br />
rate Ωtotal fallsbelow1,i.e.atleastonetubecouldnotbedetected.<br />
Conveyor Velocity The goal in this scenario is to investigate how accuracy and precision<br />
of the measurements depend on the velocity of the conveyor. The focus is on four different<br />
velocities: slow (10m/min), medium (20m/min), fast (30m/min), and very fast (40m/min).<br />
This is the maximum velocity that can be reached at production. Currently, the production<br />
line runs at approximately 20m/min. To test the limits of the system, even higher velocities
106 CHAPTER 5. RESULTS AND EVALUATION<br />
up to 55m/min are tested. For all velocities > 30m/min,thetubeshavetobeplacedonto<br />
the conveyor using the supply tube.<br />
Again the inspected tube size is about 50mm in length within the allowed tolerance and<br />
a diameter of 8mm both for black and transparent tubes. The spacing in between the tubes<br />
must be large enough following the results of the minimum tube spacing experiments.<br />
In this scenario, all evaluation criteria introduced in Section 5.1.2 are considered including<br />
a comparison to ground truth measurements. The evaluation is performed offline.<br />
Tube Diameter If the distance between camera and conveyor belt does not change,<br />
the diameter of a tube influences the distance between the measuring plane ΠM (see<br />
Section 4.2) and the image plane. Tubes with a smaller diameter are further away and<br />
appear smaller in the image, while tubes with a larger diameter are magnified in the image.<br />
Thus, the calibration factor that relates a pixel length to a real world length in mm has<br />
to be adapted.<br />
The test data includes transparent and black tubes with a diameter of 6, 8 and 12mm<br />
and a length of 50mm that meet the allowed tolerances. The conveyor velocity is constant<br />
at 30m/min. Again all evaluation criteria are considered and the evaluation is performed<br />
offline.<br />
Repeatability In this scenario, a tube of known size is measured many times in a row<br />
at a constant velocity of 30m/min. Theoretically, the system should measure the same<br />
length each time, since one can assume the length of the tube does not change throughout<br />
the experiments. As mentioned before there are several parameters that can influence the<br />
repeatability in practice like a varying background.<br />
In the same experiment one can not only determine the repeatability, i.e. the precision<br />
of the system, but also the accuracy if one does not use a heat shrink tube, but an ideal<br />
tube gage. Such a gage can be made from metal with much higher precision overcoming<br />
the human variance in measuring deformable heat shrink tubes. For comparable results,<br />
the gage should have the same shape and dimension of a heat shrink tube. Since it does<br />
not transmit light, a metallic gage can simulate black tubes only.<br />
The real world length of the gage is known very accurate and precise. Thus, the RMSE<br />
of the measuring results gets almost independent of errors in the ground truth data.<br />
The measurements can be best performed online, i.e. in real-time, due to the amount<br />
of accumulating data. The resulting lengths are stored in a file for later evaluation.<br />
Outlier Detection Until now, all experiments are based on test data that is known to<br />
meet the given tolerances. In this scenario, tubes of approximately 50mm length are mixed<br />
with tubes that are too long or too short, i.e. differ from the target length for more than<br />
0.7mm.Thepositionandthenumberoftheoutliersinasequenceisknown.Thesystem<br />
must be able to detect the outliers correctly. Thus, the false positive and false negative<br />
rate are the main criteria of interest in this scenario.<br />
The evaluation can be performed both offline or online.<br />
Tube Length As mentioned before, the focus in this thesis is set to tubes of 50mm length.<br />
In addition it is shown that the system is able to measure also tubes of different length<br />
exemplary for tubes of 30 and 70mm length.
5.3. EXPERIMENTAL RESULTS 107<br />
The tolerances for these lengths differ, i.e. the 30mm tubes are allowed to deviate only<br />
up to 0.5mm around the target length while 70mm tubes have a larger tolerance of 1mm.<br />
The measuring precision can be directly linked to these tolerances. Accordingly the system<br />
must measure smaller tubes with a higher precision then larger ones.<br />
In this scenario, the accuracy and precision is evaluated based on the mean and standard<br />
deviation of a sequence of tubes measured online that approximately meet the given target<br />
length. Corresponding ground truth data is available.<br />
Performance Finally, it is of interest to determine the performance of the system in<br />
terms of the average per frame processing time ΩTIME. It is investigated how the total<br />
processing time is distributed over the different stages of the inspection including radial<br />
distortion compensation, profile analysis, edge detection and template matching, as well<br />
as the total length computation and tracking.<br />
5.3. Experimental Results<br />
In this section the experimental results of the different scenarios are presented and discussed.<br />
Further discussion as well as an outlook on future work is given in Section 5.4.<br />
5.3.1. Noise<br />
The influence of noise on the measuring accuracy is tested on synthetic sequences. Rectangles<br />
of 200 pixels width are placed on a uniform background with a contrast of 70 gray<br />
levels between the object and the brighter background. The image size is 780 × 160, and<br />
the sequence is analyzed like a real sequence with two differences. First, the perspective<br />
correction function is disabled, since the synthetic ‘tube’ is not influenced by perspective,<br />
i.e. the width of the rectangle is constant independent of the image position. Furthermore,<br />
the dynamic selection of template curvatures based on the image position does not work<br />
as well in this scenario, since the model knowledge assumptions do not hold. Thus, in<br />
this experiment all templates are tested at each position (computation time is not critical<br />
here).<br />
Gaussian noise of standard deviation σN has been added to the ideal images, with<br />
σN ∈{5, 10, 25}. Sample images of each noise level are shown in Figure 5.4(a)-(d).<br />
The measuring results are evaluated using the root-mean-square-error between the<br />
ground truth length of 200pixels and the result of the single measurements. The results<br />
show that in the ideal (noise free) case, the pixel length is always measured correctly.<br />
Under the presence of noise, the measured length varies at subpixel level. Figure 5.4(e)<br />
shows how the measurements differ in accuracy and precision under the presence of noise.<br />
The maximum deviation from the target length occurs at the largest standard deviation<br />
(σN = 25). The RMSE results can be found in Figure 5.4(f). For sequences with only<br />
a little amount of noise (σN = 5) the RMSE is acceptable low with 0.122. If one pixel<br />
represents 0.12mm in the measuring plane, the real world error is about 1/100mm. Even<br />
under strong noise (σN = 25), which is far beyond the noise level of real images, the<br />
measuring error is 0.252pixels or 0.03mm in the example. This is still significantly below<br />
the human measuring variance.
108 CHAPTER 5. RESULTS AND EVALUATION<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
(a) σN =0 (b) σN =5<br />
(c) σN =10 (d) σN =25<br />
std=0<br />
std=5<br />
std=10<br />
std=25<br />
0<br />
199 199.2 199.4 199.6 199.8 200<br />
Length [pixel]<br />
200.2 200.4 200.6 200.8 201<br />
(e) (f)<br />
σN RMSE<br />
0 0<br />
5 0.122<br />
10 0.158<br />
25 0.252<br />
Figure 5.4: Accuracy evaluation of length measurements at synthetic sequences under the<br />
influence of noise. (a)-(d) Rectangles of known size (length = 200 pixels) simulate a tube<br />
on a uniform background without perspective effects. Gaussian noise of different standard<br />
deviation σN ∈{5, 10, 25} has been added to the ideal images. (e) Gaussian distribution of<br />
the measurements. (f) Root mean square error (RMSE) for each noise level.
5.3. EXPERIMENTAL RESULTS 109<br />
Detection rate<br />
1.05<br />
1<br />
0.95<br />
0.9<br />
0.85<br />
0.8<br />
black<br />
transparent<br />
0.75<br />
0 10 20 30<br />
Tube spacing [mm]<br />
40 50 60<br />
Figure 5.5: Detection rate of black and transparent tubes depending on the spacing between<br />
consecutive tubes.<br />
Thus, one can conclude the system is able to detect the synthetic tube edges very accurate<br />
even under the presence of noise if there is a sufficient contrast between background<br />
and foreground.<br />
5.3.2. Minimum Tube Spacing<br />
10 black and 10 transparent tubes are used to investigate the influence of the spacing<br />
on the detection rate. The tubes have been placed on the conveyor at an approximately<br />
constant spacing. Five gap sizes are tested: 60, 30, 20, 10, and 5mm respectively. Each<br />
load of tubes passes the measuring area five times for each gap size at a conveyor velocity<br />
of 30m/min. In this experiment the total detection rate Ωtotal is considered only, i.e. how<br />
many tubes are detected by the system at least once. The results are averaged over the 5<br />
iterations.<br />
As can be seen in Figure 5.5 the detection of black tubes is uncritical indicated by<br />
Ωtotal = 1 until the tube spacing is less than 10mm. This means no black tube can pass<br />
the measuring area without being measured if the spacing is ≥ 10mm. The decrease at<br />
5mm gaps to Ωtotal =0.98 (i.e. 1 tube out of 50 is not detected) may be due to the fact<br />
that the manual tube placing can not guarantee an exact spacing of 5mm. It is likely<br />
that the distance between two tubes has become even smaller leading to the failure. Since<br />
the tests have been performed online it is not possible to locate the origin of the outlier.<br />
Therefore, it has been investigated how small the gap between two black tubes must be<br />
until the profile analysis fails to locate the tube. The results are shown in Figure 5.6. Even<br />
a spacing of about 2mm as in (a) is large enough to reliably detect the background regions<br />
between the tubes as can be also seen at the corresponding profile analysis results in (c).<br />
A gap of about 1mm, however, is too small even for black tubes. Due to perspective the<br />
points closer to the camera merge (see Figure 5.6(b) and (d)).<br />
Thetransparenttubesshowadetectionrateof< 1 even for the largest tested gap<br />
size of 60. This can be explained by the much lower contrast to the background. If the<br />
system must able to overcome a strong non-uniform background brightness, one has to<br />
make a larger compromise in terms of detection sensitivity. As it turns out there is no<br />
parameter setting that can guarantee that all tubes are detected independent of the gap
110 CHAPTER 5. RESULTS AND EVALUATION<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
(a) (b)<br />
smoothed profile<br />
segment boundaries<br />
local median<br />
global mean<br />
regional mean<br />
predicted tube boundaries<br />
0<br />
0 20 40 60 80 100 120 140 160 180<br />
(c)<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
smoothed profile<br />
segment boundaries<br />
local median<br />
global mean<br />
regional mean<br />
0<br />
0 20 40 60 80 100 120 140 160 180<br />
Figure 5.6: Minimum tube spacing for black tubes. (a) A spacing of about 2mm is still<br />
sufficient to locate the measurable tube correctly. (b) The detection fails if the two tubes<br />
appear to touch under perspective as on the left side. (c) Profile analysis of (a). (d) Profile<br />
analysis of (b).<br />
size. However, the results have shown that the detection rate decreases drastically below<br />
10mm(seeFigure5.5).<br />
As the result of these experiments the minimum spacing used in the following experiments<br />
is 10mm for black tubes and 20mm for transparent tubes.<br />
5.3.3. Conveyor Velocity<br />
The test data in this scenario includes 17 transparent and 21 black tubes of 50mm length<br />
and 8mm diameter. Manual ground truth measurements of these tubes are available. The<br />
number of tubes of each color is geared to the number of tubes that can be placed on the<br />
conveyor with a sufficient spacing. To increase the probability of a 100% detection rate,<br />
the spacing between two transparent tubes has to be larger than for black tubes. Each<br />
charge of tubes is measured 5 − 6 times at each velocity of 10, 20, 30, and 40m/min to<br />
yield a total number of > 100 measurements (based on even more single measurements)<br />
in each experiment. Thus, all tubes have to pass the measuring area many times.<br />
Before presenting the results in detail, Figure 5.7 shows an example of how the system<br />
has measured (a) the charge of black tubes and (b) the charge of transparent tubes at<br />
20m/min respectively. Both the single measurements per tube (indicated by the crosses) as<br />
well as the computed total length and the corresponding ground truth length are visualized.<br />
The lengths measured by the system are quite close to the ground truth data.<br />
These results are just an example to show what kind of data is evaluated in the following.<br />
Since it is not possible to visualize longer sequences as detailed as in Figure 5.7 due to the<br />
(d)
5.3. EXPERIMENTAL RESULTS 111<br />
Length [mm]<br />
Length [mm]<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
48<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
48<br />
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140<br />
Measurement number<br />
(a) 21 black tubes at 20m/min<br />
measurements<br />
upper tolerance<br />
lower tolerance<br />
resulting mean length<br />
ground truth<br />
boundaries<br />
0 10 20 30 40 50 60 70 80 90 100 110<br />
Measurement number<br />
(b) 17 transparent tubes at 20m/min<br />
measurements<br />
upper tolerance<br />
lower tolerance<br />
resulting mean length<br />
ground truth<br />
boundaries<br />
Figure 5.7: Measuring results at 20m/min for (a) black and (b) transparent tubes. The red<br />
crosses indicate single measurements, while the dashed vertical lines represent the boundaries<br />
between measurements belonging to the same tube. The averaged total length as well as the<br />
corresponding ground truth length are also shown in the plots. All measured tubes of this<br />
sequence meet the tolerances. However, while the transparent tubes have approximately the<br />
target length of 50mm on average, the mean of the black tubes is slightly shifted, i.e. all tubes<br />
tend to be shorter than the target length.
112 CHAPTER 5. RESULTS AND EVALUATION<br />
v[m/min] Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE<br />
10 1 11.4 0.05 -0.12 0.14 0.01 0.07<br />
20 1 6.9 0.04 -0.16 0.11 -0.02 0.07<br />
30 1 4.6 0.05 -0.19 0.19 0.0 0.07<br />
40 1 3.2 0.07 -0.21 0.17 -0.01 0.09<br />
55 1 2.3 0.07 -0.16 0.16 0.01 0.08<br />
Table 5.4: Evaluation results at different conveyor velocities v for black tubes (50mm length,<br />
∅8mm). The accuracy of the measurements does not decrease significantly with faster velocities<br />
nor with a decreasing number of per tube measurements ΩPTM indicated by the RMSE.<br />
σtube is the per tube standard deviation and GT D stands for ground truth distance (see<br />
Section 5.1.2).<br />
amount of data, more comprehensive representations will be used based on the proposed<br />
evaluation criteria.<br />
Black Tubes The results of the velocity experiments with black tubes are summarized<br />
in Table 5.4.<br />
TheblacktubesshowadetectionrateΩtotal of 1 for all velocities, i.e. no tube has<br />
passed the measuring area without being measured independent of how fast the tubes are<br />
moved. The average number of per tube measurements ΩPTM decreases from 11.4 atthe<br />
slowest velocity (10m/min) to 3.2 at the maximum possible production velocity. Even at<br />
55m/min each tube is measured at least twice. The average standard deviation σtube of<br />
themeasurementspertubereachesfrom0.04 to 0.07mm, again there is only a very little<br />
rise from the slower to the faster velocities. The absolute ground truth distance does not<br />
exceed 0.21 and measurements that are shorter or larger than the ground truth are equally<br />
distributed indicated by the mean ground truth distance GT D that is approximately zero.<br />
As an example, the ground truth distance at 30m/min is shown in Figure 5.8(a). If the<br />
distance is larger than 0, the manually measured length is shorter than the vision-based<br />
measurement and vice versa. Due to the variance in the ground truth data it is not very<br />
likely that the distance is zero for all values. However, the distance should be as small as<br />
possible. If the ground truth distance is one-sided, i.e. all measurements of the system<br />
are larger or shorter than the corresponding ground truth measurement, this indicates an<br />
imprecise calibration factor. The conversion of the pixel length into a real world length<br />
results in a systematical error which has to be compensated by adapting the calibration<br />
factor.<br />
The RMSE differs only marginally between the tested velocities. The largest RMSE<br />
is computed at 40m/min with 0.09. This value is only slightly larger than the deviation of<br />
human measurements. For lower velocities it is even better with 0.07. Another indicator<br />
of how the vision-based measurements converge to the ground truth data is the Gaussian<br />
distribution over the sequence of all measurements. This distribution is based on the<br />
mean µseq and standard deviation σseq (see Section 5.1.2). Figure 5.8(b) compares the<br />
vision-based distribution (solid line) at 30m/min and the corresponding ground truth<br />
distribution (dashed line). The mean is 49.66 in both cases. σseq is slightly larger with<br />
0.1193 compared to the ground truth with 0.1027.<br />
In terms of accuracy and precision this means the vision-based measurements of black<br />
tubes are equally accurate compared to human measurements (laboratory conditions) and
5.3. EXPERIMENTAL RESULTS 113<br />
GTD [mm]<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
-0.1<br />
-0.2<br />
-0.3<br />
-0.4<br />
-0.5<br />
-0.6<br />
ground truth distance<br />
-0.7<br />
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105<br />
Tube number<br />
(a)<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
Measurement distribution<br />
Ground truth distribution<br />
49 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51<br />
Length [mm]<br />
Figure 5.8: (a) Ground truth distance GT D in mm for black tubes (50mm length, ∅8mm)<br />
at 30m/min. (b) Gaussian distribution of all measurements compared to the ground truth<br />
distribution.<br />
v[m/min] Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE<br />
10 0.99 9.6 0.06 -0.14 0.32 0.09 0.13<br />
20 0.98 5.2 0.09 -0.16 0.29 0.08 0.11<br />
30 1 3.9 0.15 -0.16 0.66 0.15 0.20<br />
40 0.97 2.4 0.18 -0.27 0.75 0.23 0.28<br />
Table 5.5: Evaluation results at different conveyor velocities v for transparent tubes (50mm<br />
length, ∅8mm). The accuracy seems to decrease with faster velocities as can be seen at the<br />
RMSE and the mean per tube standard deviation σtube. The number of per tube measurements<br />
ΩPTM is smaller for transparent tubes. Due to the lower contrast it is more likely that<br />
a tube is not detected as measurable.<br />
are only marginally less precise. Furthermore, as an additional benefit, it is possible to<br />
show that a sequence of tubes is systematically shorter than the target length (although<br />
still in the tolerances). This information could be used to adjust the cutting machine until<br />
µseq approximates the given target length.<br />
Transparent Tubes The same experiments have been repeated with transparent tubes.<br />
The results are summarized in Table 5.5.<br />
The detection rate Ωtotal tends to decrease with an increasing velocity, although all<br />
tubes have been detected at 30m/min in this experiment. 3% of the tubes have passed<br />
the visual field of the camera without being measured at 40m/min.<br />
Due to the poorer contrast of transparent tubes the probability increases that a tube<br />
can not be located in the profile analysis step. This can be seen on the average number<br />
of per tube measurements ΩPTM. While black tubes are measured about 11.4 timesat<br />
v = 10m/min, the transparent tubes reach only 9.6 measurements per tube at the same<br />
velocity. At 40m/min this number decreases to 2.4. At faster velocities, e.g. 55m/min,<br />
the number of per tube measurements falls short of 1. Reliable measurements are not<br />
(b)
114 CHAPTER 5. RESULTS AND EVALUATION<br />
GTD [mm]<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
-0.1<br />
-0.2<br />
-0.3<br />
-0.4<br />
-0.5<br />
-0.6<br />
ground truth distance<br />
tube marker<br />
-0.7<br />
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100<br />
Tube number<br />
(a)<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
Measurement distribution<br />
Ground truth distribution<br />
49 49.2 49.4 49.6 49.8 50 50.2 50.4 50.6 50.8 51<br />
Length [mm]<br />
Figure 5.9: (a) Ground truth distance GT D in mm for transparent tubes (50mm length,<br />
∅8mm) at 30m/min. The measurements marked by a ‘+’ are all belonging to the same tube<br />
that reached the maximum GT D at measurement 68. As one can see it is not systematically<br />
measured wrong. A poor contrast region on the conveyor belt is rather the origin for the strong<br />
deviations from the ground truth. (b) Gaussian distribution of all measurements compared to<br />
the ground truth distribution.<br />
possible at this velocity for transparent tubes so far and are therefore not considered in<br />
Table 5.5.<br />
The standard deviation σtube of transparent tubes moved at 40m/min is three times<br />
larger than at 10m/min. This can be explained by the smaller number of per tube measurements.<br />
The ground truth distance increases also with the velocity. Especially GT D<br />
gets conspicuously larger, i.e. the measured lengths are larger than the ground truth<br />
length on average. This trend can also be observed at the absolute value of GT Dmax and<br />
GT Dmin. At a velocity of 40m/min the maximum ground truth distance is 0.75 which<br />
is more than the allowed tolerance. In this context one has to keep in mind that these<br />
values are only the extrema and do not describe the average distribution. This makes the<br />
ground truth distance measure very sensitive to outliers. However, a large GT D value<br />
does not have to mean poor accuracy automatically. On the other hand if the ground<br />
truth distance is low in the extrema as with the black tubes in this experiment, this is an<br />
additional indicator of high accuracy. The ground truth distance of the transparent tubes<br />
at 30m/min is shown in Figure 5.9(a). The deviations are significantly larger compared<br />
to Figure 5.8(a).<br />
Instead of being approximately equally distributed as for the black tubes, the error of<br />
transparent tubes seems to increase and decrease randomly, but always over a range of<br />
consecutive measurements. This observation can be explained by the varying background<br />
intensity at back light through the conveyor belt. The periodic intensity changes influence<br />
the transparent tubes obviously much stronger than the black tubes since the detection<br />
quality depends mostly on the image contrast. Figure 5.10 shows how the mean image<br />
intensity of a moving empty conveyor belt changes over time. If a tube is measured at<br />
a part on the conveyor belt that yields a poor contrast under back light, the GT D is<br />
likely to increase. Having in mind each tube passes the measuring area 6 times in this<br />
experiment, the probability is small that it is always measured at the same position on<br />
the conveyor. The tube measured with the maximum GT D hasbeenmarkedintheplot<br />
(b)
5.3. EXPERIMENTAL RESULTS 115<br />
gray level<br />
155<br />
150<br />
145<br />
140<br />
135<br />
130<br />
125<br />
120<br />
115<br />
110<br />
105<br />
0 50 100 150 200 250 300 350 400 450 500<br />
t<br />
Mean image brightness<br />
Figure 5.10: Mean image intensity of a moving empty conveyor belt over time. The deviation<br />
between the brightest and the darkest region on the conveyor exceeds 40 gray levels and is<br />
originated in non uniform translucency characteristics of the belt. Example images showing<br />
this non uniformity can be found in Figure 3.4.<br />
as well as all other measurements belonging to this particular tube. It turns out that<br />
the average ground truth distance of this tube is 0.3mm which is still larger than the<br />
RMSE of the whole sequence due to the outliers. However it is shown that this tube is<br />
not measured wrongly in general. Furthermore one can see that all neighboring tubes that<br />
lie in the same region on the conveyor are also measured inaccurately. It is assumed that<br />
with a more uniform conveyor belt such deviations could be avoided.<br />
The mean over all measurements is 50.04 at 30m/min compared to 49.96 in the ground<br />
truth. This is still very accurate. The precision of the vision-based measurements is<br />
0.15 compared to 0.09 of human measurements under ideal laboratory conditions. The<br />
corresponding Gaussian distributions are plotted in Figure 5.9(b).<br />
Finally, the RMSE increases with faster velocities, and the total error is larger compared<br />
to black tubes. The lowest error was measured at 20m/min (approximately the current<br />
production velocity) with 0.11. This error is still only slightly larger than the human<br />
variance.<br />
One can conclude that the results of the black tubes are very accurate both for slow and<br />
fast conveyor velocities. The RMSE falls even below the standard deviation of human<br />
measurements. The accuracy of transparent tubes decreases with faster velocities, but is<br />
still in a range that allows for measurements with the given tolerance specifications. Best<br />
results have been achieved at a velocity of 20m/min. As it turns out, all tubes meeting the<br />
tolerances in the real world (based on manual ground truth data) have been also measured<br />
reliably to be within the tolerances by the system, i.e. ΩFN =0. Thus,notubewould<br />
have been blown out wrongly at any velocity.
116 CHAPTER 5. RESULTS AND EVALUATION<br />
Diameter Ωtotal ΩPTM σtube GT Dmin GT Dmax GT D RMSE<br />
6mm (B) 1 4.8 0.05 -0.40 0.29 -0.13 0.18<br />
8mm (B) 1 4.6 0.05 -0.19 0.19 0.0 0.07<br />
12mm (B) 1 4.6 0.07 -0.44 0.31 -0.11 0.19<br />
6mm (T) 0.92 2.8 0.18 -1.15 0.87 0.01 0.20<br />
8mm (T) 1 3.9 0.15 -0.16 0.66 0.15 0.20<br />
12mm (T) 0.98 3.12 0.24 -0.69 0.67 0.07 0.20<br />
Table 5.6: Measuring results of 50mm length tubes with different diameter at a velocity of<br />
30m/min. The first two rows show to black tubes (B) and the last two rows transparent (T)<br />
ones.<br />
Figure 5.11: The thin 6mm tubes are likely to be bent. The distance between the defined<br />
measuring points in the image does not represent the length of the straight tube correctly.<br />
5.3.4. Tube Diameter<br />
Beside tubes of 8mm diameter as investigated in the velocity experiments, there are also 6<br />
and 12mm diameter tubes to be considered in the DERAY-SPLICEMELT series. Therefore,<br />
the test data in this scenario includes transparent and black tubes of 50mm length<br />
with these diameters. The velocity is constant at 30m/min. Again more than 100 tubes<br />
are measured for each combination of color and diameter. The summarized evaluation<br />
results can be found in Table 5.6.<br />
Black Tubes As for 8mm tubes, 100% of the black tubes both for 6 and 12mm diameter<br />
are measured by the system indicated by a score of Ωtotal = 1. The number of per tube<br />
measurements is also approximately equal with 4.8 for 6mm diameter tubes and 4.6 for<br />
12mm tubes. The per tube standard deviation σtube is slightly larger for 12mm with 0.07<br />
compared to 0.05 at 6 and 8mm tubes. One significant difference to 8mm tubes are the<br />
larger extrema in the ground truth distance GT Dmin and GT Dmax and the definite shift<br />
in the average ground truth distance GT D. Values of −0.13 for 6mm and −0.11 for 12mm<br />
indicate the vision-based lengths are mostly shorter than the manual measurements.<br />
This has basically two different origins: Tubes with a diameter of 6mm are bent much<br />
stronger than tubes of larger diameters as can be seen for example in Figure 5.11. In this<br />
case,bothmanualaswellasvision-basedmeasurementsaredifficult. Thelengthofatube<br />
intheimageisdefinedasthedistancebetweentheleftandrightendofthetubeatthe<br />
most outer points of the corresponding edges. If the tube is bent, however, the distance<br />
between the measuring points is obviously smaller than the real length. This can be seen<br />
in the ground truth distance as well as in the resulting RMSE which is significantly larger<br />
with 0.18 compared to 8mm tubes. Figure 5.12(a) visualizes the results of a sequence of<br />
21 black tubes at 30m/min. The bent tube in Figure 5.11 corresponds to the 10th tube in<br />
this plot (located between measurement number 45 and 50) and is measured significantly
5.3. EXPERIMENTAL RESULTS 117<br />
Length [mm]<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
48<br />
measurements<br />
upper tolerance<br />
lower tolerance<br />
resulting mean length<br />
ground truth<br />
boundaries<br />
0 10 20 30 40 50 60 70 80 90 100<br />
Measurement number<br />
(a) black, 6mm diameter<br />
Length [mm]<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
48<br />
measurements<br />
upper tolerance<br />
lower tolerance<br />
resulting mean length<br />
ground truth<br />
boundaries<br />
0 10 20 30 40 50 60 70 80 90 100<br />
Measurement number<br />
(b) black, 12mm diameter<br />
Figure 5.12: Length measurement results of black tubes with different diameter at 30m/min.<br />
The plots show only a section of the total number of measured tubes. Although the RMSE<br />
is larger both for 6 and 12mm tubes compared to the 8mm results, the measurements are still<br />
accurate enough to correctly detect all tubes within the allowed tolerances.<br />
shorter than the ground truth. The total results of the experiment with 6mm diameter<br />
black tubes are shown in terms of the ground truth distance in Figure 5.13(a).<br />
Only a few tubes are measured too long while most measurements are shorter than the<br />
ground truth depending on how much a tube is bent, i.e. how much it is differing from<br />
the assumed straight tube model. However, all tubes out of 100 are measured correctly to<br />
lie within the allowed tolerances leading to a false negative rate of ΩFN =0(ΩFP =0is<br />
implicit since there are no outliers in the test data).<br />
While bending is no problem for black tubes with a diameter of 12mm, these tubes have<br />
another drawback. The larger diameter makes the tubes more susceptible to deformations<br />
of the circular cross-section shape. This means, only a little pressure is needed to deform<br />
the cross-section of a tube to an ellipse. These deformations occur if the tubes are stored<br />
for example in a bag or box and many tubes lay on top of each other. The tubes used as<br />
test set have been delivered in such way. In addition, the effect is increased since most<br />
tubes are grabbed by hand several times, e.g. to measure the ground truth distance or if<br />
experiments have been repeated with the same tubes. Each manual handling is a potential<br />
source for a deformation. With respect to the vision-based measuring results the elliptical<br />
cross-section of a tube leads to a significant problem. In the model assumptions the<br />
measuring plane ΠM is defined at a certain distance above the conveyor belt. This distance<br />
is assumed to be exactly the outer diameter of an ideal circular tube (see Figure 5.14(a)).<br />
The magnification factor that relates a pixel length into a real world length is valid only<br />
in the measuring plane. With a weak-perspective camera model it is assumed that this<br />
factor is also valid within a certain range of depth around this plane.<br />
For a deformed tube the measuring points in the image pL and pR do not originate in<br />
points that lie in the measuring plane. If the cross-section is elliptical it is most likely<br />
that the tube will automatically roll to the largest contact area. In this case the points<br />
closest to the camera will be further away than the measuring plane. Under perspective<br />
the resulting length in the image will be shorter. This is exactly what is observed in
118 CHAPTER 5. RESULTS AND EVALUATION<br />
GTD [mm]<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
-0.1<br />
-0.2<br />
-0.3<br />
-0.4<br />
-0.5<br />
-0.6<br />
ground truth distance<br />
-0.7<br />
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105<br />
Tube number<br />
(a) black, 6mm diameter<br />
GTD [mm]<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
-0.1<br />
-0.2<br />
-0.3<br />
-0.4<br />
-0.5<br />
-0.6<br />
ground truth distance<br />
-0.7<br />
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105<br />
Tube number<br />
(b) black, 12mm diameter<br />
Figure 5.13: Ground truth distance in mm of all measured black tubes with a diameter of<br />
6 and 12mm at 30m/min.<br />
the experiments. Although it is less likely, it is also possible that a tube lies on the side<br />
with the smaller contact area. This happens if the tube is leaned against a guide bar for<br />
example. The result are measuring points above the measuring plane leading to a larger<br />
length in the image.<br />
Figure 5.12(b) shows a section of 21 black tubes with a diameter of 12mm measured at<br />
30m/min. The larger distance to the ground truth data is clearly visible. However, the<br />
system is again able to reliably detect all tubes correctly within the tolerances without<br />
any false negatives (ΩFN = 0). As an example of how the deformation of a tube influences<br />
the measuring results, images of the 7th and the 11th tube 2 of this sequence are shown<br />
in Figure 5.14(b) and (c) respectively. The extension in vertical direction of tube No. 7<br />
is definitely smaller than for the neighboring tubes. This is an indicator that the tube is<br />
deformed and lies on the smaller side, thus, it is measured larger than it actually is. On<br />
the other hand, tube No. 11 is larger in the vertical extension indicating it is lying on<br />
the larger contact area. The result is a much shorter length measured by the vision-based<br />
system which can be also seen in Figure 5.13(b). Like for 6mm tubes, the measurements are<br />
mostly shorter compared to the ground truth, although the origin is different as introduced<br />
above.<br />
These results show the accuracy limits of the weak-perspective model. If higher accuracy<br />
is needed, a telecentric lens could be used to overcome the perspective effects of different<br />
depths, or the height of a tube in the image could be exploited to adapt the calibration<br />
factor fpix2mm dynamically.<br />
Transparent Tubes The experiments with different diameters have been repeated with<br />
transparent tubes. Only 92% of all transparent tubes with a diameter of 6mm are detected<br />
and measured by the system in this experiment. This is mainly due to the nonuniform<br />
translucency of the conveyor belt. Especially the thin 6mm tubes are very sensitive to<br />
2 Note: The tube number does not correspond to the (single) measurement number. The dashed lines<br />
indicate which measurements belong to the same tube.
5.3. EXPERIMENTAL RESULTS 119<br />
(a)<br />
(b) (c)<br />
Figure 5.14: (a) Idealized cross-section of deformed tubes (frontal view). The measuring<br />
plane ΠM is defined based on an ideal circular tube (center). Deviations denoted as ∆1 (left)<br />
and ∆2 (right) influence the length measurement in the image projection. (b) Example of a<br />
deformed tube (No. 7 in Figure 5.12(b)) lying on the smaller side. The measuring points are<br />
closer to the camera and due to perspective, the tube appears measurable larger in the image.<br />
(c) The opposite effect occurs if a deformed tube (No. 11 in Figure 5.12(b)) lies on the larger<br />
contact area.<br />
changes in brightness, since they are more translucent than 8mm and 12mm tubes. At<br />
regions on the conveyor belt that transmit more light, the thin tubes almost disappear.<br />
Thus, one has to reduce the intensity of the light source. This is a tradeoff, because<br />
other regions that transmit less light get even darker while the structure of the belt is<br />
emphasized. If the contrast is too low, the tube can not be located in the profile. This<br />
problem could be prevented if one would use a more homogenously translucent conveyor<br />
belt.<br />
The 12mm diameter tubes yield generally a better contrast which can be seen on the<br />
detection rate of 98%. The number of per tube measurements ΩPTM is 3.12 compared to<br />
2.8 for 6mm tubes. However, the average standard deviation is larger for the 12mm tubes<br />
with 0.24. A RMSE of 0.2 for both 6 and 12mm transparent tubes indicates the measuring<br />
results are almost equally accurate than black tubes of the same diameter, although the<br />
extrema are significantly larger. As already mentioned, these values can be influenced by a<br />
few outliers. The values of GT D show a much more uniform distribution of the deviations<br />
compared to black tubes. This is due to the fact that transparent tubes are more sensitive<br />
to strong background edges which can be wrongly detected as tube edge. Figure 5.17<br />
gives an example of how the system can fail leading to a larger measured length. The<br />
poor contrast at the tube boundary can not be compensated by the stronger responses at<br />
thetubeedgeends. Themaximumcorrelationscoreisreachedatthebackgroundedge.<br />
This problem does not occur at black tubes due to the stronger contrast.<br />
Thus, in addition to the problems described for black tubes of 6 and 12mm diameter,<br />
transparent tubes may be measured longer than they really are. Figure 5.15 visualizes the<br />
experimental results with different diameters of transparent tubes. Again, this is only a<br />
section of the total number of measurements which are summarized more comprehensive<br />
in Figure 5.16 based on the ground truth distance. Compared to the experiments with<br />
black tubes there have been false negatives among the transparent tubes, i.e. tubes have
120 CHAPTER 5. RESULTS AND EVALUATION<br />
Length [mm]<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
48<br />
measurements<br />
upper tolerance<br />
lower tolerance<br />
resulting mean length<br />
ground truth<br />
boundaries<br />
0 10 20<br />
Measurement number<br />
30<br />
(a) transparent, 6mm diameter<br />
Length [mm]<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
48<br />
measurements<br />
upper tolerance<br />
lower tolerance<br />
resulting mean length<br />
ground truth<br />
boundaries<br />
0 10 20 30 40 50<br />
Measurement number<br />
(b) transparent, 12mm diameter<br />
Figure 5.15: Experimental results of transparent tubes (50mm length) with a diameter of 6<br />
and 12mm at 30m/min. The plots show a section of the total number of tubes only.<br />
GTD [mm]<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
-0.1<br />
-0.2<br />
-0.3<br />
-0.4<br />
-0.5<br />
-0.6<br />
ground truth distance<br />
-0.7<br />
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90<br />
Tube number<br />
(a) transparent, 6mm diameter<br />
GTD [mm]<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
-0.1<br />
-0.2<br />
-0.3<br />
-0.4<br />
-0.5<br />
-0.6<br />
ground truth distance<br />
-0.7<br />
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95<br />
Tube number<br />
(b) transparent, 12mm diameter<br />
Figure 5.16: Ground truth distance in mm of all measured transparent tubes with a diameter<br />
of 6 and 12mm at 30m/min.
5.3. EXPERIMENTAL RESULTS 121<br />
(a) (b) (c)<br />
Figure 5.17: The tube edge detection can fail if the contrast between tube and background<br />
is poor. (a) Zoomed region of an input image. (b) Edge response of this image within the local<br />
ROI around the assumed edge location. Only the ends of the tube edge yield a significant<br />
response which is of little account compared to the edge response of the background. (c) The<br />
maximum correlation score between a template and the image within the local ROI (blue<br />
bounding box) is reached at the background edge indicated by the red dots. The resulting<br />
measured length is obviously wrong.<br />
been wrongly classified as too long or too short. For 6mm tubes the false negative rate<br />
is ΩFN =0.02 and for 12mm tubes ΩFN =0.01. This means 1 − 2 tubes out of hundred<br />
would have been sorted out wrongly by the system.<br />
5.3.5. Repeatability<br />
A transparent tube of 50.0mm and a black tube of 49.7mm (manual ground truth length)<br />
have been measured 100 times (based on several single measurements in each case) by<br />
the system at a constant velocity of 30m/min. The tubes have a diameter of 8mm. The<br />
measuring results of the black tube are shown in Figure 5.18(a) and the results of the transparent<br />
tube in Figure 5.18(c). The corresponding Gaussian distribution functions based<br />
on the mean and standard deviation over all measurements can be found in Figure 5.18(b)<br />
and (d) respectively. The narrower the distribution the better is the repeatability of the<br />
measurements.<br />
The mean of the 100 measurements of the black tube is 49.66 which is pretty close to<br />
the ground truth length. The standard deviation of the black tube is 0.0614mm. Thus,<br />
the deviation between measurements of the same tube is less than 1/10th of the tolerance<br />
and significantly smaller than the deviation between human measurements.<br />
The measuring results of the transparent tubes show a mean of 49.99 and a standard<br />
deviation of 0.051. With the results of the previous experiments one could have expected<br />
the deviation of a transparent tube would be larger than for a black tube. In this experiment<br />
the transparent tube has been detected 100times in a row correctly (as the black<br />
tube). The only difference between the two tubes is the shape of the cross-section. Both<br />
tubes are not ideally circular, but the material of the black tubes is slightly softer, i.e.<br />
more susceptible for deformations than transparent tubes. In this experiment each tube<br />
is manually put onto the conveyor belt 100 times. Thus, even if the operator tries to grab<br />
the tubes as carefully as possible, deformations can not be prevented for both tube types<br />
leading to the observed deviations in the measurements. Obviously the total deviation
122 CHAPTER 5. RESULTS AND EVALUATION<br />
Length [mm]<br />
Length [mm]<br />
50<br />
49.8<br />
49.6<br />
49.4<br />
49.2<br />
50.4<br />
50.2<br />
50<br />
49.8<br />
49.6<br />
Measurements<br />
Mean<br />
0 20 40 60 80 100<br />
N<br />
(a) black (49.7mm)<br />
0 20 40 60 80 100<br />
N<br />
(c) transparent (50.0mm)<br />
Measurements<br />
Mean<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
Measurement distribution<br />
49 49.2 49.4 49.6 49.8 50 50.2<br />
Length [mm]<br />
(b) black (49.7mm)<br />
Measurement distribution<br />
0<br />
49.4 49.6 49.8 50<br />
Length [mm]<br />
50.2 50.4 50.6<br />
(d) transparent (50.0mm)<br />
Figure 5.18: Repeatability of the measurement of one tube. (a) 100 measurements of one<br />
black tube with the ground truth length of 49.7mm. (b) Corresponding Gaussian distribution<br />
of all measurements in (a) with µ =49.66 and σ =0.0614. (c) 100 measurements of<br />
one transparent tube with the ground truth length of 50.0mm. (d) Corresponding Gaussian<br />
distribution of all measurements in (c) with µ =49.99 and σ =0.051. The belt velocity is<br />
30m/min in both experiments.
5.3. EXPERIMENTAL RESULTS 123<br />
Length [mm]<br />
50.4<br />
50.2<br />
50<br />
49.8<br />
49.6<br />
Measurements<br />
Mean<br />
0 20 40 60 80 100<br />
N<br />
(a)<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
Measurement distribution<br />
0<br />
49.4 49.6 49.8 50<br />
Length [mm]<br />
50.2 50.4 50.6<br />
Figure 5.19: Repeatability results of a metallic cylinder simulating a tube of 49.99mm ground<br />
truth length. (a) 100 measurements of the gage at 30m/min. (b) Gaussian distribution of the<br />
results with µ =49.94 and σ =0.033.<br />
is also influenced by other parameters such as the tube orientation within the guide bars<br />
and the limits of the discrete input image (although subpixel techniques are applied).<br />
This experiment shows how accurate the vision-based system is able to measure even<br />
transparent tubes if the tube edge detection is successful.<br />
The experiment has been repeated with a metallic cylinder of 49.99mm length simulating<br />
an ideal tube (gage). The cross-section of this gage is circular and not deformable manually.<br />
The results of this experiment are shown in Figure 5.19(a) and (b). The mean over all<br />
100 measurements is 49.94 with a standard deviation of 0.0331. This deviation is close to<br />
the error that has been estimated in Section 4.2.6 with respect to the maximum possible<br />
tube orientation within the guide bars.<br />
One can conclude, as long as the orientation within the guide bars is neglected, the<br />
maximum precision of the system is about 0.03mm for tubes that are ideally round and<br />
not bent. This is much more than twice as precise as human measurements. It is assumed<br />
that this precision could be even increased, if the tubes are not only approximately but<br />
ideally horizontally oriented.<br />
5.3.6. Outlier<br />
The system is evaluated with respect to outliers in two steps. First more than 150 tubes<br />
(about 50mm, ∅8mm) are measured by the system at 30m/min. Approximately 1/3 of<br />
thetubesmeetthetoleranceswhiletheother2/3 have a manipulated length. The ground<br />
truth length of the tubes is known as well as the measuring order, i.e. each measurement<br />
can be assigned to a corresponding ground truth length. With the results of the previous<br />
experiments one can assume that the results of the black tubes will be better or equal to<br />
the transparent tube results.<br />
The results of this experiment are visualized in Figure 5.20. All of the 150 tubes are<br />
classified correctly. There is not a single false positive or false negative in the data.<br />
In the second stage of this experiment 30 manipulated and 22 good tubes are randomly<br />
mixed. All tubes are measured online at 30m/min while the blow out mechanism is<br />
(b)
124 CHAPTER 5. RESULTS AND EVALUATION<br />
52<br />
51.5<br />
51<br />
50.5<br />
50<br />
49.5<br />
49<br />
48.5<br />
48<br />
upper tolerance<br />
lower tolerance<br />
resulting mean length<br />
ground truth<br />
0 20 40 60 80 100 120 140<br />
Figure 5.20: 150 transparent tubes of both good and manipulated tubes have been measured<br />
by the system at 30m/min and compared to ground truth data. The system is able to reliably<br />
separate the tubes that meet the tolerances around the target length of 50mm from the<br />
manipulated tubes without any false positive or false negative.<br />
activated. This means tubes that do not meet the tolerances should be sorted out. Once<br />
all tubes have passed the measuring area it is checked how many of the manipulated tubes<br />
have also passed the blow out mechanism (false positives) and how many good tubes<br />
have been sorted out (false negatives). To simplify this task the manipulated tubes have<br />
been marked before. This experiment is repeated 22 times leading to a total number of<br />
1144 inspected tubes. The results can be found in Table 5.7. The total detection rate<br />
is Ωtotal =0.99, i.e. 6 tubes out of 1144 could pass the measuring area without being<br />
measured at. Three tubes have been sorted out wrongly representing a false negative rate<br />
of ΩFN =0.0026, i.e. 2.6 .<br />
The false positives are more critical. 5 outliers have not been blown out correctly, thus,<br />
ΩFP =0.0043. However, it turns out that 4 of the 5 false positives occur at sequences<br />
with at least one non detected tube. Hence with the ratio of good and manipulated<br />
tubes of about 2:3, the probability is larger that the not inspected tube is a manipulated<br />
one. In this case the false positives are most likely not due to failures in measuring, but<br />
originated in the fact that these tubes have not been measured at all. At production, all<br />
non inspected tubes should be sorted out and revised to be sure that no outlier can pass.<br />
5.3.7. Tube Length<br />
Measuring tubes of a different length requires the adaptation of the visual field of the<br />
camera. For tubes < 50mm this means placing the camera closer to the conveyor. However,<br />
due to the minimum object distance (250mm) of the 16mm lens used in the experiments<br />
before and with the consideration made in Section 3.2.1, a lens with a longer focal length<br />
is needed to yield the desired field of view. In this case a 25mm focal length lens is used.
5.3. EXPERIMENTAL RESULTS 125<br />
Total Detected Missed FN FP<br />
52 52 0 1 0<br />
52 52 0 0 0<br />
52 52 0 0 0<br />
52 52 0 0 0<br />
52 52 0 0 0<br />
52 52 0 1 0<br />
52 51 1 0 1<br />
52 52 0 0 0<br />
52 50 2 0 1<br />
52 52 0 0 0<br />
52 52 0 0 1<br />
52 52 0 0 0<br />
52 52 0 0 0<br />
52 52 0 0 0<br />
52 52 0 1 0<br />
52 52 0 0 0<br />
52 51 1 0 1<br />
52 52 0 0 0<br />
52 51 1 0 1<br />
52 52 0 0 0<br />
52 51 1 0 0<br />
52 52 0 0 0<br />
1144 1138 6 3 5<br />
Table 5.7: Results of repeated blow out experiments. 22 × 52 transparent tubes have been<br />
measured at 30m/min. The test data included 22 tubes within the allowed tolerances and 30<br />
outliers. Detected outliers should have been sorted out by the blow out mechanism. 3 tubes<br />
have been sorted out wrongly (false negatives) and 5 outliers have passed (false negatives).<br />
Conspicuously, 4 of the 5 false positives occur if at least one tube has not been detected at all<br />
by the system.
126 CHAPTER 5. RESULTS AND EVALUATION<br />
Larger tubes can be covered by the 16mm focal length lens like 50mm tubes, but the<br />
camera has to be placed further away from the conveyor to yield a larger field of view. The<br />
resulting pixel representation, i.e. the length a pixel represents in the measuring plane,<br />
increases as mentioned before. Hence, the precision decreases.<br />
In each experiment a charge of 50 tubes (transparent and black) of 30mm and 70mm<br />
length and 8mm diameter is used as test data. Each charge has been measured by hand and<br />
is evaluated with respect to mean and standard deviation. Each tube passes the measuring<br />
area once in this experiment and is measured as often as possible (single measurements)<br />
while it is in the visual field of the camera. The mean over the computed total lengths<br />
as well as the standard deviation are determined and compared to the ground truth data.<br />
The results are summarized in Table 5.8 and visualized in Figure 5.21 in terms of Gaussian<br />
distributions.<br />
The number of per tube measurements ΩPTM of 30mm tubes is slightly smaller compared<br />
to experiments with 50mm tubes at the same velocity. This is due to the smaller<br />
field of view of the camera. Obviously the tubes leave the measuring area faster. However,<br />
there are still more than 3 single measurements of each tube both for black and transparent<br />
tubes on average. The larger 70mm tubes have been measured even more often<br />
than 50mm tubes with 6.12 single measurements for black and 4.85 for transparent tubes<br />
respectively. This can be explained by a larger field of view.<br />
The mean value over a sequence of tubes µseq equals the expectation µGT in almost<br />
all experiments. Only the 30mm transparent tubes differ from the ground truth of about<br />
0.01mm which is acceptable small. This indicates the calibration factor between pixels<br />
and mm has been trained perfectly in all experiments.<br />
The standard deviation is much smaller for 30mm tubes both in the manual and automated<br />
measurements compared to 70mm tubes. In general black tubes are measured with<br />
higher precision than transparent tubes by the system according to the observations in<br />
previous experiments. The higher precision for 30mm tubes is important with respect to<br />
the specified tolerances (see Table 1.2). In all experiments beside the 70mm black tubes<br />
the manual precision is only slightly better than the precision of the visual inspection<br />
system. However, the results of the system have been always precise enough to allow for<br />
reliable measurements in terms of the allowed tolerances. At 70mm black tubes the system<br />
performed even better than humans with a standard deviation of 0.14 compared to 0.16<br />
measured by hand.<br />
It is important to state that the precision in these experiments depends both on the<br />
measuring variance of the system and the real variance of the tubes. Accordingly one<br />
can not compare the results directly with those in Section 5.3.5 where only one tube was<br />
measured several times in one experiment.<br />
One can conclude the visual inspection system is able to measure also tubes of different<br />
lengths as accurate as humans on average.<br />
5.3.8. Performance<br />
Finally, the performance of the system is evaluated on an Athlon64 FX-55 (2.6GHz, 2GB<br />
RAM) platform.<br />
The total processing time can be divided into five main groups including profile analysis,<br />
compensation for radial distortion, edge detection and template matching, as well as length
5.3. EXPERIMENTAL RESULTS 127<br />
Color Ltarget ΩPTM µseq µGT σseq σGT<br />
(a) Black 30 3.43 30.06 30.06 0.09 0.08<br />
(b) Transparent 30 3.18 30.07 30.06 0.12 0.08<br />
(c) Black 70 6.12 69.76 69.76 0.14 0.16<br />
(d) Transparent 70 4.85 70.21 70.21 0.27 0.20<br />
Table 5.8: Results of 30mm and 70mm tubes at 30m/min. Ltarget represents the target<br />
length and ΩPTM the average number of per tube measurements. The mean and standard<br />
deviation of the length measuring distributions are denoted as µseq and σseq for the automated,<br />
and µGT and σGT for the human measurements respectively. The results are also visualized<br />
in Figure 5.21.<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
Measurement distribution<br />
Ground truth distribution<br />
29 29.5 30 30.5 31<br />
Length [mm]<br />
(a) 30mm black<br />
69 69.5 70 70.5 71<br />
Length [mm]<br />
(c) 70mm black<br />
Measurement distribution<br />
Ground truth distribution<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
0<br />
1<br />
0.9<br />
0.8<br />
0.7<br />
0.6<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
Measurement distribution<br />
Ground truth distribution<br />
29 29.5 30 30.5 31<br />
Length [mm]<br />
(b) 30mm transparent<br />
Measurement distribution<br />
Ground truth distribution<br />
0<br />
68.5 69 69.5 70<br />
Length [mm]<br />
70.5 71 71.5<br />
(d) 70mm transparent<br />
Figure 5.21: Length distribution of 30mm and 70mm tubes at 30m/min for automated (solid<br />
line) and manual measurements (dashed line). All experiments show a very good accuracy, i.e.<br />
the vision system measures the same length on average. Black tubes are generally measured<br />
slightly more precise than transparent tubes. The vision system is even more precise at 70mm<br />
black tubes than human measurements.
128 CHAPTER 5. RESULTS AND EVALUATION<br />
computation and tracking. The last group contains all remaining operations that are not<br />
considered by any of the groups before.<br />
Manythousandsofframeshavebeentimedwithandwithouttubesinthevisualfield<br />
of the camera. The results of the performance evaluation can be found in Figure 5.22. It<br />
turns out that the processing of a measurable frame requires 17.8ms on average. Thus,<br />
all images at a capture rate of 50fps (i.e. a new image is acquired every 20ms) can be<br />
processed.<br />
The dominant part of the processing is consumed by edge detection and template matching<br />
where the later is mostly expensive. 82% of the total processing time is needed for<br />
this step on average, although the number of pixels considered is highly restricted by the<br />
local ROIs. The undistortion operation is the second most expensive operation with 10%<br />
followed by the length computation and tracking with 4%. The profile analysis, thougth as<br />
fast heuristic to locate a tube roughtly, is proven to be very fast with only 0.29ms/frame.<br />
The remaining 3% represent operations such as image conversions, copying or drawing<br />
functions to visualize the detection results. The later could be saved at production if<br />
visualization is not required.<br />
If the profile analysis detects a non measurable frame, the template matching is not<br />
performed. Thus, the remaining time could be used for different side operations in future,<br />
e.g. to save logging information or to run certain self control mechanisms. Such mechanisms<br />
could check whether the illumination is still bright enough or if the camera position<br />
has changed for example.
5.3. EXPERIMENTAL RESULTS 129<br />
Task Ωtime [ms/frame]<br />
Profile Analysis 0.29<br />
Undistortion 1.79<br />
Edge Detection/<br />
14.57<br />
Template Matching<br />
Length computation/<br />
Tracking<br />
0.69<br />
Other 0.48<br />
Total 17.82<br />
Undistortion: 10%<br />
Profile Analysis: 2%<br />
(a)<br />
Other: 3%<br />
Length computation/<br />
Tracking: 4%<br />
Edge detection/ Template matching: 82%<br />
(b)<br />
Figure 5.22: (a) Average processing time per frame divided into different steps of the visual<br />
inspection. (b) Corresponding pie chart. As one can see, the edge detection and template<br />
matching is the dominant operation throughout inspection.
130 CHAPTER 5. RESULTS AND EVALUATION<br />
5.4. Discussion and Future Work<br />
The main difficulties with transparent tubes come along with the nonuniform brightness<br />
and the texture of the background. A conveyor belt which is equally translucent over<br />
the whole length could prevent many problems. The parameters controlling the detection<br />
sensitivity must cover both the brightest and the darkest region of the conveyor belt. This<br />
is always a compromise leading to poorer results on average. However, if the contrast<br />
between tubes and the background does not depend on where the tube is located on the<br />
conveyor belt, the parameters can be adjusted much more specific.<br />
The background texture of the conveyor belt used for the prototype has the drawback<br />
of regular vertical structures. If the tube edge contrast is poor, the edge response of<br />
the background may be stronger than the tube edge. Model knowledge can be used to<br />
improve the tube edge localization even under the presence of strong vertical background<br />
edges. However, there is still a certain error probability which can be drastically reduced<br />
if vertical background edges are suppressed. The best solution would be to use a conveyor<br />
belt with a canvas of horizontal structure. This would obviously simplify the detection<br />
task without requiring any computation time.<br />
If no conveyor belt can be found that provides the desired horizontal structure in combination<br />
with good translucency characteristics, one can think of suppressing the background<br />
pattern within the local ROI around a tube edge algorithmically by exploiting the regularity<br />
of the background pattern. One idea is to transform the spatial image into the<br />
frequency domain using the Fourier transform. For more information on the Fourier transform<br />
and the frequency domain it is referred to [64]. If it is possible to find characteristic<br />
frequencies belonging to the background pattern, one can remove these frequencies in the<br />
frequency domain and apply the inverse Fourier transform to the filtered spectrum. The<br />
result is a filtered spatial image with reduced background structure. The filter must be<br />
designed carefully to preserve the tube edges.<br />
In a first experiment, test images of both a conveyor with and without a tube have been<br />
acquired and transformed into the frequency domain. Figure 5.23(a) and (b) show an<br />
example of the spectrum of an image with background only and with transparent tubes<br />
in the image respectively. The spatial domain of (b) can be seen in (d). One eye-catching<br />
consistency in the spectra are the bright spots. If one removes these spots in the spectrum<br />
of an image indicated by the black regions in (c) and applies the inverse Fourier transform<br />
to this filtered spectrum, the result is an image with a significantly reduced background<br />
pattern. The actual tube edges, however, are quite well preserved. In this case the<br />
spectrum has been filtered by hand and only coarse. Much more work has to be spent<br />
in designing more sophisticated and reliable filters that perform well for a large number<br />
of images without removing or blurring any relevant edges. Removing a frequency from<br />
the spectrum does always influence the whole image. The filter in the example produces<br />
new structure at the tube regions, especially around the printings. In addition, the darker<br />
stripe in the background on the right of the input image is still present in the filtered<br />
version, since it does not belong to the regular pattern of the background. Although in<br />
this example the dark stripe is not critical it might be in other situations. This shows the<br />
limits of this approach. Any deviations from the regular background pattern are difficult<br />
to suppress in the frequency domain. If the conveyor belt is changed, the texture of the<br />
belt might by completely different. In this case the filter has to be adapted. An automated<br />
filter adaptation and background learning is non trivial.
5.4. DISCUSSION AND FUTURE WORK 131<br />
(a) Background only (b) Background + Tubes (c) Masked spectrum<br />
(d) Source Image (e) Filtered Image<br />
Figure 5.23: Background suppression in the frequency domain. (a) Fourier transform of an<br />
image of an empty conveyor. (b) Fourier transform of (d). (c) Certain frequencies have been<br />
removed by hand indicated by the black regions. (e) Inverse Fourier transform of the filtered<br />
spectrum. The characteristic vertical background pattern could be reduced quite well while<br />
thetubeedgesarepreserved.<br />
The experiments have shown that tubes of 8mm diameter are most robust against<br />
deformations. While thinner tubes of 6mm diameter tend to be bent, tubes of 12mm may<br />
be elliptical in the cross-section. In both cases the accuracy and precision decreases. The<br />
question is whether such deformations are only caused by the way the tubes have been<br />
stored, transported and handled throughout the experiments in the laboratory or if they<br />
also occur at production. The later can be assumed, at least in a certain amount. A<br />
telecentric lens could overcome the problem of perspective occurring with deformed 12mm<br />
tubes.<br />
A less cost expensive improvement would be to measure not only the length, but also<br />
the height of a tube in the image. A larger height indicates the tube is closer to the camera<br />
and vice versa. The calibration factor relating pixels to mm could be defined as a function<br />
ofthetubeheight.Obviously,thisrequiresamorecomplexteach-instep.<br />
Another potential source of deviations in the measurements is the tube orientation.<br />
The guide bars restrict the maximum tube rotation to a minimum. The remaining error<br />
has been approximated. Although it is very small, it could be even further reduced by<br />
tilting the whole conveyor slightly around its longitudinal axis. The angular orientation<br />
guarantees that all tubes will roll to the lower guide bar. If the guide bar is horizontal<br />
in the image, so will be the tubes. Accordingly the camera position has to be adapted<br />
to reestablish the fronto-orthogonal view. The proposed camera positioning method is<br />
independent of the orientation of the conveyor and the camera in 3D space.
132 CHAPTER 5. RESULTS AND EVALUATION<br />
The blow out mechanism was tested successfully in the prototype setup. The advantage<br />
of this mechanism is that it works almost independent of the conveyor velocity and the<br />
position of the light barrier relative to the measuring area. One has to assure only that<br />
no tube passes the light barrier before the good/bad decision of the measuring system<br />
reaches the blow out controller.<br />
One drawback of the current strategy is the sensitivity to ghosts. If the system detects<br />
a tube where actually no tube is, the resulting classification of the ghost is send to the<br />
controller anyhow and stored in the FIFO memory. Since a ghost is never detected by<br />
the light barrier, the good/bad decision of the ghost is still in the memory when the next<br />
tube passes the light barrier. Instead of considering the decision belonging to this tube<br />
(appended to the FIFO memory) the decision of the ghost is evaluated. This leads to a<br />
loss of synchronization, i.e. a tube T is related to the decision of tube T − 1. Over time<br />
this effect can increase and the reliability of the system is obviously violated.<br />
A potential solution of this problem can be achieved by replacing the FIFO memory by<br />
a single register that is able to store only the latest decision. Without loss of generality<br />
a0inthisregistermightcorrespondtoblowingoutthenexttubewhilea1indicatesthe<br />
next tube can pass. The register is set to 0 by default. Each time the inspection system<br />
measures a tube to be within the allowed tolerances a signal is send to the controller that<br />
sets the bit in the register to 1. As soon as the tube has passed the light barrier, the<br />
register is reset to 0. This has to be done before the next tube is measured. Therefore<br />
the light barrier has to be placed quite close to the measuring area. The advantage of this<br />
approach is that the memory contains always the current decision belonging to the tube<br />
that passes the light barrier next. A timer can be used to reset the register if no tube<br />
intersects the light barrier within the expected time. Thus, ghosts become uncritical.<br />
Furthermore, since the register is reset each time, this helps also to prevent the problemsofnondetectedtubes,i.e.<br />
tubesthathavepassedthevisualfieldofthecamera<br />
without being measured. In the outlier experiment (see Section 5.3.6) the false positive<br />
rate increased drastically if tubes could not be detected. In this case the system does not<br />
send a good/bad decision for the missed tube to the controller. The light barrier, however,<br />
detects every tube independent of being measured or not. With the single register strategy<br />
these tubes are blown out by default. Thus, only tubes that have been measured by the<br />
system and meet the allowed tolerances are able to pass the blow out nozzle.<br />
If tubes are not detected at all or measurements do not result in a meaningful length<br />
value (e.g. the standard deviation of the single measurements is too large), the corresponding<br />
tubes define another group U including all unsure measurements that can not<br />
definitely be assigned to G ′ 0 , G′ −,orG ′ +. All tubes of this class should be blown out by<br />
default to ensure no outlier can pass the quality control. These tubes do not have to<br />
be considered as rejections, but could be measured by hand afterward or recirculated to<br />
be inspected again by the vision-based measuring system depending on the frequency of<br />
occurrence.<br />
The experiments have shown that more than 80% of the total processing time is needed<br />
for the template based edge localization. In the current implementation the left and<br />
right ROI are processed sequential. One possible optimization could be to parallelize this<br />
problem. This means, the computation within the left and right ROI could be performed<br />
in separate threads to exploit the power of curret dual core architectures. This is possible,<br />
since the processing in the two ROIs is independent of each other.
6. Conclusion<br />
In this thesis a functioning prototype for a vision-based heat shrink tube measuring system<br />
has been presented allowing for an 100% online inspection in real-time. Extensive experiments<br />
have shown the accuracy and precision of the developed system which is reaching<br />
the quality of accurate human measurements under ideal laboratory conditions. The advantage<br />
of the developed system is that this accuracy can be achieved even at conveyor<br />
velocities of up to 40m/min.<br />
A multi-measurement approach has been investigated in which each decision whether<br />
a tube has to be sorted out is based on 2-11 single measurements depending on the tube<br />
type and conveyor velocity. This requires video frame rates of ≥ 50fps to be processed<br />
in real-time. Fast algorithms, heuristics and model knowledge are used to improve the<br />
performance in this constrained application. Tube edge specific templates have been defined<br />
that are able to locate a tube edge with subpixel accuracy even in low contrast<br />
images under the presence of background clutter. In the prototype setup, the tube edge<br />
detection has been complicated by the strong vertical structure of the conveyor belt and<br />
an inhomogeneous translucency leading to non uniform bright background regions. The<br />
consequences for transparent tubes have been discussed including the possibility of tubes<br />
that can pass the visual field of the camera without being detected.<br />
Since black tubes are not translucent, they yield an optimal contrast to the background<br />
with a back lighting setup. On the other hand, transparent tubes are much more sensitive<br />
to the structure of the background and the local tube edge contrast. All parameters<br />
adjusted for transparent tubes turned out to have no disadvantage for black ones. Thus,<br />
the parameters for transparent tubes are used in general, leading to a more uniform<br />
solution in the system design.<br />
Beside the algorithmic part of the work the engineering of the whole system including<br />
the proper selection of a camera, optical system, and illumination has been solved. The<br />
integration of the micro controller and the air blow nozzle completes the prototype, allowing<br />
for concrete demonstrations of how tubes that do not meet the tolerances are blown<br />
out.<br />
A simple and intuitive initialization of the system has been developed. Most parameters<br />
can be trained interactively and automated without complicated user interactions. Even<br />
an unskilled worker should be able to perform the teach-in step after a few instructions.<br />
The only critical part of the teach-in is the camera positioning. To exclude as many sources<br />
of error the camera should be mounted as stable as possible at fix orientation (which has<br />
to be calibrated only once). The required height adjustments to cover the range of tube<br />
lengths should be automated if possible.<br />
The maximum measuring precision of 0.03mm was reached for a metallic tube model<br />
simulating an ideal tube (at a conveyor velocity of 30m/min). During the experiments<br />
it has been observed that deformations of real heat shrink tubes (elliptical cross-section<br />
133
134 CHAPTER 6. CONCLUSION<br />
or bending) have a certain influence on the measuring precision. However, the average<br />
precision is still < 0.1mm for real tubes. In general, tubes of 8mm diameter have been<br />
measured more precisely than 6mm or 12mm tubes.<br />
The average accuracy (root mean square error) of the automated measurements, i.e.<br />
the distance to some ground truth reference, is about 0.1mm for black tubes and about<br />
0.2mm for transparent tubes at velocities of 30m/min. The ground truth has been acquired<br />
manually under ideal laboratory conditions and has also a certain inter and intra human<br />
deviation of about 0.1mm. While the velocity has only a minor influence on the accuracy of<br />
black tubes, the accuracy of transparent tubes decreases significantly with higher velocities.<br />
The main reason for this observation is the decreasing number of per tube measurements,<br />
sinceaveragingoverthesinglemeasurementsgetsmoresensitivetooutliers.Inaddition,<br />
the probability increases that a transparent tube is not detected at all if the background<br />
contrast is poor. However, in general, the accuracy and precision has been good enough in<br />
all experiments to reliably detected both black and transparent tubes of different length<br />
and diameter with respect to the specified tolerances. Experiments with transparent tubes<br />
of manipulated lengths have shown the system is able to separate the good ones from the<br />
tubes that do not meet the tolerances successfully. The false negative rate, i.e. the number<br />
of tubes that have been sorted out wrongly, is 2.6 .Lessthan4.3 of failures could<br />
pass the measuring area. However, 80% of the false positives have not been detected at all<br />
by the system. With the adaptation of the blow out strategy as suggested in Section 5.4<br />
these tubes would have been blown out, too. Hence, the theoretically remaining false<br />
positive rate is 0.87 for transparent tubes. Following the experimental results one can<br />
assume that the false positive rate for black tubes will be less or equal.<br />
The measuring results have a positive side effect, since it is possible to compute the<br />
moving average over the last N measurements. An operator can compare the current<br />
mean length to the given target length. This can be useful especially during the teachin<br />
of the machine. At production, deviations can be corrected before the tolerances are<br />
exceeded. In a more sophisticated solution the adjustment could be automated. If one can<br />
assure the current mean length measured by the vision system equals the target length,<br />
the blow out mechanism may never need to be activated and the probability for false<br />
positives can be further decreased.<br />
In addition, the system is able to store the inspection results in a file or database. Such<br />
statistics can be also useful for the management or controlling since they include not only<br />
the length distribution of the production, but also information about the total number of<br />
tubesproduced,thetimeofproduction,aswellasthenumberofdefectives.<br />
The good results of the prototype support the use of an optical inspection system for<br />
length measurements of heat shrink tubes. Manual sample inspections as used currently at<br />
production are influenced by many factors like concentration, speed, motivation, or tiredness<br />
of the individual operator. In general, less precision can be assumed for measurements<br />
at production compared to ideal laboratory measurements as used for evaluating the system.<br />
The advantage of the automated vision-based system is the ability to inspect each<br />
tube at laboratory precision without getting tired.
Appendix<br />
135
A. Profile Analysis Implementation Details<br />
Details regarding the implementation of the profile analysis with a focus on performance<br />
aspects are introduced in the following.<br />
A.1. Global ROI<br />
A simple, but very effective way to decrease the computational load is to restrict the image<br />
processing to a certain region of interest (ROI). Following the assumption that parts of<br />
the guide bars are visible in the images at the top and the bottom without containing<br />
any information, the guide bars can be excluded from further processing and, thus, the<br />
ROI lies in between these guide bars. The height of the ROI is given by the guide bar<br />
distance which should be almost constant over the whole image since they are adjusted to<br />
be parallel to the x-axis in the image. The ROI extends in horizontal direction over the<br />
whole image width minus a certain offset at both sides. This offset is due to the fact that<br />
the image distortion is maximal at the boundaries. The actual value of the offset depends<br />
on the ability to overcome the distortion at measuring. If the measurements are accurate<br />
even at the image boundaries, the offset tends against zero. In the following, the ROI<br />
betweentheguidebarsisalsoreferredtoasglobal ROI.<br />
Section 3.2 states it is possible to adapt the camera resolution to a user-defined size.<br />
The reason why the image size is not adjusted to cover the global ROI exactly (by what it<br />
becomes redundant) is a very practical one. First of all, the guide bars provide a valuable<br />
clue in adjusting the field of view of the camera. In addition, smaller images mean less data<br />
has to be transferred and consequentially a larger number of images can be transferred in<br />
the same time. If the image size is too small, the actual frame rate exceeds the number of<br />
frames that can be processed without skipping frames which should be avoided.<br />
The extraction of the global ROI can be automated using a similar profile analysis<br />
approach as used for tube localization but in vertical direction. Again several vertical<br />
scan lines are used to build the profile. If there is no tube in the image (empty scene),<br />
the guide bars can be detected clearly since the contrast between the bright conveyor belt<br />
and the black guide bars is very strong. A smoothing step as used in horizontal direction<br />
to overcome the background clutter is not necessary. This has the benefit that the two<br />
strongest peaks in the profile describe the guide bar location quite accurate. The detection<br />
of the global ROI has to be performed only once at an initialization step if assuming a static<br />
setup of camera and conveyor that does not change over time. In future, it is thinkable<br />
that everytime the state ’empty’ is detected, the ROI is reinitialized and compared with<br />
the previous location. A difference indicates something changed with the setup and may<br />
induce an alert or some specific reaction.<br />
137
138 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILS<br />
A.2. Profile Subsampling<br />
In many computer vision tasks it is common to perform a specific operation on lower<br />
resolution images than the input to increase computation speed. For example one could<br />
simply discard every second row or column two obtain an image of half size of the original<br />
image. However, to avoid a violation of the sampling theorem it is important to apply<br />
a low-pass filter operation on the data before. This mechanism can be used to generate<br />
pyramids of images at different resolutions or scales. Each layer in the pyramid has half<br />
the size of the layer above with the top layer corresponding to the original size. Before<br />
subsampling the data a Gaussian smoothing operation is performed to suppress higher<br />
frequencies. Thus, such pyramids are called Gaussian Pyramids in the literature [24].<br />
The same can be applied to one-dimensional signals such as gray level profiles. In<br />
this application, experiments have shown the information about the tube boundaries is<br />
conserved a coarser scale. Thus, a subsampled version two levels down the pyramid instead<br />
of the original profile is used in praxis. The data to be processed after this step is only<br />
a fourth of the input. Obviously, the profile analysis can be accelerated by this step.<br />
Experiments investigating whether the profile subsampling could replace step one in the<br />
profile analysis, i.e. the smoothing with a large mean kernel, came to the conclusion that<br />
in connection with transparent tubes and dark printing, the strong contrast of the letters<br />
could be misclassified as tube boundary. The system tries to detect the real tube location<br />
in a certain region around the wrong position and is likely to fail. The mean filter instead<br />
is able to reduce the influence of the lettering and must not be replaced.<br />
A.3. Scan Lines<br />
As mentioned in Section 4.4.2, the profile to be evaluated is based on the normalized sum<br />
of Nscan scan lines equally distributed over the global ROI. The reason why a single scan<br />
line is not sufficient is shown in Figure A.1(b). Three sample profiles at different heights<br />
(61, 80 and 100) are selected to visualize the influence of the printing. One can see the<br />
strong contrast at the letters as well as a poor contrast at the right tube boundary. Since<br />
it is non-deterministic of whether the printing of a particular tube is visible in an image,<br />
one has to consider the worst case. This is a scan line passing through the printing at as<br />
many positions possible. The global mean of the resulting profile is much lower in this case<br />
and it is possible that the intensity of the tube at regions outside the printing is wrongly<br />
classified as background. The result of this effect is shown in Figure A.1(d). On the other<br />
hand, the usage of several scan lines decreases the influence of the printing significantly.<br />
The probability that more than a few scan lines will pass through the printing is low.<br />
For example, among the sample tubes used for testing of the prototype, the coverage of<br />
the printing is about 16% with respect to the diameter. Thus, it is very likely to have<br />
more than one scan line passing through tube regions without printing. In total, the<br />
influence of the printing decreases with the number of scan lines. However, Figure A.1(c)<br />
shows 11 scan lines equally distributed over the global ROI in y-direction are sufficient<br />
to yield almost equal results as with considering all rows of the ROI. Here, the profile<br />
consisting of 11 scan lines is shifted, i.e. the intensity values are lower compared to the<br />
profile calculated from all ROI rows (90 in this example). This is due to the location of
A.3. SCAN LINES 139<br />
gray value<br />
400<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
1 scanline (y=61)<br />
1 scanline (y=80)<br />
1 scanline (y=100)<br />
0<br />
0 100 200 300 400<br />
x<br />
500 600 700<br />
(b)<br />
(a)<br />
gray value<br />
400<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
normalized sum of 11 scanlines<br />
normalized sum of all rows<br />
0<br />
0 100 200 300 400<br />
x<br />
500 600 700<br />
(d) (e)<br />
Figure A.1: Comparison of a single and multi scan line approach. (a) Input gray scale image.<br />
(b) Profiles of three selected scan lines at height 61, 80 and 100 respectively. The first two<br />
scan lines pass through the printing leading to strong variations in the profile. Compared to<br />
these variations the poor contrast of the right tube border makes a correct detection difficult.<br />
(c) The normalized sum of several scan lines reduces the effect of the printing bringing out the<br />
location of the tube much more clearly. It can be seen that 11 scan lines equally distributed<br />
over the global ROI are sufficient to yield almost equivalent results as if considering every row.<br />
(Note: The profile of the 11 scan lines is shifted since the global ROI included parts of the<br />
guide bars at the upper and bottom row. Since these pixels have a value near zero, they do not<br />
contribute much to the profile sum but are considered in normalization. The scale, however,<br />
does not affect the actual tube location.) (d) Wrong detection of the tube boundaries if using<br />
a single scan line. (e) Result of the multi scan line approach.<br />
(c)
140 APPENDIX A. PROFILE ANALYSIS IMPLEMENTATION DETAILS<br />
the global ROI. As can be seen in Figure A.1(e) the global ROI is a bit too large, thus, the<br />
upper and bottom row hits the border of the guide bars. Scan lines through these rows do<br />
not contribute much to the overall profile, but have an effect in normalization. This shift,<br />
however, does not affect the actual tube location. With respect to performance, rows that<br />
have no influence should be ignored.<br />
Obviously the problem with the printing on a tube’s surface comes only with transparent<br />
tubes since the printing is not visible on the black tubes at back light. If black tubes are<br />
inspected, a single scan line in the image center is sufficient to localize the tube correctly,<br />
but more scan lines do not impair the results. To have a more universal solution, the multi<br />
scan line approach is used for all tube types and it is not distinguished at this part of the<br />
system to keep it simple.<br />
A.4. Notes on Convolution<br />
At several steps in the profile analysis a convolution operation is performed. With respect<br />
to the derivation of the profile by convolving with a first derivative Gaussian kernel in step<br />
two, it is important to note what boundary condition is used, since in discrete convolution<br />
there are positions at the image boundaries that are undefined. There are many different<br />
strategies to adopt this problem including padding the image with constant values (e.g.<br />
zero), reflecting the image boundaries periodically or simply ignoring the boundaries [24].<br />
Here, a symmetric reflection strategy is used:<br />
P (−i) = P (i − 1) (A.1)<br />
P (NP + i) = P (NP +1− i); (A.2)<br />
where the first equation is used for the left and the second equation for the right boundary<br />
respectively. NP indicates the length of P and P (x) the intensity value in the profile at<br />
position x. The advantage of this strategy compared to a padding with zeros for example<br />
is that no artificial edges are introduced.
B. Hardware Components<br />
B.1. Camera<br />
Specification MF-033C MF-046B<br />
Image Device 1/2” (diag. 8 mm) type progressive scan 1/2” (diag. 8 mm) type progressive scan<br />
SONY IT CCD<br />
SONY IT CCD<br />
Effective Picture Elements 656 (H) × 492 (V) 780 (H) × 580 (V)<br />
Lens Mount C-mount: 17.526 mm (in air); ∅ 25.4 mm C-mount: 17.526 mm (in air); ∅ 25.4 mm<br />
(32 T.P.I.) Mechanical Flange Back to filter (32 T.P.I.) Mechanical Flange Back to filter<br />
distance: 8.2 mm<br />
distance: 8.2 mm<br />
640 × 480 pixels (Format 0)<br />
Picture Sizes<br />
640 × 480 pixels (Format 0; Mode 5)<br />
656 × 492 pixels (Format 7; Mode 0)<br />
780 × 580 pixels (Format 7; Mode 0)<br />
388 × 580 pixels (Format 7; Mode 1)<br />
780 × 288 pixels (Format 7; Mode 2)<br />
388 × 288 pixels (Format 7; Mode 3)<br />
Cell Size 9.9 µm × 9.9 µm 8.3 µm × 8.3 µm<br />
ADC 10 Bit 10 Bit<br />
Color Modes Raw 8, YUV 4:2:2, YUV 4:1:1 -<br />
Data Path 8Bit 8<br />
Frame Rates 3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 74 Hz 3.75 Hz; 7.5 Hz; 15 Hz; 30 Hz; up to 53 Hz<br />
in Format 7(RAW);68Hz(YUV4:1:1);up<br />
to 51 Hz in YUV 4:2:2<br />
in Format 7<br />
Gain Control Manual: 0-16 dB (0.035 dB/step); Auto gain Manual: 0-24 dB (0.035 dB/step); Auto gain<br />
(select. AOI)<br />
(select. AOI)<br />
White Balance Manual<br />
AOI)<br />
(U/V); One Push; Auto (select. -<br />
Shutter Speed 20 . . . 67.108.864 µs (∼ 67s); Auto shutter 20 . . . 67.108.864 µs (∼ 67s); Auto shutter<br />
(select. AOI)<br />
(select. AOI)<br />
External Trigger Shutter Trigger Mode 0, Trigger Mode 1, Advanced Trigger Mode 0, Trigger Mode 1, Advanced<br />
feature: Trigger Mode 15 (bulk); image feature: Trigger Mode 15 (bulk); image<br />
transfer by command; Trigger delay<br />
transfer by command; Trigger delay<br />
Internal FIFO-Memory Up to 17 frames Up to 13 frames<br />
#LookUpTables One, user programmable (10 Bit → 8 Bit); One, user programmable (10 Bit → 8 Bit);<br />
Gamma (0.45)<br />
Gamma (0.45)<br />
Smart Functions Real time shading correction, image sequenc- Real time shading correction, image sequencing,<br />
two configurable inputs, two configing, two configurable inputs, two configurable<br />
outputs, image mirror (L-R ↔ R-L), urable outputs, image mirror (L-R ↔ R-L),<br />
serial port (IIDC v. 1.31)<br />
binning, serial port (IIDC v. 1.31)<br />
Transfer Rate 100 Mb/s, 200 Mb/s, 400 Mb/s 100 Mb/s, 200 Mb/s, 400 Mb/s<br />
Digital Interface IEEE 1394 IIDC v. 1.3 IEEE 1394 IIDC v. 1.3<br />
Power Requirements DC 8 V - 36 V via IEEE 1394 cable or 12-pin DC 8 V - 36 V via IEEE 1394 cable or 12-pin<br />
HIROSE<br />
HIROSE<br />
Power Consumption Less than 3 Watts (@ 12 V d.c) Less than 3 Watts (@ 12 V d.c)<br />
Dimension 58 mm × 44 mm × 29 mm (L × W × H); 58 mm × 44 mm × 29 mm (L × W × H);<br />
without tripod and lens<br />
without tripod and lens<br />
Mass<br />
Operating Temparature<br />
< 120g(withoutlens)<br />
+5 – +45<br />
< 120g(withoutlens)<br />
◦ Celsius +5 – +45 ◦ Storage Temparature −10 – +60<br />
Celsius<br />
◦ Celsius −10 – +60 ◦ Celsius<br />
Regulations EN 55022, EN 61000, EN 55024, FCC Class EN 55022, EN 61000, EN 55024, FCC Class<br />
A, DIN ISO 9022<br />
A, DIN ISO 9022<br />
Options Host adapter card, locking IEEE 1394 ca- Removable IR-cut-filter, Host adapter card,<br />
ble, API (FirePackage), TWAIN (WIA)- and locking IEEE 1394 cable, API (FirePackage),<br />
WDM stream driver<br />
TWAIN (WIA)- and WDM stream driver<br />
Table B.1: Camera specifications for the AVT Marlin F-033C and F-046B.<br />
141
142 APPENDIX B. HARDWARE COMPONENTS<br />
B.2. Illumination Hardware<br />
Description Value<br />
Rated Power Output 200 Watts<br />
Output Voltage 0.0, 0.5 to 20.5 VDC<br />
Input Voltage Rating, 50/60 Hz 90 to 265 VAC<br />
Power Factor Correction @ 230 VAC, 50 Hz > 0.99, < 4 ◦<br />
Hold-up Time, Nominal AC Input, Full Load 8.3 ms<br />
Line Regulation, Over Entire Input Range ±0.5%<br />
Current Limit Set Point 8.5 Amps<br />
Temperature Range: Operating 0 ◦ to 45 ◦ C<br />
Storage −25 ◦ to 85 ◦ C<br />
Relative Humidity, Non-condensing 5% to 95%<br />
Table B.2: Light Source (A20800.2) with DDL Lamp<br />
Description Value<br />
Calibrated Area 3” × 5” (76 × 127mm)<br />
Panel Size 4” × 6” (102 × 152mm)<br />
Overall Thickness .05” (1.3mm)<br />
Table B.3: SCHOTT PANELite Backlight (A23000) (flexible fiber optical area light).
B.2. ILLUMINATION HARDWARE 143<br />
Description Value<br />
Bulb Type DDL<br />
Voltage 20<br />
Wattage 150<br />
Lamp Base GX5.3<br />
Bulb Finish Clear<br />
Burn Position Base/Down Horz.<br />
Shape MR-16<br />
Color Temp. 3150<br />
Filament CC-6<br />
Lamp Fill Halogen<br />
Lamp Life 500 Hrs.<br />
Over All Lengt [mm] 44.5<br />
Reflector Design Dichroic<br />
Reflector Size [mm] 50.7<br />
Working Distance [mm] 194.5<br />
Table B.4: Lamp specifications
144 APPENDIX B. HARDWARE COMPONENTS
Bibliography<br />
[1] Y.I. Abdel-Aziz and H.M. Karara. Direct linear transformation from comparator<br />
coordinates into object space coordinates in close-range photogrammetry. Proc. of<br />
the Symposium on Close-Range Photogrammetry, pages 1–18, 1971.<br />
[2] M. B. Ahmad and T. S. Choi. Local threshold and boolean function based edge<br />
detection. IEEE Trans. on Consumer Electronics, 45(3):674–679, August 1999.<br />
[3] Allied Vision Technologies GmbH, Taschenweg 2a, D-07646 Stadtroda, Germany.<br />
AVT Marlin - Technical Manual, 7 2004.<br />
[4] A. Alper. An inside look at machine vision. Managing Automation, 2005.<br />
[5] American Society for Photogrammetry and Remote Sensing (ASPRS). Manual of<br />
Photogrammetry. Asprs Pubns, 4th edition, 1980.<br />
[6] K. Astrom and A. Heyden. Stochastic modelling and analysis of sub-pixel edge detection.<br />
In International Conference on Pattern Recognition (ICPR), pages 86–90,<br />
1996.<br />
[7] B. Batchelor and F. Waltz. Intelligent Machine Vision. Springer, 2001.<br />
[8] A. Blake. Active Contours. Springer, 1999.<br />
[9] J. Y. Bouguet. Camera calibration toolbox for matlab.<br />
[10] I. N. Bronstein, G. Musiol, H. Mühlig, and K. A. Semendjajew. Taschenbuch der<br />
Mathematik. Harri Deutsch, 2001.<br />
[11] D. C. Brown. Decentering distortion of lenses. Photometric Engineering, 32(3):444–<br />
462, 1966.<br />
[12] D. C. Brown. Lens distortion for close-range photogrammetry. Photometric Engineering,<br />
37(8):855–866, 1971.<br />
[13] J. Canny. A computational approach to edge detection. IEEE Transactions on Pattern<br />
Analysis and Machine Intelligence (PAMI), 8:679–698, 1986.<br />
[14] T. Chaira and A. K. Ray. Threshold selection using fuzzy set theory. Pattern Recognition<br />
Letters (PRL), 25(8):865–874, June 2004.<br />
[15] R. W. Conners, D. E. Kline, P. A. Araman, and T.H. Drayer. Machine vision technology<br />
for the forest products industry. Computer, 30(7):43–48, 1997.<br />
[16] E. R. Davies. Machine Vision- Theory, Algorithms, Practicalities. Elsevier, 2005.<br />
[17] C. de Boor. A practical guide to splines. Springer, 1978.<br />
145
146 Bibliography<br />
[18] C. Demant, B. Streicher-Abel, and P. Waszkewitz. Industrial Image Processing -<br />
Visual Quality Control in Manufacturing. Springer, 1999.<br />
[19] R. Deriche. Using canny’s criteria to derive a recursively implemented optimal edge<br />
detector. International Journal of Computer Vision (IJCV), 1(2):167–187, 1987.<br />
[20] S. di Zenzo, L. Cinque, and S. Levialdi. Image thresholding using fuzzy entropies.<br />
IEEE Transactions on Systems, Man, and Cybernetics (SMC-B), 28(1):15–23, February<br />
1998.<br />
[21] O. Faugeras. Three-Dimensional Computer Vision. A Geometric Viewpoint. MIT<br />
Press, Cambridge, 1993.<br />
[22] J. Föglein. On edge gradient approximations. Pattern Recognition Letters (PRL),<br />
1:429–434, 1983.<br />
[23] P. J. Flynn and A. K. Jain. Cad-based computer vision: From cad models to relational<br />
graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),<br />
13(2):114–132, 1991.<br />
[24] D. A. Forsyth and J. Ponce. Computer Vision - A modern approach. Pearson Education<br />
International, 2003.<br />
[25] W. T. Freeman and E. H. Adelson. The design and use of steerable filters. IEEE<br />
Transactions on Pattern Analysis and Machine Intelligence (PAMI), 13(9):891–906,<br />
1991.<br />
[26] C. A Glasbey. An analysis of histogram-based thresholding algorithm. Graphical<br />
Models and Image Processing, 55(6):532–537, November 1993.<br />
[27] E. B. Goldstein. Sensation and Perception. California: Brooks/Cole Publishing Co.,<br />
1996.<br />
[28] R. C. Gonzalez and R. E. Woods. Digital Image Processing. Prentice Hall, 2nd<br />
edition, 2002.<br />
[29] E. R. Hancock and J. V. Kittler. Adaptive estimation of hysteresis thresholds. In<br />
Proc. of the IEEE Computer Vision and Pattern Recognition (CVPR), pages 196–201,<br />
1991.<br />
[30] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision. Cambridge<br />
University Press, 2nd edition, 2003.<br />
[31] J. Heikkila and O. Silven. A four-step camera calibration procedure with implicit<br />
image correction. In Proc. of the IEEE Computer Vision and Pattern Recognition<br />
(CVPR), pages 1106–1112, 1997.<br />
[32] R. V. Hogg and A. T. Craig. Introduction to Mathematical Statistics. Prentice Hall,<br />
5 edition, 1994.<br />
[33] D. H. Hubel. Exploration of the primary visual cortex, 1955-1978. Nature, 299:515–<br />
524, 1982.
Bibliography 147<br />
[34] R. J. Hunsicker, J. Patten, A. Ledford, C Ferman, et al. Automatic vision inspection<br />
and measurement system for external screw threads. Journal of Manufacturing<br />
Systems, 1994.<br />
[35] R. W. Hunt. Measuring Colour. Ellis Horwood Ltd. Publishers, 2nd edition, 1991.<br />
[36] B. Jähne. Digital Image Processing. Springer, 6th edition, 2005.<br />
[37] B. Julez. A method of coding TV signals based on edge detection. Bell System Tech.,<br />
38(4):1001–1020, July 1959.<br />
[38] R. King. Brunelleschi’s Dome: How a Renaissance Genius Reinvented Architecture.<br />
Penguin Books, 2001.<br />
[39] R. K. Lenz and R. Y. Tsai. Calibrating a cartesian robot with eye-on-hand configuration<br />
independent of eye-to-hand relationship. IEEE Transactions on Pattern Analysis<br />
and Machine Intelligence (PAMI), 11(9):916–928, September 1989.<br />
[40] J. Linkemann. Optics recommendation guide. http://www.baslerweb.com/.<br />
[41] E. P. Lyvers, O. R. Mitchell, M. L. Akey, and A. P. Reeves. Subpixel measurements<br />
using a moment-based edge operator. IEEE Transactions on Pattern Analysis and<br />
Machine Intelligence (PAMI), 11(12):1293–1309, December 1989.<br />
[42] E. N. Malamas, E. G. M. Petrakis, M. E. Zervakis, L. Petit, and J. D. Legat. A<br />
survey on industrial vision systems, applications and tools. Israel Venture Capital<br />
(IVC), 21(2):171–188, February 2003.<br />
[43] M. Malassiotis and G. Strintzis. Stereo vision system for precision dimensional inspection<br />
of 3d holes. Machine Vision and Applications, 15(2):101–113, December<br />
2003.<br />
[44] J. Malik and P. Perona. Preattentive texture discrimination with early vision mechanism.<br />
Journal of the Optical Society of America, 7(5):923–932, May 1990.<br />
[45] D. Marr and E. C. Hildreth. Theory of edge detection. Proc. Royal Soc. London,<br />
B207:187–217, 1980.<br />
[46] N. Otsu. A threshold selection method from grey-level histograms. IEEE Transactions<br />
on Systems, Man, and Cybernetics (SMC), 9(1):62–66, January 1979.<br />
[47] N. R. Pal and S. K. Pal. A review on image segmentation techniques. Pattern<br />
Recognition, 26(9):1277–1294, September 1993.<br />
[48] J.R. Parker. Algorithms for image processing and computer vision. John Wiley &<br />
Sons, Inc., 1997.<br />
[49] P. Perona. Deformable kernels for early vision. IEEE Transpaction on Pattern Analysis<br />
and Machine Intelligence, 17(5):488–499, May 1995.<br />
[50] D. T1 Pham and R. J Alcock. Automated visual inspection of wood boards: Selection<br />
of features for defect classification by a neural network. In Proc.oftheIMECH<br />
E Part E Journal of Process Mechanical Engineering, volume 213, pages 231–245.<br />
Professional Engineering Publishing, 1999.
148 Bibliography<br />
[51] K.K. Pingle. Visual perception by a computer. In Proc. of Analogical and Inductive<br />
Inference (AII), pages 277–284, 1969.<br />
[52] W. J. Plut and G. M. Bone. Grasping of 3-d sheet metal parts for robotic fixtureless<br />
assembly. In Proc. of the CSME Forum - Engineering Applications of Mechanics,<br />
pages 221–228, Hamilton, Ont., 1996.<br />
[53] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery.<br />
Numerical Recipes in C: The Art of Scientific Computing. Cambridge University<br />
Press, Cambridge, UK, 2nd edition, 1993.<br />
[54] T. Pun. Entropic thresholding: A new approach. Computer Graphics and Image<br />
Processing (CGIP), 16(3):210–239, July 1981.<br />
[55] T. W. Ridler and S. Calvard. Picture thresholding using an iterative selection method.<br />
IEEE Transactions on Systems, Man, and Cybernetics (SMC), 8(8):629–632, August<br />
1978.<br />
[56] P. Rockett. The accuracy of sub-pixel localisation in the canny edge detector. In<br />
Proc. of the British Machine Vision Conference (BMVC), 1999.<br />
[57] A. Rosenfeld and P. de la Torre. Histogram concavity analysis as an aid in threshold<br />
selection. IEEE Transactions on Systems, Man, and Cybernetics (SMC), 13(3):231–<br />
235, March 1983.<br />
[58] S. Rusinkiewicz, O. Hall-Holt, and M. Levoy. Real-time 3d model acquisition. ACM<br />
Transactions on Graphics, 21(3):438–446, July 2002.<br />
[59] P.K. Sahoo, S. Soltani, A. K. C. Wong, and Y.C. Chen. A survey of thresholding<br />
techniques. Computer Vision, Graphics, and Image Processing (CVGIP), 41(2):233–<br />
260, February 1988.<br />
[60] B. Sankur and M. Sezgin. A survey over image thresholding techniques and quantitative<br />
performance evaluation. Journal of Electronic Imaging, 13(1):146–165, 1994.<br />
[61] J. L. Sanz and D. Petkovic. Machine vision algorithms for automated inspection of<br />
thin-film disk heads. IEEE Transactions on Pattern Analysis and Machine Intelligence<br />
(PAMI), 10(6), 1988.<br />
[62] M. Seul, L. O’Gorman, and M. J. Sammon. Practical Algorithms For Image Analysis.<br />
Cambridge University Press, 2000.<br />
[63] M.I. Sezan. A peak detection algorithm and its application to histogram-based image<br />
data reduction. Computer Vision, Graphics, and Image Processing (CVGIP),<br />
49(1):36–51, January 1990.<br />
[64] S. W. Smith. The Scientist and Engineer’s Guide to Digital Signal Processing. California<br />
Technical Publishing, 1997.<br />
[65] E. Trucco and A. Verri. Introductory Techniques for 3-D Computer Vision. Prentice<br />
Hall PTR, 1998.
Bibliography 149<br />
[66] F. Truchetet, F. Nicolier, and O. Laligant. Supixel edge detection for dimensional<br />
control by artificial vision. Journal of Electronic Imaging, 10(1):234–239, Januar<br />
2001.<br />
[67] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machine<br />
vision metrology using off-the-shelf tv cameras and lenses. Robotics and Automation,<br />
IEEE Journal, 3(4):323–344, 1987.<br />
[68] H. Voorhees and T. Poggio. Detecting textons and texture boundaries in natural<br />
images. In Proc. of the International Conference on Computer Vision (ICCV), pages<br />
250–258, 1987.<br />
[69] J. Weickert. Anisotropic Diffusion in Image Processing. ECMI. Teubner, Stuttgart,<br />
1998.<br />
[70] J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models and<br />
accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence<br />
(PAMI), 14(10):965–980, October 1992.<br />
[71] G. A. W. West and T. A Clarke. A survey and examination of subpixel measurement<br />
techniques. ISPRS Int. Conf. on Close Range Photogrammetry and Machine Vision,<br />
1395:456 – 463, 1990.<br />
[72] P. C. West. High speed, real-time machine vision. Technical report, Imagenation and<br />
Automated Vision Systems, 2001.<br />
[73] M. Young. The pinhole camera, imaging without lenses or mirrors. The Physics<br />
Teacher, pages 648–655, December 1989.<br />
[74] Z. Y. Zhang. A flexible new technique for camera calibration. IEEE Transactions<br />
on Pattern Analysis and Machine Intelligence (PAMI), 22(11):1330–1334, November<br />
2000.