research activities in 2007 - CSEM
research activities in 2007 - CSEM
research activities in 2007 - CSEM
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Generic Framework for Feature Extraction <strong>in</strong> Vision<br />
T. Zamof<strong>in</strong>g, P. Seitz<br />
A vision system that classifies objects <strong>in</strong> complex, natural scenes has been realized. This software project tries to map structures of the cerebral<br />
cortex <strong>in</strong>to a hierarchical b<strong>in</strong>ary matched filter [1] implemented on a computer.<br />
One of the most ambitious goals <strong>in</strong> digital image process<strong>in</strong>g is<br />
the development of universal classification algorithms,<br />
capable of “understand<strong>in</strong>g” natural scenes with robustness<br />
and reliability similar to those demonstrated by natural vision<br />
systems, especially by the human visual system. In particular,<br />
the functionality of such natural vision systems under very<br />
adverse conditions is a highly desirable property for practical<br />
applications <strong>in</strong> mach<strong>in</strong>e vision. This robustness can<br />
encompass translation <strong>in</strong>variance, rotation <strong>in</strong>variance, as well<br />
as a high tolerance to distortion (e.g. perspective), to partial<br />
occlusion, to reflections and shadows, to unsharp images<br />
(focus, movement). It should moreover be <strong>in</strong>dependent of<br />
local contrast and illum<strong>in</strong>ation variations, <strong>in</strong>dependent of the<br />
background and <strong>in</strong>dependent of object texture (surface<br />
texture, dirt, etc.). All classification algorithms are faced, from<br />
the outset, with the problem that the given cont<strong>in</strong>uous-tone<br />
images conta<strong>in</strong> a vast amount of <strong>in</strong>formation that must be<br />
substantially reduced <strong>in</strong> order to label the different image<br />
areas (“the objects”) correctly, accord<strong>in</strong>g to the class to which<br />
they belong.<br />
This feature-match<strong>in</strong>g process is realized as a b<strong>in</strong>ary matched<br />
filter follow<strong>in</strong>g three assumptions (see Figure 1). (1) Local<br />
orientation is a central source of relevant image <strong>in</strong>formation.<br />
This assertion is corroborated by neurobiologists’ f<strong>in</strong>d<strong>in</strong>gs on<br />
how natural vision systems work, us<strong>in</strong>g directionally selective<br />
filter banks. The process employs local orientations as the<br />
fundamental picture primitives, rather than the more usual<br />
edge locations. (2) The procedures are based on reta<strong>in</strong><strong>in</strong>g<br />
and exploit<strong>in</strong>g the local arrangement of features of different<br />
complexity <strong>in</strong> an image. The technique is based on the<br />
accumulation of evidence <strong>in</strong> b<strong>in</strong>ary channels, followed by a<br />
weighted, non-l<strong>in</strong>ear sum of the evidence accumulators. (3)<br />
The algorithm proceeds <strong>in</strong> a hierarchical fashion, start<strong>in</strong>g at<br />
low feature complexity, and rais<strong>in</strong>g the level of abstraction at<br />
each successive process<strong>in</strong>g step.<br />
Figure 1: The feature match<strong>in</strong>g process is based on a successive<br />
hierarchical approach<br />
This algorithm can be implemented very easily <strong>in</strong> a computer<br />
program. The essence of the algorithm (i.e. without graphics<br />
38<br />
and file handl<strong>in</strong>g) can be written us<strong>in</strong>g 60–70 l<strong>in</strong>es of a highlevel<br />
language (e.g. Pascal, Fortran or C). Because of the<br />
homogeneity and the simple, reusable characteristics of the<br />
feature-matcher, it should be possible, with reasonable effort,<br />
to develop a hardware implementation runn<strong>in</strong>g at low power <strong>in</strong><br />
real time.<br />
The current implementation uses a black and white firewire<br />
camera with a resolution of 640x480 pixels and is<br />
implemented on a PC us<strong>in</strong>g SSE (s<strong>in</strong>gle <strong>in</strong>struction multiple<br />
data) for speedup. With this setup a frame rate of 5 to 25<br />
frames per second (depend<strong>in</strong>g on the complexity of the<br />
templates) could be achieved. The sample below shows an<br />
example (Figure 2) with its templates to detect traffic signs<br />
that runs at about 20 fps.<br />
Figure 2: Application example: Identification of traffic signs<br />
Future work could use the ViSe (www.csem-devise.com)<br />
sensor. The advantage of the ViSe sensor is that it directly<br />
delivers orientation images with a high dynamic range.<br />
Furthermore a FPGA implementation will lead to a smart, low<br />
power and portable vision system.<br />
[1] G. Lang, P. Seitz, „Robust classification of arbitrary object<br />
classes based on hierarchical spatial feature-match<strong>in</strong>g“, MVA,<br />
1997