05.12.2012 Views

research activities in 2007 - CSEM

research activities in 2007 - CSEM

research activities in 2007 - CSEM

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Generic Framework for Feature Extraction <strong>in</strong> Vision<br />

T. Zamof<strong>in</strong>g, P. Seitz<br />

A vision system that classifies objects <strong>in</strong> complex, natural scenes has been realized. This software project tries to map structures of the cerebral<br />

cortex <strong>in</strong>to a hierarchical b<strong>in</strong>ary matched filter [1] implemented on a computer.<br />

One of the most ambitious goals <strong>in</strong> digital image process<strong>in</strong>g is<br />

the development of universal classification algorithms,<br />

capable of “understand<strong>in</strong>g” natural scenes with robustness<br />

and reliability similar to those demonstrated by natural vision<br />

systems, especially by the human visual system. In particular,<br />

the functionality of such natural vision systems under very<br />

adverse conditions is a highly desirable property for practical<br />

applications <strong>in</strong> mach<strong>in</strong>e vision. This robustness can<br />

encompass translation <strong>in</strong>variance, rotation <strong>in</strong>variance, as well<br />

as a high tolerance to distortion (e.g. perspective), to partial<br />

occlusion, to reflections and shadows, to unsharp images<br />

(focus, movement). It should moreover be <strong>in</strong>dependent of<br />

local contrast and illum<strong>in</strong>ation variations, <strong>in</strong>dependent of the<br />

background and <strong>in</strong>dependent of object texture (surface<br />

texture, dirt, etc.). All classification algorithms are faced, from<br />

the outset, with the problem that the given cont<strong>in</strong>uous-tone<br />

images conta<strong>in</strong> a vast amount of <strong>in</strong>formation that must be<br />

substantially reduced <strong>in</strong> order to label the different image<br />

areas (“the objects”) correctly, accord<strong>in</strong>g to the class to which<br />

they belong.<br />

This feature-match<strong>in</strong>g process is realized as a b<strong>in</strong>ary matched<br />

filter follow<strong>in</strong>g three assumptions (see Figure 1). (1) Local<br />

orientation is a central source of relevant image <strong>in</strong>formation.<br />

This assertion is corroborated by neurobiologists’ f<strong>in</strong>d<strong>in</strong>gs on<br />

how natural vision systems work, us<strong>in</strong>g directionally selective<br />

filter banks. The process employs local orientations as the<br />

fundamental picture primitives, rather than the more usual<br />

edge locations. (2) The procedures are based on reta<strong>in</strong><strong>in</strong>g<br />

and exploit<strong>in</strong>g the local arrangement of features of different<br />

complexity <strong>in</strong> an image. The technique is based on the<br />

accumulation of evidence <strong>in</strong> b<strong>in</strong>ary channels, followed by a<br />

weighted, non-l<strong>in</strong>ear sum of the evidence accumulators. (3)<br />

The algorithm proceeds <strong>in</strong> a hierarchical fashion, start<strong>in</strong>g at<br />

low feature complexity, and rais<strong>in</strong>g the level of abstraction at<br />

each successive process<strong>in</strong>g step.<br />

Figure 1: The feature match<strong>in</strong>g process is based on a successive<br />

hierarchical approach<br />

This algorithm can be implemented very easily <strong>in</strong> a computer<br />

program. The essence of the algorithm (i.e. without graphics<br />

38<br />

and file handl<strong>in</strong>g) can be written us<strong>in</strong>g 60–70 l<strong>in</strong>es of a highlevel<br />

language (e.g. Pascal, Fortran or C). Because of the<br />

homogeneity and the simple, reusable characteristics of the<br />

feature-matcher, it should be possible, with reasonable effort,<br />

to develop a hardware implementation runn<strong>in</strong>g at low power <strong>in</strong><br />

real time.<br />

The current implementation uses a black and white firewire<br />

camera with a resolution of 640x480 pixels and is<br />

implemented on a PC us<strong>in</strong>g SSE (s<strong>in</strong>gle <strong>in</strong>struction multiple<br />

data) for speedup. With this setup a frame rate of 5 to 25<br />

frames per second (depend<strong>in</strong>g on the complexity of the<br />

templates) could be achieved. The sample below shows an<br />

example (Figure 2) with its templates to detect traffic signs<br />

that runs at about 20 fps.<br />

Figure 2: Application example: Identification of traffic signs<br />

Future work could use the ViSe (www.csem-devise.com)<br />

sensor. The advantage of the ViSe sensor is that it directly<br />

delivers orientation images with a high dynamic range.<br />

Furthermore a FPGA implementation will lead to a smart, low<br />

power and portable vision system.<br />

[1] G. Lang, P. Seitz, „Robust classification of arbitrary object<br />

classes based on hierarchical spatial feature-match<strong>in</strong>g“, MVA,<br />

1997

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!