Gesture-Based Interaction with Time-of-Flight Cameras
Gesture-Based Interaction with Time-of-Flight Cameras
Gesture-Based Interaction with Time-of-Flight Cameras
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
CHAPTER 1. INTRODUCTION<br />
recognition systems and touch screens in recent years. In the former case, however,<br />
it has proven difficult to devise systems that perform <strong>with</strong> sufficient reliability and ro-<br />
bustness, i.e. apart from correctly recognizing speech it is difficult to decide whether<br />
an incoming signal is intended for the system or whether it belongs to an arbitrary<br />
conversation. So far, this limited the widespread use <strong>of</strong> speech recognition systems.<br />
Touch screens, on the other hand, have experienced a broad acceptance in infor-<br />
mation terminals and hand-held devices, such as Apple’s iPhone. Especially, in the<br />
latter case one can observe a new trend in human-machine interaction – the use <strong>of</strong><br />
gestures.<br />
The advantage <strong>of</strong> touch sensitive surfaces in combination <strong>with</strong> a gesture recog-<br />
nition system is that a touch onto the surface can have more meaning than a simple<br />
pointing action to select a certain item on the screen. For example, the iPhone can<br />
recognize a swiping gesture to navigate between so-called views, which are used to<br />
present different content to the user, and two finger gestures can be used to rotate<br />
and resize images.<br />
Taking the concept <strong>of</strong> gestures one step further is to recognize and interpret hu-<br />
man gestures independently <strong>of</strong> a touch sensitive surface. This has two major advan-<br />
tages: (i) It comes closer to our natural use <strong>of</strong> gestures as a body language and (ii) it<br />
opens a wider range <strong>of</strong> applications where users do not need to be in physical contact<br />
<strong>with</strong> the input device.<br />
The success <strong>of</strong> such systems has first been shown in the gaming market by Sony’s<br />
EyeToy. In case <strong>of</strong> the EyeToy, a camera is placed on top <strong>of</strong> the video display, which<br />
monitors the user. Movement in front <strong>of</strong> the camera triggers certain actions and,<br />
thus, allows the gamer to control the system.<br />
A similar, yet more realistic, approach was taken by Nintendo to integrate ges-<br />
ture recognition into their gaming console Wii. Hand-held wireless motion sensors<br />
capture the movement <strong>of</strong> the hands and, thus, provide natural motion patterns to the<br />
system, which can be used to perform virtual actions, such as the swing <strong>of</strong> a virtual<br />
tennis racket.<br />
The next consequent step is to perform complex action and gesture recognition<br />
in an entirely contact-less fashion - for example <strong>with</strong> a camera based system. This<br />
goal seems to be <strong>with</strong>in reach since the appearance <strong>of</strong> low-cost 3D sensors, such as<br />
the time-<strong>of</strong>-flight (TOF) camera. A TOF camera is a new type <strong>of</strong> image sensor which<br />
provides both an intensity image and a range map, i.e. it can measure the distance to<br />
the object in front <strong>of</strong> the camera at each pixel <strong>of</strong> the image at a high temporal resolu-<br />
tion <strong>of</strong> 30 or more frames per second. TOF cameras operate by actively illuminating<br />
4