09.06.2013 Views

Gesture-Based Interaction with Time-of-Flight Cameras

Gesture-Based Interaction with Time-of-Flight Cameras

Gesture-Based Interaction with Time-of-Flight Cameras

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

accuracy <strong>of</strong> the tracking is around 5–6 cm root mean square (RMS) for the head and<br />

shoulders and around 2 cm RMS for the head. The implementation <strong>of</strong> the procedure<br />

is straightforward and real-time capable.<br />

Features<br />

The discussion <strong>of</strong> TOF image features in Chapter 7 is divided into four individual<br />

parts. The first part in Section 7.1 will discuss the so-called generalized eccentricities,<br />

a kind <strong>of</strong> feature that can be used to distinguish between different surface types, i.e.<br />

one can for example distinguish between planar surface regions, edges, and corners<br />

in 3D. These features were employed for detecting the nose in frontal face images and<br />

we obtained an equal error rate <strong>of</strong> 3.0%. Section 7.2 will focus on a reformulation<br />

<strong>of</strong> the generalized eccentricities such that the resulting features become invariant<br />

towards scale. This is achieved by computing the features not on the image grid but<br />

on the sampled surface <strong>of</strong> the object in 3D. This becomes possible by using the range<br />

map to invert the perspective camera projection <strong>of</strong> the TOF camera. As a results, one<br />

obtains data that is irregularly sampled. Here, we propose the use <strong>of</strong> the Nonequi-<br />

spaced Fast Fourier Transform to compute the features. As a result, one can observe<br />

a significantly improved robustness <strong>of</strong> the nose detection when the person is moving<br />

towards and away from the camera. An error rate <strong>of</strong> zero is achieved on the test data.<br />

The third category <strong>of</strong> image features is computed using the sparse coding prin-<br />

ciple, i.e. we learn an image basis for the simultaneous representation <strong>of</strong> TOF range<br />

and intensity data. We show in Section 7.3 that the resulting features outperform<br />

features obtained using Principal Component Analysis in the same nose detection<br />

task that was evaluated for the geometric features. In comparison to the generalized<br />

eccentricities we achieve a slightly reduced performance. On the other hand, in this<br />

scenario the features were simply obtained under the sparse coding principle <strong>with</strong>out<br />

incorporating prior knowledge <strong>of</strong> the data or properties <strong>of</strong> the object to be detected.<br />

The fourth type <strong>of</strong> features, presented in Section 7.4, aims at the extraction <strong>of</strong> the<br />

3D motion <strong>of</strong> objects in the scene. To this end, we rely on the computation <strong>of</strong> range<br />

flow. The goal is the recognition <strong>of</strong> human gestures. We propose to combine the<br />

computation <strong>of</strong> range flow <strong>with</strong> the previously discussed estimation <strong>of</strong> human pose,<br />

i.e. we explicitly compute the 3D motion vectors for the hands <strong>of</strong> the person perform-<br />

ing a gesture. These motion vectors are accumulated in 3D motion histograms. We<br />

then apply a learned decision rule to assign a gesture to each frame <strong>of</strong> a video se-<br />

quence. Here, we specifically focus on the problem <strong>of</strong> detecting that no gesture was<br />

performed, i.e. each frame is either assigned to a one <strong>of</strong> the predefined gestures or to<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!