Gesture-Based Interaction with Time-of-Flight Cameras

More documents

Recommendations

Info

CHAPTER 3. INTRODUCTION straint: If the reflectance properties of the surface are known, a certain range map implies a corresponding intensity image. In practice, a general reflectance model (such as Lambertian reflectance) provides a sufficient approximation for a wide range of surfaces. We impose the shading constraint using a probabilistic model of image formation and find a maximum a posteriori estimate for the true range map. We present results on both synthetic and real TOF camera images that demonstrate the robust shape estimates achieved by the algorithm. We also show how the reflectivity (or albedo) of the surface can be estimated, both globally for an entire object and locally for objects where albedo varies across the surface. Image Segmentation In image analysis, a very helpful step towards interpreting the content of an image is to assign the pixels of the image to different categories. Often, one uses a category for the background of the scene and one category for each object appearing in front of the background. In Chapter 5, we describe how the available range map can be used effectively in combination with the intensity data to devise two very efficient algorithms that reliably identify the pixels belonging to a person in front the camera. One method relies on a previously captured model of the background and determines the person as a deviation from this background model. The second method operates on a histogram of the range values to identify the object closest to the camera. In both cases, the range data of the TOF camera makes it possible to obtain very robust segmentation results even in complex scenes. Pose Estimation In Chapter 6, we describe a technique for estimating human pose from an image sequence captured by a TOF camera. The pose estimation is derived from a simple model of the human body that we fit to the data in 3D space. The model is represented by a graph consisting of 44 vertices for the upper torso, head, and arms. The anatomy of these body parts is encoded by the edges, i.e. an arm is represented by a chain of pairwise connected vertices whereas the torso consists of a 2-dimensional grid. The model can easily be extended to the representation of legs by adding further chains of pairwise connected vertices to the lower torso. The model is fit to the data in 3D space by employing an iterative update rule common to self-organizing maps. Despite the simplicity of the model, it captures the human pose robustly and can thus be used for tracking the major body parts, such as arms, hands, and head. The 30
accuracy of the tracking is around 5–6 cm root mean square (RMS) for the head and shoulders and around 2 cm RMS for the head. The implementation of the procedure is straightforward and real-time capable. Features The discussion of TOF image features in Chapter 7 is divided into four individual parts. The first part in Section 7.1 will discuss the so-called generalized eccentricities, a kind of feature that can be used to distinguish between different surface types, i.e. one can for example distinguish between planar surface regions, edges, and corners in 3D. These features were employed for detecting the nose in frontal face images and we obtained an equal error rate of 3.0%. Section 7.2 will focus on a reformulation of the generalized eccentricities such that the resulting features become invariant towards scale. This is achieved by computing the features not on the image grid but on the sampled surface of the object in 3D. This becomes possible by using the range map to invert the perspective camera projection of the TOF camera. As a results, one obtains data that is irregularly sampled. Here, we propose the use of the Nonequi- spaced Fast Fourier Transform to compute the features. As a result, one can observe a significantly improved robustness of the nose detection when the person is moving towards and away from the camera. An error rate of zero is achieved on the test data. The third category of image features is computed using the sparse coding principle, i.e. we learn an image basis for the simultaneous representation of TOF range and intensity data. We show in Section 7.3 that the resulting features outperform features obtained using Principal Component Analysis in the same nose detection task that was evaluated for the geometric features. In comparison to the generalized eccentricities we achieve a slightly reduced performance. On the other hand, in this scenario the features were simply obtained under the sparse coding principle without incorporating prior knowledge of the data or properties of the object to be detected. The fourth type of features, presented in Section 7.4, aims at the extraction of the 3D motion of objects in the scene. To this end, we rely on the computation of range flow. The goal is the recognition of human gestures. We propose to combine the computation of range flow with the previously discussed estimation of human pose, i.e. we explicitly compute the 3D motion vectors for the hands of the person perform- ing a gesture. These motion vectors are accumulated in 3D motion histograms. We then apply a learned decision rule to assign a gesture to each frame of a video sequence. Here, we specifically focus on the problem of detecting that no gesture was performed, i.e. each frame is either assigned to a one of the predefined gestures or to 31
Page 1 and 2: Aus dem Institut für Neuro- und Bi
Page 3 and 4: Contents Acknowledgements vi I Intr
Page 5 and 6: CONTENTS 7.3.2 Sparse Features . .
Page 7: Acknowledgements There are a number
Page 11 and 12: 1 Introduction Nowadays, computers
Page 13 and 14: the scene (typically with light nea
Page 15 and 16: 2.1 Introduction 2 Time-of-Flight C
Page 17 and 18: 2.2. STATE-OF-THE-ART TOF SENSORS r
Page 19 and 20: 2.3. ALTERNATIVE OPTICAL RANGE IMAG
Page 21 and 22: 2.3. ALTERNATIVE OPTICAL RANGE IMAG
Page 23 and 24: 2.4 Measurement Principle 2.4. MEAS
Page 25 and 26: .I(t) .A .B . .A0 .A1 .A2 .A3 2.5.
Page 27 and 28: 2.5. TECHNICAL REALIZATION form H(f
Page 29 and 30: 2.6. MEASUREMENT ACCURACY the name
Page 31 and 32: 2.7 Limitations 2.7. LIMITATIONS As
Page 33 and 34: 2.7. LIMITATIONS objects than the i
Page 35: Part II Algorithms 27
Page 40 and 41: CHAPTER 3. INTRODUCTION the class i
Page 42 and 43: CHAPTER 4. SHADING CONSTRAINT Besid
Page 44 and 45: CHAPTER 4. SHADING CONSTRAINT an in
Page 46 and 47: CHAPTER 4. SHADING CONSTRAINT 4.2.3
Page 48 and 49: CHAPTER 4. SHADING CONSTRAINT in th
Page 50 and 51: CHAPTER 4. SHADING CONSTRAINT inten
Page 52 and 53: CHAPTER 4. SHADING CONSTRAINT RMS e
Page 54 and 55: CHAPTER 4. SHADING CONSTRAINT inten
Page 56 and 57: CHAPTER 4. SHADING CONSTRAINT sures
Page 58 and 59: CHAPTER 5. SEGMENTATION (a) (b) Fig
Page 60 and 61: CHAPTER 5. SEGMENTATION (a) (b) (c)
Page 62 and 63: CHAPTER 5. SEGMENTATION (a) (b) Fig
Page 64 and 65: CHAPTER 5. SEGMENTATION Figure 5.7:
Page 67 and 68: 6.1 Introduction 6 Pose Estimation
Page 69 and 70: 6.2. METHOD resolve disambiguities
Page 71 and 72: 6.2. METHOD Figure 6.2: Graph model
Page 73 and 74: 6.3. RESULTS Figure 6.3: Point clou
Page 75 and 76: 6.3. RESULTS exactly which part of
Page 77 and 78: 6.4. DISCUSSION learning. As a resu
Page 79 and 80: 7 Features In image processing, the
Page 81 and 82: The first type of feature is relate
Page 83 and 84: 7.1 Geometric Invariants 7.1.1 Intr
Page 85 and 86: 7.1. GEOMETRIC INVARIANTS The featu
Page 87 and 88: 4 3 ǫ2 2 5 ×10−3 1 7.1. GEOMETR
Page 89 and 90:
7.1. GEOMETRIC INVARIANTS Figure 7.
Page 91 and 92:
7.1. GEOMETRIC INVARIANTS We were a
Page 93 and 94:
7.2. SCALE INVARIANT FEATURES measu
Page 95 and 96:
an algorithm for the fast evaluatio
Page 97 and 98:
7.2. SCALE INVARIANT FEATURES the o
Page 99 and 100:
7.2. SCALE INVARIANT FEATURES exhib
Page 101 and 102:
detection rate detection rate 1 0.8
Page 103 and 104:
7.2. MULTIMODAL SPARSE FEATURES 7.3
Page 105 and 106:
7.3. MULTIMODAL SPARSE FEATURES Fig
Page 107 and 108:
7.3. MULTIMODAL SPARSE FEATURES Fig
Page 109 and 110:
7.3. MULTIMODAL SPARSE FEATURES eac
Page 111 and 112:
7.3. MULTIMODAL SPARSE FEATURES tem
Page 113 and 114:
false positive rate 0.4 0.35 0.3 0.
Page 115 and 116:
7.3. MULTIMODAL SPARSE FEATURES thi
Page 117 and 118:
7.4. LOCAL RANGE FLOW FOR HUMAN GES
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127:
Part III Applications 119
Page 130 and 131:
CHAPTER 8. INTRODUCTION an alternat
Page 132 and 133:
CHAPTER 9. FACIAL FEATURE TRACKING
Page 134 and 135:
CHAPTER 9. FACIAL FEATURE TRACKING
Page 137 and 138:
10.1 Introduction 10 Gesture-Based
Page 139 and 140:
10.2. METHOD scenarios: One, where
Page 141 and 142:
10.2. METHOD Figure 10.3: Segmented
Page 143 and 144:
Table 10.1: Interpretation of point
Page 145 and 146:
corner along the ray. 10.2. METHOD
Page 147 and 148:
600 500 400 300 200 100 0 0 10 20 3
Page 149:
10.4. DISCUSSION tention here was t
Page 152 and 153:
CHAPTER 11. DEPTH OF FIELD BASED ON
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
Page 166 and 167:
Page 168 and 169:
CHAPTER 12. CONCLUSION alization of
Page 171 and 172:
Bibliography ARTTS 3D TOF Database.
Page 173 and 174:
BIBLIOGRAPHY James R. Diebel and Se
Page 175 and 176:
BIBLIOGRAPHY ence on Computer Visio
Page 177 and 178:
BIBLIOGRAPHY Stefan Kunis and Danie
Page 179 and 180:
BIBLIOGRAPHY Thierry Oggier, Michae
Page 181 and 182:
BIBLIOGRAPHY Rudolf Schwarte, Horst
Page 183 and 184:
BIBLIOGRAPHY ’06: Proceedings of
Page 185:
modulation frequency, 8, 23, 26 Mon
show all

Gesture-Based Interaction with Time-of-Flight Cameras

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?