PHD Thesis - Institute for Computer Graphics and Vision - Graz ...

More documents

Recommendations

Info

2.2. Localization from point features 13 recovered. Typically a pose is estimated from 10 to 40 3D-2D correspondences. The visual pose is then combined with measuresments from wheel odometry within a probabilistic SLAM framework [78]. In fact, during frames with no detected visual landmarks navigation continues based on wheel odometry. Map building is also vision based. The robot starts driving around in a first unknown environment, building a world map. The 3D points in the map are associated with SIFT features and an original view of the landmarks cropped from the original image. Each landmark can have a set of associated SIFT features, describing the landmark for various viewpoints. A 3D landmark is reconstructed from three images. Three images are taken in sequence each in a distance of 20cm. Interest points are detected and matched between the three images using SIFT feature matching. With a structure-from-motion approach the 3D landmarks are reconstructed and the camera positions (robot positions) are computed. The landmarks are reconstructed in a local coordinate frame. By adding this position to the current position of the robot the landmarks are transformed into the global coordinate system and this position is stored in the map database. Map building continues until all the environment has been traversed and no new landmarks are found. The authors describe experiments for a 2-bedroom apartment. Map building lasted 32 minutes and the robot created a map containing 82 landmarks. During operation map updates are possible. Updates of the landmark position are maintained by a Kalman filter [54]. The average localization error measured in the experiments is about 20cm to 25cm, which is quite high. However, it should be stressed that rather simple methods are used in this approach to let the software run in real-time on low-cost computers. The approach of Karlsson et al. is especially interesting as it is available as the commercial localization software vSlam 1 for the robots sold by Evolution Robotics 2 . vSlam achieves map building and navigation with a single low-cost camera. The most limiting factor of the approach, according to the authors, is the size of the landmark database. Each landmark needs about 40kB to 500kB of memory. This restricts the method to small indoor environments. Another critical issue which should be discussed is the reconstruction of the landmarks during map building. A landmark is reconstructed from three images at different positions. However, as the camera usually faces forward the three views contain only translational forward motion. This imposes very bad conditions for 3D reconstruction. In fact, reconstruction of a plane (e.g. wall) will show depth estimation errors of about 10cm in practice. However, such uncertainties are handled within the SLAM framework. A different approach has been presented by Se, Lowe and Little [96]. In their work they actually propose three different localization methods. The robot movement is however assumed to be restricted to a plane, thus the pose estimate only contains 3 DOF. The map itself contains full 3D coordinates of the landmarks. All three methods basically work by computing the pose from 3D-3D landmark matches. The robot is equipped with a trinocular stereo head (Triclops 3 ) which produces 3D coordinates for each landmark in the current view. The first localization approach is based on the Hough transform [48]. A 3D discretized Hough space representing the robot poses with three parameters (X, Z, θ) is constructed. Each landmark match votes for possible poses in the Hough space. The maximum vote then determines the parameters (X, Z, θ) of the robot pose. The second proposed method is a RANSAC scheme [28]. From two landmark matches the necessary translation and rotation for alignment and thus the robot pose can be computed. This is repeated for a number of randomly chosen landmark samples within a RANSAC scheme. For each sample the pose hypothesis is verified by checking how 1 http://www.evolution.com/core/navigation/vslam.masn 2 http://www.evolution.com 3 http://www.ptgrey.com
2.2. Localization from point features 14 many landmark matches out of the complete set agree with the pose estimate. The landmarks supporting the pose estimate form the consensus set and are called inliers. Finally a leastsquares estimate of the pose is performed by all inlier landmarks of the pose hypothesis with the largest consensus set. The third method computes the pose by map alignment. It works by constructing a local sub-map from landmarks of multiple frames. This local sub-map is then aligned with a part of the world map. The local sub-map is created while the robot rotates a little, from -15 ◦ to 15 ◦ . The map alignment is implemented with the RANSAC scheme of the previous method. This method is to be preferred if only a few landmarks are currently in the field-of-view of the robot. Beside localization the authors describe a complete framework for visual SLAM including global localization. The system is designed for indoor operations. Without an a priori map, the robot will start to construct a map by driving around randomly. The map building will be completed, if no new features are detected. DOG-keypoints [67] are detected for each image frame and a SIFT descriptor [67] is computed for each detection. The 3D parameters for each detected image point is computed with the calibrated trinocular stereo system. The reconstructed image points are stored in the map as landmarks associated with the corresponding SIFT description. The detected image points are tracked in the subsequent frames and the different SIFT descriptions are additionally added to the 3D landmarks. Thus a landmarks entry in the database consists of the 3D parameters of the point and a collection of SIFT descriptions from different viewpoints. The acquired image data will not further be stored. A sub-map concept is used for map building. 3D landmarks extracted from an image are not immediately added to the map, instead they are added to a local sub-map. If the landmarks can be tracked for some time, the whole sub-map will then be added to the global map. The local sub-map will be aligned to the already existing landmarks in the global map and new landmarks will be added, while already existing landmarks will be updated. Each landmark has an associated uncertainty which is decreasing with multiple measurements. The uncertainty is represented by a 3 × 3 covariance matrix. A Kalman filter [54] is used to propagate the uncertainty of the landmarks. If a landmark is re-detected the uncertainty shrinks, indicating that the landmark is better localized. Experiments for map building and localization are shown for a 10x10m big room. The measured average position error for global localization was reported to be 7cm, while the average rotational error was about 1 ◦ . The experiments show that reliable pose estimation requires a minimum of 10 landmark matches. The approach is a very reliable visual SLAM algorithm. With a frame rate of 2Hz reported on a relatively slow computer it is basically running in real-time. The key component of the method is the use of the SIFT descriptor for the landmarks. This allows to generate a map of natural landmarks which can be reliable re-detected and matched. The SIFT descriptor is based on orientation histograms and is therefore very robust to illumination changes. This allows to solve the correspondence problem fast and reliable, which is basically the most crucial part for visual systems. The achieved localization accuracy is high enough to allow a safe and useful navigation through the environment. Difficulties in 3D reconstruction are avoided by using a fixed stereo setup, which directly outputs 3D coordinates. However, this is much more expensive than the use of a single camera and it is not suited for small scale robots. It is worth mentioning that the created 3D map is a sparse set of 3D landmarks. It cannot be used for visualization purposes and it is difficult to use for navigation and path planning tasks, because a lot of the structure in the environment is not contained in the map, but only some distinct landmark points. Another approach to visual localization uses invariant sets of points. In the work by Atiya and Hager [1] the pose is computed from invariant point triples. Another different approach has been developed by Sim and Dudek [98] where the pose is computed from the transformation of
Page 1 and 2: Graz University of Technology Insti
Page 3 and 4: Abstract Visual map building and lo
Page 5 and 6: Contents 1 Introduction to mobile r
Page 7 and 8: CONTENTS vi 7.1.5 Sub-map linking .
Page 9 and 10: 1.1. Localization and map building
Page 11 and 12: 1.3. What has already been achieved
Page 13 and 14: 1.5. How can it get solved? 6 fully
Page 15 and 16: 1.6. Contribution of this thesis 8
Page 17 and 18: 1.7. Structure of the thesis 10 com
Page 19: 2.2. Localization from point featur
Page 23 and 24: 2.4. Localization from plane featur
Page 25 and 26: 2.5. Summary 18 or not. Clearly thi
Page 27 and 28: Chapter 3 Local detectors Research
Page 29 and 30: 3.1. Interest point detectors 22 fu
Page 31 and 32: 3.2. Scale invariant detectors 24 r
Page 33 and 34: 3.2. Scale invariant detectors 26 (
Page 35 and 36: 3.2. Scale invariant detectors 28 3
Page 37 and 38: 3.3. Affine invariant detectors 30
Page 47 and 48: 3.4. Comparison of the described me
Page 49 and 50: 3.4. Comparison of the described me
Page 51 and 52: 44 But using a plane to plane homog
Page 53 and 54: 4.2. Representation of the detectio
Page 55 and 56: 4.3. Detection correspondence 48 th
Page 57 and 58: 4.4. Point transfer using the trifo
Page 59 and 60: 4.5. Ground truth generation 52 usi
Page 61 and 62: 4.6. Experimental evaluation 54 Fig
Page 63 and 64: 4.6. Experimental evaluation 56 rep
Page 65 and 66: 4.6. Experimental evaluation 58 MSE
Page 67 and 68: 4.6. Experimental evaluation 60 mat
Page 69 and 70: 4.6. Experimental evaluation 62 vie
Page 71 and 72:
4.6. Experimental evaluation 64 vie
Page 73 and 74:
4.6. Experimental evaluation 66 rel
Page 75 and 76:
Chapter 5 Maximally Stable Corner C
Page 77 and 78:
5.1. The MSCC detector 70 (a) (b) (
Page 79 and 80:
5.2. Region representation 72 400 (
Page 81 and 82:
5.3. Computational complexity 74 6.
Page 83 and 84:
5.5. Detection examples 76 paramete
Page 85 and 86:
5.5. Detection examples 78 Figure 5
Page 87 and 88:
5.6. Detector evaluation: Repeatabi
Page 89 and 90:
5.7. Combining MSCC with other loca
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
6.1. Wide-baseline region matching
Page 101 and 102:
6.1. Wide-baseline region matching
Page 103 and 104:
6.2. Piece-wise planar scene recons
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Chapter 7 Living in a piecewise pla
Page 119 and 120:
7.1. Map building 112 s,R,t sub-map
Page 121 and 122:
7.1. Map building 114 x = (x 1 ...x
Page 123 and 124:
7.1. Map building 116 distance is u
Page 125 and 126:
7.1. Map building 118 normalization
Page 127 and 128:
7.2. Localization 120 where N = |D
Page 129 and 130:
7.2. Localization 122 registration
Page 131 and 132:
7.2. Localization 124 (a) (b) (c) F
Page 133 and 134:
7.2. Localization 126 3D structure
Page 135 and 136:
7.2. Localization 128 other landmar
Page 137 and 138:
Chapter 8 Map building and localiza
Page 139 and 140:
8.2. Map building experiments 132 8
Page 141 and 142:
8.2. Map building experiments 134 D
Page 143 and 144:
8.2. Map building experiments 136 (
Page 145 and 146:
8.2. Map building experiments 138 F
Page 147 and 148:
8.2. Map building experiments 140 F
Page 149 and 150:
8.3. Localization experiments 142 8
Page 151 and 152:
8.3. Localization experiments 144 F
Page 153 and 154:
8.3. Localization experiments 146 F
Page 155 and 156:
8.3. Localization experiments 148 8
Page 157 and 158:
8.3. Localization experiments 150 (
Page 159 and 160:
Chapter 9 Conclusion More than 25 y
Page 161 and 162:
9.1. Future work 154 Map building a
Page 163 and 164:
9.1. Future work 156 information ca
Page 165 and 166:
A.1. Projective ellipse transfer 15
Page 167 and 168:
A.1. Projective ellipse transfer 16
Page 169 and 170:
A.2. Affine approximation of ellips
Page 171 and 172:
Appendix B The trifocal tensor and
Page 173 and 174:
Bibliography [1] S. Atiya and G. Ha
Page 175 and 176:
168 [31] F. Fraundorfer and H. Bisc
Page 177 and 178:
170 [61] U. Köthe. Edge and juncti
Page 179 and 180:
172 [92] F. Schaffalitzky and A. Zi
show all

PHD Thesis - Institute for Computer Graphics and Vision - Graz ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?