18.07.2014 Views

PHD Thesis - Institute for Computer Graphics and Vision - Graz ...

PHD Thesis - Institute for Computer Graphics and Vision - Graz ...

PHD Thesis - Institute for Computer Graphics and Vision - Graz ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.2. Localization from point features 13<br />

recovered. Typically a pose is estimated from 10 to 40 3D-2D correspondences. The visual<br />

pose is then combined with measuresments from wheel odometry within a probabilistic SLAM<br />

framework [78]. In fact, during frames with no detected visual l<strong>and</strong>marks navigation continues<br />

based on wheel odometry. Map building is also vision based. The robot starts driving around in<br />

a first unknown environment, building a world map. The 3D points in the map are associated<br />

with SIFT features <strong>and</strong> an original view of the l<strong>and</strong>marks cropped from the original image.<br />

Each l<strong>and</strong>mark can have a set of associated SIFT features, describing the l<strong>and</strong>mark <strong>for</strong> various<br />

viewpoints. A 3D l<strong>and</strong>mark is reconstructed from three images. Three images are taken in sequence<br />

each in a distance of 20cm. Interest points are detected <strong>and</strong> matched between the three<br />

images using SIFT feature matching. With a structure-from-motion approach the 3D l<strong>and</strong>marks<br />

are reconstructed <strong>and</strong> the camera positions (robot positions) are computed. The l<strong>and</strong>marks are<br />

reconstructed in a local coordinate frame. By adding this position to the current position of<br />

the robot the l<strong>and</strong>marks are trans<strong>for</strong>med into the global coordinate system <strong>and</strong> this position is<br />

stored in the map database. Map building continues until all the environment has been traversed<br />

<strong>and</strong> no new l<strong>and</strong>marks are found. The authors describe experiments <strong>for</strong> a 2-bedroom apartment.<br />

Map building lasted 32 minutes <strong>and</strong> the robot created a map containing 82 l<strong>and</strong>marks. During<br />

operation map updates are possible. Updates of the l<strong>and</strong>mark position are maintained by a<br />

Kalman filter [54]. The average localization error measured in the experiments is about 20cm to<br />

25cm, which is quite high. However, it should be stressed that rather simple methods are used<br />

in this approach to let the software run in real-time on low-cost computers. The approach of<br />

Karlsson et al. is especially interesting as it is available as the commercial localization software<br />

vSlam 1 <strong>for</strong> the robots sold by Evolution Robotics 2 . vSlam achieves map building <strong>and</strong> navigation<br />

with a single low-cost camera. The most limiting factor of the approach, according to the<br />

authors, is the size of the l<strong>and</strong>mark database. Each l<strong>and</strong>mark needs about 40kB to 500kB of<br />

memory. This restricts the method to small indoor environments. Another critical issue which<br />

should be discussed is the reconstruction of the l<strong>and</strong>marks during map building. A l<strong>and</strong>mark<br />

is reconstructed from three images at different positions. However, as the camera usually faces<br />

<strong>for</strong>ward the three views contain only translational <strong>for</strong>ward motion. This imposes very bad conditions<br />

<strong>for</strong> 3D reconstruction. In fact, reconstruction of a plane (e.g. wall) will show depth<br />

estimation errors of about 10cm in practice. However, such uncertainties are h<strong>and</strong>led within the<br />

SLAM framework.<br />

A different approach has been presented by Se, Lowe <strong>and</strong> Little [96]. In their work they<br />

actually propose three different localization methods. The robot movement is however assumed<br />

to be restricted to a plane, thus the pose estimate only contains 3 DOF. The map itself contains<br />

full 3D coordinates of the l<strong>and</strong>marks. All three methods basically work by computing the pose<br />

from 3D-3D l<strong>and</strong>mark matches. The robot is equipped with a trinocular stereo head (Triclops 3 )<br />

which produces 3D coordinates <strong>for</strong> each l<strong>and</strong>mark in the current view. The first localization<br />

approach is based on the Hough trans<strong>for</strong>m [48]. A 3D discretized Hough space representing<br />

the robot poses with three parameters (X, Z, θ) is constructed. Each l<strong>and</strong>mark match votes<br />

<strong>for</strong> possible poses in the Hough space. The maximum vote then determines the parameters<br />

(X, Z, θ) of the robot pose. The second proposed method is a RANSAC scheme [28]. From<br />

two l<strong>and</strong>mark matches the necessary translation <strong>and</strong> rotation <strong>for</strong> alignment <strong>and</strong> thus the robot<br />

pose can be computed. This is repeated <strong>for</strong> a number of r<strong>and</strong>omly chosen l<strong>and</strong>mark samples<br />

within a RANSAC scheme. For each sample the pose hypothesis is verified by checking how<br />

1 http://www.evolution.com/core/navigation/vslam.masn<br />

2 http://www.evolution.com<br />

3 http://www.ptgrey.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!