Fast Robust Large-scale Mapping from Video and Internet Photo ...
Fast Robust Large-scale Mapping from Video and Internet Photo ...
Fast Robust Large-scale Mapping from Video and Internet Photo ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
For every frame of video, a depthmap is generated using a real-time GPUbased<br />
multi-view planesweep stereo algorithm. The planesweep algorithm,<br />
comprised primarily of image warping operations, is highly efficient on the<br />
GPU [62]. Because it is multi-view, it is quite robust <strong>and</strong> can efficiently<br />
h<strong>and</strong>le occlusions, a major problem in stereo. The stereo depthmaps are<br />
then fused using a visibility-based depthmap fusion method, which also runs<br />
on the GPU. The fusion method removes outliers <strong>from</strong> the depthmaps, <strong>and</strong><br />
also combines depth estimates to enhance their accuracy. Because of the redundancy<br />
in video (each surface is imaged multiple times), fused depthmaps<br />
only need to be produced for a subset of the video frames. This reduces<br />
processing time as well as the 3D model size as discussed in Section 6.<br />
The planesweep stereo algorithm [36] computes a depthmap by testing<br />
a family of plane hypotheses, <strong>and</strong> for each depthmap pixel recording the<br />
distance to the plane with the best photoconsistency score. One view is<br />
designated as reference, <strong>and</strong> a depthmap is computed for that view. The<br />
remaining views are designated matching views. For each plane, all matching<br />
views are back-projected onto the plane, <strong>and</strong> then projected into the<br />
reference view. This mapping, known as a plane homography [84], can be<br />
performed very efficiently on the GPU. Once the matching views are warped,<br />
the photoconsistency score is computed. We use a sum of absolute differences<br />
(SAD) photoconsistency measure. To h<strong>and</strong>le occlusions, the matching views<br />
are divided into a left <strong>and</strong> right subset, <strong>and</strong> the best matching score of the two<br />
subsets is kept [85]. Before the difference is computed, the image intensities<br />
are multiplied by the relative exposure (gain) computed during KLT tracking<br />
(Section 4.1.1). The stereo algorithm can produce a 512 × 384 depthmap<br />
30