Fast Robust Large-scale Mapping from Video and Internet Photo ...

More documents

Recommendations

Info

For every frame of video, a depthmap is generated using a real-time GPUbased multi-view planesweep stereo algorithm. The planesweep algorithm, comprised primarily of image warping operations, is highly efficient on the GPU [62]. Because it is multi-view, it is quite robust and can efficiently handle occlusions, a major problem in stereo. The stereo depthmaps are then fused using a visibility-based depthmap fusion method, which also runs on the GPU. The fusion method removes outliers from the depthmaps, and also combines depth estimates to enhance their accuracy. Because of the redundancy in video (each surface is imaged multiple times), fused depthmaps only need to be produced for a subset of the video frames. This reduces processing time as well as the 3D model size as discussed in Section 6. The planesweep stereo algorithm [36] computes a depthmap by testing a family of plane hypotheses, and for each depthmap pixel recording the distance to the plane with the best photoconsistency score. One view is designated as reference, and a depthmap is computed for that view. The remaining views are designated matching views. For each plane, all matching views are back-projected onto the plane, and then projected into the reference view. This mapping, known as a plane homography [84], can be performed very efficiently on the GPU. Once the matching views are warped, the photoconsistency score is computed. We use a sum of absolute differences (SAD) photoconsistency measure. To handle occlusions, the matching views are divided into a left and right subset, and the best matching score of the two subsets is kept [85]. Before the difference is computed, the image intensities are multiplied by the relative exposure (gain) computed during KLT tracking (Section 4.1.1). The stereo algorithm can produce a 512 × 384 depthmap 30
from 11 images (10 matching, 1 reference) and 48 plane hypotheses at a rate of 42 Hz on an Nvidia GTX 280. After depthmaps have been computed, a fused depthmap is computed for a reference view using the surrounding stereo depthmaps [86]. All points from the depthmaps are projected into the reference view, and for each pixel the point with the highest stereo confidence is chosen. Stereo confidence is computed based on the matching scores from the planesweep: C(x) = ∑ e −(SAD(x,π)−SAD(x,π 0)) 2 (1) π≠π 0 where x is the pixel in question, π is a plane from the planesweep, and π 0 is the best plane chosen by the planesweep. This confidence measure prefers depth estimates with low uncertainty, where the matching score is much lower than scores for all other planes. For each pixel, after the most confident point is chosen, it is scored according to mutual support, free-space violations, and occlusions. Supporting points are those that fall within 5% of the chosen point’s depth value. Occlusions are caused by points that would make the chosen point invisible in the reference view, and free-space violations are points that are made invisible in some other view by the chosen point. If the number of occlusions and free-space violations exceeds the number of support points, the chosen point is discarded and the depth value is marked as invalid. If the point survives, it is replaced with a new point computed as the average of the chosen point and support points. See Figure 6. Currently our dense geometry method is used only for video. The temporal sequence of the video makes it easy to select nearby views for stereo matching and depthmap fusion. For photo collections, a view selection strat- 31
Page 1 and 2: Fast Robust Large-scale Mapping fro
Page 3 and 4: Figure 1: The left shows an overvie
Page 5 and 6: use statistical models to combine d
Page 7 and 8: a probabilistic way and the final s
Page 9 and 10: streams of multiple cameras mounted
Page 11 and 12: In the case of available GPS data o
Page 13 and 14: 4.1. Camera Pose from Video Our sys
Page 15 and 16: complexity. Our recently proposed A
Page 17 and 18: surements we must normalize all of
Page 19 and 20: VIP-features [68]. To avoid a compu
Page 21 and 22: an active research topic in the com
Page 23 and 24: achieve high computational performa
Page 25 and 26: for each image in the cluster to re
Page 27 and 28: can be added to the 3D model. This
Page 29: Figure 5: Left: 3D reconstruction o
Page 33 and 34: Original Mesh Simplified Mesh Textu
Page 35 and 36: 7. Conclusions In this paper we pre
Page 37 and 38: [12] T. Berg, D. Forsyth, Animals o
Page 39 and 40: [30] Y. Jing, S. Baluja, H. Rowley,
Page 41 and 42: objects from multiple range images,
Page 43 and 44: [63] M. Fischler, R. Bolles, Random
Page 45 and 46: Marquardt algorithm, Tech. Rep. 340

Fast Robust Large-scale Mapping from Video and Internet Photo ...

Create successful ePaper yourself

Delete template?

Save as template?