18.07.2014 Views

Towards Wiki-based Dense City Modeling - Institute for Computer ...

Towards Wiki-based Dense City Modeling - Institute for Computer ...

Towards Wiki-based Dense City Modeling - Institute for Computer ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

atio of about 10:1). Consequently, view pairs with a large<br />

overlap and appropriate triangulation angles are preferable.<br />

Hence, we rank the views with sufficient overlap according<br />

to the following score:<br />

A H<br />

A median(ψ(α i)), (7)<br />

where α i is the angle between corresponding rays and ψ(·)<br />

is a unimodal weighting function with its maximum at α 0 .<br />

We choose ψ as<br />

ψ(α) = α 2 e −2α/α0 . (8)<br />

The two views with highest scores are taken as sensor<br />

views.<br />

<strong>Dense</strong> depth estimation depends heavily on the quality<br />

of the provided epipolar geometry. In order to reduce the<br />

influence of inaccuracies in the estimated poses on the depth<br />

maps, and to increase the per<strong>for</strong>mance, multi-view stereo<br />

is applied on downsampled images (512 × 384 pixels <strong>for</strong><br />

4 : 3 <strong>for</strong>mat digital images). Matching cost computation<br />

and scanline optimization <strong>for</strong> view triples (one key view and<br />

two sensor views) take about 3s on a GeForce 7800GS.<br />

The set of depth maps provides 2.5D geometry <strong>for</strong> each<br />

view. These depth maps are subsequently fused into a common<br />

3D model using a robust depth image integration approach<br />

[25]. This method results in globally optimal 3D<br />

models according to the employed energy functional, but it<br />

operates in batch mode. The incorporation of this step is<br />

the major reason <strong>for</strong> dense geometry generation to be an offline<br />

process. The result obtained by depth map fusion <strong>for</strong><br />

the dataset depicted in Figure 2 (augmented with per-vertex<br />

coloring) is shown in Figure 3.<br />

5. Results<br />

We tested our system on different image collections,<br />

varying from a few hundred to thousands of pictures. All<br />

images are calibrated with the method described in Section<br />

2. The calibration precision (i.e. the final mean reprojection<br />

error) ranges from 1/20 to 1/7 pixel. Our largest<br />

test set is a database containing about 7000 street-side images<br />

taken with four compact digital cameras from different<br />

manufacturers. The images are captured at different days<br />

under varying illumination conditions. Furthermore, the<br />

images are only partially ordered. The size of the source<br />

images varies from two to seven Megapixel. In order to remove<br />

compression artefacts, the supplied images are resampled<br />

to the half resolution <strong>for</strong> further processing. Adding<br />

an image to the view network takes approximately half a<br />

minute, most of the time is spend on exhaustive matching<br />

and bundle adjustment. Average run-times required <strong>for</strong> each<br />

step are listed in Table 1.<br />

In all our tests we use a pre-trained generic vocabulary<br />

tree of about 7 × 10 5 leaf nodes <strong>for</strong> searching the image<br />

Operation<br />

time [s]<br />

Undistortion (2272 × 1704 pixel) 1.0<br />

Feature extraction (2272 × 1704 pixel) 1.6<br />

Vocabulary scoring 0.2<br />

Brute <strong>for</strong>ce NN-matching (1500 × 1500) k × 0.35<br />

RANSAC (3000 samples Five-Point) k × 0.4<br />

Structure and Motion 0.50<br />

Bundle adjustment (100 views) 60<br />

<strong>Dense</strong> stereo 3.0<br />

Table 1. Typical average timings of our system on an AMD<br />

Athlon(tm) 64 X2 Dual Core Processor 4400+. The parameter<br />

k specifies the number of images taken <strong>for</strong> post-verification. Typically,<br />

we set k to 1% of the current database size.<br />

Figure 4. Vocabulary tree per<strong>for</strong>mance <strong>for</strong> image retrieval depending<br />

on the database size. The y-axis shows the probability to find<br />

an epipolar neighbor in the first k-ranked images reported by the<br />

vocabulary tree scoring.<br />

database. Our experiments suggest that the vocabulary tree<br />

approach <strong>for</strong> image retrieval is very efficient <strong>for</strong> large scale<br />

structure and motion computation. In Figure 4 the change<br />

of retrieval per<strong>for</strong>mance depending on the database size is<br />

shown. On average an image has an overlap with about<br />

eight other images in the database. However, there are<br />

also images with no geometric relation to existing database<br />

entries. The detailed distribution of epipolar neighbors is<br />

shown in Figure 5. The retrieval per<strong>for</strong>mance is measured<br />

by the rank of the first image that passes the geometric verification<br />

procedure. Note, even <strong>for</strong> a relative large database<br />

of about 7000 images, the first ranked image reported by<br />

the vocabulary tree is a valid epipolar neighbor of the query<br />

image with a probability of more than 90%. What we can<br />

observe also is that the scoring saturates rapidly, there<strong>for</strong>e<br />

employing the 1% from the vocabulary tree ranking is a<br />

good choice between speed and retrieval quality. Finally,<br />

two captured sites and the resulting sparse and dense models<br />

are shown in Figure 6 and Figure 7. The final reprojec-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!