21.08.2013 Views

GPU Acceleration of Structure-from-Motion Pipeline

GPU Acceleration of Structure-from-Motion Pipeline

GPU Acceleration of Structure-from-Motion Pipeline

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

报告人:刘鑫


SFM系统及其时间复杂度分析<br />

利用<strong>GPU</strong>加速的解决方案<br />

特征点检测<br />

特征点匹配<br />

bundler<br />

已完成的工作及存在的问题<br />

进一步的工作


Bundler<br />

•特征点检测<br />

•特征点匹配<br />

•Bundle<br />

Adjustment<br />

•SBA<br />

PMVS<br />

•Matching<br />

•Expansion<br />

•Filtering<br />

PSR和纹理生成<br />

•Possion Surface<br />

Reconstruction<br />

•纹理生成


输入:图像序列<br />

输出:恢复相机参数(f,k1,k2)、相机位置估计<br />

(R,t),并得到离散的3D场景点信息(位置、<br />

颜色、在图像上可见信息)<br />

特征点检测<br />

• SIFT特征检<br />

测<br />

• SIFT描述子<br />

特征点匹配<br />

• ANN<br />

• 两两匹配<br />

Bundle<br />

Adjustment<br />

• SBA (Sparse<br />

Bundle<br />

Adjustment)<br />

Photo tourism: Exploring photo collections in 3D (ACM Transactions on Graphics 2006)<br />

Noah Snavely, Steven M. Seitz, Richard Szeliski.


Data 罗睺寺 洛阳体育<br />

中心<br />

黛螺顶 法雨寺 应县木塔<br />

Number <strong>of</strong> Pic. 105 237 240 290 349<br />

Total Running Time 5.5 16 35 38 3.2day<br />

Bundler Detecting


Feature Detecting<br />

寻找所有图像的特征点 O(n)<br />

Feature Matching<br />

获得任意两幅图像的匹配,作n 2 /2次match<br />

时间复杂度为 O(n 2 )<br />

BA<br />

O(mn(m+2n) 2 ) (m:feature,n:image)<br />

O(n 7 )<br />

Efficient Bundle Adjustment with Virtual Key Frames: A Hierarchical Approach to Multi-<br />

Frame <strong>Structure</strong> <strong>from</strong> <strong>Motion</strong> (CVPR 1999)<br />

H.-Y Shum, Q.Ke….


SBA<br />

观察:pic\捕获1.PNG pic\捕获2.PNG<br />

时间复杂度:O(n 4 )<br />

应为最耗时部分<br />

PMVS<br />

Scene Reconstruction and Visualization <strong>from</strong> Internet Photo Collections<br />

Keith. N Snavely


结论<br />

最耗时部分为Match和BA;<br />

目标<br />

对SFM系统加速<br />

用<strong>GPU</strong>对Match和BA模块加速<br />

保持重建的结果<br />

如何评价?Accuracy, Completeness (Coverage),<br />

Run Time (Size, Compactness) (Snavely)<br />

实验评估?(camera定位的精确性)


CPU: 8<br />

Memory: 32GB<br />

<strong>GPU</strong><br />

C1060: 240 cores, 4GB<br />

Qudro FX 570: 30 cores, 512M<br />

Compare to (Cloudless Day)<br />

4 CPU, 4 <strong>GPU</strong>, 48GB memory<br />

10 threads, one thread for each CPU and <strong>GPU</strong><br />

Building Rome on a Cloudless Day<br />

Jan-Michael Frahm…


目标<br />

加速Matching和BA模块,并行化改造<br />

思路<br />

向上:降低模块的时间复杂度。<br />

Skeletal Graphs (Canonical View)<br />

Out <strong>of</strong> Core BA<br />

向下:<br />

加快两两特征点匹配的速度。(<strong>GPU</strong>)<br />

加快SBA的速度。(<strong>GPU</strong>)


Detecting<br />

(down/all<br />

features)<br />

Matching<br />

(down<br />

features)<br />

Skeletal<br />

Graph<br />

Detailed<br />

Matching<br />

(all<br />

features)<br />

Out <strong>of</strong><br />

Core SBA<br />

SBA


Step 1 : Detecting<br />

<strong>GPU</strong> Sift Detecting<br />

All Features: 30000 左右<br />

Down Features: 5000 左右<br />

Step 2 : Matching<br />

<strong>GPU</strong> Matching.<br />

对所有图像进行两两匹配,O(n 2 ).<br />

利用down features.<br />

Step 3 : Skeletal Graph<br />

Canonical View


Step 4 : Detailed Matching<br />

<strong>GPU</strong> Matching.<br />

利用Skeletal Graph进行两两匹配.<br />

利用all features.<br />

Step 5 : Out <strong>of</strong> Core BA<br />

<strong>GPU</strong> SBA.<br />

Out <strong>of</strong> core, 分两层使用SBA。<br />

循环.<br />

Step 6 : SBA for whole Graph<br />

<strong>GPU</strong> SBA


Shape Descriptor<br />

HOG ?<br />

Gist features (cloudless day)<br />

Appearance Descriptor<br />

Feature Descriptors (SIFT)<br />

Combining efficient object localization and image classification (ICCV 09)<br />

Hedi Harzallah, Frederic Jurie, Cordelia Schmid


其他表示<br />

Feature Descriptors + Visual Words + Verification<br />

(in a day)<br />

Gist Features (small code) + Visual Words (cloudless<br />

day)<br />

SFM系统分析<br />

Not millions <strong>of</strong> Pictures, but thousands <strong>of</strong> Pictures.<br />

高分辨率图像。<br />

Down Features<br />

Down Sample/Random/Pyramid …


高分辨率图像特征点检测<br />

Sift<strong>GPU</strong> (cuda)<br />

效率<br />

找点方式 平均找点时间<br />

<strong>GPU</strong> Tesla C1060 1.795s<br />

<strong>GPU</strong> Quadro FM580 2.806s<br />

Tesla+Quadro 1.4~~1.6s<br />

1 CPU 80~~100s<br />

8 CPU 8~~9s


<strong>GPU</strong> Surf Descriptor (64)<br />

Matching Time; 4?<br />

SBA Time; >2<br />

Feature Descriptor<br />

All Features (30000)<br />

Down Features (5000)<br />

Down Sample/Random/Constraints/Pyramid….<br />

Thousands<br />

Total Time


Using Down Features<br />

作两两匹配<br />

时间复杂度分析 O(n 2 )<br />

<strong>GPU</strong> Matching Time<br />

4000*4000 0.07s<br />

Surf Descriptor<br />

Multi <strong>GPU</strong><br />

Total Time 1.5h (400)


目标:<br />

Caninocal View Set<br />

Submap Sets<br />

输入<br />

无向图<br />

节点是Image, 边是Maching的结果


自上而下<br />

Connected Dominating Set<br />

Greedy (最大生成树)<br />

Graph Cut<br />

K-means<br />

自下而上<br />

Clustering (CMVS)<br />

Agglomerative Cluster


Skeletal graphs for efficient structure <strong>from</strong><br />

motion (CVPR 08). Snavely.<br />

Spectral Partitioning for <strong>Structure</strong> <strong>from</strong> <strong>Motion</strong><br />

(ICCV 03). Drew Steedly.<br />

Towards Internet-scale Multi-view<br />

Stereo.(CVPR 10). Yasutaka Furukawa.<br />

<strong>Structure</strong> and <strong>Motion</strong> <strong>Pipeline</strong> on a hierarchical<br />

Cluster tree (). Michela Farenzena.<br />

……


Algorithm<br />

Step 1: Selecting Canonical View Set<br />

Step 2: Submap Set<br />

Step 3: Repeat 1 2<br />

时间复杂度分析<br />

Minutes


Using All Features。<br />

按照Skeletal Graph的边做两两匹配。<br />

Verification: 与上一步循环。<br />

时间复杂度分析 O(n)<br />

<strong>GPU</strong> Matching Time<br />

<strong>GPU</strong> KD-Tree<br />

KD Tree<br />

Total Time 2-3h


Step 1: Canonical View Set SBA<br />

如何加入image?<br />

Step 2: Submap Sets SBA<br />

固定Canonical View 参数,SBA;<br />

对整个Set 进行SBA;<br />

Step 3:<br />

Repeat Step 1 2 until convergence<br />

Out-<strong>of</strong>-Core Bundle Adjustment for Large-Scale 3D Reconstruction (ICCV 07)<br />

Kai Ni, Drew Steedlyy, and Frank Dellaert


<strong>GPU</strong> SBA<br />

ECCV2010, no code;<br />

对整个Graph做SBA<br />

时间复杂度分析<br />

可对每个Set并行化做SBA<br />

O(n 3 )??<br />

时间估计。4-5h??<br />

Practical Time Bundle Adjustment for 3D Reconstruction on the <strong>GPU</strong> (ECCV 2010)<br />

Siddharth Choudhary, Shubham Gupta, and P J Narayanan


Sift Detector O(n)<br />


Sift Feature Detector on <strong>GPU</strong><br />

由SIFT<strong>GPU</strong>改写,以适合大分辨率图像。<br />

Feature Match<br />

由SIFT<strong>GPU</strong>改写,以适合大分辨率图像。<br />

<strong>GPU</strong> KD-Tree Traversal (待改进)<br />

Pthread + <strong>GPU</strong><br />

Multi <strong>GPU</strong>s 控制类<br />

<strong>GPU</strong> Surf Descriptor (改写中)


SBA on <strong>GPU</strong><br />

ECCV2010, no code;<br />

Float or Double<br />

用Float 做循环,然后用Double做循环<br />

Fermi <strong>GPU</strong><br />

<strong>GPU</strong>加速<br />

SBA 内部函数


KD-Tree Travesal on <strong>GPU</strong><br />

每个block找一个点<br />

Surf Descriptor<br />

实验评估<br />

camera定位的精确性?<br />

CPU和<strong>GPU</strong>的并行<br />

Feature Detecting<br />

Visual Words


Skeletal Graph<br />

Canonical View set连通性<br />

Submap Set的大小<br />

Canonical View的意义与评估<br />

Scene Summarization for Online Image Collection.<br />

Inn Simmon, Noah Snavely…


进一步的并行化<br />

连接3台有<strong>GPU</strong>的机器,将任务分配<br />

MPI +Pthread + CUDA<br />

问题<br />

对PMVS的<strong>GPU</strong>加速<br />

Bundler并行化结束后,PMVS有可能成为影响速度的<br />

瓶颈。

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!