GPU Acceleration of Structure-from-Motion Pipeline
GPU Acceleration of Structure-from-Motion Pipeline
GPU Acceleration of Structure-from-Motion Pipeline
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
报告人:刘鑫
SFM系统及其时间复杂度分析<br />
利用<strong>GPU</strong>加速的解决方案<br />
特征点检测<br />
特征点匹配<br />
bundler<br />
已完成的工作及存在的问题<br />
进一步的工作
Bundler<br />
•特征点检测<br />
•特征点匹配<br />
•Bundle<br />
Adjustment<br />
•SBA<br />
PMVS<br />
•Matching<br />
•Expansion<br />
•Filtering<br />
PSR和纹理生成<br />
•Possion Surface<br />
Reconstruction<br />
•纹理生成
输入:图像序列<br />
输出:恢复相机参数(f,k1,k2)、相机位置估计<br />
(R,t),并得到离散的3D场景点信息(位置、<br />
颜色、在图像上可见信息)<br />
特征点检测<br />
• SIFT特征检<br />
测<br />
• SIFT描述子<br />
特征点匹配<br />
• ANN<br />
• 两两匹配<br />
Bundle<br />
Adjustment<br />
• SBA (Sparse<br />
Bundle<br />
Adjustment)<br />
Photo tourism: Exploring photo collections in 3D (ACM Transactions on Graphics 2006)<br />
Noah Snavely, Steven M. Seitz, Richard Szeliski.
Data 罗睺寺 洛阳体育<br />
中心<br />
黛螺顶 法雨寺 应县木塔<br />
Number <strong>of</strong> Pic. 105 237 240 290 349<br />
Total Running Time 5.5 16 35 38 3.2day<br />
Bundler Detecting
Feature Detecting<br />
寻找所有图像的特征点 O(n)<br />
Feature Matching<br />
获得任意两幅图像的匹配,作n 2 /2次match<br />
时间复杂度为 O(n 2 )<br />
BA<br />
O(mn(m+2n) 2 ) (m:feature,n:image)<br />
O(n 7 )<br />
Efficient Bundle Adjustment with Virtual Key Frames: A Hierarchical Approach to Multi-<br />
Frame <strong>Structure</strong> <strong>from</strong> <strong>Motion</strong> (CVPR 1999)<br />
H.-Y Shum, Q.Ke….
SBA<br />
观察:pic\捕获1.PNG pic\捕获2.PNG<br />
时间复杂度:O(n 4 )<br />
应为最耗时部分<br />
PMVS<br />
Scene Reconstruction and Visualization <strong>from</strong> Internet Photo Collections<br />
Keith. N Snavely
结论<br />
最耗时部分为Match和BA;<br />
目标<br />
对SFM系统加速<br />
用<strong>GPU</strong>对Match和BA模块加速<br />
保持重建的结果<br />
如何评价?Accuracy, Completeness (Coverage),<br />
Run Time (Size, Compactness) (Snavely)<br />
实验评估?(camera定位的精确性)
CPU: 8<br />
Memory: 32GB<br />
<strong>GPU</strong><br />
C1060: 240 cores, 4GB<br />
Qudro FX 570: 30 cores, 512M<br />
Compare to (Cloudless Day)<br />
4 CPU, 4 <strong>GPU</strong>, 48GB memory<br />
10 threads, one thread for each CPU and <strong>GPU</strong><br />
Building Rome on a Cloudless Day<br />
Jan-Michael Frahm…
目标<br />
加速Matching和BA模块,并行化改造<br />
思路<br />
向上:降低模块的时间复杂度。<br />
Skeletal Graphs (Canonical View)<br />
Out <strong>of</strong> Core BA<br />
向下:<br />
加快两两特征点匹配的速度。(<strong>GPU</strong>)<br />
加快SBA的速度。(<strong>GPU</strong>)
Detecting<br />
(down/all<br />
features)<br />
Matching<br />
(down<br />
features)<br />
Skeletal<br />
Graph<br />
Detailed<br />
Matching<br />
(all<br />
features)<br />
Out <strong>of</strong><br />
Core SBA<br />
SBA
Step 1 : Detecting<br />
<strong>GPU</strong> Sift Detecting<br />
All Features: 30000 左右<br />
Down Features: 5000 左右<br />
Step 2 : Matching<br />
<strong>GPU</strong> Matching.<br />
对所有图像进行两两匹配,O(n 2 ).<br />
利用down features.<br />
Step 3 : Skeletal Graph<br />
Canonical View
Step 4 : Detailed Matching<br />
<strong>GPU</strong> Matching.<br />
利用Skeletal Graph进行两两匹配.<br />
利用all features.<br />
Step 5 : Out <strong>of</strong> Core BA<br />
<strong>GPU</strong> SBA.<br />
Out <strong>of</strong> core, 分两层使用SBA。<br />
循环.<br />
Step 6 : SBA for whole Graph<br />
<strong>GPU</strong> SBA
Shape Descriptor<br />
HOG ?<br />
Gist features (cloudless day)<br />
Appearance Descriptor<br />
Feature Descriptors (SIFT)<br />
Combining efficient object localization and image classification (ICCV 09)<br />
Hedi Harzallah, Frederic Jurie, Cordelia Schmid
其他表示<br />
Feature Descriptors + Visual Words + Verification<br />
(in a day)<br />
Gist Features (small code) + Visual Words (cloudless<br />
day)<br />
SFM系统分析<br />
Not millions <strong>of</strong> Pictures, but thousands <strong>of</strong> Pictures.<br />
高分辨率图像。<br />
Down Features<br />
Down Sample/Random/Pyramid …
高分辨率图像特征点检测<br />
Sift<strong>GPU</strong> (cuda)<br />
效率<br />
找点方式 平均找点时间<br />
<strong>GPU</strong> Tesla C1060 1.795s<br />
<strong>GPU</strong> Quadro FM580 2.806s<br />
Tesla+Quadro 1.4~~1.6s<br />
1 CPU 80~~100s<br />
8 CPU 8~~9s
<strong>GPU</strong> Surf Descriptor (64)<br />
Matching Time; 4?<br />
SBA Time; >2<br />
Feature Descriptor<br />
All Features (30000)<br />
Down Features (5000)<br />
Down Sample/Random/Constraints/Pyramid….<br />
Thousands<br />
Total Time
Using Down Features<br />
作两两匹配<br />
时间复杂度分析 O(n 2 )<br />
<strong>GPU</strong> Matching Time<br />
4000*4000 0.07s<br />
Surf Descriptor<br />
Multi <strong>GPU</strong><br />
Total Time 1.5h (400)
目标:<br />
Caninocal View Set<br />
Submap Sets<br />
输入<br />
无向图<br />
节点是Image, 边是Maching的结果
自上而下<br />
Connected Dominating Set<br />
Greedy (最大生成树)<br />
Graph Cut<br />
K-means<br />
自下而上<br />
Clustering (CMVS)<br />
Agglomerative Cluster
Skeletal graphs for efficient structure <strong>from</strong><br />
motion (CVPR 08). Snavely.<br />
Spectral Partitioning for <strong>Structure</strong> <strong>from</strong> <strong>Motion</strong><br />
(ICCV 03). Drew Steedly.<br />
Towards Internet-scale Multi-view<br />
Stereo.(CVPR 10). Yasutaka Furukawa.<br />
<strong>Structure</strong> and <strong>Motion</strong> <strong>Pipeline</strong> on a hierarchical<br />
Cluster tree (). Michela Farenzena.<br />
……
Algorithm<br />
Step 1: Selecting Canonical View Set<br />
Step 2: Submap Set<br />
Step 3: Repeat 1 2<br />
时间复杂度分析<br />
Minutes
Using All Features。<br />
按照Skeletal Graph的边做两两匹配。<br />
Verification: 与上一步循环。<br />
时间复杂度分析 O(n)<br />
<strong>GPU</strong> Matching Time<br />
<strong>GPU</strong> KD-Tree<br />
KD Tree<br />
Total Time 2-3h
Step 1: Canonical View Set SBA<br />
如何加入image?<br />
Step 2: Submap Sets SBA<br />
固定Canonical View 参数,SBA;<br />
对整个Set 进行SBA;<br />
Step 3:<br />
Repeat Step 1 2 until convergence<br />
Out-<strong>of</strong>-Core Bundle Adjustment for Large-Scale 3D Reconstruction (ICCV 07)<br />
Kai Ni, Drew Steedlyy, and Frank Dellaert
<strong>GPU</strong> SBA<br />
ECCV2010, no code;<br />
对整个Graph做SBA<br />
时间复杂度分析<br />
可对每个Set并行化做SBA<br />
O(n 3 )??<br />
时间估计。4-5h??<br />
Practical Time Bundle Adjustment for 3D Reconstruction on the <strong>GPU</strong> (ECCV 2010)<br />
Siddharth Choudhary, Shubham Gupta, and P J Narayanan
Sift Detector O(n)<br />
Sift Feature Detector on <strong>GPU</strong><br />
由SIFT<strong>GPU</strong>改写,以适合大分辨率图像。<br />
Feature Match<br />
由SIFT<strong>GPU</strong>改写,以适合大分辨率图像。<br />
<strong>GPU</strong> KD-Tree Traversal (待改进)<br />
Pthread + <strong>GPU</strong><br />
Multi <strong>GPU</strong>s 控制类<br />
<strong>GPU</strong> Surf Descriptor (改写中)
SBA on <strong>GPU</strong><br />
ECCV2010, no code;<br />
Float or Double<br />
用Float 做循环,然后用Double做循环<br />
Fermi <strong>GPU</strong><br />
<strong>GPU</strong>加速<br />
SBA 内部函数
KD-Tree Travesal on <strong>GPU</strong><br />
每个block找一个点<br />
Surf Descriptor<br />
实验评估<br />
camera定位的精确性?<br />
CPU和<strong>GPU</strong>的并行<br />
Feature Detecting<br />
Visual Words
Skeletal Graph<br />
Canonical View set连通性<br />
Submap Set的大小<br />
Canonical View的意义与评估<br />
Scene Summarization for Online Image Collection.<br />
Inn Simmon, Noah Snavely…
进一步的并行化<br />
连接3台有<strong>GPU</strong>的机器,将任务分配<br />
MPI +Pthread + CUDA<br />
问题<br />
对PMVS的<strong>GPU</strong>加速<br />
Bundler并行化结束后,PMVS有可能成为影响速度的<br />
瓶颈。