21.08.2013 Views

Discriminative Learning of Local Image Descriptors

Discriminative Learning of Local Image Descriptors

Discriminative Learning of Local Image Descriptors

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Discriminative</strong> <strong>Learning</strong> <strong>of</strong><br />

<strong>Local</strong> <strong>Image</strong> <strong>Descriptors</strong><br />

Authors: Matthew Brown, Gang Hua,<br />

Simon Winder<br />

讲解人: 樊彬


作者信息<br />

文章信息<br />

提纲<br />

拟解决的问题与采用的思路<br />

本文的方法<br />

实验<br />

结论


Matthew Brown<br />

作者简介<br />

Postdoctoral Fellow at the Ecole Polytechnique Fédérale de Lausanne (EPFL)<br />

PhD in Computer Science (UBC, 2005)<br />

MEng in Electrical and Information Sciences (Cambridge, 2000)<br />

Famous for his work on automatic 2D image stitch<br />

http://cvlab.epfl.ch/~brown/research/research.html<br />

Gang Hua<br />

Senior Researcher in Nokia Research Center, Hollywood<br />

Scientist in Micros<strong>of</strong>t Live Labs Research from 2006 to 2009<br />

PhD in Electrical and Computer Engineering (Northwestern University, 2006)<br />

M.S and B.S in Electrical Engineering (Xi’an Jiaotong University, 2002,1999)<br />

http://www.eecs.northwestern.edu/~ganghua/<br />

Simon Winder<br />

Senior Developer in Micros<strong>of</strong>t Research<br />

http://research.micros<strong>of</strong>t.com/en-us/people/swinder/


作者信息<br />

文章信息<br />

提纲<br />

拟解决的问题与采用的思路<br />

本文的方法<br />

实验<br />

结论


文章出处<br />

文章信息<br />

PAMI 2010, to appear<br />

相关文献<br />

<strong>Learning</strong> <strong>Local</strong> <strong>Image</strong> <strong>Descriptors</strong>. S. Winder and M.<br />

Brown.(CVPR2007).<br />

Discriminant Embedding for <strong>Local</strong> <strong>Image</strong> <strong>Descriptors</strong>. G.<br />

Hua, M. Brown and S. Winder. (ICCV2007).<br />

Picking the Best DAISY. S. Winder, G. Hua and M. Brown.<br />

(CVPR09).


摘要<br />

A realistic ground truth dataset <strong>of</strong> matched patches<br />

based on multi-view stereo data<br />

Describe a set <strong>of</strong> building blocks for constructing<br />

descriptors<br />

Parametric learning for local image descriptors<br />

Non-Parametric learning for local image<br />

descriptors<br />

Dimensionality reduction<br />

Obtain descriptors that exceed the state-<strong>of</strong>-the-art<br />

performance with lower dimensionality


作者信息<br />

文章信息<br />

提纲<br />

拟解决的问题与采用的思路<br />

本文的方法<br />

实验<br />

结论


拟解决的问题与采用的思路<br />

一方面,尽管局部描述子在计算机视觉领域得到高度重视<br />

和广泛应用,但目前大部分局部描述子都是人为设计好的<br />

特征变换。<br />

另一方面,虽然基于学习的方法广泛用于高层视觉任务中,<br />

底层的视觉处理方法却很少用到基于学习的方法。<br />

本文提出了一种自动的基于学习的局部描述子设计方法,<br />

基于训练样本,通过线性判别分析和Powell最小化方法分<br />

别学得最优的非参数局部描述子和参数局部描述子。


作者信息<br />

文章信息<br />

提纲<br />

拟解决的问题与采用的思路<br />

本文的方法<br />

实验<br />

结论


The Framework<br />

本文的方法


G-block T-block<br />

G-block: Gaussian smoothing<br />

S-block/<br />

E-block<br />

N-block


G-block T-block<br />

G-block: Gaussian smoothing<br />

S-block/<br />

E-block<br />

N-block<br />

T-block: Non-linear transformation to each sample grid in<br />

smoothed patch


G-block T-block<br />

G-block: Gaussian smoothing<br />

S-block/<br />

E-block<br />

N-block<br />

“simple-cell”<br />

stage<br />

T-block: Non-linear transformation to each sample grid in<br />

smoothed patch


G-block T-block<br />

G-block: Gaussian smoothing.<br />

S-block/<br />

E-block<br />

N-block<br />

T-block: Non-linear transformation to each sample grid in<br />

smoothed patch.<br />

S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />

S-block uses parameterized pooling regions, E-block is<br />

non-parametric.


G-block T-block<br />

G-block: Gaussian smoothing.<br />

S-block/<br />

E-block<br />

N-block<br />

T-block: Non-linear transformation to each sample grid in<br />

smoothed patch.<br />

“complex-cell”<br />

operations<br />

S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />

S-block uses parameterized pooling regions, E-block is<br />

non-parametric.


G-block T-block<br />

G-block: Gaussian smoothing.<br />

S-block/<br />

E-block<br />

N-block<br />

T-block: Non-linear transformation to each sample grid in<br />

smoothed patch.<br />

S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />

S-block uses parameterized pooling regions, E-block is<br />

non-parametric.<br />

N-block: SIFT-style Normalization.


G-block T-block<br />

G-block: Gaussian smoothing.<br />

S-block/<br />

E-block<br />

N-block<br />

T-block: Non-linear transformation to each sample grid in<br />

smoothed patch.<br />

S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />

S-block uses parameterized pooling regions, E-block is<br />

non-parametric.<br />

N-block: SIFT-style Normalization.


G-block T-block<br />

Smoothed<br />

input patch<br />

I11 I12 I13 I14<br />

<br />

<br />

I I I I<br />

<br />

21 22 23 24 <br />

I31 I32 I33 I 34<br />

<br />

I I I I<br />

41 42 43 44 <br />

S-block/<br />

E-block<br />

An output grid, one<br />

k length vector per<br />

sample<br />

N-block<br />

f11 f12 f13 f14<br />

<br />

<br />

f f f f<br />

<br />

21 22 23 24 <br />

f31 f32 f33 f 34<br />

<br />

f f f f<br />

41 42 43 44


G-block T-block<br />

S-block/<br />

E-block<br />

N-block<br />

T1: Angle-quantized gradients. Magnitude is linearly assigned to two<br />

adjacent elements <strong>of</strong> orientation.<br />

T1a: 4 quantized directions; T1b: 8 quantized directions<br />

T2: Rectified gradients. Positive and negative separated gradient vector.<br />

T2a: { x x; x x; y y; y y}<br />

T2b: { x x; x x; y y; y y; x x; x x; y y; y y}


G-block T-block<br />

S-block/<br />

E-block<br />

T3: Steerable filters.<br />

T3g: 2 nd order, 4 orientations; T3h: 4 th order, 4 orientations;<br />

T3i: 2 nd order, 8 orientations; T3j: 4 th order, 8 orientations<br />

T4: DoG responses.<br />

D I( ) I( ), D I( ) I(<br />

)<br />

1 2 1 2 3 2<br />

T4: { D D ; D D ; D D ; D <br />

D }<br />

1 1 1 1 2 2 2 2<br />

N-block


G-block T-block<br />

S-block<br />

S-block/<br />

E-block<br />

N-block


G-block T-block<br />

E-block<br />

E1: PCA (Principal Component Analysis)<br />

E2: LPP (<strong>Local</strong> Preserving Projections)<br />

S-block/<br />

E-block<br />

E4: LDE (<strong>Local</strong> <strong>Discriminative</strong> Embedding)<br />

E6: GLDE (Generalized <strong>Local</strong> <strong>Discriminative</strong> Embedding)<br />

E3,E5,E7: orthogonality version <strong>of</strong> E2,E4,E6<br />

N-block


<strong>Learning</strong> Parametric <strong>Descriptors</strong><br />

Parameters: parameters <strong>of</strong> G,T,S,N-blocks<br />

Maximizing the area under the ROC curve.<br />

ROC: true positive rate VS. false positive rate<br />

Optimization method: Powell’s multidimensional direction<br />

set method


Input:<br />

<strong>Learning</strong> Non-Parametric<br />

<strong>Descriptors</strong> (E-block)<br />

S { x T ( p ), x T ( p ), l }<br />

i i j j ij<br />

Output: The optimized projections w.


Input:<br />

<strong>Learning</strong> Non-Parametric<br />

<strong>Descriptors</strong> (E-block)<br />

S { x T ( p ), x T ( p ), l }<br />

i i j j ij<br />

Output: The optimized projections w.<br />

E2: Minimizing the distance between the match pairs<br />

while keeping the overall variance <strong>of</strong> all vectors in the<br />

match pair set as big as possible in projection space.<br />

J ( w)<br />

<br />

<br />

1<br />

T<br />

( wx)<br />

1<br />

<br />

T<br />

2<br />

( w ( x ))<br />

l 1 i x<br />

<br />

j<br />

ij<br />

l<br />

ij<br />

i<br />

2


<strong>Learning</strong> Non-Parametric<br />

<strong>Descriptors</strong> (E-block)<br />

E4: Seeking the embedding space under which the<br />

distances between match pairs are minimized and the<br />

distances between non-matches pairs are maximized.<br />

J ( w)<br />

<br />

<br />

l<br />

ij<br />

0<br />

T<br />

( w ( x x ))<br />

2<br />

<br />

T<br />

2<br />

( w ( x ))<br />

l 1 i x<br />

<br />

j<br />

ij<br />

i j<br />

E6: Find projections that minimize the ratio <strong>of</strong> in-class<br />

variance for match pairs to the total data variance.<br />

( wx)<br />

T 2<br />

xiS i<br />

3( ) <br />

<br />

T<br />

2<br />

( w ( x ))<br />

l 1 i x<br />

<br />

j<br />

J w<br />

<br />

ij<br />

2


1<br />

2<br />

3<br />

<strong>Learning</strong> Non-Parametric<br />

0<br />

1<br />

<strong>Descriptors</strong> (E-block)<br />

A ( l ) x x<br />

T<br />

w Ai w<br />

i T<br />

J ( w)<br />

A ( x x )( x x )<br />

ij<br />

i<br />

B ( x x )( x x )<br />

l<br />

<br />

ij<br />

l<br />

S j<br />

<br />

<br />

A x x<br />

xS <br />

ij i<br />

T<br />

i<br />

i j i j<br />

T<br />

i i<br />

i j i j<br />

w Bw<br />

T<br />

T


1<br />

2<br />

3<br />

<strong>Learning</strong> Non-Parametric<br />

0<br />

1<br />

<strong>Descriptors</strong> (E-block)<br />

A ( l ) x x<br />

T<br />

w Ai w<br />

i T<br />

J ( w)<br />

A ( x x )( x x )<br />

ij<br />

i<br />

B ( x x )( x x )<br />

l<br />

<br />

ij<br />

l<br />

S j<br />

<br />

<br />

A x x<br />

xS <br />

ij i<br />

T<br />

i<br />

i j i j<br />

T<br />

i i<br />

i j i j<br />

w Bw<br />

T<br />

T<br />

A w <br />

Bw<br />

i


<strong>Learning</strong> Non-Parametric<br />

<strong>Descriptors</strong> (E-block)<br />

Orthogonality constraint on projections<br />

w , w , ,<br />

w 1 2 k1<br />

T<br />

w Ai w<br />

arg max w T<br />

w Bw<br />

T<br />

s. t. w w 0, j 1,2, ,<br />

k 1<br />

j


作者信息<br />

文章信息<br />

提纲<br />

拟解决的问题与采用的思路<br />

本文的方法<br />

实验<br />

结论


实验<br />

数据:大约250万标记好的匹配对和未匹配对,来自Yosemite,<br />

Notre Dame和Liberty三个自然场景的三维重建数据。


实验<br />

数据:大约250万标记好的匹配对和未匹配对,来自Yosemite,<br />

Notre Dame和Liberty三个自然场景的三维重建数据。<br />

评价方法<br />

ROC曲线: Correct Match Fraction VS. Incorrect Match Fraction<br />

95%错误率:在95%正确匹配被找到的条件下,不正确匹配数的<br />

比例


Parametric <strong>Descriptors</strong>


Parametric <strong>Descriptors</strong>


Parametric <strong>Descriptors</strong>


Parametric <strong>Descriptors</strong><br />

1、凹的形状<br />

2、离中心越远,累加的区域越大<br />

3、性能优于SIFT,但维数高


Non-Parametric <strong>Descriptors</strong>


Non-Parametric <strong>Descriptors</strong><br />

Trained on Yosemite, tested on Notre Dame


Non-Parametric <strong>Descriptors</strong>


Non-Parametric <strong>Descriptors</strong>


Dimension Reduced Parametric<br />

<strong>Descriptors</strong>


Dimension Reduced Parametric<br />

<strong>Descriptors</strong>


Dimension Reduced Parametric<br />

<strong>Descriptors</strong>


Dimension Reduced Parametric<br />

<strong>Descriptors</strong>


Effects <strong>of</strong> Normalization<br />

r/sqrt(D)


作者信息<br />

文章信息<br />

提纲<br />

拟解决的问题与采用的思路<br />

本文的方法<br />

实验<br />

结论


结论<br />

The techniques have been used in Photosynth and ICE<br />

(<strong>Image</strong> Compositing Editor)<br />

Photosynth: www.photosynth.com<br />

ICE: http://research.micros<strong>of</strong>t.com/ivm/ice.html


结论<br />

Recommendations by the authors


结论<br />

Recommendations by the authors<br />

1. <strong>Learning</strong> parameters from training data.


结论<br />

Recommendations by the authors<br />

1. <strong>Learning</strong> parameters from training data.<br />

2. Use foveated summation regions.


结论<br />

Recommendations by the authors<br />

1. <strong>Learning</strong> parameters from training data.<br />

2. Use foveated summation regions.<br />

3. Use non-linear filter responses.


结论<br />

Recommendations by the authors<br />

1. <strong>Learning</strong> parameters from training data.<br />

2. Use foveated summation regions.<br />

3. Use non-linear filter responses.<br />

4. Use LDA for discriminative dimension reductions.


结论<br />

Recommendations by the authors<br />

1. <strong>Learning</strong> parameters from training data.<br />

2. Use foveated summation regions.<br />

3. Use non-linear filter responses.<br />

4. Use LDA for discriminative dimension reductions.<br />

5. Normalization.


Thanks!<br />

Questions?

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!