25.12.2014 Views

From Contours to 3D Object Detection and Pose Estimation

From Contours to 3D Object Detection and Pose Estimation

From Contours to 3D Object Detection and Pose Estimation

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>From</strong> <strong>Con<strong>to</strong>urs</strong> <strong>to</strong> <strong>3D</strong> <strong>Object</strong><br />

<strong>Detection</strong> <strong>and</strong> <strong>Pose</strong> <strong>Estimation</strong><br />

Nadia Payet <strong>and</strong> Sinisa Todorovic<br />

Wednesday, November 30, 11<br />

1


Problem Statement<br />

Given a single image:<br />

1. Detect an object of interest<br />

2. Delineate its boundaries<br />

3. Estimate its continuous <strong>3D</strong> pose<br />

Wednesday, November 30, 11<br />

2


Prior Work<br />

Generative models<br />

e.g., aspect graphs<br />

Discriminative models<br />

e.g., structured prediction<br />

Koendrik & Doorn 79<br />

Kushal et al. 04<br />

Saverese & Fei-Fei 07-09<br />

Arie & Basri 09<br />

Hu & Zhu 10<br />

Hoiem et al. 07<br />

Su et al. ICCV 09<br />

Ozuysal et al. 09<br />

Liebelt & Schmid 08-10<br />

Gu & Ren 10<br />

Main characteristics of recent work:<br />

• Local image features<br />

• Sophisticated models<br />

• <strong>3D</strong> pose = Interpolation of viewpoint classes<br />

Wednesday, November 30, 11<br />

3


To Bridge the Semantic Gap...<br />

Recent work, typically<br />

semantic level<br />

model<br />

gap<br />

local features<br />

pixels<br />

Wednesday, November 30, 11<br />

4


To Bridge the Semantic Gap...<br />

Recent work, typically<br />

Our approach<br />

semantic level<br />

semantic level<br />

model<br />

model<br />

gap<br />

mid-level<br />

features<br />

local features<br />

con<strong>to</strong>urs<br />

Prior work:<br />

Lowe & Binford 85<br />

Cyr & Kimia 04<br />

pixels<br />

pixels<br />

Wednesday, November 30, 11<br />

5


To Bridge the Semantic Gap...<br />

Recent work, typically<br />

Our approach<br />

semantic level<br />

semantic level<br />

model<br />

model<br />

gap<br />

mid-level<br />

features<br />

local features<br />

con<strong>to</strong>urs<br />

pixels<br />

pixels<br />

Wednesday, November 30, 11<br />

6


To Bridge the Semantic Gap...<br />

Recent work, typically<br />

Our approach<br />

semantic level<br />

semantic level<br />

model<br />

model<br />

gap<br />

BoBs<br />

Prior work:<br />

Zhu et al. 08<br />

Zhang et al. 11<br />

con<strong>to</strong>urs<br />

local features<br />

pixels<br />

pixels<br />

Wednesday, November 30, 11<br />

7


Bags of Boundaries = BoBs<br />

If an object occurs,<br />

it must be in the spotlight of many BoBs<br />

jointly supporting the occurrence hypothesis<br />

Wednesday, November 30, 11<br />

8


Bags of Boundaries = BoBs<br />

shape context<br />

latent indica<strong>to</strong>r<br />

of boundaries<br />

his<strong>to</strong>gram of<br />

s =<br />

boundaries<br />

# bins<br />

⇥<br />

# con<strong>to</strong>urs<br />

# con<strong>to</strong>urs<br />

Zhu et al. 08, Zhang et al. 11<br />

Wednesday, November 30, 11<br />

9


Bags of Boundaries vs. Bags-of-Words<br />

BoBs<br />

BoWs<br />

His<strong>to</strong>gram of<br />

His<strong>to</strong>gram of<br />

hidden features<br />

observable features<br />

that must be inferred<br />

Wednesday, November 30, 11<br />

10


Approach<br />

input<br />

con<strong>to</strong>ur<br />

extraction<br />

Zhu et al. ICCV07<br />

Wednesday, November 30, 11<br />

11


Approach<br />

input<br />

con<strong>to</strong>ur<br />

extraction<br />

grid of<br />

BoBs<br />

Wednesday, November 30, 11<br />

12


Approach<br />

input<br />

con<strong>to</strong>ur<br />

extraction<br />

object<br />

model<br />

grid of<br />

BoBs<br />

Wednesday, November 30, 11<br />

13


Approach<br />

input<br />

con<strong>to</strong>ur<br />

extraction<br />

object<br />

model<br />

grid of<br />

BoBs<br />

estimate of<br />

<strong>3D</strong> pose<br />

Wednesday, November 30, 11<br />

14


Approach<br />

input<br />

selected<br />

boundaries<br />

object<br />

model<br />

grid<br />

warping<br />

estimate of<br />

<strong>3D</strong> pose<br />

Wednesday, November 30, 11<br />

15


Approach<br />

input<br />

output<br />

object<br />

model<br />

Wednesday, November 30, 11<br />

16


<strong>Object</strong> Model = Shape Templates<br />

2D probabilistic maps of shape<br />

for a set of viewpoints<br />

Wednesday, November 30, 11<br />

17


Learning<br />

view 1 view 2 view 3 ... view n<br />

image 1<br />

...<br />

image m<br />

Table <strong>to</strong>p dataset<br />

Sun et al. 10<br />

Wednesday, November 30, 11<br />

18


Example Shape Templates<br />

AUTOCAD dataset<br />

Liebelt & Schmid 08-10<br />

Wednesday, November 30, 11<br />

19


Representation of the Shape Template<br />

Regular grid of shape-context descrip<strong>to</strong>rs<br />

+<br />

Affine projection matrix T<br />

Wednesday, November 30, 11<br />

20


Inference = Matching of BoBs<br />

Wednesday, November 30, 11<br />

21


Inference = Matching of BoBs<br />

template 1 template 2 ... template n<br />

Wednesday, November 30, 11<br />

22


Inference = Matching of BoBs<br />

under an arbitrary affine projection<br />

Wednesday, November 30, 11<br />

23


Example Problem: <strong>Object</strong> Recognition<br />

Given a set of edges in the image<br />

detect <strong>and</strong> localize all object instances<br />

<strong>and</strong> estimate their <strong>3D</strong> pose<br />

Payet & Todorovic ICCV11<br />

Wednesday, November 30, 11<br />

24


Matching Formulation<br />

min<br />

X,F,T<br />

min<br />

X,F,T<br />

tr C T (X)F + ||TQF T P ||<br />

tr+⇥||(TQF C T (X)F T + P ) ||TQF (TQF T T P || P )W T ||<br />

+⇥||(TQF T P ) (TQF T P )W T ||<br />

Wednesday, November 30, 11<br />

25


Matching Formulation<br />

min<br />

X,F,T<br />

min<br />

X,F,T<br />

tr C T (X)F + ||TQF T P ||<br />

tr+⇥||(TQF C T (X)F T + P ) ||TQF (TQF T T P || P )W T ||<br />

+⇥||(TQF T P ) (TQF T P )W T ||<br />

s.t. X [0, 1] N ; T T ;<br />

Wednesday, November 30, 11<br />

26


Matching Formulation<br />

min<br />

X,F,T<br />

min<br />

X,F,T<br />

tr C T (X)F + ||TQF T P ||<br />

tr+⇥||(TQF C T (X)F T + P ) ||TQF (TQF T T P || P )W T ||<br />

+⇥||(TQF T P ) (TQF T P )W T ||<br />

s.t. X F [0, 0; 1] F N T ; 1T N = T 1;<br />

M ; F 1 M apple 1 N<br />

27<br />

Wednesday, November 30, 11


Matching Formulation<br />

min<br />

X,F,T<br />

min<br />

X,F,T<br />

tr C T (X)F + ||TQF T P ||<br />

tr+⇥||(TQF C T (X)F T + P ) ||TQF (TQF T T P || P )W T ||<br />

+⇥||(TQF T P ) (TQF T P )W T ||<br />

s.t. X F [0, 0; 1] F N T ; 1T N = T 1;<br />

M ; F 1 M apple 1 N<br />

28<br />

Wednesday, November 30, 11


Results: <strong>Object</strong> <strong>Detection</strong><br />

PASCAL VOC 2006<br />

Car show dataset<br />

car dataset<br />

Wednesday, November 30, 11<br />

29


Results: Viewpoint Classification<br />

<strong>3D</strong>#<strong>Object</strong>#dataset:#Cars##<br />

Wednesday, November 30, 11<br />

30


Results: <strong>3D</strong> <strong>Pose</strong> <strong>Estimation</strong><br />

Correct detection, localization, <strong>and</strong> pose estimation<br />

Wednesday, November 30, 11<br />

31


Results: <strong>3D</strong> <strong>Pose</strong> <strong>Estimation</strong><br />

Correct detection, localization, <strong>and</strong> pose estimation<br />

Wednesday, November 30, 11<br />

32


Conclusion<br />

• Recent work:<br />

• Pre-selected local features<br />

• Sophisticated object models <strong>and</strong> algorithms<br />

• Our approach:<br />

• Mid-level features allow for:<br />

• Abstracting low-level features<br />

• Synergistic bot<strong>to</strong>m-up/<strong>to</strong>p-down interaction<br />

• Simple models <strong>and</strong> algorithms<br />

Wednesday, November 30, 11<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!