Friday, 25 November 2011

ICCV 2011 Tutorial on

Parts Based Deformable Registration

Simon Lucey and Jason Saragih

ICT Centre Computer Vision (CI2CV)

Commonwealth Scientific and Industrial Research Organisation

Australia

Friday, 25 November 2011

Face

What is an Object?

Body

Friday, 25 November 2011

Face

What is an Object?

Body

Friday, 25 November 2011

Right Eye

Face

What is an Object?

Left Eye

Nose Mouth

Right Upper

Arm

Torso

Right

Thigh

Right

Calf

Right Lower Arm Head

Body

Left Upper

Arm

Left Lower

Arm

Left Thigh

Left Calf

Friday, 25 November 2011

Face

What is a Part?

Left Eye

Not Left Eye

Left Eye

When is a Collection of Parts an Object?

Friday, 25 November 2011

When is a Collection of Parts an Object?

Friday, 25 November 2011

Face

When is a Collection of Parts an Object?

Friday, 25 November 2011

Face Face

When is a Collection of Parts an Object?

Friday, 25 November 2011

Face Face Not a Face

What is an Object?

“A collection of semantically meaningful

components with geometrical constraints on

Friday, 25 November 2011

their spatial configuration’’

Data

Friday, 25 November 2011

Registration

Friday, 25 November 2011

Registration

This is an image of a body.

Friday, 25 November 2011

Registration

This is an image of a body. Show me the...

Friday, 25 November 2011

Registration

This is an image of a body.

Right Upper

Arm?

Torso?

Right

Thigh?

Right

Calf?

Right Lower Arm? Head?

Show me the...

Left Upper

Arm?

Left Lower

Arm?

Left Thigh?

Left Calf?

Friday, 25 November 2011

Applications

Puppetry Gesture Controller Analysis

Friday, 25 November 2011

Applications

Puppetry Gesture Controller Analysis

Friday, 25 November 2011

Best Method is Domain Specific

No Silver Bullet

Probability Theory

Friday, 25 November 2011

Graphical Models

Machine Learning

Registration

Optimisation

Signal Processing

Computational Geometry

Friday, 25 November 2011

Regularising Part Geometry

Regulating Geometry: Outline

1. Heuristic Regularisation

2. Learned Regularisation

1. Part Topology (Graphical Models)

2. Instance Topology

1.Linear Models (PCA)

2.Mixture Models (Clustering & EM)

3.Non-parametric Models (KPCA & GPLDM)

3. Image-specific Regularisation (PRA)

Friday, 25 November 2011

Friday, 25 November 2011

Why Regularise?

Friday, 25 November 2011

Appearance Variability

Why Regularise?

Friday, 25 November 2011

Appearance Variability

Why Regularise?

[7] Gross et al.ʼ08

Friday, 25 November 2011

Appearance Variability

Why Regularise?

False Negatives

[7] Gross et al.ʼ08

Friday, 25 November 2011

Why Regularise?

Friday, 25 November 2011

Appearance Ambiguity

Why Regularise?

Friday, 25 November 2011

Appearance Ambiguity

?

Why Regularise?

?

[7] Gross et al.ʼ08

Friday, 25 November 2011

?

Why Regularise?

Appearance Ambiguity False Positives

?

[7] Gross et al.ʼ08

Head

Right Shoulder

Friday, 25 November 2011

Why Regularise?

[1] Felzenszwalb et al.’09

Left Lip

Corner

Right Eye

Corner

Friday, 25 November 2011

What is a Regulariser?

Friday, 25 November 2011

What is a Regulariser?

p(x|I) ∝ p(I|x)p(x)

Friday, 25 November 2011

What is a Regulariser?

p(x|I) ∝ p(I|x)p(x)

Likelihood

Friday, 25 November 2011

What is a Regulariser?

p(x|I) ∝ p(I|x)p(x)

Likelihood Prior

max

x

Friday, 25 November 2011

p(I|x) p(x)

What is a Regulariser?

p(x|I) ∝ p(I|x)p(x)

Likelihood Prior

max

x

Friday, 25 November 2011

What is a Regulariser?

p(x|I) ∝ p(I|x)p(x)

p(I|x) p(x) min

x

Likelihood Prior

− log {p(I|x) p(x)}

max

x

Friday, 25 November 2011

p(I|x) p(x)

What is a Regulariser?

p(x|I) ∝ p(I|x)p(x)

Likelihood Prior

=

min

x

min

x

− log {p(I|x) p(x)}

− log {p(I|x)} − log {p(x)}

What is a Regulariser?

Face Face Not a Face

Friday, 25 November 2011

What is a Regulariser?

Face Face Not a Face

Friday, 25 November 2011

Regularisation helps disambiguate

candidates locations of parts by

enforcing geometric dependencies

between parts!

What is a Regulariser?

Face Face Not a Face

Friday, 25 November 2011

p(I|x1)

p(I|x2)

Regularisation helps disambiguate

candidates locations of parts by

enforcing geometric dependencies

between parts!

θ

p(x|I)

θ

θ

[22] Saragih et al.ʼ10

What is a Regulariser?

Face Face Not a Face

Friday, 25 November 2011

p(I|x1)

p(I|x2)

Regularisation helps disambiguate

candidates locations of parts by

enforcing geometric dependencies

between parts!

θ

p(x|I)

θ

θ

[22] Saragih et al.ʼ10

What is a Regulariser?

Face Face Not a Face

Friday, 25 November 2011

p(I|x1)

p(I|x2)

Regularisation helps disambiguate

candidates locations of parts by

enforcing geometric dependencies

between parts!

θ

p(x|I)

θ

θ

[22] Saragih et al.ʼ10

Friday, 25 November 2011

Heuristic Regularisation

Friday, 25 November 2011

Heuristic Regularisation

Friday, 25 November 2011

Heuristic Regularisation

Friday, 25 November 2011

Heuristic Regularisation

Friday, 25 November 2011

Laplacian Regularisation

x1 x2 x3 x4 x5 x6 x7 x8 x9

x =[x1; ...; xn]

x

y

z

Friday, 25 November 2011

Smooth Deformation Basis

~Frequency

Ψ0 Ψ1 Ψ2 Ψ3 Ψ25 Ψ50

x

y

z

Friday, 25 November 2011

Smooth Deformation Basis

~Frequency

Ψ0 Ψ1 Ψ2 Ψ3 Ψ25 Ψ50

Heuristic Regularisation: Recap

- Regularisation is important because image measurements are not enough

- Priors model the space of valid instances of object’s geometry

- Regularisers penalise object geometry outside the space of valid instances.

- The smoothness the assumption is a good heuristic

- Laplacian regularisers enforce smoothness by penalising high frequency

variations more heavily than lower frequency variations

- The concept of frequency that is penalised can be specialised

to the topology of the object though defining specialised graphlaplacian

- But... is that the best we can do?

Friday, 25 November 2011

Prior Regulariser

Friday, 25 November 2011

Data Driven (Learned) Regularisers

What if we have annotated data?

[3] Huang et al.ʼ07

Friday, 25 November 2011

Topology of Samples vs. Parts

Samples

Parts

Data

Friday, 25 November 2011

Types of Graphs

Maximally Connected

R(x) =R(x1,...,xn)

Friday, 25 November 2011

Types of Graphs

Friday, 25 November 2011

Face

Body

Hand

Optimal Sparse Graphs

Mean Procrustes Correlation Sparse Graph

[24] Gu et al.’07

Friday, 25 November 2011

Maximally

Connected

Star

Unconnected

Sparse

Optimal Sparse Graphs

[24] Gu et al.’07

Friday, 25 November 2011

Tree-Structured Graphs

Friday, 25 November 2011

Tree-Structured Graphs

Friday, 25 November 2011

Tree-Structured Graphs

Forwards

Backwards

Tree

Friday, 25 November 2011

Tree-Structured Graphs

Star Spanning

Forwards

Backwards

Graph Potentials

ψij(xi, xj) : encodes knowledge of distribution of relative spatial location of parts

Friday, 25 November 2011

Spring Models:

Graph Potentials

ψij(xi, xj) : encodes knowledge of distribution of relative spatial location of parts

Friday, 25 November 2011

[1] Felzenszwalb et al.’09 [25] Yang and Ramanan’11

Spring Models:

Graph Potentials

ψij(xi, xj) : encodes knowledge of distribution of relative spatial location of parts

Friday, 25 November 2011

[1] Felzenszwalb et al.’09 [25] Yang and Ramanan’11

ψij(xi, xj) =w T ij [dx; d 2 x; dy; d 2 y] [dx; dy] =xi − xj

Spring Models:

Graph Potentials

ψij(xi, xj) : encodes knowledge of distribution of relative spatial location of parts

Friday, 25 November 2011

[1] Felzenszwalb et al.’09 [25] Yang and Ramanan’11

ψij(xi, xj) =w T ij [dx; d 2 x; dy; d 2 y] [dx; dy] =xi − xj

=([xi; xj] − µij) T Σ −1

ij ([xi; xj] − µij)

Spring Models:

Graph Potentials

ψij(xi, xj) : encodes knowledge of distribution of relative spatial location of parts

Friday, 25 November 2011

[1] Felzenszwalb et al.’09 [25] Yang and Ramanan’11

ψij(xi, xj) =w T ij [dx; d 2 x; dy; d 2 y] [dx; dy] =xi − xj

Sparse Gaussian Prior:

=([xi; xj] − µij) T Σ −1

ij ([xi; xj] − µij)

R(x) =(x − µ) T Σ −1 (x − µ)

Spring Models:

Graph Potentials

ψij(xi, xj) : encodes knowledge of distribution of relative spatial location of parts

Friday, 25 November 2011

[1] Felzenszwalb et al.’09 [25] Yang and Ramanan’11

ψij(xi, xj) =w T ij [dx; d 2 x; dy; d 2 y] [dx; dy] =xi − xj

Sparse Gaussian Prior:

=([xi; xj] − µij) T Σ −1

ij ([xi; xj] − µij)

R(x) =(x − µ) T Σ −1 (x − µ)

Σ −1

Spring Models:

Graph Potentials

ψij(xi, xj) : encodes knowledge of distribution of relative spatial location of parts

Friday, 25 November 2011

[1] Felzenszwalb et al.’09 [25] Yang and Ramanan’11

ψij(xi, xj) =w T ij [dx; d 2 x; dy; d 2 y] [dx; dy] =xi − xj

Sparse Gaussian Prior:

=([xi; xj] − µij) T Σ −1

ij ([xi; xj] − µij)

R(x) =(x − µ) T Σ −1 (x − µ)

Σ −1

Sparsity

structure

follows a tree

topology!

x =[x1; ...; xn; y1; ...; yn]

Friday, 25 November 2011

Outline

x =[x1; ...; xn; y1; ...; yn]

Friday, 25 November 2011

Outline

[4] Matthews et al.ʼ07

Outline

x =[x1; ...; xn; y1; ...; yn] Procrustes

Friday, 25 November 2011

[4] Matthews et al.ʼ07

Outline

x =[x1; ...; xn; y1; ...; yn] Procrustes

Friday, 25 November 2011

Learning

[4] Matthews et al.ʼ07

•

•

•

•

•

Friday, 25 November 2011

Procrustes Analysis

Geometry of an object in an image is the composition of rigid

and non-rigid motion

Datasets are often contrived

An object can appear anywhere in the image

Should we place a prior over the rigid component of an object?

Procrustes analysis: remove rigid component before modelling!

•

•

•

•

•

Friday, 25 November 2011

Procrustes Analysis

Geometry of an object in an image is the composition of rigid

and non-rigid motion

Datasets are often contrived

An object can appear anywhere in the image

Should we place a prior over the rigid component of an object?

Procrustes analysis: remove rigid component before modelling!

[6] Gowerʼ04

Friday, 25 November 2011

Learning a Regulariser

•

Friday, 25 November 2011

Learning a Regulariser

How do we learn a prior (i.e. a regulariser)

over part geometry?

MultiPIE

CMU MoCap

Friday, 25 November 2011

Principal Component Analysis (PCA)

Mean First 3 Modes of Variation

µ Ψ1 Ψ2 Ψ3

Friday, 25 November 2011

PCA Regularisation

Friday, 25 November 2011

R(x) =(x − µ) T Σ −1 (x − µ)

PCA Regularisation

Friday, 25 November 2011

PCA Regularisation

R(x) =(x − µ) T Σ −1 (x − µ) = ˜x T ΨΛ −1 Ψ T ˜x

Friday, 25 November 2011

Leveraging 3D Geometry

Leveraging 3D Geometry

- Most objects we’re interested in exist in the 3D world!

Friday, 25 November 2011

Leveraging 3D Geometry

- Most objects we’re interested in exist in the 3D world!

- 2D vs. 3D Models: 2D models require up to 6 times as many basis!

[18] Matthews et al.ʼ06

Friday, 25 November 2011

Leveraging 3D Geometry

- Most objects we’re interested in exist in the 3D world!

- 2D vs. 3D Models: 2D models require up to 6 times as many basis!

[18] Matthews et al.ʼ06

- Use 3D model to generate better 2D models!

[19] Igual and De la Torreʼ10

Friday, 25 November 2011

Leveraging 3D Geometry

- Most objects we’re interested in exist in the 3D world!

- 2D vs. 3D Models: 2D models require up to 6 times as many basis!

[18] Matthews et al.ʼ06

- Use 3D model to generate better 2D models!

[19] Igual and De la Torreʼ10

- Given a set of images, how do we build a 3D model?

Friday, 25 November 2011

Leveraging 3D Geometry

- Most objects we’re interested in exist in the 3D world!

- 2D vs. 3D Models: 2D models require up to 6 times as many basis!

[18] Matthews et al.ʼ06

- Use 3D model to generate better 2D models!

[19] Igual and De la Torreʼ10

- Given a set of images, how do we build a 3D model?

Friday, 25 November 2011

Pinhole Camera

Leveraging 3D Geometry

- Most objects we’re interested in exist in the 3D world!

- 2D vs. 3D Models: 2D models require up to 6 times as many basis!

[18] Matthews et al.ʼ06

- Use 3D model to generate better 2D models!

[19] Igual and De la Torreʼ10

- Given a set of images, how do we build a 3D model? Structure From Motion (SFM)

Friday, 25 November 2011

Pinhole Camera

Friday, 25 November 2011

2D Basis from 3D Models

[19] Igual and De la Torreʼ10

Friday, 25 November 2011

2D Basis from 3D Models

[19] Igual and De la Torreʼ10

MultiPIE

[7] Gross et al.ʼ08

Friday, 25 November 2011

2D Basis from 3D Models

[19] Igual and De la Torreʼ10

MultiPIE

[7] Gross et al.ʼ08

Friday, 25 November 2011

2D Basis from 3D Models

[19] Igual and De la Torreʼ10

MultiPIE

[7] Gross et al.ʼ08

[18] Matthews et al.ʼ06

Friday, 25 November 2011

Linear Models: Recap

How Good is a Linear Parameterisation?

Friday, 25 November 2011

Projections onto first 3 Modes (3000 faces)

How Good is a Linear Parameterisation?

Friday, 25 November 2011

Projections onto first 3 Modes (3000 faces)

Monsters!

Friday, 25 November 2011

Clustering

Friday, 25 November 2011

Clustering

Friday, 25 November 2011

Clustering

Friday, 25 November 2011

Clustering

Friday, 25 November 2011

Spectral Clustering

Friday, 25 November 2011

Spectral Clustering

Cluster in a space that respects local topology!

-Topology = Graph Connectivity

Friday, 25 November 2011

Spectral Clustering

Cluster in a space that respects local topology!

-Topology = Graph Connectivity

Friday, 25 November 2011

Spectral Clustering

Cluster in a space that respects local topology!

= Graph Laplacian

-Topology = Graph Connectivity

Lij =

Friday, 25 November 2011

Spectral Clustering

Cluster in a space that respects local topology!

= Graph Laplacian

⎧

⎪⎨ |Ni| if i = j

−1

⎪⎩

0

if j ∈ Ni

otherwise

-Topology = Graph Connectivity

Lij =

Friday, 25 November 2011

Spectral Clustering

Cluster in a space that respects local topology!

= Graph Laplacian

⎧

⎪⎨ |Ni| if i = j

−1

⎪⎩

0

if j ∈ Ni

otherwise

Smallest eigenvectors

maximally preserve

topology!

-Topology = Graph Connectivity

Lij =

Friday, 25 November 2011

Spectral Clustering

Cluster in a space that respects local topology!

= Graph Laplacian

⎧

⎪⎨ |Ni| if i = j

−1

⎪⎩

0

if j ∈ Ni

otherwise

The Laplacian is

representation

agnostic!

Smallest eigenvectors

maximally preserve

topology!

Friday, 25 November 2011

Spectral Clustering

[9] Ng’01

Friday, 25 November 2011

Spectral Clustering

Euclidian K-means Spectral Coeffs Spectral Clustering

[9] Ng’01

Spectral Clustering: Why Does it Work?

Friday, 25 November 2011

Spectral Clustering: Why Does it Work?

A toy example:

x1 x2

x3

Friday, 25 November 2011

x4

x6

x5

Friday, 25 November 2011

Mixture Model Regularisation

Friday, 25 November 2011

Mixture Model Regularisation

Friday, 25 November 2011

Mixture Model Regularisation

PDF

Mixture Model

Friday, 25 November 2011

Clustering: Recap

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

Friday, 25 November 2011

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

- Clustering: divide data into regions that can be modelled linearly

Friday, 25 November 2011

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

- Clustering: divide data into regions that can be modelled linearly

- Clustering is NP-hard -> Optimise using alternation (K-means)

Friday, 25 November 2011

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

- Clustering: divide data into regions that can be modelled linearly

- Clustering is NP-hard -> Optimise using alternation (K-means)

- Euclidian K-means only works for isotropic clusters in flat space

Friday, 25 November 2011

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

- Clustering: divide data into regions that can be modelled linearly

- Clustering is NP-hard -> Optimise using alternation (K-means)

- Euclidian K-means only works for isotropic clusters in flat space

- Spectral clustering better accounts for local topology

Friday, 25 November 2011

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

- Clustering: divide data into regions that can be modelled linearly

- Clustering is NP-hard -> Optimise using alternation (K-means)

- Euclidian K-means only works for isotropic clusters in flat space

- Spectral clustering better accounts for local topology

- Jointly modelling a prior over clustered data: EM algorithm

Friday, 25 November 2011

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

- Clustering: divide data into regions that can be modelled linearly

- Clustering is NP-hard -> Optimise using alternation (K-means)

- Euclidian K-means only works for isotropic clusters in flat space

- Spectral clustering better accounts for local topology

- Jointly modelling a prior over clustered data: EM algorithm

- But... does the data naturally form clusters?

Friday, 25 November 2011

Clustering: Recap

- Linear (Gaussian) models may not be sufficiently descriptive of data.

- Clustering: divide data into regions that can be modelled linearly

- Clustering is NP-hard -> Optimise using alternation (K-means)

- Euclidian K-means only works for isotropic clusters in flat space

- Spectral clustering better accounts for local topology

- Jointly modelling a prior over clustered data: EM algorithm

- But... does the data naturally form clusters?

Friday, 25 November 2011

[15] Urtasunʼ06

Friday, 25 November 2011

Non-parametric Models

Can we capture nonlinearity without clustering?

Friday, 25 November 2011

Non-parametric Models

Can we capture nonlinearity without clustering?

Kernel-PCA (KPCA): Model a linear subspace in a non-linear feature space

Friday, 25 November 2011

KPCA

Data Driven Regularisation: Recap

•

•

•

Friday, 25 November 2011

With enough data we can build highly sophisticated models to

capture spatial relationships between parts

The efficacy of a regulariser depends on how easy it is to

optimise

Leading methods for face and body alignment use simple

(Gaussian) priors.

Power Tractability

Friday, 25 November 2011

Holistic Registration

Why do we need regularisation again?

Friday, 25 November 2011

Holistic Registration

Why do we need regularisation again?

?

Friday, 25 November 2011

Holistic Registration

Why do we need regularisation again?

?

Why not learn how the whole object looks?

Friday, 25 November 2011

A Machine Learning Perspective

Friday, 25 November 2011

A Machine Learning Perspective

yx

Friday, 25 November 2011

Image Specific Regularisation

Friday, 25 November 2011

PRA Prior

Friday, 25 November 2011

Where are we?

Right Lower Arm Head

...

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

•

Friday, 25 November 2011

[1] P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan. "Object Detection with Discriminatively Trained Part-Based Models". PAMI (2009).

[2] B. Horn and B. Schunck, "Determining optical flow." AI (1981).

[3] G. Huang, M. Ramesh, T. Berg and E. Learned-Miller, “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained

Environments”. University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.

[4] I. Matthews, J. Xiao, and S. Baker, “2D vs. 3D Deformable face models: Representational power, construction and real-time fitting”. IJCV (2007).

[5] Pizarro and A. Bartoli, “Global optimization for optimal generalized procrustes analysis”, CVPR’11

[6] J. Gower, G. Dijksterhuis, “Procrustes Problems”, Oxford University Press (2004).

[7] R. Gross, I. Matthews, J. Cohn, J. Kanade and S. Baker, “Multi-PIE”. AFGR’08.

[8] J. MacQueen, “Some MEthods for Classification and Analysis of Multivariate Observations”. 1. Proceedings of 5th Berkeley Symposium on Mathematical

Statistics and Probability. University of California Press (1967).

[9] A. Ng, M. Jordan and Y. Weiss. “On Spectral Clustering: Analysis and Algorithms” NIPS (2001).

[10] D. Arthur and S. Vassilvitskii, "k-means++: the advantages of careful seeding". ACM-SIAM (2007).

[11] J. Chen, L. Tiang, J. Liu and J. Ye. “A Convex Formulation for Learning Shared Structures from Multiple Tasks”. ICML’09

[12] P. Baldi and K. Hornik, “Neural Networks and Principal Component Analysis: Learning from Examples Without Local Minima”, Neural Networks

(1989).

[13] B. Moghaddam and A. Pentland, “Probabilistic Visual Learning for Object Representation”. PAMI (1997).

[14] R. Tibshirani, “Regression shrinkage and selection via the lasso”. J. Royal. Statist. Soc B. (1997).

[15] R. Urtasun, “Motion Models for Robust 3D Human Body Tracking”, PhD Thesis (2006).

[16] B. Scholkopf, A. Smola and K. R. Muller, “Kernel Principal Component Analysis”, ICANN (1997).

[17] R. Prim, “Shortest connection networks and some generalizations”. In: Bell System Technical Journal, 36 (1957).

[18] I. Matthews, J. Xiao, and S. Baker, “On the Dimensionality of Deformable Face Models”, CMU-RI-TR-06-12, 2006

[19] L. Igual and F. De la Torre, “Continuous Procrustes Analysis to Learn 2D Shape Models from 3D Objects”, 3rd Workshop on Non-Rigid Shape and

Deformable Image Alignment, in conjunction with CVPR, 2010

[20] J. Xiao, J. Chai, and T. Kanade. “A closed-form solution to non-rigid shape and motion recovery”. ECCV’04.

[21] I. Akhter, Y. Sheikh, S. Khan, and T. Kanade , “Trajectory Space: A Dual Representation for Nonrigid Structure from Motion” PAMI (2011).

[22] J. Saragih, S. Lucey and J. Cohn, “Deformable Model Fitting by Regularized Landmark Mean-Shifts", IJCV (2010).

[23] S. Romdhani, A. Psarrou, and S.Gong, “Multi-View Nonlinear Active Shape Model using Kernel PCA”, BMVC (1999).

[24] L. Gu, E. Xing and T. Kanade,”Learning GMRF Structures for Spatial Priors”, CVPR (2007).

[25] Y. Yang and D. Ramanan, “Articulated pose estimation with flexible mixtures-of-parts”, CVPR (2011).

[26] J. Saragih, “Principal Regression Analysis", CVPR (2011).

References

Friday, 25 November 2011

Learning Responses

Outline

1.Exhaustive Search - Sampling:

(i) Tolerance to Shift

(ii) Faces (Pixels) vs. Bodies (HOG)

(iii) Photometric Normalization

2. Exhaustive Search - Learning Detectors:

(i) Generative vs. Discriminative

(ii) Linear Support Vector Machines

(iii) Latent Support Vector Machines

3. Beyond Translation - Factorizing Warps

Friday, 25 November 2011

93

Friday, 25 November 2011

A Single Patch (N=1)

min

p D(p)

N =1

p

Friday, 25 November 2011

Naive Approach

(p)

“Images at various warps ”

97

p

Friday, 25 November 2011

Naive Approach

(p)

“Images at various warps ”

[255,134,45,.......,34,12,124,67]

[123,244,12,.......,134,122,24,02]

[67,13,245,.......,112,51,92,181]

...........

“Vectors [65,09,67,.......,78,66,76,215]

of pixel values at

each warp position”

97

Naive Approach

• If you were a person coming straight from machine learning

you might suggest,

[255,134,45,.......,34,12,124,67]

[123,244,12,.......,134,122,24,02]

[67,13,245,.......,112,51,92,181]

[65,09,67,.......,78,66,76,215]

“Vectors of pixel values at

each warp position”

Friday, 25 November 2011

...........

D( )

“matching function”

object

≥

<

background

Th

98

Naive Approach

• If you were a person coming straight from machine learning

you might suggest,

[255,134,45,.......,34,12,124,67]

[123,244,12,.......,134,122,24,02]

[67,13,245,.......,112,51,92,181]

[65,09,67,.......,78,66,76,215]

“Vectors of pixel values at

each warp position”

Friday, 25 November 2011

...........

D( )

“matching function”

object

≥

<

background

We refer to this as Exhaustive Search!!!

Th

98

p2

p = {p1,p2}

Sampling?

• How do we sample every warp parameter value?

Friday, 25 November 2011

p1

“Possible Source Warps”

99

p2

p = {p1,p2}

Sampling?

• How do we sample every warp parameter value?

Friday, 25 November 2011

∆p2

∆p1

p1

“Possible Source Warps”

99

Digital Data Acquisition

• Shannon sampling theorem:

Sampling

ation: Shannon sampling theorem

ou sample densely enough

he Nyquist rate), you can

reconstruct the original data.

ectly reconstruct the

nal data”

if you sample densely enough (at the

Nyquist rate) you can perfectly

• We will show the desired sampling rate

is dependent on the “centre frequency”

of the salient edges.

time space

Friday, 25 November 2011

100

Measures of Similarity

• Sampling strategy dependent on similarity measure,

• Sum of squared differences (SSD)

Friday, 25 November 2011

D(p) =||I(p) − T (0)|| 2

“Vector Form”

I T

“Source Image”

“Model”

102

Measures of Similarity

• Sampling strategy dependent on similarity measure,

• Linear Correlation,

Friday, 25 November 2011

D(p) =−I(p) T T (0)

I T

“Can “Source be done Image” efficiently using 2D convolutions....” “Template”

z

103

to the closest one. The test patterns used in this example

were random with each pixel in a disc of radius 25 pixels

being turned on with probability 5%. Figures 5 and 6

Oriented Edges

show the mis-classification rate as the amount of blur, α or

σ is varied. Geometric blur has much better discriminative

power, and manages to be general enough to handle large

rotation somewhat more effectively than uniform blur.

100

• Well known that natural images can be represented as a

20

90

80

linear summation of oriented edges,

° rotation

40 ° rotation

v

y

y

% error rate

70

60

50

40

30

20

10

(1,1)

u

x

...

...

...

(2,2)

...

...

...

Figure 7: The twelve half-wave rectified channels contrast

0 1 2 3 4 5 6 7 8 9 10

Figure 7: The twelve half-wave rectified channels contrast

normalized from the response of 6 oriented edge filters on

the image. White indicates zero, and black indicates a positive

value. Note that the filter responses are sparse, making

the individual channels appropriate for geometric blur

104

normalized from the response of 6 oriented edge filters on

the image. White indicates zero, and black indicates a posi-

Friday, 25 November 2011

(3,3)

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Amount of geometric blur, alpha

...

...

Figure 5: Identifying 200 random test images after rotation,

using various amounts (α) of geometric blur.

∗

x

% error rate

100

90

80

70

60

50

40

30

20

10

0

20 ° rotation

40 ° rotation

Amount of uniform blur, sigma

“Useful as edges capture ONLY relative local change in intensity....”

...

Figure 6: Identifying 200 random test images after rotation,

using various amounts (σ) of uniform Gaussian blur.

(4,4)

C =[C1(x) ...Cn(x)] is a vector of channel values at x,

1

then the normalized version would be |C|2+ɛ where we use

an ɛ =0.3for filters with response between +1 and -1. Figure

7 shows an image and a set of 12 channels resulting

from 6 oriented edge filters.

One useful consequence of treating the positive and negative

components of oriented edge responses separately is

that information about zero crossings is not lost under blur-

ring. Instead of blurring the signal response around a zero

crossing to zero, the positive and negative responses are

both blurred over the area, retaining the information that

there was a zero crossing, but allowing uncertainty as to its

position.

Pixel Intensity

Pixel Intensity

Friday, 25 November 2011

Sensitivity to Shift

Warp

(p)

D(p)

T (0) I(p) “image”

“model”

D(p)

Pixel Coordinates

Warp(p) Pixel Coordinates

105

Pixel Intensity

Pixel Intensity

Friday, 25 November 2011

Sensitivity to Shift

Warp

(p)

D(p)

T (0) I(p) “image”

“model”

D(p)

Pixel Coordinates

Warp(p) Pixel Coordinates

105

Pixel Intensity

Pixel Intensity

Friday, 25 November 2011

Sensitivity to Shift

Warp

(p)

D(p)

T (0) I(p) “image”

“model”

D(p)

∆pl

Pixel Coordinates

∆ph

Warp(p) Pixel Coordinates

105

p2

p = {p1,p2}

Sensitivity to Shift

• Clear that sampling is linked to center frequency of edge.

Friday, 25 November 2011

∆pl

p1

“Possible Source Warps”

106

p2

p = {p1,p2}

Sensitivity to Shift

• Clear that sampling is linked to center frequency of edge.

Friday, 25 November 2011

∆ph

p1

“Possible Source Warps”

107

Sensitivity to Shift

• Edges have a center frequency/wavelength,

• Clear that wavelength of the most “salient” edge is

proportional to sample size:-

Friday, 25 November 2011

λ ∝ ∆p

108

Sensitivity to Shift

• Edges have a center frequency/wavelength,

• Clear that wavelength of the most “salient” edge is

proportional to sample size:-

Friday, 25 November 2011

λ

λ ∝ ∆p

108

Sensitivity to Shift

• Edges have a center frequency/wavelength,

• Clear that wavelength of the most “salient” edge is

proportional to sample size:-

Friday, 25 November 2011

λ

λ ∝ ∆p

108

Sensitivity to Shift

Figure 8: Example of frontal upright face images used for training.

Friday, 25 November 2011

• Tolerance to shift is

also important when

learning a model.

“More on this soon.”

109

Review

Face detection

Boosting

Challenge #2: efficiency

• Faces, generally have their salient edges in mid-frequency

range. Dealing with pixels directly gives good results.

640×480 pixels

scales 24,36,48,60,72

sliding every 8 pixels

⇒ 24,000 subwindows to classify

Friday, 25 November 2011

Pixels Good for Faces

Schneiderman & Kanade, 1999, Viola & Jones, 2001

110

Pixels not Good for Bodies

4 Deva Ramanan

• Human bodies rely on contrast with background.

• Reliant on high-frequency edges.

• As a result poor sensitivity to shift.

statistics computed over a local patch; the mean (µ ∈ R3 ) and covariance (Σ ∈ R3×3 )

of the color distribution.

2.2 Oriented gradient descriptors

(a) (b) (b) (c) (c) (d) (d) (e) (e) (f) (f) (g)

Fig. 3 On the left, we show an image. On the center left, we show its representation under a HOG

descriptor [10]. A common visualization technique is to render an oriented edge with intensity

r HOG equal Figure detectors to its6. histogram Our cueHOG mainly count, detectors where on silhouette thecue histogram mainly contours ison computed silhouette (especially overcontours a 8 the × 8head, pixel (especially neighborhood. shoulders Dalal theand head, feet). & Triggs, shoulders The most 2005 andactive feet). blocks The mo ar

111

e image Wecentred canbackground use the on same the image just technique outside background to visualize the contour. just linearly-parameterized outside (a) The theaverage contour. part gradient models; (a) Thewe image average showover a “head” gradient the training imageexamples. over the training (b) Each example “pixel

Friday, part 25 November shows modelthe on 2011

maximum the right, and positive its associated SVM weight responseinmap thefor block all candidate centred on head thelocation pixel. (c) on the Likewise for the negative SVM weigh

What About Blurring?

• As pointed out in seminal work by Berg and Malik

(CVPR’01) the effectiveness of SSD will degrade with

significant viewpoint change.

• Two options to match patches:-

1. simultaneously estimate the distortion and position of matching patch.

2. to “blur” the template window performing matching coarse-to-fine.

Friday, 25 November 2011

Berg & Malik, 2001

112

What About Blurring?

• As pointed out in seminal work by Berg and Malik

(CVPR’01) the effectiveness of SSD will degrade with

significant viewpoint change.

• Two options to match patches:-

1. simultaneously estimate the distortion and position of matching patch.

2. to “blur” the template window performing matching coarse-to-fine.

Friday, 25 November 2011

Option 2 is attractive, low computational cost!

Berg & Malik, 2001

112

“HF Edge”

“Blur Kernel”

Friday, 25 November 2011

What About Blurring?

∗

113

“HF Edge”

“Blur Kernel”

“Output”

Friday, 25 November 2011

What About Blurring?

∗

113

“HF Edge”

“Blur Kernel”

“Output”

Friday, 25 November 2011

What About Blurring?

λe

∗

λb

113

What About Blurring

• Clearly, blurring a high-frequency edge filter simply lowers

the centre frequency (not what we want).

λb > λe

“Blurred Edge Wavelength”

“High Frequency Edge Wavelength”

Friday, 25 November 2011

114

Sparseness and Positiveness

• Blurring only works if the signals being matched are sparse

and positive.

• Unfortunately natural images are neither.

• Combination of oriented filter banks and rectification can

remedy this problem with little loss in performance.

∗

v

y

y

(1,1)

u

x

x

...

...

...

(2,2)

Figure 7: The twelve half-wave rectified channels contrast

Friday, 25 November 2011

...

...

...

(3,3)

...

...

...

(4,4)

“Rectification”

115

Sparseness and Positiveness

• Blurring only works if the signals being matched are sparse

and positive.

• Unfortunately natural images are neither.

• Combination of oriented filter banks and rectification can

remedy this problem with little loss in performance.

Friday, 25 November 2011

r

r · r

“Rectification”

“Non-Linearly

sets Centre

Frequency

to Zero”

116

and the rotated versions were blurred by either geometric

blur or a uniform Gaussian blur. For geometric blur, a spatially

varying kernel Kx(y) =G α|x|(y), where Gσ(y) is a

ation,

Sparseness and Positiveness

two channels would be C1(x) =[I(x)E(x)] χ [I(x)E(x)>0]

Gaussian with standard deviation σ, was applied. For uni-

form Gaussian blur the kernel Gσ(y) was applied. Then

each blurred rotated pattern was compared to all the blurred

original patterns using normalized correlation and matched

to the closest one. The test patterns used in this example

were random with each pixel in a disc of radius 25 pixels

being turned on with probability 5%. Figures 5 and 6

show the mis-classification rate as the amount of blur, α or

σ is varied. Geometric blur has much better discriminative

power, and manages to be general enough to handle large

rotation somewhat more effectively than uniform blur.

ure 7 shows an image and a set of 12 channels resulting

• Comes at additional computational cost, as new

from 6 oriented edge filters.

One useful consequence of treating the positive and neg-

representation is F times larger (where F is the number of

filters employed).

100

% error rate

90

80

70

60

50

40

30

20

10

20 ° rotation

40 ° rotation

0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Amount of geometric blur, alpha

Figure 5: Identifying 200 random test images after rotation,

using various amounts (α) of geometric blur.

% error rate

100

90

80

70

60

50

40

30

20

10

20 ° rotation

40 ° rotation

I(p) φ{I(p)}

Figure 7: The twelve half-wave rectified channels contrast

normalized from the response of 6 oriented edge filters on

Figure 7: The twelve half-wave rectified channels contrast

the image. White indicates zero, and black indicates a posi-

φ{} = image tive value. descriptor Note that the filter responses function

are sparse, making

normalized from

the individual

the response

channels appropriate for

of

geometric

6

blur

oriented edge filters 117 on

0

0 1 2 3 4 5 6 7 8 9 10

Amount of uniform blur, sigma

Figure 6: Identifying 200 random test images after rotation,

using various amounts (σ) of uniform Gaussian blur.

Friday, 25 November 2011

considering real images, we break images up into a number

of channels. Each channel is a half-wave rectified oriented

edge response. In particular if E(x) is a filter then

and C2(x) =− [I(x)E(x)] χ [I(x)E(x)

Sparseness and Positiveness

• Comes at additional computational cost, as new

representation is F times larger (where F is the number of

filters employed).

• Downsampling strategies can be employed to lessen this

problem.

(b) (b) (c) (c) (d) (d) (e) (e) (f) (f) (g) (g)

I(p) φ{I(p)}

sctors cue mainly cue mainly on silhouette on silhouette contours contours (especially (especially the head, theshoulders head, shoulders and feet). andThe feet). mostThe active most blocks active areblocks

are

ound kground just outside just outside the contour. the contour. (a)

φ{}

The (a) average The

=

average gradient

image

gradient image over

descriptor

image theover training the examples. training

function

examples. (b) Each “pixel” (b) Each “pixel”

itive SVM SVM weight weight in theinblock the block centred centred on theon pixel. the(c) pixel. Likewise (c) Likewise for the negative for the negative SVM weights. SVM (d) weights. A test (d) image. A test image.

scriptor. descriptor. (f,g) (f,g) The R-HOG The R-HOG descriptor descriptor weighted weighted by respectively by respectively the positive the positive and the negative and theSVM negative weights. SVM weights.

Friday, 25 November 2011

118

Edge Energy

Friday, 25 November 2011

Sensitivity to Shift

Warp

(p)

D(p)

No Blurring

Pixel Coordinates

119

Rectified Edge

Friday, 25 November 2011

Sensitivity to Shift

Warp

(p)

D(p)

Gaussian Blur

Pixel Coordinates

120

Rectified Edge

Friday, 25 November 2011

Sensitivity to Shift

Warp

(p)

D(p)

Histogram Blur

Pixel Coordinates

121

Photometric Normalization

• Nearly all parts-based registration methods employ some

sort of photometric normalization.

3 Face Recognition Across Illumination

• Based on old idea of the 3.1 “Reflectance Databases and Algorithms Perception Model”.

ecognition Across Illumination

I(x, y) 1

L(x,y)

= R(x, y)

ses and Algorithms

For our experiments we use images from the more challenging illumination set

from two publicly available databases in our evaluation: which was CMUcaptured PIE without room lights (see Figure 2).

Yale database. The CMU PIE database contains a total of 41,368

from 68 individuals [18]. The subjects were imaged in the CMU 3D

set of 13 synchronized high-quality color cameras and 21 flashes.

iments we use images from the more challenging illumination set

tured without room lights (see Figure 2).

Friday, 25 November 2011

Original PIE images

We use images from two publicly available databases in our evaluation: CMU PIE

database and Yale database. The CMU PIE database contains a total of 41,368

images taken from 68 individuals [18]. The subjects were imaged in the CMU 3D

Room using a set of 13 synchronized high-quality color cameras and 21 flashes.

Original PIE images

Processed PIE images

Fig. 2. Result of removing illumination variations with our algorithm for a set of images

from the PIE database.

Gross & Brajovic 2003, Land & McCann, 1971 122

Photometric Normalization

• Nearly all parts-based registration methods employ some

sort of photometric normalization.

3 Face Recognition Across Illumination

• Based on old idea of the 3.1 “Reflectance Databases and Algorithms Perception Model”.

We use images from two publicly available databases in our evaluation: CMU PIE

ecognition Across Illumination database

I(x, y) 1

and Yale database. The CMU PIE database contains a total of 41,368

images taken from

=

68 individuals

R(x,

[18]. The

y)

subjects were imaged in the CMU 3D

ses and Algorithms L(x,y)

Room using a set of 13 synchronized high-quality color cameras and 21 flashes.

For our experiments we use images from the more challenging illumination set

from two publicly available databases in our evaluation: which was CMUcaptured PIE without room lights (see Figure 2).

Yale database. The CMU PIEImage database contains Illuminance

a total of 41,368 Reflectance

from 68 individuals [18]. The subjects were imaged in the CMU 3D

set of 13 synchronized high-quality color cameras and 21 flashes.

iments we use images from the more challenging illumination set

tured without room lights (see Figure 2).

Friday, 25 November 2011

Original PIE images

Original PIE images

Image Reflectance

Processed PIE images

Fig. 2. Result of removing illumination variations with our algorithm for a set of images

from the PIE database.

Gross & Brajovic 2003, Land & McCann, 1971 122

Photometric Normalization

• When dealing with “smaller” image patches, can get similar

types of performance through power-normalization.

Friday, 25 November 2011

123

• Approach

SIFT descriptor [Lowe’99]

– 8 orientations of the gradient (sparseness)

– 4x4 spatial grid (blur)

– soft-assignment to spatial bins of gradient magnitude (recitifcation)

– normalization of the descriptor to norm one (illumination)

– comparison with Euclidean distance (SSD)

image patch

Friday, 25 November 2011

gradient

→ →

y

x

3D histogram

Lowe, 1999

Learning Detectors

• SSD and correlation measures are good if the models stems

from a single source image.

• Not so good if there is noise stemming from appearance

variation, e.g.:

Friday, 25 November 2011

“Template”

126

Discriminative Approaches

• Better generalization performance can often be realized by

learning the difference between two classes.

• We no longer get caught up with the problem of attempting

to synthesize all variations of an object.

Friday, 25 November 2011

(a) (b)

“How do we get negative examples?”

128

Positive Examples for Detection

Figure 8: Example of frontal upright face images used for training.

Friday, 25 November 2011

• Coarsely normalize

for scale, orientation

and translation.

• Bad idea to

normalize too much

due to the nature of

the discrete ES.

• HOG style

descriptors help

when salient

information is in highfrequency

edges.

129

Friday, 25 November 2011

Negative Examples for Detection

• Obtain a large number

of non-object images

(through the web).

• Randomly, sample

through various

positions within the

images.

130

Linear Support Vector Machines

• Linear support vector machines (SVMs) are by FAR the

most popular classifier for deformable parts detection.

• As they are extremely efficient to train. (Pegasos, Liblinear, etc.).

• Can be interpreted as linear correlation, which can be applied very

efficiently using 2D convolutions.

• Have low capacity, as much of the complexity in the object is now

modeled in the shape, not the appearance.

Friday, 25 November 2011

132

Friday, 25 November 2011

Linear Support Vector Machines

binary labels

zi = ith training example

w = weight vector

(a) (b)

(a) (b)

(a) (b)

133

Friday, 25 November 2011

Linear Support Vector Machines

yi =+1

yi = −1

binary labels

zi = ith training example

w = weight vector

(a) (a) (b)

(b)

(a) (b)

(a) (a) (b)

133

Friday, 25 November 2011

Linear Support Vector Machines

margin ∝ (w T w) −1

C = cost of error

(a) (b)

Cortes & Vapnik, 1995

134

Imbalance between + & -

• There is still always an inherent imbalance between positive

(finite) and negative (near infinite) examples.

• Common methodology (Dalal & Triggs, 2005) is to,

• Start with random negatives, then repeat

1. Train model

2. Harvest false positive to define “hard negatives”.

Friday, 25 November 2011

135

• Up until now, we have considered learning Di(p) and

R(p) independently of each other.

Friday, 25 November 2011

Latent Support Vector Machines

w T Φ(I,p)

Felzenszwalb, McAllester, & Ramanan, 2008

136

Latent Support Vector Machines

• Up until now, we have considered learning Di(p) and

R(p) independently of each other.

w T Φ(I,p)

• Recent work, has proposed a method to learn them jointly

using a Latent Support Vector Machine (LSVM).

Friday, 25 November 2011

Felzenszwalb, McAllester, & Ramanan, 2008

136

Beyond Translation

• Is sensitivity to shift only for translation?

Friday, 25 November 2011

“Motion Field for Translation”

137

Beyond Translation

• Is sensitivity to shift only for translation?

Friday, 25 November 2011

“Motion Field for Scale”

138

Beyond Translation

• Is sensitivity to shift only for translation?

Friday, 25 November 2011

“Motion Field for Rotation”

139

Friday, 25 November 2011

Factorizing Complex Warps

140

Factorizing Complex Warps

• How does one sample for warps more complicated than

translation and scale?

Friday, 25 November 2011

140

Factorizing Complex Warps

• How does one sample for warps more complicated than

translation and scale?

Friday, 25 November 2011

140

Factorizing Complex Warps

• How does one sample for warps more complicated than

translation and scale?

• Options are limited for an exhaustive search strategy.

Friday, 25 November 2011

140

Friday, 25 November 2011

Warp Functions

W(x; p)

142

“Curse of Dimensionality”

• One can see that as the dimensionality of p increases, and,

assuming the same number of discrete samples per n

dimension, the number of searches becomes,

Friday, 25 November 2011

No. of searches

number of searches → O(n D )

Dimensionality of

p (D)

143

• One can see that as the dimensionality of p increases, and,

assuming the same number of discrete samples per n

dimension, the number of searches becomes,

Friday, 25 November 2011

“Curse of Dimensionality”

No. of searches

number of searches → O(n D )

Dimensionality of

p (D)

D ≤ 3

“Impractical for ”

143

• Two central strategies for overcoming this problem,

1. Keep D low,

2. Factorize p by N (no. of parts).

Friday, 25 November 2011

“Curse of Dimensionality”

No. of searches

number of searches → O(n D )

Dimensionality of

p (D)

→ O(Nn D/N )

144

Friday, 25 November 2011

Warp Factorization

“Motion Field for Rotation”

145

Friday, 25 November 2011

Warp Factorization

“Motion Field for Rotation”

“Local regions can be

approximated by

translation.”

145

Warp Factorization

• Dimensionality of pi should be as small as possible to make

local exhaustive search tractable.

Friday, 25 November 2011

“Motion Field for Rotation”

146

p1

p3

Warp Factorization

• Dimensionality of pi should be as small as possible to make

local exhaustive search tractable.

Friday, 25 November 2011

p2

p4

“Motion Field for Rotation”

p =[p T 1 , p T 2 , p T 3 , p T 4 ] T

146

Friday, 25 November 2011

Exhaustive Search (N > 1)

“Source Image”

147

Friday, 25 November 2011

Exhaustive Search (N > 1)

“Source Image”

[255,134,45,.......,34,12,124,67]

[123,244,12,.......,134,122,24,02]

[67,13,245,.......,112,51,92,181]

...........

[65,09,67,.......,78,66,76,215]

“Vectors of pixel values at

each warp position”

147

ure map

Friday, 25 November 2011

Exhaustive Search (N > 1)

149

Friday, 25 November 2011

Exhaustive Search (N > 1)

response of root filter

x

x

...

...

response of part filters

...

model

feature map feature map at twice the resolution

149

Exhaustive Search (N > 1)

φ{I} φ{T1(0)} φ{T2(0)}

x

response of root filter

x

D1

I

D2

response of root of root filter response filter of part filters

x

φ{TN(0)}

DN

...

...

response of part filters

...

model

feature map feature map at twice the resolution

model

feature map feature map at twice the resolution

x

∗ ∗ ...

x

x ... ∗

model x

x x

x

feature map feature map at twice the resolution model model

transformed responses

x

... ... x

...

...

...

x

response of part filters

model

feature map feature map feature map feature at twice map the at resolution twice the resolution

response of root filter

color encoding of filter

response values response of root filter

response of root filter

x

...

response of part filters

response of part filters

...

color encoding of filter

transformed + responses

+

model

combined response

response

of score part filters

of of part filters

root transformed locationsresponses

...

model

feature map feature map at twice the resolution

response of root filter

x

Color Encoding of Filter Responses

Friday, 25 November 2011

Feature Extraction

x

Local

Search

feature map feature map feature map feature at twice map the at twice resolution the resolution

Part

x

Responses

+ x

x

response of root filter

...

...

response of part filters

...

response of root filter

x

...

...

transformed responses

x

............

...

... ...

............

response of part filters

+ transformed ... responses ...

Felzenszwalb, Girshick, McAllester & Ramanan, 2010

x

...

...

transformed responses

transformed responses

x

x

x

x

150

Head

Right Shoulder

Friday, 25 November 2011

Faces and Bodies

Left Lip

Corner

Right Eye

Corner

Friday, 25 November 2011

Optimisation

Friday, 25 November 2011

Outline

1.Graph Fitting

(i) Tree Structures - Dynamic Programming

(ii) Beyond Trees - Tree Sub-Problems

(iii) Loopy Belief Propagation

2. Response Fitting

(i) Gradient Methods

(ii) Convex Quadratic Fitting

(iii) Constrained Mean Shift

3. Graph vs. Response Fitting

(i) No Silver Bullet

(ii) Future Directions

154

Important Message

• Objective function is only as useful, as how well

you can optimize it.

Friday, 25 November 2011

Power Tractability

156

ure map

Friday, 25 November 2011

Exhaustive Search (N > 1)

157

Friday, 25 November 2011

Exhaustive Search (N > 1)

response of root filter

x

x

...

...

response of part filters

...

model

feature map feature map at twice the resolution

157

Exhaustive Search (N > 1)

φ{I} φ{T1(0)} φ{T2(0)}

x

response of root filter

x

I

response of root of root filter response filter of part filters

x

φ{TN(0)}

...

...

response of part filters

...

model

feature map feature map at twice the resolution

model

feature map feature map at twice the resolution

x

∗ ∗ ...

x

x ... ∗

model x

x x

x

feature map feature map at twice the resolution model model

transformed responses

x

... ... x

...

...

...

x

response of part filters

model

feature map feature map feature map feature at twice map the at resolution twice the resolution

response of root filter

color encoding of filter

response values response of root filter

response of root filter

x

...

response of part filters

response of part filters

...

color encoding of filter

transformed + responses

+

model

combined response

response

of score part filters

of of part filters

root transformed locationsresponses

...

model

feature map feature map at twice the resolution

response of root filter

x

Color Encoding of Filter Responses

Friday, 25 November 2011

Feature Extraction

x

Local

Search

x

Responses

+ x

D1

x

D2

transformed responses

............

feature map feature map feature map feature at twice map the at twice resolution the resolution

Part

x

response of root filter

...

...

response of part filters

...

response of root filter

x

...

...

...

... ...

............

response of part filters

x

x

DN

+ transformed ... responses ...

Felzenszwalb, Girshick, McAllester & Ramanan, 2010

x

...

...

transformed responses

transformed responses

x

x

158

feature map feature map at twice the resolution model

x

D1

response of root filter

response of root filter

Exhaustive Search (N > 1)

feature map feature map feature map feature at twice map the at twice resolution the resolution

Friday, 25 November 2011

x

x

response of root filter

color encoding of filter

response values

color encoding of filter

color encoding response of filter values

response values

x

D2

+

x

x

...

...

...

...

response of part filters

response

response

of part filters

of part filters

...

...

transformed responses

+ transformed responses

transformed responses

+

...

...

............

...

model

x

x

x

DN

combined score of

root locations

158

feature map feature map at twice the resolution model

feature map feature map feature map feature at twice map the at twice resolution the resolution

x

D1

response of root filter

response of root filter

x

p ∗ 1 = arg min

p1

response of root filter

Friday, 25 November 2011

D1(p1)

x

color encoding of filter

response values

color encoding of filter

color encoding response of filter values

response values

x

D2

+

x

D2(p2) x

...

...

...

...

response of part filters

response

response

of part filters

of part filters

...

...

transformed responses

+ transformed responses

transformed responses

+

............

...

...

...

model

Computational Cost

p ∗ 2 = arg min

p2

x

p x x

∗ N = arg min

pN

DN

combined score of

root locations

DN(pN )

160

feature map feature map at twice the resolution model

feature map feature map feature map feature at twice map the at twice resolution the resolution

x

D1

response of root filter

response of root filter

x

p ∗ 1 = arg min

p1

response of root filter

Friday, 25 November 2011

p ∗ 1

x

O(M)

D1(p1)

color encoding of filter

response values

color encoding of filter

color encoding response of filter values

response values

x

p ∗ 2

D2

+

x

D2(p2) x

...

...

...

...

response of part filters

response

response

of part filters

of part filters

...

+ transformed responses

transformed responses

+

............

...

...

model

Computational Cost

p ∗ 2 = arg min

p2

x

p x x

∗ N = arg min

p ∗ N

DN

transformed ... ...

responses

O(M) O(M)

pN

combined score of

root locations

DN(pN )

160

Friday, 25 November 2011

Exhaustive Search

p2

p3

p1

p5

O(M N )

p4

“We can do much better than this if the graph is sparse.”

161

Friday, 25 November 2011

Exhaustive Search

p2

p3

p1

p5

O(NM 2 )

p4

“We can do much better than this if the graph is sparse.”

161

Tree Regularization

• Sparse graph of particular interest is a tree,

Friday, 25 November 2011

Felzenszwalb & Huttenlocher, 2005

162

Tree Regularization

• Sparse graph of particular interest is a tree,

Friday, 25 November 2011

Felzenszwalb & Huttenlocher, 2005

162

Dynamic Programming

• Globally optimal solution to any tree graph can be found

using “Dynamic Programming”.

Friday, 25 November 2011

163

Backtracking

• Once we have the optimal parent score, we can back-track

to find the children,

Friday, 25 November 2011

p2

p3

p1

p5

p4

166

Backtracking

• Once we have the optimal parent score, we can back-track

to find the children,

Friday, 25 November 2011

p2

m2(p1)

m5(p1)

p3

m3(p1)

p1

p5

m4(p1)

p4

166

Backtracking

• Once we have the optimal parent score, we can back-track

to find the children,

Friday, 25 November 2011

p2

p3

p ∗ 1

p5

p4

167

Backtracking

• Once we have the optimal parent score, we can back-track

to find the children,

Friday, 25 November 2011

p ∗ 2

p ∗ 3

p ∗ 1

p ∗ 5

p ∗ 4

167

Friday, 25 November 2011

Matching results

Results

~ 1 second to search all scales.

(after non-maximum suppression)

~1 second

Felzenszwalb,

to search

Girshick,

all scales

McAllester & Ramanan, 2010

168

Mixtures of Parts

• Approach is useful, as it can allow smaller detectors.

• Smaller detectors increase likelihood of the aperture

problem.

Friday, 25 November 2011

?

?

170

Friday, 25 November 2011

Mixtures of Parts

�

!"#$

%&'(&

)*)"+

'*)"+

)*#',

'*#',

Figure 6: Results on the Parse dataset. We show 27 part bounding boxes reported by our algorithm for each image. The top

3 rows show successful examples, while the bottom row shows failure cases. Examining failure cases from left to right, we

find our model is not flexible enough to model horizontal people, is confused by overlapping people, suffers from doublecounting

phenomena common to tree models (both the left and right legs fire on the same image region), and is confused

when objects partially occlude people.

�����

Yang & Ramanan, 2011

�

����� �

171

• What about non-tree structures?

Friday, 25 November 2011

Non-Trees?

172

• What about non-tree structures?

Friday, 25 November 2011

Non-Trees?

172

• What about non-tree structures?

Friday, 25 November 2011

Non-Trees?

172

Tree Sub-Problems

• One popular solution is to break a more complicated graph

into a number of tree sub-problems.

Text

Friday, 25 November 2011

Felzenszwalb & Huttenlocher, 2005 (Importance Sampling using Trees)

Wang & Mori, 2008 (Mixtures of Trees)

Tian & Scharloff, 2010 (Branch & Bound with Trees)

173

Tree Sub-Problems

• One popular solution is to break a more complicated graph

into a number of tree sub-problems.

Felzenszwalb & Huttenlocher, 2005 (Importance Sampling using Trees)

Wang & Mori, 2008 (Mixtures of Trees)

• Text These approaches have a couple of drawbacks,

• They assume that the problem can be broken down into

tree sub-problems.

• Finding a solution is slow.

• Only guaranteed of a local-minima (with exception of

Branch & Bound methods).

Friday, 25 November 2011

Tian & Scharloff, 2010 (Branch & Bound with Trees)

173

30

35

40

45

50

55

5 10 15 20 25 30 35 40 45 50 55

Loopy Belief Propagation

• Loopy 3.c belief propagation is 3.d another approach.

• Attempts to apply tree-based belief propagation iteratively.

• A number of drawbacks,

top to bottom the figure shows the results on face, human body and

he objects; b) the superimposition of training samples; c) the sample

amples. • No Note guarantee that thereof isconvergence an interesting(although coherenceempirically between theperforms graph well).

and the • Can physical be slow structures to converge. of objects.

• Requires the graph to be relatively sparse.

Friday, 25 November 2011

Gu & Kanade, 2007

174

Friday, 25 November 2011

Another Direction

y - translation

p = {x-translation, y-translation}

similarity

similarity

x - translation

x - translation

177

“Correlation function

for a natural image”

y - translation

p = {x-translation, y-translation}

Friday, 25 November 2011

Another Direction

similarity

x - translation

178

Friday, 25 November 2011

Another Direction

179

Constrained Local Models

• Constrained local models (CLMs) encapsulate many

methods in existence at the moment in non-rigid face

alignment literature,

• Best known example are “Active Shape Models” (ASMs).

Cootes and Taylor, 1992 (Active Shape Models)

Zhou, Gu, and Zhang, 2003 (Bayesian Tangent Shape Models)

Cristinacce and Cootes, 2004. (Constrained Local Models)

• Related work also in “Active Appearance Models” (AAMs).

Friday, 25 November 2011

Cootes, Edwards and Taylor, 1998 (Active Appearance Models)

Matthews and Baker, 2004 (Active Appearance Models Revisited)

180

Friday, 25 November 2011

Constrained Local Models

• Start with an initial guess.

181

Friday, 25 November 2011

Constrained Local Models

• Start with an initial guess.

181

Friday, 25 November 2011

Constrained Local Models

• Start with an initial guess.

• Warp to a fixed size template (e.g., 110x110 pixels).

181

CLM - Extracting Patch Responses

(110x110 pixels)

• Try to fit a single point.

Friday, 25 November 2011

182

CLM - Extracting Patch Responses

(110x110 pixels)

• Try to fit a single point.

Friday, 25 November 2011

182

CLM - Extracting Patch Responses

(110x110 pixels)

• Try to fit a single point.

Friday, 25 November 2011

“Current Estimate for point n”

182

CLM - Extracting Patch Responses

(110x110 pixels)

• Try to fit a single point.

Friday, 25 November 2011

“Groundtruth for point n”

“Current Estimate for point n”

182

Friday, 25 November 2011

Constrained Local Models

(110x110 pixels)

“nth Constrained Local Search Area”

183

220

221

222

223

224

225

Constrained Local Models

ith Patch Expert

O(L)

a) (b) (c) (d) (e)

226

ith Constrained Local Search Area

(e.g., 30x30 pixels)

227

228

229

Friday, 25 November 2011

∗

Examples of local search responses: (a) is the source

(a) (b)

“Uses ZNCC“

“nth Patch Response”

e aligned. (b) shows the local search responses using

Figure 1. Examples of local search

rts trained by 125 positive examples and 15k negative

image to be aligned. (b) shows the

(c) shows the local search responses trained by 125

patch experts trained by 125 positiv

184

220

221

222

223

224

225

Constrained Local Models

ith Patch Expert

O(L)

a) (b) (c) (d) (e)

226

ith Constrained Local Search Area

(e.g., 30x30 pixels)

227

228

229

Friday, 25 November 2011

∗

Examples of local search responses: (a) is the source

(a) (b)

“Uses ZNCC“

“nth Patch Response”

e aligned. (b) shows the local search responses using

Figure 1. Examples of local search

rts trained by 125 positive examples and 15k negative

Remember: O(L)

CLM - Extracting Patch Responses

(110x110 pixels)

• 68 points in total. 185

Friday, 25 November 2011

CLM - Extracting Patch Responses

(110x110 pixels)

• 68 points in total. 185

Friday, 25 November 2011

%&''&

!"#$

216

%&''&

217

218

219

220216

221217

222218

223219

224

220

221

225

222

226

223

227

CLM - Extracting Patch Responses

216

217

218

219 216

220 217

221 218

222 219

223 220

224 221

225 222

D1(p1) =

DN(pN) =

(a) (b) (c) (d) (e)

(110x110 pixels) 226 223

224 Figure 1. Examples of local search responses: (a) (a) is the (b) sourc

image to be aligned. 227 224

• Get 225

(b) shows the local search responses usin

responses for all N patch Figure experts. 1. Examples of local se 186

patch experts trained by 125 positive examples and 15k negativ

228 225

228

229

226

Friday, 25 November 2011

!"#$

%&''&

CVPR 2008 Submission #1001. CONFID

CVPR 2

CVPR 2008 Submission #1001. CONFI

.............................

Why an Iterative Solution?

• Iterative approach allows for smaller more accurate

detectors.

• Aperture problem is handled by increasingly better estimates

of scale and rotation.

• Traditionally, CLMs have been applied to raw or photometric

normalized pixels (not HOG style descriptors that give

greater invariance).

• As discussed earlier, dealing with raw pixels does not blur/

throw away position information.

• Also, raw pixels have clear computational advantages over

descriptors.

Friday, 25 November 2011

188

• A couple of solutions have been proposed in literature.

• Most straightforward approach is to,

p ∗ 1 = arg min D

p1

t 1(p1)

p ∗ N = arg min D

pN

t p

N(pN )

∗ 2 = arg min D

p2

t 488

490

(15)

491

x, y] 492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

2(p2)

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

T B (i,i) =

.

rained

to be

l show

rve fitreduce

e a di-

More

0. As

(16)

gram-

CLM

xperts

cult to

on we

obuste

been

ve rore

are

lected

particowing

from

to the

scale

tliers.

ion 14

(17)

∂ϱ(E(xk,yk); σk)

B (i+1,i+1) = ∂ϱ(E(xk,yk);

486

487

2xy

σk)

488

489

490

(15) where i = 2kand k = 1...N. We shall refer to this

491

x, y] method of fitting a CLM as robust convex quadratic fitting

492

(RCQF).

493

Example Fits: Examples of local response surface fitting 494

can be found in Figure 2. The red cross shows the ground 495

truth location in the search window. The closer the peaks 496

of the local responses are to the red cross indicates the 497

better the performance of the method. We can see that in 498

most cases ELS, CQF, and RCQF methods can all achieve 499

good performance. However, our proposed CQF and RCQF 500

methods in (c) and (d) respectively are less sensitive to local 501

minima than the ELS method in (b).

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

(a) (b) (c) (d)

Figure 2. Examples of fitting local search responses: (a) is the lo- 519

cal search responses in Figure 1(d) using patch experts trained by 520

a linear support vector machine (SVM). (b-d) show the surface fit- 521

ting results. More specifically, (b) picks the local displacement 522

with the minimum response value in the search window, while (d) 523

and (e) fit the local search response surface by a quadratic kernel 524

in Equation 16 and a quadratic kernel with a robust error func- 525

tion in Equation 17, respectively. The brighter intensity means the

526

smaller matching error between the template and the source image

527

patch. In each search window, the red cross illustrates the ground

528

truth location. As we can see, in most cases, the above three methods

can all achieve good performance, while the proposed convex

529

quadratic fitting (CQF) (d) and the robust convex quadratic fitting 530

(RCQF) (e) methods are less sensitive to local minima than the 531

T where B is a 2N × 2N diagonal matrix with

B (i,i) =

.

rained

to be

l show

rve fitreduce

e a di-

More

0. As

(16)

gram-

CLM

xperts

cult to

on we

obuste

been

ve rore

are

lected

particowing

from

to the

scale

tliers.

ion 14

∂ϱ(E(xk,yk); σk)

B (i+1,i+1) = ∂ϱ(E(xk,yk);

!"#$

%&''&

ONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

σk)

486

∂yk

487

2xy

488

where i = 2kand k = 1...N. We shall refer to this 489

method of fitting a CLM as robust convex quadratic fitting 490

(15) (RCQF).

491

x, y] Example Fits: Examples of local response surface fitting 492

can be found in Figure 2. The red cross shows the ground 493

truth location in the search window. The closer the peaks 494

of the local responses are to the red cross indicates the 495

better the performance of the method. We can see that in 496

most cases ELS, CQF, and RCQF methods can all achieve 497

good performance. However, our proposed CQF and RCQF 498

methods in (c) and (d) respectively are less sensitive to local 499

minima than the ELS method in (b).

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

(a) (b) (c) (d)

517

Figure 2. Examples of fitting local search responses: (a) is the local

search responses in Figure 1(d) using patch experts trained by 518

a linear support vector machine (SVM). (b-d) show the surface fit- 519

ting results. More specifically, (b) picks the local displacement 520

with the minimum response value in the search window, while (d) 521

and (e) fit the local search response surface by a quadratic kernel 522

in Equation 16 and a quadratic kernel with a robust error func- 523

tion in Equation 17, respectively. The brighter intensity means the 524

smaller matching error between the template and the source image

525

patch. In each search window, the red cross illustrates the ground

526

truth location. As we can see, in most cases, the above three meth-

527

ods can all achieve good performance, while the proposed convex

T where B is a 2N × 2N diagonal matrix with

B (i,i) =

.

rained

to be

l show

ve fiteduce

e a di-

More

0. As

(16)

gram-

CLM

xperts

cult to

on we

obuste

been

ve rore

are

lected

particowing

from

to the

scale

∂ϱ(E(xk,yk); σk)

∂xk

B (i+1,i+1) = ∂ϱ(E(xk,yk); σk)

∂yk

where i = 2kand k = 1...N. We shall refer to this

method of fitting a CLM as robust convex quadratic fitting

(RCQF).

Example Fits: Examples of local response surface fitting

can be found in Figure 2. The red cross shows the ground

truth location in the search window. The closer the peaks

of the local responses are to the red cross indicates the

x better the performance of the method. We can see that in

most cases ELS, CQF, and RCQF methods can all achieve

good performance. However, our proposed CQF and RCQF

methods in (c) and (d) respectively are less sensitive to local

minima than the ELS method in (b).

(a) (b) (c) (d)

Figure 2. Examples of fitting local search responses: (a) is the local

search responses in Figure 1(d) using patch experts trained by

a linear support vector machine (SVM). (b-d) show the surface fitting

results. More specifically, (b) picks the local displacement

with the minimum response value in the search window, while (d)

and (e) fit the local search response surface by a quadratic kernel

in Equation 16 and a quadratic kernel with a robust error func-

∗ 1

x ∗ 2

x ∗ !"#$

%&''&

CVPR 2008 Submission #1001. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

432

into the following form

433

�

434 arg mina11,a22,a12,b1,b2,c x,y

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

N

466

467

468

469

470

471

472

473

474

�Ek(x, y) − 2a12xy

−a11x2 − a22y2 + 2b1x +2b2y − c�2 subject to a11a22 > 2a2 12

(15)

where Ek(x, y) =Ek{Y (xk + ∆x)} and ∆x =[x, y] T .

The above optimization is a quadratically constrained

quadratic program (QCQP) and in general costly to be

solved directly [5]. In the following sections, we will show

some simplified versions of this generic quadratic curve fitting

optimization.

Quadratic Program Curve Fitting: One way to reduce

the complexity of Equation 15 is to enforce Ak to be a diagonal

matrix with �non-negative�diagonal elements. More

a11 0

specifically, Ak =

, where a11,a22 > 0. As

0 a22

a result, Equation 15 can be simplified as

�

arg mina11,a22,b1,b2,c x,y �Ek(x, y)

−a11x2 − a22y2 + 2b1x +2b2y − c�2 (16)

subject to a11 > 0,a22 > 0

which can be solved efficiently through quadratic programming

[5]. We shall refer to this method of fitting a CLM

as convex quadratic fitting (CQF).

When the local search responses from our patch experts

have outliers as shown in Figure 1, it might be difficult to

have accurate surface fitting. In the following section we

will introduce a robust error function to improve the robustness

of curve fitting.

Robust Error Function: Robust error functions have been

used in many registration approaches [3, 18] to improve robustness

for non-rigid image alignment. Although there are

many different choices [18], a sigmoid function is selected

similar to the weighting function in Equation 2. In particular,

we define the robust error function in the following

form,

1

ϱ(E(x); σ) =

1+e−�E(x)�2 where B is a 2N × 2N diagonal matrix with

B (i,i) =

+σ

where σ is a scale parameter which can be estimated from

E(x). Essentially, this function assigns lower weights to the

response values whose fitting error is larger than the scale

∂ϱ(E(xk,yk); σk)

∂xk

B (i+1,i+1) = ∂ϱ(E(xk,yk);

434 arg mina11,a22,a12,b1,b2,c x,y

435

436

437

438

439

440

441

442

443

444

445

σk)

446

∂yk

447

448

where i = 2kand k = 1...N. We shall refer to this

449

method of fitting a CLM as robust convex quadratic fitting

450

(RCQF).

451

Example Fits: Examples of local response surface fitting

452

can be found in Figure 2. The red cross shows the ground

453

truth location in the search window. The closer the peaks

454

of the local responses are to the red cross indicates the

455

better the performance of the method. We can see that in

456

most cases ELS, CQF, and RCQF methods can all achieve

457

good performance. However, our proposed CQF and RCQF

458

methods in (c) and (d) respectively are less sensitive to local

459

minima than the ELS method in (b).

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

(a) (b) (c) (d)

477

Figure 2. Examples of fitting local search responses: (a) is the lo-

478

cal search responses in Figure 1(d) using patch experts trained by

a linear support vector machine (SVM). (b-d) show the surface fit-

479

ting results. More specifically, (b) picks the local displacement

480

with the minimum response value in the search window, while (d)

481

and (e) fit the local search response surface by a quadratic kernel

482

in Equation 16 and a quadratic kernel with a robust error func-

483

tion in Equation 17, respectively. The brighter intensity means the

�Ek(x, y) − 2a12xy

−a11x2 − a22y2 + 2b1x +2b2y − c�2 subject to a11a22 > 2a2 12

where Ek(x, y) =Ek{Y (xk + ∆x)} and ∆x =[x, y] T .

The above optimization is a quadratically constrained

quadratic program (QCQP) and in general costly to be

solved directly [5]. In the following sections, we will show

some simplified versions of this generic quadratic curve fitting

optimization.

Quadratic Program Curve Fitting: One way to reduce

the complexity of Equation 15 is to enforce Ak to be a diagonal

matrix with �non-negative�diagonal elements. More

a11 0

specifically, Ak =

, where a11,a22 > 0. As

0 a22

a result, Equation 15 can be simplified as

�

arg mina11,a22,b1,b2,c x,y �Ek(x, y)

−a11x2 − a22y2 + 2b1x +2b2y − c�2 (16)

subject to a11 > 0,a22 > 0

which can be solved efficiently through quadratic programming

[5]. We shall refer to this method of fitting a CLM

as convex quadratic fitting (CQF).

When the local search responses from our patch experts

have outliers as shown in Figure 1, it might be difficult to

have accurate surface fitting. In the following section we

will introduce a robust error function to improve the robustness

of curve fitting.

Robust Error Function: Robust error functions have been

used in many registration approaches [3, 18] to improve robustness

for non-rigid image alignment. Although there are

many different choices [18], a sigmoid function is selected

similar to the weighting function in Equation 2. In particular,

we define the robust error function in the following

form,

1

ϱ(E(x); σ) =

1+e−�E(x)�2 +σ

where σ is a scale parameter which can be estimated from

E(x). Essentially, this function assigns lower weights to the

response values whose fitting error is larger than the scale

parameter σ, since they are more likely to be the outliers.

As a result, the original curve fitting problem in Equation 14

can be rewritten as

�

arg minAk,bk,ck ∆x ϱ(E(∆x); σ)

(17)

subject to Ak > 0

where E(∆x) =E(∆x) − ∆xT Ak∆x +2bT k ∆x − ck.

B (i,i) =

By

performing a first-order Taylor expansion of ϱ(E(∆x); σ),

we can derive the global update ∆p explicitly in a similar

form to Equation 12

∂ϱ(E(xk,yk); σk)

∂xk

B (i+1,i+1) = ∂ϱ(E(xk,yk);

433

�

434 arg mina11,a22,a12,b1,b2,c x,y

435

σk)

436

∂yk

437

438

where i = 2kand k = 1...N. We shall refer to this

439

method of fitting a CLM as robust convex quadratic fitting

440

(RCQF).

441

Example Fits: Examples of local response surface fitting

442

can be found in Figure 2. The red cross shows the ground

443

truth location in the search window. The closer the peaks

444

of the local responses are to the red cross indicates the

445

better the performance of the method. We can see that in

446

most cases ELS, CQF, and RCQF methods can all achieve

447

good performance. However, our proposed CQF and RCQF

448

methods in (c) and (d) respectively are less sensitive to local

449

minima than the ELS method in (b).

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

(a) (b) (c) (d)

467

Figure 2. Examples of fitting local search responses: (a) is the local

search responses in Figure 1(d) using patch experts trained by

468

a linear support vector machine (SVM). (b-d) show the surface fit-

469

ting results. More specifically, (b) picks the local displacement

470

with the minimum response value in the search window, while (d)

471

and (e) fit the local search response surface by a quadratic kernel

472

in Equation 16 and a quadratic kernel with a robust error func-

473

tion in Equation 17, respectively. The brighter intensity means the

474

smaller matching error between the template and the source image

475

patch. In each search window, the red cross illustrates the ground

476

truth location. As we can see, in most cases, the above three meth-

477

ods can all achieve good performance, while the proposed convex

478

quadratic fitting (CQF) (d) and the robust convex quadratic fitting

(RCQF) (e) methods are less sensitive to local minima than the

479

exhaustive local search (ELS) method (b).

480

481

Computational Complexity: In this section, we investi-

482

gate the computation complexity of the our proposed ap-

�Ek(x, y) − 2a12xy

−a11x2 − a22y2 + 2b1x +2b2y − c�2 subject to a11a22 > 2a2 12

where Ek(x, y) =Ek{Y (xk + ∆x)} and ∆x =[x, y] T .

The above optimization is a quadratically constrained

quadratic program (QCQP) and in general costly to be

solved directly [5]. In the following sections, we will show

some simplified versions of this generic quadratic curve fitting

optimization.

Quadratic Program Curve Fitting: One way to reduce

the complexity of Equation 15 is to enforce Ak to be a diagonal

matrix with �non-negative�diagonal elements. More

a11 0

specifically, Ak =

, where a11,a22 > 0. As

0 a22

a result, Equation 15 can be simplified as

�

arg mina11,a22,b1,b2,c x,y �Ek(x, y)

−a11x2 − a22y2 + 2b1x +2b2y − c�2 (16)

subject to a11 > 0,a22 > 0

which can be solved efficiently through quadratic programming

[5]. We shall refer to this method of fitting a CLM

as convex quadratic fitting (CQF).

When the local search responses from our patch experts

have outliers as shown in Figure 1, it might be difficult to

have accurate surface fitting. In the following section we

will introduce a robust error function to improve the robustness

of curve fitting.

Robust Error Function: Robust error functions have been

used in many registration approaches [3, 18] to improve robustness

for non-rigid image alignment. Although there are

many different choices [18], a sigmoid function is selected

similar to the weighting function in Equation 2. In particular,

we define the robust error function in the following

form,

1

ϱ(E(x); σ) =

1+e−�E(x)�2 +σ

where σ is a scale parameter which can be estimated from

E(x). Essentially, this function assigns lower weights to the

response values whose fitting error is larger than the scale

parameter σ, since they are more likely to be the outliers.

As a result, the original curve fitting problem in Equation 14

can be rewritten as

�

arg minAk,bk,ck ∆x ϱ(E(∆x); σ)

(17)

subject to Ak > 0

where E(∆x) =E(∆x) − ∆xT Ak∆x +2bT k ∆x − ck.

B (i,i) =

By

performing a first-order Taylor expansion of ϱ(E(∆x); σ),

we can derive the global update ∆p explicitly in a similar

∂ϱ(E(xk,yk); σk)

∂xk

B (i+1,i+1) = ∂ϱ(E(xk,yk); σk)

∂yk

where i = 2kand k = 1...N. We shall refer to this

method of fitting a CLM as robust convex quadratic fitting

(RCQF).

Example Fits: Examples of local response surface fitting

can be found in Figure 2. The red cross shows the ground

truth location in the search window. The closer the peaks

of the local responses are to the red cross indicates the

better the performance of the method. We can see that in

most cases ELS, CQF, and RCQF methods can all achieve

good performance. However, our proposed CQF and RCQF

methods in (c) and (d) respectively are less sensitive to local

minima than the ELS method in (b).

O(L)

O(L)

(a) (b) (c) (d)

Figure 2. Examples of fitting local search responses: (a) is the local

search responses in Figure 1(d) using patch experts trained by

a linear support vector machine (SVM). (b-d) show the surface fitting

results. More specifically, (b) picks the local displacement

with the minimum response value in the search window, while (d)

and (e) fit the local search response surface O(L) by a quadratic kernel

in Equation 16 and a quadratic kernel with a robust error function

in Equation 17, respectively. The brighter intensity means the

smaller matching error between the template and the source image

patch. In each search window, the red cross illustrates the ground

truth location. As we can see, in most cases, the above three methods

can all achieve good performance, while the proposed convex

quadratic fitting (CQF) (d) and the robust convex quadratic fitting

(RCQF) (e) methods are less sensitive to local minima than the

exhaustive local search (ELS) method (b).

Friday, 25 November 2011

Computational Complexity: In this section, we investi-

2xy

...................

∂xk

489

Active Shape Models

(15)

∂xk ∂yk

(15)

...

...................

488 487

489 488

490 489

!"#$

491 490

%&''&

492 491

493 492

494 493

486 495 494

487 496 495

488 497 496

489 498 497

490 499 498

491 500 499

492 501 500

493 502 501

494 503 502

495 504 503

496 505 504

497 506 505

498 507 506

499 508 507

500 509 508

501 510 509

502 511 510

503 512 511

504 513 512

505 514 513

506 515 514

507 516 515

508 517 516

509 518 517

510 519 518

511 520 519

512 521 520

513 522 521

514 523 522

515 524 523

516 525 524

517 526 525

518 527 526

519 528 527

520 529 528

521 530 529

522 531 530

523 532 531

524 533 532

525 534 533

189 526 535 534

527 536 535

528 537 536

Can we do Better?

• A common approach in vision literature is to use gradient

information (e.g. LK algorithm),

Friday, 25 November 2011

D(p) =||I(p) − T (0)|| 2

Lucas & Kanade, 1981 (LK Algorithm)

Matthews and Baker, 2004 (Active Appearance Models Revisited)

191

Can we do Better?

• A common approach in vision literature is to use gradient

information (e.g. LK algorithm),

Friday, 25 November 2011

D(p) =f{I(p) − T (0)}

Lucas & Kanade, 1981 (LK Algorithm)

Matthews and Baker, 2004 (Active Appearance Models Revisited)

191

Can we do Better?

• A common approach in vision literature is to use gradient

information (e.g. LK algorithm),

Friday, 25 November 2011

D(p) =f{I(p) − T (0)}

Lucas & Kanade, 1981 (LK Algorithm)

Matthews and Baker, 2004 (Active Appearance Models Revisited)

D(p + ∆p) ≈ f{I(p)+ ∂I(p)

∆p − T (0)}

∂p

191

Can we do Better?

• A common approach in vision literature is to use gradient

information (e.g. LK algorithm),

Friday, 25 November 2011

D(p) =f{I(p) − T (0)}

Lucas & Kanade, 1981 (LK Algorithm)

Matthews and Baker, 2004 (Active Appearance Models Revisited)

D(p + ∆p) ≈ ∆p T A∆p + b T ∆p + c

191

Can we do Better?

• A common approach in vision literature is to use gradient

information (e.g. LK algorithm),

Friday, 25 November 2011

D(p) =f{I(p) − T (0)}

Lucas & Kanade, 1981 (LK Algorithm)

Matthews and Baker, 2004 (Active Appearance Models Revisited)

D(p + ∆p) ≈ ∆p T A∆p + b T ∆p + c

“Forces local response

INDIRECTLY

to be a convex quadratic.”

191

Problems with Gradients

• Have small spatial support.

• Do not handle noise and illumination variation well.

• Need to be close to the global minima to converge.

• Only compatible with certain objective functions.

Friday, 25 November 2011

Liu et al., 2008 (Boosted Active Appearance Models)

Lucey, 2008 (Support Vector Tracking)

I

∗

∗

“Horizontal”

“Vertical”

∇Ix

∇Iy

192

Convex Quadratic Fitting

• Another alternative is to force the responses DIRECTLY to

be a convex quadratic,

+

+

RES ISO ANI

RES ISO ANI GMM KDE!{20,5,1}

ISO ANI GMM KDE!{20,5,1}

whe

where Ki den

are t

are the mixing

mar

mark. Treatin

mar

mark, {zi}

lihoo

max

n w

a

i=1

m

lihood solutio

maximizationm

l

GMM KDE!{20,5,1}

Fig. 3 Response maps, RES, and their approximations u

various methods, for the outer left eye corner, the nose bridg

onse chin. maps, RedRES, crosses and ontheir the response approximations maps denote used the in true land

, ods, RES, locations. forand thetheir outer Theapproximations left GMM eyeapproximation corner, the usednose inhas

bridge five cluster and center

p(p|

osses outer KDE onleft the approximations eye response corner, maps the nose are denote shown bridge the for and true ρ ∈ landmark {20, 5, 1}.

p(p|{li =1}

he response GMM approximation maps denote the has true fivelandmark cluster centers. The

193

n RES ISO ANI GMM KDE!{20,5,1}

Response maps, RES, and their approximations

Wang, Lucey & Cohn,

used

2008

in i=

Friday, 25 November 2011

+

=

m

Friday, 25 November 2011

Results

AAM ASM CQF

194

Mode Seeking

• Minimizing a convex quadratic has a probabilistic

interpretation as finding the mode of a Gaussian distribution.

Friday, 25 November 2011

195

Mode Seeking

• Minimizing a convex quadratic has a probabilistic

interpretation as finding the mode of a Gaussian distribution.

• Gaussian assumption is often too strong, mode seeking of

other distributions can be entertained.

Friday, 25 November 2011

195

Mode Seeking

• Minimizing a convex quadratic has a probabilistic

interpretation as finding the mode of a Gaussian distribution.

• Gaussian assumption is often too strong, mode seeking of

other distributions can be entertained.

• If non-Gaussian, however, mode(s) MUST be found

iteratively.

• Uses variations of the EM-Algorithm,

• Can be slow (iterative).

• Guaranteed of only local minima.

Friday, 25 November 2011

195

Friday, 25 November 2011

Constrained Mean Shift

196

Constrained Mean Shift

• Saragih et al. proposed constrained mean shift.

• Can be applied extremely efficiently using LUT.

• Vary kernel density estimate (KDE) to perform coarse to fine

fitting.

2. Response maps, p(li = aligned|x), and their approxi-

s used in various methods, for the outer left eye corner, the

Friday, 25 November 2011

“Constrained Mean Shift”

Saragih, Lucey and Cohn, ICCV 2009. (Constrained Mean Shifts)

196

Constrained Mean Shift

Fig. 4 Illustration of a the use of a precomputed grid for efficient

mean-shift. Kernel evaluations are precomputed between c and

all other nodes in the grid. To approximate the true kernel evaluation,

xi is assumed to coincide with c and the likelihood of any

Friday, 25 November 2011

197

Constrained Mean Shift

Fig. 4 Illustration of a the use of a precomputed grid for efficient

mean-shift. Kernel evaluations are precomputed between c and

all other nodes in the grid. To approximate the true kernel evalu-

Saragih, Lucey and Cohn, ICCV 2009. (Constrained Mean Shifts)

ation, xi is assumed to coincide with c and the likelihood of any

Friday, 25 November 2011

197

Friday, 25 November 2011

Results

ASM CQF

BTSM CMS

198

ProportionofImages

Proportion of Images

1

0.8

0.6

0.4

0.2

Friday, 25 November 2011

0

ASM(88ms)

CQF(98ms)

GMM(2410ms)

KDE(121ms)

Results

MultiPieFittingCurve

ASM (88ms)

CQF (98ms)

BTSM (2410ms)

CMS (121ms)

0 2 4 6 8 10

Shape RMS Error

ShapeRMSError

(a)

199

Saragih, Lucey, Cohn, ICCV 2009.

Saragih, Lucey, Cohn, IJCV 2011.

Friday, 25 November 2011

CMS Results

200

Saragih, Lucey, Cohn, AFGR 2011.

Friday, 25 November 2011

Friday, 25 November 2011

Best Method is Domain Specific

No Silver Bullet

Friday, 25 November 2011

Graph vs. Response Fitting

Desired Characteristics

Pre-Computed Responses

Sub-Pixel Accuracy

No Initialization

No Descriptors

Real-Time Performance

Efficient Non-Tree Graphs

Joint learning

Mixtures of Detectors

3D inference

D(p) &R(p)

Graph Fitting Response Fitting

203

Friday, 25 November 2011

Promising Directions

204

Promising Directions

• Applying space-time regularizations.

Friday, 25 November 2011

204

Promising Directions

• Applying space-time regularizations.

• Hybrid Graph- and Response- Fitting Approaches.

Friday, 25 November 2011

204

Promising Directions

• Applying space-time regularizations.

• Hybrid Graph- and Response- Fitting Approaches.

• Hybrid descriptor/pixel algorithms.

Friday, 25 November 2011

204

Promising Directions

• Applying space-time regularizations.

• Hybrid Graph- and Response- Fitting Approaches.

• Hybrid descriptor/pixel algorithms.

• 3D inference with Graph Fitting approaches.

Friday, 25 November 2011

204

Friday, 25 November 2011

THANKS