Polynomial Regression on Riemannian Manifolds

<strong>Polynomial</strong> <strong>Regression</strong> on 

Riemannian Manifolds 

Jacob Hinkle, Tom Fletcher, Sarang Joshi 

May 11, 2012 

arxiv:1201.2395

Nonparametric <strong>Regression</strong> 

Number of parameters tied to amount of data present 

Example: kernel regression on images using diffeomorphisms 

(Davis2007) 

<strong>Polynomial</strong> <strong>Regression</strong> on Riemannian Manifolds 2

Parametric <strong>Regression</strong> 

Small number of parameters can be estimated more efficiently 

Fletcher 2011 

Geodesic regression (Niethammer2011, Fletcher2011) has 

recently received attention. 


<strong>Polynomial</strong> <strong>Regression</strong> 

0.04 

0.035 

0.03 

0.025 

Dependent Variable 

0.02 

0.015 

0.01 

0.005 

0 

−0.005 

−0.01 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

Independent Variable 

<strong>Polynomial</strong>s provide a more flexible framework for parametric 

regression on Riemannian manifolds 


Riemannian <strong>Polynomial</strong>s 

At least three ways to define polynomial in R d 

Algebraic: γ(t) = c 0 + 1 1! c 1t + 1 2! c 2t 2 + · · · + 1 k! c kt k 





∫ T 

Variational: γ = argmin ϕ 0 | ( ) k+1 

d 2 

dt 

ϕ(t)| 2 dt s.t. BC/ICs 





Variational: γ = argmin ϕ 

∫ T 

0 | ( d 

dt 

) k+1 

2 


Differential: 

( d 

) k+1 ( 

dt γ(t) = 0 s.t. initial conditions d 

) i 

dt γ(0) = ci 






∫ T 

0 | ( d 

dt 

) k+1 

2 


Differential: 

( d 

) k+1 ( 


) i 

dt γ(0) = ci 

Covariant derivative: replace d dt of vectors with ∇ ˙γ 






∫ T 

0 | ( d 

dt 

) k+1 

2 


Differential: 

( d 

) k+1 ( 


) i 

dt γ(0) = ci 


Geodesic (k = 1) has both forms 

γ = argmin ϕ 

∫ T 

0 | ˙ϕ(t)|2 dt 

∇ ˙γ ˙γ = 0 s.t. initial conditions γ(0), ˙γ(0) 

Well-studied (Fletcher, Younes, Trouve, …) 






∫ T 

0 | ( d 

dt 

) k+1 

2 


Differential: 

( d 

) k+1 ( 


) i 

dt γ(0) = ci 


Cubic spline satisfies (Noakes1989, Leite, Machado,…) 

∫ T 

γ = argmin ϕ 0 |∇ ˙ϕ ˙ϕ(t)| 2 dt 

Euler-Lagrange equation: (∇ ˙γ ) 3 ˙γ = R( ˙γ, ∇ ˙γ ˙γ) ˙γ 

Shape splines (Trouve-Vialard) 






∫ T 

0 | ( d 

dt 

) k+1 

2 


Differential: 

( d 

) k+1 ( 


) i 

dt γ(0) = ci 


k-order polynomial satisfies 

(∇ ˙γ ) k ˙γ = 0 

subject to initial conditions γ(0), (∇ ˙γ ) i ˙γ(0), i = 0, . . . , k − 1 

Introduced via rolling maps by Jupp&Kent1987 

Studied by Leite (2008), in rolling map setting 






∫ T 

0 | ( d 

dt 

) k+1 

2 


Differential: 

( d 

) k+1 ( 


) i 

dt γ(0) = ci 



(∇ ˙γ ) k ˙γ = 0 









∫ T 

0 | ( d 

dt 

) k+1 

2 


Differential: 

( d 

) k+1 ( 


) i 

dt γ(0) = ci 



(∇ ˙γ ) k ˙γ = 0 





Rolling maps 

Leite 2008 

Unroll curve α on manifold to curve α dev on R d without twisting 

or slipping. Then 

(∇ ˙α ) k ˙α = 0 ⇐⇒ 

( d 

dt 

) k 

˙α dev = 0 


Rolling maps 

Leite 2008 

Unroll curve α on manifold to curve α dev on R d without twisting 

or slipping. Then 

(∇ ˙α ) k ˙α = 0 ⇐⇒ 

( d 

dt 

) k 

˙α dev = 0 

Unknown whether this satisfies a variational principle 



Generate via forward evolution of linearized 

system of first-order covariant ODEs 

Forward <strong>Polynomial</strong> Evolution 

repeat 

w ← v 1 

for i = 1, . . . , k − 1 do 

v i ← ParallelTransport γ (∆tw, v i + ∆tv i+1 ) 

end for 

v k ← ParallelTransport γ (∆tw, v k ) 

γ ← Exp γ (∆tw) 

t ← t + ∆t 

until t = T 

Parametrized by ICs: 

γ(0) position 

v 1 (0) velocity 

v 2 (0) acceleration 

v 3 (0) jerk 



(∇ ˙γ ) k ˙γ = 0 becomes linearized system 

˙γ = v 1 

∇ ˙γ v i = v i+1 i = 1, . . . , k − 1 

∇ ˙γ v k = 0. 

Want to find initial conditions for this ODE that minimize 

E(γ) = 

N∑ 

g i (γ(t i )) 

i=1 


Lagrange multiplier (adjoint) vector fields λ i along γ: 

E ∗ (γ, {v i }, {λ i }) = 

N∑ 

g i (γ(t i )) + 

i=1 

∑k−1 

+ 

i=1 

∫ T 

0 

∫ T 

0 

〈λ 0 , ˙γ − v 1 〉dt 

〈λ i , ∇ ˙γ v i − v i+1 〉dt + 

Euler-Lagrange for {λ i } gives forward system. 

Vector field integration by parts: 

∫ T 

0 

〈λ i , ∇ ˙γ v i 〉dt = [〈λ i , v i 〉] T 0 − 

∫ T 

0 

∫ T 

0 

〈∇ ˙γ λ i , v i 〉dt 

〈λ k , ∇ ˙γ v k 〉dt 


Lagrange multiplier (adjoint) vector fields λ i along γ: 

E ∗ (γ, {v i }, {λ i }) = 

N∑ 

g i (γ(t i )) + 

i=1 

∑k−1 

+ 

i=1 

∫ T 

0 

∫ T 

0 

〈λ 0 , ˙γ − v 1 〉dt 

〈λ i , ∇ ˙γ v i − v i+1 〉dt + 

Euler-Lagrange for {λ i } gives forward system. 

Vector field integration by parts: 

∫ T 

0 

〈λ i , ∇ ˙γ v i 〉dt = [〈λ i , v i 〉] T 0 − 

∫ T 

0 

∫ T 

0 

〈∇ ˙γ λ i , v i 〉dt 

〈λ k , ∇ ˙γ v k 〉dt 


Rewrite using integration by parts 

E ∗ (γ, {v i }, {λ i }) = 

N∑ 

g i (γ(t i )) + 

i=1 

∫ T 

∑k−1 

∑k−1 

+ [〈λ i , v i 〉] T 0 − 

i=1 

k−1 ∫ T 

∑ 

− 

i=1 

0 

+ [〈λ k , v k 〉] T 0 − 

0 

〈λ 0 , ˙γ − v 1 〉dt 

i=1 

〈λ i , v i+1 〉dt 

∫ T 

0 

∫ T 

0 

〈∇ ˙γ λ i , v i 〉dt 

〈∇ ˙γ λ k , v k 〉dt 


Rewrite using integration by parts 

E ∗ (γ, {v i }, {λ i }) = 

N∑ 

g i (γ(t i )) + 

i=1 

∫ T 

∑k−1 

∑k−1 

+ [〈λ i , v i 〉] T 0 − 

i=1 

k−1 ∫ T 

∑ 

− 

i=1 

+ [〈λ k , v k 〉] T 0 − 

So variation w.r.t. {v i } gives 

0 

0 

〈λ 0 , ˙γ − v 1 〉dt 

i=1 

〈λ i , v i+1 〉dt 

∫ T 

0 

∫ T 

δ vi E ∗ = 0 = −∇ ˙γ λ i − λ i−1 

δ vi (T )E ∗ = 0 = λ i (T ) 

δ vi (0)E ∗ = −λ i (0) 

0 

〈∇ ˙γ λ i , v i 〉dt 

〈∇ ˙γ λ k , v k 〉dt 


Variation with respect to the curve γ: 

Let {γ s : s ∈ (−ɛ, ɛ)} a smooth family of curves, with: 

γ 0 = γ 

W (t) := d ds γ s(t)| s=0 

Extend v i , λ i away from curve via parallel transport: 

Then 

∫ T 

∇ W v i = 0 

∇ W λ i = 0 

0 

〈δ γ E ∗ (γ, {v i }, {λ i }), W 〉dt = d ds E∗ (γ s , {v i }, {λ i })| s=0 


For any smooth family of curves γ s (t), we have 

[ d 

ds γ s(t), d dt γ s(t)] 

= [W, ˙γ s ] = 0 

so 

We also need the Leibniz rule 

∇ W ˙γ = ∇ ˙γ W. 

d 

ds 〈X, Y 〉| s=0 = 〈∇ W X, Y 〉 + 〈X, ∇ W Y 〉, 

And the Riemann curvature tensor 

R(X, Y )Z = ∇ X ∇ Y Z − ∇ Y ∇ X Z − ∇ [X,Y ] Z 

∇ W ∇ ˙γ Z = ∇ ˙γ ∇ W Z + R(W, ˙γ)Z 


For first term, T 1 = ∫ T 

0 〈λ 0, γ˙ 

s 〉dt 

d 

ds T 1(γ s )| s=0 = d ds 

= 

= 

∫ T 

0 

∫ T 

0 

∫ T 

0 

〈λ 0 , γ˙ 

s 〉dt| s=0 

〈∇ W λ 0 , γ˙ 

s 〉 + 〈λ 0 , ∇ W γ˙ 

s 〉dt| s=0 

〈0, γ˙ 

s 〉 + 〈λ 0 , ∇ ˙γ W 〉dt| s=0 

= [〈λ 0 , W 〉] T 0 − 

∫ T 

Variation of this term with respect to γ is 

0 

δ γ(t) T 1 = −∇ ˙γ λ 0 

δ γ(T ) T 1 = 0 = λ 0 (T ) 

δ γ(0) T 1 = −λ 0 (0) 

〈∇ ˙γ λ 0 , W 〉dt 


Now do the same with another term T i 

d 

ds T i(γ s )| s=0 = d ds 

= 

∫ T 

0 

= 0 + 

= 0 + 

∫ T 

0 

〈λ i , ∇ ˙γ v i 〉dt 

〈∇ W λ i , ∇ ˙γ v i 〉 + 〈λ i , ∇ W ∇ ˙γ v i 〉dt 

∫ T 

0 

∫ T 

0 

〈λ i , ∇ ˙γ ∇ W v i + R(W, ˙γ)v i 〉dt 

〈R(λ i , v i ) ˙γ, W 〉dt 

where we used Bianchi identities to rearrange the curvature 

term. So 

δ γ(t) T i = R(λ i , v i ) ˙γ 


Combine all terms to get adjoint equations 

N∑ 

k∑ 

∇ ˙γ λ 0 = δ(t − t i )(grad g i (γ(t))) + R(λ i , v i )v 1 

i=1 

i=1 

∇ ˙γ λ i = −λ i−1 

Initialization for λ i at t = T is 

λ i (T ) = 0, 

Parameter gradients are 

δ γ(0) E = −λ 0 (0) 

δ vi (0)E = −λ i (0) 



N∑ 

k∑ 


i=1 

i=1 

∇ ˙γ λ i = −λ i−1 


λ i (T ) = 0, 


δ γ(0) E = −λ 0 (0) 

δ vi (0)E = −λ i (0) 



N∑ 

k∑ 


i=1 

i=1 

∇ ˙γ λ i = −λ i−1 


λ i (T ) = 0, 


δ γ(0) E = −λ 0 (0) 

δ vi (0)E = −λ i (0) 

Typically, g i (γ) = d(γ, y i ) 2 , so that 

(grad g i (γ)) = − Log γ y i 



Algorithm 

repeat 

Integrate γ, {v i } forward to t = T 

Initialize λ i (T ) = 0, i = 0, . . . , k 

Integrate {λ i } via adjoint equations back to t = 0 

Gradient descent step: 

γ(0) n+1 = Exp γ(0) n(ɛλ 0 (0)) 

v i (0) n+1 = ParTrans γ(0) n(ɛλ 0 (0), v i (0) n + ɛλ i (0)) 

until convergence 


Special Case: Geodesic (k = 1) 

Adjoint system is 

N∑ 

∇ ˙γ λ 0 = δ(t − t i )(grad g i (γ(t))) + R(λ 1 , v 1 )v 1 

i=1 

∇ ˙γ λ 1 = −λ 0 

Between data points this is 

(∇ ˙γ ) 2 λ 1 = −R(λ 1 , ˙γ) ˙γ 

This is the Jacobi equation, λ 1 is a Jacobi field. 


Kendall Shape Space 

Space of N landmarks in d dimensions, 

R Nd , modulo translation, scale, rotation 

Prevents skewed statistics due to 

similarity transformed data 

d = 2, complex projective space C P N−2 


Kendall Shape Space Geometry 

(d = 2) 

Center point-set and scale so that ∑ N 

i=1 |x i| 2 = 1 (resulting 

object is called a preshape) 

Preshapes lie on sphere S 2N−1 , represented as vectors in 

(R 2 ) N = C N 

Riemannian submersion from preshape to shape space: 

vertical direction holds rotations of R 2 

Exponential and log map available in closed form (for d = 2) 


Covariant derivative in shape space in terms of preshape 

(O’Neill1966): 

∇ ∗ X ∗ 

Y ∗ = H∇ X Y 

Vertical direction is JN, where N is outward unit normal at the 

preshape, J is almost complex structure on C N . 

So parallel transport in small steps in upstairs space then do 

horizontal projection. 


Curvature on preshape sphere S 2N−1 , R(X, Y )Z is: 

R(X, Y )Z = 〈X, Z〉Y − 〈Y, Z〉X 

For curvature, need first fundamental form A. For horizontal vf’s 

X, Y , 

Curvature downstairs is 

A X Y = 1 2 V[X, Y ] 

〈R ∗ (X ∗ , Y ∗ )Z ∗ , H〉 = 〈R(X, Y )Z, H〉 

+ 2〈A X Y, A Z H〉 − 〈A Y Z, A X H〉 − 〈A Z X, A Y H〉 


First fundamental form is (O’Neill) 

A X Y = 〈X, JY 〉JN 

Adjoint of A Z : 

〈A X Y, A Z H〉 = 〈−J〈X, JY 〉Z, H〉 

Curvature then is 

R ∗ (X ∗ , Y ∗ )Z ∗ = R(X, Y )Z − 2J〈X, JY 〉Z + J〈Y, JZ〉X + J〈Z, JX〉Y 


−0.14 

−0.16 

−0.18 

−0.2 

−0.22 

−0.24 

−0.26 

−0.32 −0.3 −0.28 −0.26 −0.24 −0.22 −0.2 

0.32 

0.3 

0.28 

0.26 

0.24 

0.22 

0.2 

0.18 

−0.15 −0.1 −0.05 

−0.1 

−0.12 

−0.14 

−0.16 

−0.18 

−0.2 

−0.22 

0.22 0.24 0.26 0.28 0.3 0.32 

−0.06 

−0.08 

−0.1 

−0.12 

−0.14 

−0.16 

−0.18 

−0.2 

0.44 0.46 0.48 0.5 0.52 0.54 0.56 

Bookstein Rat Calivarium Growth 

0.3 

B 

0.2 

0.1 

0 

8 landmark points 

18 subjects 

8 ages 

D 

−0.1 

C 

A 

−0.2 

−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 

A 

B 

C 

D 

k R 2 

1 0.79 

2 0.85 

3 0.87 


Corpus Collosum Aging (www.oasis-brains.org) 

Geodesic 

0.04 

0.02 

0 

−0.02 

−0.04 

−0.06 

Fletcher 2011 

N = 32 patients 

Age range 18–90 

64 landmarks using 

ShapeWorks sci.utah.edu 

k R 2 

1 0.12 

2 0.13 

3 0.21 

Quadratic 

Cubic 

−0.08 

0.04 

0.02 

0 

−0.02 

−0.04 

−0.06 

−0.08 

0.06 

0.04 

0.02 

0 

−0.02 

−0.04 

−0.06 

−0.08 

−0.15 −0.1 −0.05 0 0.05 0.1 0.15 

−0.15 −0.1 −0.05 0 0.05 0.1 0.15 

−0.15 −0.1 −0.05 0 0.05 0.1 0.15 


Corpus Collosum Aging 

0.08 

0.06 

0.04 

0.02 

0 

−0.02 

−0.04 

˙γ(0) 

∇ ˙γ ˙γ(0) 

(∇ ˙γ ) 2 ˙γ(0) 

−0.06 

−0.08 

−0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 

Initial conditions are collinear, implying time reparametrization 


Landmark Space 

Space L of N points in R d . Geodesic equations: 

d 

dt x i = 

N∑ 

γ(|x i − x j | 2 )α j 

j=1 

d 

dt α i = −2 

N∑ 

(x i − x j )γ ′ (|x i − x j |) 2 αi T α j 

j=1 

Usually use Gaussian kernel 

γ(r) = e −r/(2σ2 ) 

x ∈ L and α ∈ T ∗ x L is a covector (momentum) 


Landmark Space 

Space L of N points in R d . Geodesic equations: 

d 

dt x i = 

N∑ 

γ(|x i − x j | 2 )α j 

j=1 

d 

dt α i = −2 

N∑ 

(x i − x j )γ ′ (|x i − x j |) 2 αi T α j 

j=1 

Usually use Gaussian kernel 

γ(r) = e −r/(2σ2 ) 

x ∈ L and α ∈ T ∗ x L is a covector (momentum) 


Have simple formula for cometric g ij (the kernel) 

Parallel transport in terms of covectors, cometric: 

d 

dt β l = 1 2 g ilg in 

,j g jm (α m β n − α n β m ) − 1 2 gmn ,l α m β n 

Curvature more complicated (Mario’s Formula): 

2R ursv = −g ur,sv − g rv,us + g rs,uv + g uv,rs + 2Γ rv 

ρ Γ us 

σ g ρσ − 2Γ rs 

ρ Γ uv 

σ g ρσ 

+ g rλ,u g λµ g µv,s − g rλ,u g λµ g µs,v + g uλ,r g λµ g µs,v − g uλ,r g λµ g µv,s 

+ g rλ,s g λµ g µv,u + g uλ,v g λµ g µs,r − g rλ,v g λµ g µs,u − g uλ,s g λµ g µv,r . 


Landmark parallel transport in momenta (Younes2008): 

d 

dt β i = K −1( ∑ 

N (x i − x j ) T ((Kβ) i − (Kβ) j )γ ′ (|x i − x j | 2 )α j 

j=1 

N∑ 

) 

− (x i − x j ) T ((Kα) i − (Kα) j )γ ′ (|x i − x j | 2 )β j 

j=1 

N∑ 

− (x i − x j )γ ′ (|x i − x j | 2 )(αj T β i + αi T β j ) 

j=1 

This is enough to integrate polynomials 


For curvature, need Christoffel symbols and their derivatives: 

N∑ 

(Γ(u, v)) i = − (x i − x j) T (v i − v j)γ ′ (|x i − x j| 2 )(K −1 u) j 

j=1 

N∑ 

− (x i − x j) T (u i − u j)γ ′ (|x i − x j| 2 )(K −1 v) j 

j=1 

N∑ 

N∑ 

+ γ(|x i − x j| 2 ) (x j − x k )γ ′ (|x j − x k | 2 )((K −1 u) T k (K −1 v) j + (K −1 

j=1 

k=1 

Take derivative with respect to x, and combine using 

R l ijk = Γl ki,j − Γl ji,k + Γl jmΓ m ki − Γl km Γm ji 


N∑ 

((DΓ(u, v))w)i = (wi − wj) T (ui − uj)γ ′ (|xi − xj| 2 )(K −1 v)j 

j=1 

N∑ 

+ 2 (xi − xj) T (ui − uj)(xi − xj) T (wi − wj)γ ′′ (|xi − xj| 2 )(K −1 v)j 

j=1 

N∑ 

+ (xi − xj) T (ui − uj)γ ′ (|xi − xj| 2 )(( d dɛ K−1 )v)j 

j=1 

N∑ 

+ (wi − wj) T (vi − vj)γ ′ (|xi − xj| 2 )(K −1 u)j 

j=1 

N∑ 

+ 2 (xi − xj) T (vi − vj)(xi − xj) T (wi − wj)γ ′′ (|xi − xj| 2 )(K −1 u)j 

j=1 

N∑ 

+ (xi − xj) T (vi − vj)γ ′ (|xi − xj| 2 )(( d dɛ K−1 )u)j 

j=1 

N∑ 

N∑ 

− 2 (xi − xj) T (wi − wj)γ ′ (|xi − xj| 2 ) (xj − xk)γ ′ (|xj − xk| 2 )((K −1 u) T k (K−1 v)j + (K −1 u) T j (K −1 v)k) 

j=1 

k=1 

N∑ 

N∑ 

− γ(|xi − xj| 2 ) (wj − wk)γ ′ (|xj − xk| 2 )((K −1 u) T k (K−1 v)j + (K −1 u) T j (K −1 v)k) 

j=1 

k=1 

N∑ 

N∑ 

− 2 γ(|xi − xj| 2 ) (xj − xk)(xj − xk) T (wj − wk)γ ′′ (|xj − xk| 2 )((K −1 u) T k (K−1 v)j + (K −1 u) T j (K −1 v)k) 

j=1 

k=1 

N∑ 

N∑ 

− γ(|xi − xj| 2 ) (xj − xk)γ ′ (|xj − xk| 2 ) 

j=1 

k=1 

× (( d dɛ K−1 u) T k (K−1 v)j + (K −1 u) T k ( d dɛ K−1 v)j + ( d dɛ K−1 u) T j (K −1 v)k + (K −1 u) T j ( d dɛ K−1 v)k) 

( 

( d ) 

dɛ K−1 )v = −(K −1 d 

i dɛ KK−1 v)i 

∑ N 

= −2(K −1 (xk − xj) T (wk − wj)γ ′ (|xk − xj| 2 )(K −1 v)j 

j=1 


Landmark <strong>Regression</strong> Results 

Same Bookstein rat data. Procrustes alignment, no scaling. 

0.3 

0.2 

0.1 

0 

−0.1 

−0.2 

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 

0.36 

0.34 

0.32 

0.34 

0.32 

0.3 

0.3 

0.28 

0.28 

0.26 

0.26 

0.24 

0.24 

0.2 0.25 0.3 0.35 0.4 0.45 

0.22 

−0.2 −0.18 −0.16 −0.14 −0.12 −0.1 −0.08 −0.06 −0.04 −0.02 0 

R 2 = 0.92 geodesic, 0.94 quadratic 



Thank You!

Polynomial Regression on Riemannian Manifolds

Create successful ePaper yourself

Delete template?

Save as template?