Grassmann Clustering

n(t) 

s(t) � 

A 

x(t) 

Grassmann 

Clustering 

Fabian Theis 

Institute of Biophysics 

University of Regensburg

n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Aim of my talk 

• clustering - elementary concept of machine learning 

• review and illustrate simple clustering algorithm 

• extend it to more general metric spaces 

• projective space 

• Grassmann manifolds 

• general submanifolds (via kernels) 

• applications (very brief - work in progress) 

• ICA, NMF 

• approximate combinatorial convex optimization 

2 

Apr 6, 2006 :: Tübingen

n(t) 

s(t) � 

A 

F. Theis 

x(t) 

I. Introduction 

Agenda 

II. Partitional Clustering 

III. Projective Clustering 

IV. Grassmann Clustering 

V. Subspace Clustering 

3 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


k-means illustration 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 



goal: 

• given a multivariate data set A 

• determine 

• partition into groups (clusters) 

• representative cluster centers (centroids) 

approaches: 

• partitional clustering (here) 

• hierarchical clustering 

partitional 

clustering 

5 

hierarchical 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

has many applications - 

for example for 

feature extraction 

of immunological data 

sets 



[ Theis, Hartl, Krauss-Etschmann, Lang. Neural networksignal analysis in immunology. Proc. ISSPA2003. ] 

0.426 

0.248 

0.07 

49 

35.9 

d 22.8 

22.1 

10.9 

d 0.718 

53.8 

d 4 

27.9 

9.29 CB(3) 

4.44 

d 0.0623 

CB(3) 

ILD(1) CB(1) ILD(2) 

ILD(2) 

CB(1) CB(1) ILD(2) 

ILD(3) 

ILD(1) CB(1) ILD(2) 

CB(2) 

CB(1) CB(1) 

ILD(1) 

CB(2) 

ILD(1) 

ILD(1) 

ILD(2) 

ILD(1) ILD(1) 

CB(1) 

1.89 

1 

d 0.166 

3.3 

1.81 

d 0.395 

32.1 

16.3 

d 1.39 

5.22 

3.28 

d 1.35 

nO(2) 

O(1) 

nO(3) 

x(1) O(1) x(2) 

x(2) 

O(1) O(1) x(2) 

x(3) 

O(2) 

nO(1) O(1) 

x(1) O(1) x(2) 

x(1) 

x(1) 

x(1) 

x(1) 

O(2) 

x(1) 

x(2) 

O(1) 

13.6 

8.23 

d 2.8 

4.5 

1.8 

d 0.104 

82.6 

57.2 

d 30.4 

699000 

446000 

d 196000 

6 

48.1 

30.3 

d 15.5 

37.8 

19.1 

d 3.2 

25.5 

16 

d 6.84 

1.71 

1.33 

d 1 

K−means−Clusters 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Example: read digits! 

task: differentiate between and 

use unsupervised learning 

7 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 





7 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 



machine 

learning 



7 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


dimension reduction: via PCA to only 2 dimensions 

8 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 


−1.5 

−1 0 1 2 3 4 5 

Apr 6, 2006 :: Tübingen 

8

n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 


−1.5 

−1 0 1 2 3 4 5 


8

n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 

• distance measure 

• algorithm: 

• fix number of 

clusters k 

• initialize centroids 

randomly 

• update-rule: 

batch or sequential 


k-means 

Samples 

Centroids 

9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 




clusters k 


randomly 




k-means 

Aufteilung 

batch k-means 

9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 




clusters k 


randomly 




k-means 

Zuweisung 

batch k-means 

9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 




clusters k 


randomly 




k-means 

batch k-means 

9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 




clusters k 


randomly 




k-means 

beliebiges Sample 

sequential k-means 

9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 




clusters k 


randomly 




k-means 

nächster Centroid 


9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 




clusters k 


randomly 




k-means 

Update 


9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• samples 




clusters k 


randomly 




k-means 


9 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

„digital“ batch k-means 

3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 


k−means after 1 iteration 

−1.5 

−1 0 1 2 3 4 5 

10 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 


k−means after 2 iterations 

−1.5 

−1 0 1 2 3 4 5 

10 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 



−1.5 

−1 0 1 2 3 4 5 

10 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 



−1.5 

−1 0 1 2 3 4 5 

10 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 



−1.5 

−1 0 1 2 3 4 5 

10 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 



−1.5 

−1 0 1 2 3 4 5 

10 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 



−1.5 

−1 0 1 2 3 4 5 

done 

10 

error: 4.5% 


n(t) 

s(t) � 

A 

F. Theis 

3 

x(t) 

hierarchical clustering 

also works: 

2.5 

2 

1.5 

1 

30 

27 

26 

25 

15 

5 

19 

16 

24 

7 

20 

18 

28 

29 

11 

8 

4 

21 

17 

23 

1 

14 

10 

2 

12 

13 

6 

22 

9 

3 

3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 

11 

−1.5 

−1 0 1 2 3 4 5 


n(t) 

s(t) � 

A 

F. Theis 

3 

x(t) 

hierarchical clustering 

also works: 

2.5 

2 

1.5 

1 

30 

27 

26 

25 

15 

5 

19 

16 

24 

7 

20 

18 

28 

29 

11 

8 

4 

21 

17 

23 

1 

14 

10 

2 

12 

13 

6 

22 

9 

3 

3 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

−1 

11 

−1.5 

−1 0 1 2 3 4 5 

error: 4.2% 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


divide and conquer 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


if A={ a1, ..., aT } constrained non-linear opt. problem 

minimize 

subject to 

with centroid locations C := {c1, . . . , ck } 

and partition matrix W := (wit) 

14 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


minimize this! 

common approach: partial optimization for W and C 

alternating minimization of either W and C while 

keeping the other one fixed 

15 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 



batch k-means 

cluster assignment: for each at 

determine an index i(t) such that 

cluster update: within each cluster 

determine the centroid ci by minimizing 

15 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 



batch k-means 

cluster assignment: for each at 

determine an index i(t) such that 

cluster update: within each cluster 

determine the centroid ci by minimizing 

may be 

difficult 

15 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Euclidean case 

special case: and Euclidean distance 

centroid is given by cluster mean 

because 


16 


n(t) 

s(t) � 

A 

F. Theis 

1 

0.5 

0 

−0.5 

−1 

1 

0.5 

x(t) 

0 

−0.5 

−1 


−1 

−0.5 

[Georgiev, Theis, Cichocki. IEEE Trans.Neural Networks, 16(4):992-996, 2005] 

0 

0.5 

1 

18 

k-plane clustering in 

three dimensions 

(300 Gaussian samples, 

3 hyperplanes) 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


line-groups 


n(t) 

s(t) � 

A 

1 

0.5 

0 

−0.5 

−1 

1 

F. Theis 

0.5 

x(t) 

0 

−0.5 

−1 

−1 


−0.5 

0 

0.5 

1 

21 

projective clustering 

in three dimensions 

(k=4 clusters) 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Applications & Extensions 

• robustness of ICA 

• similar to [Meinecke et al, 2003] 

• determine „best“ directions in data set 

• hyperplane clustering 

• instead of line vectors cluster 

normal vectors 

• projective k-median clustering 

• median is statistically more robust! 

data set 

bootstrap 

matrix A matrix A 

best mixing 

directions 

22 

different sample subsets 

projective clustering 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


subspaces instead of lines 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Grassmann Clustering 

• goal: clustering in Gn,p 

• for this necessary: definition of a metric 

• different notions possible 

• may be derived from the geodesic metric induced 

by the natural Riemannian structure of Gn,p 

[Edelman et al 1999] 

• here: computationally simple projection F-norm 

with Frobenius norm 

25 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


generalizes well 

remark: 

• Gn,1 = RP n 

• and metrics coincide (not entirely obvious) 

26 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

main result: 


Centroid Calculation 

theorem: the centroid [Ci] of some cluster Bi is spanned 

by p eigenvectors corresponding to the smallest 

eigenvalues of the generalized cluster correlation 

27 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Proof (optima, example) 

(0, .5, .5, 0) 


(0, 1, 0, 0) 

(0, 0, 1, 0) (.5, 0, .5, 0) 

(.3, .3, 0, .3) 

(1, 0, 0, 0) 

n=4 

29 


n(t) 

s(t) � 

A 

F. Theis 

p=3 

x(t) 

Proof (optima, example) 

p=1 

(0, .5, .5, 0) 


(0, 1, 0, 0) 

(0, 0, 1, 0) (.5, 0, .5, 0) 

(.3, .3, 0, .3) 

(1, 0, 0, 0) 

p=2 

n=4 

29 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

• corners are in our case 

• so 


Proof (ctd.) 

• i.e. as claimed. 

30 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Indeterminacies 

occur if eigenspaces are higher dimensional or 

eigenvalue zero is present 

e1 

} 


d0(e1, x) = cos(x1) 2 

} 

x = (x1, x2) 

two sample case: e.g. Vi = ei 

d0(e2, x) = cos(x2) 2 

e2 

31 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Indeterminacies 

occur if eigenspaces are higher dimensional or 

eigenvalue zero is present 

e1 

} 


d0(e1, x) = cos(x1) 2 

} 

x = (x1, x2) 

two sample case: e.g. Vi = ei 

d0(e2, x) = cos(x2) 2 

e2 

where‘s the centroid? 

31 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Examples 

• toy example 

• 10 4 samples of G4,2 

• drawn from k=6 

coordinate hyperplanes 

• add 10dB noise 

• k-subspace clustering 

• convergence after 6 its. 

• visualize clusters by 

index 

1.5 

1 

0.5 

0 

1 

2 

3 

4 

1 

2 

3 

32 


4 

5 

6

n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Polytope identification 

• solve approximation problem (from comp. geometry) 

• given set of points, identify smallest enclosing 

convex polytope with fixed number of faces k 


• compute convex hull (QHull) 

• apply subspace k-means to faces 

• note: affine version nec. 

• include sample weighting by volume 

• possibly intersect resulting clusters 

33 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Application to NMF 

• Nonnegative Matrix Factorization 

• observing X=AS+N with everything with nonneg. 

coefficients, find unknown A and S 

• key idea: 

• X is conic hull with edges given by columns of A 

• projection onto std simplex yields: 

• X‘ lies in the convex hull with vertices given by A 

• example: 

• n=3, 100 uniform samples 

34 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Data points in simplex projection (100) 

35 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Extracted borders (100) 

36 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Cluster centers (100) 

37 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Comparison 

• why? because NMF has 

high indeterminacy 

• minimality criterion 

seems to be a good 

idea 

• related: 

• [Cutler, Archetypal 

analysis, 1994] 

• [Theis, how to make 

NMF unique,in prep] 

Crosserror 

1.6 

1.4 

1.2 

1 

0.8 

0.6 

0.4 

0.2 

0 

10 1 

10 1 

10 2 

10 2 

10 3 

NMF (Mean Square Error) 

Grassmann clustering 

10 3 

Number of samples 

10 4 

10 4 

10 5 

10 5 

38 

1.6 

1.4 

1.2 

0.8 

0.6 

0.4 

0.2 


1 

0

n(t) 

s(t) � 

A 

F. Theis 

• 

x(t) 


Other applications 

Non-Gaussian Component Analysis 

• [Blanchard et al, 2005] 

• identify non-noise subspace via k=2 clustering 

• inter-subject biomedical data analysis 

• identify common subspaces in multivariate data 

sets 

• allows for inter-patient variability with keeping 

vector space selectivity 

• robustness of subspace ICA 

• extension of the ICA case 

39 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

V. Submanifold Clustering 

let‘s bend ‘em 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 


Nonlinear generalization 

warning: very preliminary 

• goal: nonlinear clustering in data space 

• common approach: hope it‘s linear in a higherdimensional 

feature space so embed it via 

• describe Φ only by dot-product [Schölkopf & 

Smola 2001] 

• data set 

• linear subspaces 

• here: result resembles extension of kernel PCA 

41 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 



nonlinear generalized cluster correlation: 

42 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 




42 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 




define kernel matrix 

42 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 




define kernel matrix 

just solve EVD 

42 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Distance Calculation 

• necessary for partitioning step of subspace k-means 

• distance: 

• here (from previous step): 

• so 


43 


n(t) 

s(t) � 

A 

2.5 

2 

1.5 

1 

0.5 

0 

-0.5 

-1 

-1.5 

-2 

F. Theis 

x(t) 


Example 

-2.5 

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 

data space 

44 

• use simple polynomial 

kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy) 

• draw 100 hyperplanes 

samples from ellipses as 

shown 

• 3 samples per hyerplane 

• because in 4d - after 

affine embedding 

• convergence after 2 its 

• mismatch: 7 samples 


n(t) 

s(t) � 

A 

5 

4.5 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

0 

F. Theis 

x(t) 


Example 

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 

feature space (1,3) 

44 

• use simple polynomial 

kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy) 

• draw 100 hyperplanes 

samples from ellipses as 

shown 

• 3 samples per hyerplane 

• because in 4d - after 

affine embedding 

• convergence after 2 its 

• mismatch: 7 samples 


n(t) 

s(t) � 

A 

3 

2.5 

2 

1.5 

1 

F. Theis 

x(t) 


0 10 20 30 40 50 60 70 80 90 100 

learnt 

cluster 

indices 

true 

cluster 

indices 

45 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Conclusion 

done - but lot‘s of todos 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Conclusions 

• showed 

• extension of k-means to Grassmann manifold 

• first results of nonlinear generalization via kernels 

• open 

• study applications in detail - new apps? 

• preimage problem in submanifold k-means 

• diff. subspace clustering/classification 

• acknowledgements 

• Peter Gruber & Harold Gutch 

• Motoaki Kawanabe and... 

47 


n(t) 

s(t) � 

A 

F. Theis 

x(t) 

Conclusions 

• showed 

• extension of k-means to Grassmann manifold 

• first results of nonlinear generalization via kernels 

• open 

• study applications in detail - new apps? 

• preimage problem in submanifold k-means 

• diff. subspace clustering/classification 

• acknowledgements 

• Peter Gruber & Harold Gutch 

• Motoaki Kawanabe and... 

47

Grassmann Clustering

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?