14.02.2013 Views

Grassmann Clustering

Grassmann Clustering

Grassmann Clustering

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

n(t)<br />

s(t) �<br />

A<br />

x(t)<br />

<strong>Grassmann</strong><br />

<strong>Clustering</strong><br />

Fabian Theis<br />

Institute of Biophysics<br />

University of Regensburg


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Aim of my talk<br />

• clustering - elementary concept of machine learning<br />

• review and illustrate simple clustering algorithm<br />

• extend it to more general metric spaces<br />

• projective space<br />

• <strong>Grassmann</strong> manifolds<br />

• general submanifolds (via kernels)<br />

• applications (very brief - work in progress)<br />

• ICA, NMF<br />

• approximate combinatorial convex optimization<br />

2<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

Agenda<br />

II. Partitional <strong>Clustering</strong><br />

III. Projective <strong>Clustering</strong><br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

V. Subspace <strong>Clustering</strong><br />

3<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

k-means illustration<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

<strong>Clustering</strong><br />

goal:<br />

• given a multivariate data set A<br />

• determine<br />

• partition into groups (clusters)<br />

• representative cluster centers (centroids)<br />

approaches:<br />

• partitional clustering (here)<br />

• hierarchical clustering<br />

partitional<br />

clustering<br />

5<br />

hierarchical<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

has many applications -<br />

for example for<br />

feature extraction<br />

of immunological data<br />

sets<br />

I. Introduction<br />

<strong>Clustering</strong><br />

[ Theis, Hartl, Krauss-Etschmann, Lang. Neural networksignal analysis in immunology. Proc. ISSPA2003. ]<br />

0.426<br />

0.248<br />

0.07<br />

49<br />

35.9<br />

d 22.8<br />

22.1<br />

10.9<br />

d 0.718<br />

53.8<br />

d 4<br />

27.9<br />

9.29 CB(3)<br />

4.44<br />

d 0.0623<br />

CB(3)<br />

ILD(1) CB(1) ILD(2)<br />

ILD(2)<br />

CB(1) CB(1) ILD(2)<br />

ILD(3)<br />

ILD(1) CB(1) ILD(2)<br />

CB(2)<br />

CB(1) CB(1)<br />

ILD(1)<br />

CB(2)<br />

ILD(1)<br />

ILD(1)<br />

ILD(2)<br />

ILD(1) ILD(1)<br />

CB(1)<br />

1.89<br />

1<br />

d 0.166<br />

3.3<br />

1.81<br />

d 0.395<br />

32.1<br />

16.3<br />

d 1.39<br />

5.22<br />

3.28<br />

d 1.35<br />

nO(2)<br />

O(1)<br />

nO(3)<br />

x(1) O(1) x(2)<br />

x(2)<br />

O(1) O(1) x(2)<br />

x(3)<br />

O(2)<br />

nO(1) O(1)<br />

x(1) O(1) x(2)<br />

x(1)<br />

x(1)<br />

x(1)<br />

x(1)<br />

O(2)<br />

x(1)<br />

x(2)<br />

O(1)<br />

13.6<br />

8.23<br />

d 2.8<br />

4.5<br />

1.8<br />

d 0.104<br />

82.6<br />

57.2<br />

d 30.4<br />

699000<br />

446000<br />

d 196000<br />

6<br />

48.1<br />

30.3<br />

d 15.5<br />

37.8<br />

19.1<br />

d 3.2<br />

25.5<br />

16<br />

d 6.84<br />

1.71<br />

1.33<br />

d 1<br />

K−means−Clusters<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

Example: read digits!<br />

task: differentiate between and<br />

use unsupervised learning<br />

7<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

Example: read digits!<br />

task: differentiate between and<br />

use unsupervised learning<br />

7<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Example: read digits!<br />

task: differentiate between and<br />

machine<br />

learning<br />

I. Introduction<br />

use unsupervised learning<br />

7<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

dimension reduction: via PCA to only 2 dimensions<br />

8<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

dimension reduction: via PCA to only 2 dimensions<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

Apr 6, 2006 :: Tübingen<br />

8


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

dimension reduction: via PCA to only 2 dimensions<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

Apr 6, 2006 :: Tübingen<br />

8


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Samples<br />

Centroids<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Aufteilung<br />

batch k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Zuweisung<br />

batch k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

batch k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

beliebiges Sample<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

nächster Centroid<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Update<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 1 iteration<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 2 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 3 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 4 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 5 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 6 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 7 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

done<br />

10<br />

error: 4.5%<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

3<br />

x(t)<br />

hierarchical clustering<br />

also works:<br />

2.5<br />

2<br />

1.5<br />

1<br />

30<br />

27<br />

26<br />

25<br />

15<br />

5<br />

19<br />

16<br />

24<br />

7<br />

20<br />

18<br />

28<br />

29<br />

11<br />

8<br />

4<br />

21<br />

17<br />

23<br />

1<br />

14<br />

10<br />

2<br />

12<br />

13<br />

6<br />

22<br />

9<br />

3<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

11<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

3<br />

x(t)<br />

hierarchical clustering<br />

also works:<br />

2.5<br />

2<br />

1.5<br />

1<br />

30<br />

27<br />

26<br />

25<br />

15<br />

5<br />

19<br />

16<br />

24<br />

7<br />

20<br />

18<br />

28<br />

29<br />

11<br />

8<br />

4<br />

21<br />

17<br />

23<br />

1<br />

14<br />

10<br />

2<br />

12<br />

13<br />

6<br />

22<br />

9<br />

3<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

11<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

error: 4.2%<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

divide and conquer<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

if A={ a1, ..., aT } constrained non-linear opt. problem<br />

minimize<br />

subject to<br />

with centroid locations C := {c1, . . . , ck }<br />

and partition matrix W := (wit)<br />

14<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

minimize this!<br />

common approach: partial optimization for W and C<br />

alternating minimization of either W and C while<br />

keeping the other one fixed<br />

15<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

minimize this!<br />

batch k-means<br />

cluster assignment: for each at<br />

determine an index i(t) such that<br />

cluster update: within each cluster<br />

determine the centroid ci by minimizing<br />

15<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

minimize this!<br />

batch k-means<br />

cluster assignment: for each at<br />

determine an index i(t) such that<br />

cluster update: within each cluster<br />

determine the centroid ci by minimizing<br />

may be<br />

difficult<br />

15<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Euclidean case<br />

special case: and Euclidean distance<br />

centroid is given by cluster mean<br />

because<br />

II. Partitional <strong>Clustering</strong><br />

16<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

1<br />

0.5<br />

x(t)<br />

0<br />

−0.5<br />

−1<br />

II. Partitional <strong>Clustering</strong><br />

−1<br />

−0.5<br />

[Georgiev, Theis, Cichocki. IEEE Trans.Neural Networks, 16(4):992-996, 2005]<br />

0<br />

0.5<br />

1<br />

18<br />

k-plane clustering in<br />

three dimensions<br />

(300 Gaussian samples,<br />

3 hyperplanes)<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

III. Projective <strong>Clustering</strong><br />

line-groups<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

1<br />

F. Theis<br />

0.5<br />

x(t)<br />

0<br />

−0.5<br />

−1<br />

−1<br />

III. Projective <strong>Clustering</strong><br />

−0.5<br />

0<br />

0.5<br />

1<br />

21<br />

projective clustering<br />

in three dimensions<br />

(k=4 clusters)<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

III. Projective <strong>Clustering</strong><br />

Applications & Extensions<br />

• robustness of ICA<br />

• similar to [Meinecke et al, 2003]<br />

• determine „best“ directions in data set<br />

• hyperplane clustering<br />

• instead of line vectors cluster<br />

normal vectors<br />

• projective k-median clustering<br />

• median is statistically more robust!<br />

data set<br />

bootstrap<br />

matrix A matrix A<br />

best mixing<br />

directions<br />

22<br />

different sample subsets<br />

projective clustering<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

subspaces instead of lines<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

<strong>Grassmann</strong> <strong>Clustering</strong><br />

• goal: clustering in Gn,p<br />

• for this necessary: definition of a metric<br />

• different notions possible<br />

• may be derived from the geodesic metric induced<br />

by the natural Riemannian structure of Gn,p<br />

[Edelman et al 1999]<br />

• here: computationally simple projection F-norm<br />

with Frobenius norm<br />

25<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

generalizes well<br />

remark:<br />

• Gn,1 = RP n<br />

• and metrics coincide (not entirely obvious)<br />

26<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

main result:<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Centroid Calculation<br />

theorem: the centroid [Ci] of some cluster Bi is spanned<br />

by p eigenvectors corresponding to the smallest<br />

eigenvalues of the generalized cluster correlation<br />

27<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Proof (optima, example)<br />

(0, .5, .5, 0)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

(0, 1, 0, 0)<br />

(0, 0, 1, 0) (.5, 0, .5, 0)<br />

(.3, .3, 0, .3)<br />

(1, 0, 0, 0)<br />

n=4<br />

29<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

p=3<br />

x(t)<br />

Proof (optima, example)<br />

p=1<br />

(0, .5, .5, 0)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

(0, 1, 0, 0)<br />

(0, 0, 1, 0) (.5, 0, .5, 0)<br />

(.3, .3, 0, .3)<br />

(1, 0, 0, 0)<br />

p=2<br />

n=4<br />

29<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• corners are in our case<br />

• so<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Proof (ctd.)<br />

• i.e. as claimed.<br />

30<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Indeterminacies<br />

occur if eigenspaces are higher dimensional or<br />

eigenvalue zero is present<br />

e1<br />

}<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

d0(e1, x) = cos(x1) 2<br />

}<br />

x = (x1, x2)<br />

two sample case: e.g. Vi = ei<br />

d0(e2, x) = cos(x2) 2<br />

e2<br />

31<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Indeterminacies<br />

occur if eigenspaces are higher dimensional or<br />

eigenvalue zero is present<br />

e1<br />

}<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

d0(e1, x) = cos(x1) 2<br />

}<br />

x = (x1, x2)<br />

two sample case: e.g. Vi = ei<br />

d0(e2, x) = cos(x2) 2<br />

e2<br />

where‘s the centroid?<br />

31<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Examples<br />

• toy example<br />

• 10 4 samples of G4,2<br />

• drawn from k=6<br />

coordinate hyperplanes<br />

• add 10dB noise<br />

• k-subspace clustering<br />

• convergence after 6 its.<br />

• visualize clusters by<br />

index<br />

1.5<br />

1<br />

0.5<br />

0<br />

1<br />

2<br />

3<br />

4<br />

1<br />

2<br />

3<br />

32<br />

Apr 6, 2006 :: Tübingen<br />

4<br />

5<br />

6


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Polytope identification<br />

• solve approximation problem (from comp. geometry)<br />

• given set of points, identify smallest enclosing<br />

convex polytope with fixed number of faces k<br />

• algorithm:<br />

• compute convex hull (QHull)<br />

• apply subspace k-means to faces<br />

• note: affine version nec.<br />

• include sample weighting by volume<br />

• possibly intersect resulting clusters<br />

33<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Application to NMF<br />

• Nonnegative Matrix Factorization<br />

• observing X=AS+N with everything with nonneg.<br />

coefficients, find unknown A and S<br />

• key idea:<br />

• X is conic hull with edges given by columns of A<br />

• projection onto std simplex yields:<br />

• X‘ lies in the convex hull with vertices given by A<br />

• example:<br />

• n=3, 100 uniform samples<br />

34<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Data points in simplex projection (100)<br />

35<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Extracted borders (100)<br />

36<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Cluster centers (100)<br />

37<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Comparison<br />

• why? because NMF has<br />

high indeterminacy<br />

• minimality criterion<br />

seems to be a good<br />

idea<br />

• related:<br />

• [Cutler, Archetypal<br />

analysis, 1994]<br />

• [Theis, how to make<br />

NMF unique,in prep]<br />

Crosserror<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

10 1<br />

10 1<br />

10 2<br />

10 2<br />

10 3<br />

NMF (Mean Square Error)<br />

<strong>Grassmann</strong> clustering<br />

10 3<br />

Number of samples<br />

10 4<br />

10 4<br />

10 5<br />

10 5<br />

38<br />

1.6<br />

1.4<br />

1.2<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

Apr 6, 2006 :: Tübingen<br />

1<br />

0


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

•<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Other applications<br />

Non-Gaussian Component Analysis<br />

• [Blanchard et al, 2005]<br />

• identify non-noise subspace via k=2 clustering<br />

• inter-subject biomedical data analysis<br />

• identify common subspaces in multivariate data<br />

sets<br />

• allows for inter-patient variability with keeping<br />

vector space selectivity<br />

• robustness of subspace ICA<br />

• extension of the ICA case<br />

39<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

let‘s bend ‘em<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Nonlinear generalization<br />

warning: very preliminary<br />

• goal: nonlinear clustering in data space<br />

• common approach: hope it‘s linear in a higherdimensional<br />

feature space so embed it via<br />

• describe Φ only by dot-product [Schölkopf &<br />

Smola 2001]<br />

• data set<br />

• linear subspaces<br />

• here: result resembles extension of kernel PCA<br />

41<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

define kernel matrix<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

define kernel matrix<br />

just solve EVD<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Distance Calculation<br />

• necessary for partitioning step of subspace k-means<br />

• distance:<br />

• here (from previous step):<br />

• so<br />

V. Submanifold <strong>Clustering</strong><br />

43<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

-1.5<br />

-2<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Example<br />

-2.5<br />

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5<br />

data space<br />

44<br />

• use simple polynomial<br />

kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy)<br />

• draw 100 hyperplanes<br />

samples from ellipses as<br />

shown<br />

• 3 samples per hyerplane<br />

• because in 4d - after<br />

affine embedding<br />

• convergence after 2 its<br />

• mismatch: 7 samples<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

5<br />

4.5<br />

4<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Example<br />

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5<br />

feature space (1,3)<br />

44<br />

• use simple polynomial<br />

kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy)<br />

• draw 100 hyperplanes<br />

samples from ellipses as<br />

shown<br />

• 3 samples per hyerplane<br />

• because in 4d - after<br />

affine embedding<br />

• convergence after 2 its<br />

• mismatch: 7 samples<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

0 10 20 30 40 50 60 70 80 90 100<br />

learnt<br />

cluster<br />

indices<br />

true<br />

cluster<br />

indices<br />

45<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Conclusion<br />

done - but lot‘s of todos<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Conclusions<br />

• showed<br />

• extension of k-means to <strong>Grassmann</strong> manifold<br />

• first results of nonlinear generalization via kernels<br />

• open<br />

• study applications in detail - new apps?<br />

• preimage problem in submanifold k-means<br />

• diff. subspace clustering/classification<br />

• acknowledgements<br />

• Peter Gruber & Harold Gutch<br />

• Motoaki Kawanabe and...<br />

47<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Conclusions<br />

• showed<br />

• extension of k-means to <strong>Grassmann</strong> manifold<br />

• first results of nonlinear generalization via kernels<br />

• open<br />

• study applications in detail - new apps?<br />

• preimage problem in submanifold k-means<br />

• diff. subspace clustering/classification<br />

• acknowledgements<br />

• Peter Gruber & Harold Gutch<br />

• Motoaki Kawanabe and...<br />

47<br />

Apr 6, 2006 :: Tübingen

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!