14.02.2013 Views

Grassmann Clustering

Grassmann Clustering

Grassmann Clustering

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

n(t)<br />

s(t) �<br />

A<br />

x(t)<br />

<strong>Grassmann</strong><br />

<strong>Clustering</strong><br />

Fabian Theis<br />

Institute of Biophysics<br />

University of Regensburg


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Aim of my talk<br />

• clustering - elementary concept of machine learning<br />

• review and illustrate simple clustering algorithm<br />

• extend it to more general metric spaces<br />

• projective space<br />

• <strong>Grassmann</strong> manifolds<br />

• general submanifolds (via kernels)<br />

• applications (very brief - work in progress)<br />

• ICA, NMF<br />

• approximate combinatorial convex optimization<br />

2<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

Agenda<br />

II. Partitional <strong>Clustering</strong><br />

III. Projective <strong>Clustering</strong><br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

V. Subspace <strong>Clustering</strong><br />

3<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

k-means illustration<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

<strong>Clustering</strong><br />

goal:<br />

• given a multivariate data set A<br />

• determine<br />

• partition into groups (clusters)<br />

• representative cluster centers (centroids)<br />

approaches:<br />

• partitional clustering (here)<br />

• hierarchical clustering<br />

partitional<br />

clustering<br />

5<br />

hierarchical<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

has many applications -<br />

for example for<br />

feature extraction<br />

of immunological data<br />

sets<br />

I. Introduction<br />

<strong>Clustering</strong><br />

[ Theis, Hartl, Krauss-Etschmann, Lang. Neural networksignal analysis in immunology. Proc. ISSPA2003. ]<br />

0.426<br />

0.248<br />

0.07<br />

49<br />

35.9<br />

d 22.8<br />

22.1<br />

10.9<br />

d 0.718<br />

53.8<br />

d 4<br />

27.9<br />

9.29 CB(3)<br />

4.44<br />

d 0.0623<br />

CB(3)<br />

ILD(1) CB(1) ILD(2)<br />

ILD(2)<br />

CB(1) CB(1) ILD(2)<br />

ILD(3)<br />

ILD(1) CB(1) ILD(2)<br />

CB(2)<br />

CB(1) CB(1)<br />

ILD(1)<br />

CB(2)<br />

ILD(1)<br />

ILD(1)<br />

ILD(2)<br />

ILD(1) ILD(1)<br />

CB(1)<br />

1.89<br />

1<br />

d 0.166<br />

3.3<br />

1.81<br />

d 0.395<br />

32.1<br />

16.3<br />

d 1.39<br />

5.22<br />

3.28<br />

d 1.35<br />

nO(2)<br />

O(1)<br />

nO(3)<br />

x(1) O(1) x(2)<br />

x(2)<br />

O(1) O(1) x(2)<br />

x(3)<br />

O(2)<br />

nO(1) O(1)<br />

x(1) O(1) x(2)<br />

x(1)<br />

x(1)<br />

x(1)<br />

x(1)<br />

O(2)<br />

x(1)<br />

x(2)<br />

O(1)<br />

13.6<br />

8.23<br />

d 2.8<br />

4.5<br />

1.8<br />

d 0.104<br />

82.6<br />

57.2<br />

d 30.4<br />

699000<br />

446000<br />

d 196000<br />

6<br />

48.1<br />

30.3<br />

d 15.5<br />

37.8<br />

19.1<br />

d 3.2<br />

25.5<br />

16<br />

d 6.84<br />

1.71<br />

1.33<br />

d 1<br />

K−means−Clusters<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

Example: read digits!<br />

task: differentiate between and<br />

use unsupervised learning<br />

7<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

Example: read digits!<br />

task: differentiate between and<br />

use unsupervised learning<br />

7<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Example: read digits!<br />

task: differentiate between and<br />

machine<br />

learning<br />

I. Introduction<br />

use unsupervised learning<br />

7<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

I. Introduction<br />

dimension reduction: via PCA to only 2 dimensions<br />

8<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

dimension reduction: via PCA to only 2 dimensions<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

Apr 6, 2006 :: Tübingen<br />

8


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

dimension reduction: via PCA to only 2 dimensions<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

Apr 6, 2006 :: Tübingen<br />

8


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Samples<br />

Centroids<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Aufteilung<br />

batch k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Zuweisung<br />

batch k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

batch k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

beliebiges Sample<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

nächster Centroid<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

Update<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• samples<br />

• distance measure<br />

• algorithm:<br />

• fix number of<br />

clusters k<br />

• initialize centroids<br />

randomly<br />

• update-rule:<br />

batch or sequential<br />

I. Introduction<br />

k-means<br />

sequential k-means<br />

9<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 1 iteration<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 2 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 3 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 4 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 5 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 6 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

10<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

„digital“ batch k-means<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

I. Introduction<br />

k−means after 7 iterations<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

done<br />

10<br />

error: 4.5%<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

3<br />

x(t)<br />

hierarchical clustering<br />

also works:<br />

2.5<br />

2<br />

1.5<br />

1<br />

30<br />

27<br />

26<br />

25<br />

15<br />

5<br />

19<br />

16<br />

24<br />

7<br />

20<br />

18<br />

28<br />

29<br />

11<br />

8<br />

4<br />

21<br />

17<br />

23<br />

1<br />

14<br />

10<br />

2<br />

12<br />

13<br />

6<br />

22<br />

9<br />

3<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

11<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

3<br />

x(t)<br />

hierarchical clustering<br />

also works:<br />

2.5<br />

2<br />

1.5<br />

1<br />

30<br />

27<br />

26<br />

25<br />

15<br />

5<br />

19<br />

16<br />

24<br />

7<br />

20<br />

18<br />

28<br />

29<br />

11<br />

8<br />

4<br />

21<br />

17<br />

23<br />

1<br />

14<br />

10<br />

2<br />

12<br />

13<br />

6<br />

22<br />

9<br />

3<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

11<br />

−1.5<br />

−1 0 1 2 3 4 5<br />

error: 4.2%<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

divide and conquer<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

if A={ a1, ..., aT } constrained non-linear opt. problem<br />

minimize<br />

subject to<br />

with centroid locations C := {c1, . . . , ck }<br />

and partition matrix W := (wit)<br />

14<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

minimize this!<br />

common approach: partial optimization for W and C<br />

alternating minimization of either W and C while<br />

keeping the other one fixed<br />

15<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

minimize this!<br />

batch k-means<br />

cluster assignment: for each at<br />

determine an index i(t) such that<br />

cluster update: within each cluster<br />

determine the centroid ci by minimizing<br />

15<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

II. Partitional <strong>Clustering</strong><br />

minimize this!<br />

batch k-means<br />

cluster assignment: for each at<br />

determine an index i(t) such that<br />

cluster update: within each cluster<br />

determine the centroid ci by minimizing<br />

may be<br />

difficult<br />

15<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Euclidean case<br />

special case: and Euclidean distance<br />

centroid is given by cluster mean<br />

because<br />

II. Partitional <strong>Clustering</strong><br />

16<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

1<br />

0.5<br />

x(t)<br />

0<br />

−0.5<br />

−1<br />

II. Partitional <strong>Clustering</strong><br />

−1<br />

−0.5<br />

[Georgiev, Theis, Cichocki. IEEE Trans.Neural Networks, 16(4):992-996, 2005]<br />

0<br />

0.5<br />

1<br />

18<br />

k-plane clustering in<br />

three dimensions<br />

(300 Gaussian samples,<br />

3 hyperplanes)<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

III. Projective <strong>Clustering</strong><br />

line-groups<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

1<br />

0.5<br />

0<br />

−0.5<br />

−1<br />

1<br />

F. Theis<br />

0.5<br />

x(t)<br />

0<br />

−0.5<br />

−1<br />

−1<br />

III. Projective <strong>Clustering</strong><br />

−0.5<br />

0<br />

0.5<br />

1<br />

21<br />

projective clustering<br />

in three dimensions<br />

(k=4 clusters)<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

III. Projective <strong>Clustering</strong><br />

Applications & Extensions<br />

• robustness of ICA<br />

• similar to [Meinecke et al, 2003]<br />

• determine „best“ directions in data set<br />

• hyperplane clustering<br />

• instead of line vectors cluster<br />

normal vectors<br />

• projective k-median clustering<br />

• median is statistically more robust!<br />

data set<br />

bootstrap<br />

matrix A matrix A<br />

best mixing<br />

directions<br />

22<br />

different sample subsets<br />

projective clustering<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

subspaces instead of lines<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

<strong>Grassmann</strong> <strong>Clustering</strong><br />

• goal: clustering in Gn,p<br />

• for this necessary: definition of a metric<br />

• different notions possible<br />

• may be derived from the geodesic metric induced<br />

by the natural Riemannian structure of Gn,p<br />

[Edelman et al 1999]<br />

• here: computationally simple projection F-norm<br />

with Frobenius norm<br />

25<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

generalizes well<br />

remark:<br />

• Gn,1 = RP n<br />

• and metrics coincide (not entirely obvious)<br />

26<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

main result:<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Centroid Calculation<br />

theorem: the centroid [Ci] of some cluster Bi is spanned<br />

by p eigenvectors corresponding to the smallest<br />

eigenvalues of the generalized cluster correlation<br />

27<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Proof (optima, example)<br />

(0, .5, .5, 0)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

(0, 1, 0, 0)<br />

(0, 0, 1, 0) (.5, 0, .5, 0)<br />

(.3, .3, 0, .3)<br />

(1, 0, 0, 0)<br />

n=4<br />

29<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

p=3<br />

x(t)<br />

Proof (optima, example)<br />

p=1<br />

(0, .5, .5, 0)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

(0, 1, 0, 0)<br />

(0, 0, 1, 0) (.5, 0, .5, 0)<br />

(.3, .3, 0, .3)<br />

(1, 0, 0, 0)<br />

p=2<br />

n=4<br />

29<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

• corners are in our case<br />

• so<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Proof (ctd.)<br />

• i.e. as claimed.<br />

30<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Indeterminacies<br />

occur if eigenspaces are higher dimensional or<br />

eigenvalue zero is present<br />

e1<br />

}<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

d0(e1, x) = cos(x1) 2<br />

}<br />

x = (x1, x2)<br />

two sample case: e.g. Vi = ei<br />

d0(e2, x) = cos(x2) 2<br />

e2<br />

31<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Indeterminacies<br />

occur if eigenspaces are higher dimensional or<br />

eigenvalue zero is present<br />

e1<br />

}<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

d0(e1, x) = cos(x1) 2<br />

}<br />

x = (x1, x2)<br />

two sample case: e.g. Vi = ei<br />

d0(e2, x) = cos(x2) 2<br />

e2<br />

where‘s the centroid?<br />

31<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Examples<br />

• toy example<br />

• 10 4 samples of G4,2<br />

• drawn from k=6<br />

coordinate hyperplanes<br />

• add 10dB noise<br />

• k-subspace clustering<br />

• convergence after 6 its.<br />

• visualize clusters by<br />

index<br />

1.5<br />

1<br />

0.5<br />

0<br />

1<br />

2<br />

3<br />

4<br />

1<br />

2<br />

3<br />

32<br />

Apr 6, 2006 :: Tübingen<br />

4<br />

5<br />

6


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Polytope identification<br />

• solve approximation problem (from comp. geometry)<br />

• given set of points, identify smallest enclosing<br />

convex polytope with fixed number of faces k<br />

• algorithm:<br />

• compute convex hull (QHull)<br />

• apply subspace k-means to faces<br />

• note: affine version nec.<br />

• include sample weighting by volume<br />

• possibly intersect resulting clusters<br />

33<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Application to NMF<br />

• Nonnegative Matrix Factorization<br />

• observing X=AS+N with everything with nonneg.<br />

coefficients, find unknown A and S<br />

• key idea:<br />

• X is conic hull with edges given by columns of A<br />

• projection onto std simplex yields:<br />

• X‘ lies in the convex hull with vertices given by A<br />

• example:<br />

• n=3, 100 uniform samples<br />

34<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Data points in simplex projection (100)<br />

35<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Extracted borders (100)<br />

36<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Cluster centers (100)<br />

37<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Comparison<br />

• why? because NMF has<br />

high indeterminacy<br />

• minimality criterion<br />

seems to be a good<br />

idea<br />

• related:<br />

• [Cutler, Archetypal<br />

analysis, 1994]<br />

• [Theis, how to make<br />

NMF unique,in prep]<br />

Crosserror<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

0<br />

10 1<br />

10 1<br />

10 2<br />

10 2<br />

10 3<br />

NMF (Mean Square Error)<br />

<strong>Grassmann</strong> clustering<br />

10 3<br />

Number of samples<br />

10 4<br />

10 4<br />

10 5<br />

10 5<br />

38<br />

1.6<br />

1.4<br />

1.2<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

Apr 6, 2006 :: Tübingen<br />

1<br />

0


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

•<br />

x(t)<br />

IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />

Other applications<br />

Non-Gaussian Component Analysis<br />

• [Blanchard et al, 2005]<br />

• identify non-noise subspace via k=2 clustering<br />

• inter-subject biomedical data analysis<br />

• identify common subspaces in multivariate data<br />

sets<br />

• allows for inter-patient variability with keeping<br />

vector space selectivity<br />

• robustness of subspace ICA<br />

• extension of the ICA case<br />

39<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

let‘s bend ‘em<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Nonlinear generalization<br />

warning: very preliminary<br />

• goal: nonlinear clustering in data space<br />

• common approach: hope it‘s linear in a higherdimensional<br />

feature space so embed it via<br />

• describe Φ only by dot-product [Schölkopf &<br />

Smola 2001]<br />

• data set<br />

• linear subspaces<br />

• here: result resembles extension of kernel PCA<br />

41<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

define kernel matrix<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Centroid Calculation<br />

nonlinear generalized cluster correlation:<br />

define kernel matrix<br />

just solve EVD<br />

42<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Distance Calculation<br />

• necessary for partitioning step of subspace k-means<br />

• distance:<br />

• here (from previous step):<br />

• so<br />

V. Submanifold <strong>Clustering</strong><br />

43<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

-1.5<br />

-2<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Example<br />

-2.5<br />

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5<br />

data space<br />

44<br />

• use simple polynomial<br />

kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy)<br />

• draw 100 hyperplanes<br />

samples from ellipses as<br />

shown<br />

• 3 samples per hyerplane<br />

• because in 4d - after<br />

affine embedding<br />

• convergence after 2 its<br />

• mismatch: 7 samples<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

5<br />

4.5<br />

4<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

Example<br />

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5<br />

feature space (1,3)<br />

44<br />

• use simple polynomial<br />

kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy)<br />

• draw 100 hyperplanes<br />

samples from ellipses as<br />

shown<br />

• 3 samples per hyerplane<br />

• because in 4d - after<br />

affine embedding<br />

• convergence after 2 its<br />

• mismatch: 7 samples<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

F. Theis<br />

x(t)<br />

V. Submanifold <strong>Clustering</strong><br />

0 10 20 30 40 50 60 70 80 90 100<br />

learnt<br />

cluster<br />

indices<br />

true<br />

cluster<br />

indices<br />

45<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Conclusion<br />

done - but lot‘s of todos<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Conclusions<br />

• showed<br />

• extension of k-means to <strong>Grassmann</strong> manifold<br />

• first results of nonlinear generalization via kernels<br />

• open<br />

• study applications in detail - new apps?<br />

• preimage problem in submanifold k-means<br />

• diff. subspace clustering/classification<br />

• acknowledgements<br />

• Peter Gruber & Harold Gutch<br />

• Motoaki Kawanabe and...<br />

47<br />

Apr 6, 2006 :: Tübingen


n(t)<br />

s(t) �<br />

A<br />

F. Theis<br />

x(t)<br />

Conclusions<br />

• showed<br />

• extension of k-means to <strong>Grassmann</strong> manifold<br />

• first results of nonlinear generalization via kernels<br />

• open<br />

• study applications in detail - new apps?<br />

• preimage problem in submanifold k-means<br />

• diff. subspace clustering/classification<br />

• acknowledgements<br />

• Peter Gruber & Harold Gutch<br />

• Motoaki Kawanabe and...<br />

47<br />

Apr 6, 2006 :: Tübingen

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!