Grassmann Clustering
Grassmann Clustering
Grassmann Clustering
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
n(t)<br />
s(t) �<br />
A<br />
x(t)<br />
<strong>Grassmann</strong><br />
<strong>Clustering</strong><br />
Fabian Theis<br />
Institute of Biophysics<br />
University of Regensburg
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Aim of my talk<br />
• clustering - elementary concept of machine learning<br />
• review and illustrate simple clustering algorithm<br />
• extend it to more general metric spaces<br />
• projective space<br />
• <strong>Grassmann</strong> manifolds<br />
• general submanifolds (via kernels)<br />
• applications (very brief - work in progress)<br />
• ICA, NMF<br />
• approximate combinatorial convex optimization<br />
2<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
I. Introduction<br />
Agenda<br />
II. Partitional <strong>Clustering</strong><br />
III. Projective <strong>Clustering</strong><br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
V. Subspace <strong>Clustering</strong><br />
3<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
I. Introduction<br />
k-means illustration<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
I. Introduction<br />
<strong>Clustering</strong><br />
goal:<br />
• given a multivariate data set A<br />
• determine<br />
• partition into groups (clusters)<br />
• representative cluster centers (centroids)<br />
approaches:<br />
• partitional clustering (here)<br />
• hierarchical clustering<br />
partitional<br />
clustering<br />
5<br />
hierarchical<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
has many applications -<br />
for example for<br />
feature extraction<br />
of immunological data<br />
sets<br />
I. Introduction<br />
<strong>Clustering</strong><br />
[ Theis, Hartl, Krauss-Etschmann, Lang. Neural networksignal analysis in immunology. Proc. ISSPA2003. ]<br />
0.426<br />
0.248<br />
0.07<br />
49<br />
35.9<br />
d 22.8<br />
22.1<br />
10.9<br />
d 0.718<br />
53.8<br />
d 4<br />
27.9<br />
9.29 CB(3)<br />
4.44<br />
d 0.0623<br />
CB(3)<br />
ILD(1) CB(1) ILD(2)<br />
ILD(2)<br />
CB(1) CB(1) ILD(2)<br />
ILD(3)<br />
ILD(1) CB(1) ILD(2)<br />
CB(2)<br />
CB(1) CB(1)<br />
ILD(1)<br />
CB(2)<br />
ILD(1)<br />
ILD(1)<br />
ILD(2)<br />
ILD(1) ILD(1)<br />
CB(1)<br />
1.89<br />
1<br />
d 0.166<br />
3.3<br />
1.81<br />
d 0.395<br />
32.1<br />
16.3<br />
d 1.39<br />
5.22<br />
3.28<br />
d 1.35<br />
nO(2)<br />
O(1)<br />
nO(3)<br />
x(1) O(1) x(2)<br />
x(2)<br />
O(1) O(1) x(2)<br />
x(3)<br />
O(2)<br />
nO(1) O(1)<br />
x(1) O(1) x(2)<br />
x(1)<br />
x(1)<br />
x(1)<br />
x(1)<br />
O(2)<br />
x(1)<br />
x(2)<br />
O(1)<br />
13.6<br />
8.23<br />
d 2.8<br />
4.5<br />
1.8<br />
d 0.104<br />
82.6<br />
57.2<br />
d 30.4<br />
699000<br />
446000<br />
d 196000<br />
6<br />
48.1<br />
30.3<br />
d 15.5<br />
37.8<br />
19.1<br />
d 3.2<br />
25.5<br />
16<br />
d 6.84<br />
1.71<br />
1.33<br />
d 1<br />
K−means−Clusters<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
I. Introduction<br />
Example: read digits!<br />
task: differentiate between and<br />
use unsupervised learning<br />
7<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
I. Introduction<br />
Example: read digits!<br />
task: differentiate between and<br />
use unsupervised learning<br />
7<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Example: read digits!<br />
task: differentiate between and<br />
machine<br />
learning<br />
I. Introduction<br />
use unsupervised learning<br />
7<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
I. Introduction<br />
dimension reduction: via PCA to only 2 dimensions<br />
8<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
dimension reduction: via PCA to only 2 dimensions<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
Apr 6, 2006 :: Tübingen<br />
8
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
dimension reduction: via PCA to only 2 dimensions<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
Apr 6, 2006 :: Tübingen<br />
8
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
Samples<br />
Centroids<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
Aufteilung<br />
batch k-means<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
Zuweisung<br />
batch k-means<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
batch k-means<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
beliebiges Sample<br />
sequential k-means<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
nächster Centroid<br />
sequential k-means<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
Update<br />
sequential k-means<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• samples<br />
• distance measure<br />
• algorithm:<br />
• fix number of<br />
clusters k<br />
• initialize centroids<br />
randomly<br />
• update-rule:<br />
batch or sequential<br />
I. Introduction<br />
k-means<br />
sequential k-means<br />
9<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
„digital“ batch k-means<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
k−means after 1 iteration<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
10<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
„digital“ batch k-means<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
k−means after 2 iterations<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
10<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
„digital“ batch k-means<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
k−means after 3 iterations<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
10<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
„digital“ batch k-means<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
k−means after 4 iterations<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
10<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
„digital“ batch k-means<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
k−means after 5 iterations<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
10<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
„digital“ batch k-means<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
k−means after 6 iterations<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
10<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
„digital“ batch k-means<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
I. Introduction<br />
k−means after 7 iterations<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
done<br />
10<br />
error: 4.5%<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
3<br />
x(t)<br />
hierarchical clustering<br />
also works:<br />
2.5<br />
2<br />
1.5<br />
1<br />
30<br />
27<br />
26<br />
25<br />
15<br />
5<br />
19<br />
16<br />
24<br />
7<br />
20<br />
18<br />
28<br />
29<br />
11<br />
8<br />
4<br />
21<br />
17<br />
23<br />
1<br />
14<br />
10<br />
2<br />
12<br />
13<br />
6<br />
22<br />
9<br />
3<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
11<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
3<br />
x(t)<br />
hierarchical clustering<br />
also works:<br />
2.5<br />
2<br />
1.5<br />
1<br />
30<br />
27<br />
26<br />
25<br />
15<br />
5<br />
19<br />
16<br />
24<br />
7<br />
20<br />
18<br />
28<br />
29<br />
11<br />
8<br />
4<br />
21<br />
17<br />
23<br />
1<br />
14<br />
10<br />
2<br />
12<br />
13<br />
6<br />
22<br />
9<br />
3<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
11<br />
−1.5<br />
−1 0 1 2 3 4 5<br />
error: 4.2%<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
II. Partitional <strong>Clustering</strong><br />
divide and conquer<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
II. Partitional <strong>Clustering</strong><br />
if A={ a1, ..., aT } constrained non-linear opt. problem<br />
minimize<br />
subject to<br />
with centroid locations C := {c1, . . . , ck }<br />
and partition matrix W := (wit)<br />
14<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
II. Partitional <strong>Clustering</strong><br />
minimize this!<br />
common approach: partial optimization for W and C<br />
alternating minimization of either W and C while<br />
keeping the other one fixed<br />
15<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
II. Partitional <strong>Clustering</strong><br />
minimize this!<br />
batch k-means<br />
cluster assignment: for each at<br />
determine an index i(t) such that<br />
cluster update: within each cluster<br />
determine the centroid ci by minimizing<br />
15<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
II. Partitional <strong>Clustering</strong><br />
minimize this!<br />
batch k-means<br />
cluster assignment: for each at<br />
determine an index i(t) such that<br />
cluster update: within each cluster<br />
determine the centroid ci by minimizing<br />
may be<br />
difficult<br />
15<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Euclidean case<br />
special case: and Euclidean distance<br />
centroid is given by cluster mean<br />
because<br />
II. Partitional <strong>Clustering</strong><br />
16<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
1<br />
0.5<br />
x(t)<br />
0<br />
−0.5<br />
−1<br />
II. Partitional <strong>Clustering</strong><br />
−1<br />
−0.5<br />
[Georgiev, Theis, Cichocki. IEEE Trans.Neural Networks, 16(4):992-996, 2005]<br />
0<br />
0.5<br />
1<br />
18<br />
k-plane clustering in<br />
three dimensions<br />
(300 Gaussian samples,<br />
3 hyperplanes)<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
III. Projective <strong>Clustering</strong><br />
line-groups<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
1<br />
0.5<br />
0<br />
−0.5<br />
−1<br />
1<br />
F. Theis<br />
0.5<br />
x(t)<br />
0<br />
−0.5<br />
−1<br />
−1<br />
III. Projective <strong>Clustering</strong><br />
−0.5<br />
0<br />
0.5<br />
1<br />
21<br />
projective clustering<br />
in three dimensions<br />
(k=4 clusters)<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
III. Projective <strong>Clustering</strong><br />
Applications & Extensions<br />
• robustness of ICA<br />
• similar to [Meinecke et al, 2003]<br />
• determine „best“ directions in data set<br />
• hyperplane clustering<br />
• instead of line vectors cluster<br />
normal vectors<br />
• projective k-median clustering<br />
• median is statistically more robust!<br />
data set<br />
bootstrap<br />
matrix A matrix A<br />
best mixing<br />
directions<br />
22<br />
different sample subsets<br />
projective clustering<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
subspaces instead of lines<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
<strong>Grassmann</strong> <strong>Clustering</strong><br />
• goal: clustering in Gn,p<br />
• for this necessary: definition of a metric<br />
• different notions possible<br />
• may be derived from the geodesic metric induced<br />
by the natural Riemannian structure of Gn,p<br />
[Edelman et al 1999]<br />
• here: computationally simple projection F-norm<br />
with Frobenius norm<br />
25<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
generalizes well<br />
remark:<br />
• Gn,1 = RP n<br />
• and metrics coincide (not entirely obvious)<br />
26<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
main result:<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Centroid Calculation<br />
theorem: the centroid [Ci] of some cluster Bi is spanned<br />
by p eigenvectors corresponding to the smallest<br />
eigenvalues of the generalized cluster correlation<br />
27<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Proof (optima, example)<br />
(0, .5, .5, 0)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
(0, 1, 0, 0)<br />
(0, 0, 1, 0) (.5, 0, .5, 0)<br />
(.3, .3, 0, .3)<br />
(1, 0, 0, 0)<br />
n=4<br />
29<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
p=3<br />
x(t)<br />
Proof (optima, example)<br />
p=1<br />
(0, .5, .5, 0)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
(0, 1, 0, 0)<br />
(0, 0, 1, 0) (.5, 0, .5, 0)<br />
(.3, .3, 0, .3)<br />
(1, 0, 0, 0)<br />
p=2<br />
n=4<br />
29<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
• corners are in our case<br />
• so<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Proof (ctd.)<br />
• i.e. as claimed.<br />
30<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Indeterminacies<br />
occur if eigenspaces are higher dimensional or<br />
eigenvalue zero is present<br />
e1<br />
}<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
d0(e1, x) = cos(x1) 2<br />
}<br />
x = (x1, x2)<br />
two sample case: e.g. Vi = ei<br />
d0(e2, x) = cos(x2) 2<br />
e2<br />
31<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Indeterminacies<br />
occur if eigenspaces are higher dimensional or<br />
eigenvalue zero is present<br />
e1<br />
}<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
d0(e1, x) = cos(x1) 2<br />
}<br />
x = (x1, x2)<br />
two sample case: e.g. Vi = ei<br />
d0(e2, x) = cos(x2) 2<br />
e2<br />
where‘s the centroid?<br />
31<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Examples<br />
• toy example<br />
• 10 4 samples of G4,2<br />
• drawn from k=6<br />
coordinate hyperplanes<br />
• add 10dB noise<br />
• k-subspace clustering<br />
• convergence after 6 its.<br />
• visualize clusters by<br />
index<br />
1.5<br />
1<br />
0.5<br />
0<br />
1<br />
2<br />
3<br />
4<br />
1<br />
2<br />
3<br />
32<br />
Apr 6, 2006 :: Tübingen<br />
4<br />
5<br />
6
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Polytope identification<br />
• solve approximation problem (from comp. geometry)<br />
• given set of points, identify smallest enclosing<br />
convex polytope with fixed number of faces k<br />
• algorithm:<br />
• compute convex hull (QHull)<br />
• apply subspace k-means to faces<br />
• note: affine version nec.<br />
• include sample weighting by volume<br />
• possibly intersect resulting clusters<br />
33<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Application to NMF<br />
• Nonnegative Matrix Factorization<br />
• observing X=AS+N with everything with nonneg.<br />
coefficients, find unknown A and S<br />
• key idea:<br />
• X is conic hull with edges given by columns of A<br />
• projection onto std simplex yields:<br />
• X‘ lies in the convex hull with vertices given by A<br />
• example:<br />
• n=3, 100 uniform samples<br />
34<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Data points in simplex projection (100)<br />
35<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Extracted borders (100)<br />
36<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Cluster centers (100)<br />
37<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Comparison<br />
• why? because NMF has<br />
high indeterminacy<br />
• minimality criterion<br />
seems to be a good<br />
idea<br />
• related:<br />
• [Cutler, Archetypal<br />
analysis, 1994]<br />
• [Theis, how to make<br />
NMF unique,in prep]<br />
Crosserror<br />
1.6<br />
1.4<br />
1.2<br />
1<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
0<br />
10 1<br />
10 1<br />
10 2<br />
10 2<br />
10 3<br />
NMF (Mean Square Error)<br />
<strong>Grassmann</strong> clustering<br />
10 3<br />
Number of samples<br />
10 4<br />
10 4<br />
10 5<br />
10 5<br />
38<br />
1.6<br />
1.4<br />
1.2<br />
0.8<br />
0.6<br />
0.4<br />
0.2<br />
Apr 6, 2006 :: Tübingen<br />
1<br />
0
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
•<br />
x(t)<br />
IV. <strong>Grassmann</strong> <strong>Clustering</strong><br />
Other applications<br />
Non-Gaussian Component Analysis<br />
• [Blanchard et al, 2005]<br />
• identify non-noise subspace via k=2 clustering<br />
• inter-subject biomedical data analysis<br />
• identify common subspaces in multivariate data<br />
sets<br />
• allows for inter-patient variability with keeping<br />
vector space selectivity<br />
• robustness of subspace ICA<br />
• extension of the ICA case<br />
39<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
let‘s bend ‘em<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
Nonlinear generalization<br />
warning: very preliminary<br />
• goal: nonlinear clustering in data space<br />
• common approach: hope it‘s linear in a higherdimensional<br />
feature space so embed it via<br />
• describe Φ only by dot-product [Schölkopf &<br />
Smola 2001]<br />
• data set<br />
• linear subspaces<br />
• here: result resembles extension of kernel PCA<br />
41<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
Centroid Calculation<br />
nonlinear generalized cluster correlation:<br />
42<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
Centroid Calculation<br />
nonlinear generalized cluster correlation:<br />
42<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
Centroid Calculation<br />
nonlinear generalized cluster correlation:<br />
define kernel matrix<br />
42<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
Centroid Calculation<br />
nonlinear generalized cluster correlation:<br />
define kernel matrix<br />
just solve EVD<br />
42<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Distance Calculation<br />
• necessary for partitioning step of subspace k-means<br />
• distance:<br />
• here (from previous step):<br />
• so<br />
V. Submanifold <strong>Clustering</strong><br />
43<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
-0.5<br />
-1<br />
-1.5<br />
-2<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
Example<br />
-2.5<br />
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5<br />
data space<br />
44<br />
• use simple polynomial<br />
kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy)<br />
• draw 100 hyperplanes<br />
samples from ellipses as<br />
shown<br />
• 3 samples per hyerplane<br />
• because in 4d - after<br />
affine embedding<br />
• convergence after 2 its<br />
• mismatch: 7 samples<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
5<br />
4.5<br />
4<br />
3.5<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
Example<br />
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5<br />
feature space (1,3)<br />
44<br />
• use simple polynomial<br />
kernel k(x,y)=(x 2 ,y 2 ,2 -1/2 xy)<br />
• draw 100 hyperplanes<br />
samples from ellipses as<br />
shown<br />
• 3 samples per hyerplane<br />
• because in 4d - after<br />
affine embedding<br />
• convergence after 2 its<br />
• mismatch: 7 samples<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
F. Theis<br />
x(t)<br />
V. Submanifold <strong>Clustering</strong><br />
0 10 20 30 40 50 60 70 80 90 100<br />
learnt<br />
cluster<br />
indices<br />
true<br />
cluster<br />
indices<br />
45<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Conclusion<br />
done - but lot‘s of todos<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Conclusions<br />
• showed<br />
• extension of k-means to <strong>Grassmann</strong> manifold<br />
• first results of nonlinear generalization via kernels<br />
• open<br />
• study applications in detail - new apps?<br />
• preimage problem in submanifold k-means<br />
• diff. subspace clustering/classification<br />
• acknowledgements<br />
• Peter Gruber & Harold Gutch<br />
• Motoaki Kawanabe and...<br />
47<br />
Apr 6, 2006 :: Tübingen
n(t)<br />
s(t) �<br />
A<br />
F. Theis<br />
x(t)<br />
Conclusions<br />
• showed<br />
• extension of k-means to <strong>Grassmann</strong> manifold<br />
• first results of nonlinear generalization via kernels<br />
• open<br />
• study applications in detail - new apps?<br />
• preimage problem in submanifold k-means<br />
• diff. subspace clustering/classification<br />
• acknowledgements<br />
• Peter Gruber & Harold Gutch<br />
• Motoaki Kawanabe and...<br />
47<br />
Apr 6, 2006 :: Tübingen