Mathematics in Independent Component Analysis
Mathematics in Independent Component Analysis
Mathematics in Independent Component Analysis
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
246 Chapter 18. Proc. EUSIPCO 2006<br />
ABSTRACT<br />
An important tool <strong>in</strong> high-dimensional, explorative data m<strong>in</strong><strong>in</strong>g<br />
is given by cluster<strong>in</strong>g methods. They aim at identify<strong>in</strong>g<br />
samples or regions of similar characteristics, and often code<br />
them by a s<strong>in</strong>gle codebook vector or centroid. One of the<br />
most commonly used partitional cluster<strong>in</strong>g techniques is the<br />
k-means algorithm, which <strong>in</strong> its batch form partitions the data<br />
set <strong>in</strong>to k disjo<strong>in</strong>t clusters by simply iterat<strong>in</strong>g between cluster<br />
assignments and cluster updates. The latter step implies<br />
calculat<strong>in</strong>g a new centroid with<strong>in</strong> each cluster. We generalize<br />
the concept of k-means by apply<strong>in</strong>g it not to the standard<br />
Euclidean space but to the manifold of subvectorspaces<br />
of a fixed dimension, also known as the Grassmann manifold.<br />
Important examples <strong>in</strong>clude projective space i.e. the<br />
manifold of l<strong>in</strong>es and the space of all hyperplanes. Detect<strong>in</strong>g<br />
clusters <strong>in</strong> multiple samples drawn from a Grassmannian<br />
is a problem aris<strong>in</strong>g <strong>in</strong> various applications. In this manuscript,<br />
we provide correspond<strong>in</strong>g metrics for a Grassmann<br />
k-means algorithm, and solve the centroid calculation problem<br />
explicitly <strong>in</strong> closed form. An application to nonnegative<br />
matrix factorization illustrates the feasibility of the proposed<br />
algorithm.<br />
1. PARTITIONAL CLUSTERING<br />
Many algorithms for cluster<strong>in</strong>g i.e. the detection of common<br />
features with<strong>in</strong> a data set are discussed <strong>in</strong> the literature. In<br />
the follow<strong>in</strong>g, we will study cluster<strong>in</strong>g with<strong>in</strong> the framework<br />
of k-means [2].<br />
In general, its goal can be described as follows: Given a<br />
set A of po<strong>in</strong>ts <strong>in</strong> some metric space (M,d), f<strong>in</strong>d a partition of<br />
A <strong>in</strong>to disjo<strong>in</strong>t non-empty subsets Bi, �<br />
i Bi = A, together with<br />
centroids ci ∈ M so as to m<strong>in</strong>imize the sum of the squares<br />
of the distances of each po<strong>in</strong>t of A to the centroid ci of the<br />
cluster Bi conta<strong>in</strong><strong>in</strong>g it. In other words, m<strong>in</strong>imize<br />
E(B1,c1,...,Bk,ck) :=<br />
GRASSMANN CLUSTERING<br />
Peter Gruber and Fabian J. Theis<br />
Institute of Biophysics, University of Regensburg<br />
93040 Regensburg, Germany<br />
phone: +49 941 943 2924, fax: +49 941 943 2479<br />
email: fabian@theis.name, web: http://fabian.theis.name<br />
k<br />
∑ ∑<br />
i=1 a∈Bi<br />
d(a,ci) 2 . (1)<br />
If the set A conta<strong>in</strong>s only f<strong>in</strong>itely many elements<br />
a1,...,aN, then this can be easily re-formulated as constra<strong>in</strong>ed<br />
non-l<strong>in</strong>ear optimization problem: m<strong>in</strong>imize<br />
subject to<br />
wit ∈ {0,1},<br />
E(W,C) :=<br />
k<br />
∑<br />
i=1<br />
k<br />
T<br />
∑ ∑<br />
i=1 t=1<br />
witd(ai,ci) 2 . (2)<br />
wit = 1 for 1 ≤ i ≤ k,1 ≤ t ≤ T. (3)<br />
Here C := {c1,...,ck} are the centroid locations, and W :=<br />
(wit) is the partition matrix correspond<strong>in</strong>g to the partition Bi<br />
of A.<br />
A common approach to m<strong>in</strong>imiz<strong>in</strong>g (2) subject to (3) is<br />
partial optimization for W and C, i.e. alternat<strong>in</strong>g m<strong>in</strong>imization<br />
of either W and C while keep<strong>in</strong>g the other one fixed.<br />
The batch k-means algorithm employs precisely this strategy:<br />
After an <strong>in</strong>itial, random choice of centroids c1,...,ck,<br />
it iterates between the follow<strong>in</strong>g two steps until convergence<br />
measured by a suitable stopp<strong>in</strong>g criterion:<br />
• cluster assignment: at determ<strong>in</strong>e an <strong>in</strong>dex i(t) such that<br />
i(t) = argm<strong>in</strong> i d(at,ci) (4)<br />
• cluster update: with<strong>in</strong> each cluster Bi := {at|i(t) = i}<br />
determ<strong>in</strong>e the centroid ci by m<strong>in</strong>imiz<strong>in</strong>g<br />
ci := argm<strong>in</strong>c ∑ d(a,c)<br />
a∈Bi<br />
2<br />
The cluster assignment step corresponds to m<strong>in</strong>imiz<strong>in</strong>g<br />
(2) for fixed C, which means choos<strong>in</strong>g the partition W such<br />
that each element of A is assigned to the i-th cluster if ci is<br />
the closest centroid. In the cluster update step, (2) is m<strong>in</strong>imized<br />
for fixed partition W, imply<strong>in</strong>g that ci is constructed<br />
as centroid with<strong>in</strong> the i-th cluster; this <strong>in</strong>deed corresponds to<br />
m<strong>in</strong>imiz<strong>in</strong>g E(W,C) for fixed W because <strong>in</strong> this case the<br />
cost function is a sum of functions depend<strong>in</strong>g different parameters,<br />
so we can m<strong>in</strong>imize them separately lead<strong>in</strong>g to the<br />
centroid equation (5). This general update rule converges to<br />
a local m<strong>in</strong>imum under rather weak conditions [3, 7].<br />
An important special case is given by M := Rn and<br />
the Euclidean distance d(x,y) := �x − y�. The centroids<br />
from equation (5) can then be calculated <strong>in</strong> closed form,<br />
and each centroid is simply given by the cluster mean ci :=<br />
(1/|Bi|)∑a∈Bi a; this follows directly from<br />
∑ �a − ci�<br />
a∈Bi<br />
2 = ∑<br />
a∈Bi<br />
n<br />
∑<br />
j=1<br />
(a j −ci j) 2 =<br />
n<br />
∑ ∑<br />
j=1 a∈Bi<br />
(5)<br />
(a 2 j −2a jci j +c 2 i j),<br />
which can be m<strong>in</strong>imized separately for each coord<strong>in</strong>ate j and<br />
is m<strong>in</strong>imal with respect to ci j if the derivative of the quadratic<br />
function is zero, so if |Bi|ci j = ∑a∈Bi a j.<br />
In the follow<strong>in</strong>g, we are <strong>in</strong>terested <strong>in</strong> more complex metric<br />
spaces. Typically, k-means can be implemented efficiently,<br />
if the cluster centroids can be calculated quickly. In<br />
the example of R n , we saw that it was crucial to use m<strong>in</strong>imize<br />
the square distances and to use the Euclidean distance.<br />
Hence we will study metrics which also allow a closed-form<br />
centroid solution.