14.02.2013 Views

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

246 Chapter 18. Proc. EUSIPCO 2006<br />

ABSTRACT<br />

An important tool <strong>in</strong> high-dimensional, explorative data m<strong>in</strong><strong>in</strong>g<br />

is given by cluster<strong>in</strong>g methods. They aim at identify<strong>in</strong>g<br />

samples or regions of similar characteristics, and often code<br />

them by a s<strong>in</strong>gle codebook vector or centroid. One of the<br />

most commonly used partitional cluster<strong>in</strong>g techniques is the<br />

k-means algorithm, which <strong>in</strong> its batch form partitions the data<br />

set <strong>in</strong>to k disjo<strong>in</strong>t clusters by simply iterat<strong>in</strong>g between cluster<br />

assignments and cluster updates. The latter step implies<br />

calculat<strong>in</strong>g a new centroid with<strong>in</strong> each cluster. We generalize<br />

the concept of k-means by apply<strong>in</strong>g it not to the standard<br />

Euclidean space but to the manifold of subvectorspaces<br />

of a fixed dimension, also known as the Grassmann manifold.<br />

Important examples <strong>in</strong>clude projective space i.e. the<br />

manifold of l<strong>in</strong>es and the space of all hyperplanes. Detect<strong>in</strong>g<br />

clusters <strong>in</strong> multiple samples drawn from a Grassmannian<br />

is a problem aris<strong>in</strong>g <strong>in</strong> various applications. In this manuscript,<br />

we provide correspond<strong>in</strong>g metrics for a Grassmann<br />

k-means algorithm, and solve the centroid calculation problem<br />

explicitly <strong>in</strong> closed form. An application to nonnegative<br />

matrix factorization illustrates the feasibility of the proposed<br />

algorithm.<br />

1. PARTITIONAL CLUSTERING<br />

Many algorithms for cluster<strong>in</strong>g i.e. the detection of common<br />

features with<strong>in</strong> a data set are discussed <strong>in</strong> the literature. In<br />

the follow<strong>in</strong>g, we will study cluster<strong>in</strong>g with<strong>in</strong> the framework<br />

of k-means [2].<br />

In general, its goal can be described as follows: Given a<br />

set A of po<strong>in</strong>ts <strong>in</strong> some metric space (M,d), f<strong>in</strong>d a partition of<br />

A <strong>in</strong>to disjo<strong>in</strong>t non-empty subsets Bi, �<br />

i Bi = A, together with<br />

centroids ci ∈ M so as to m<strong>in</strong>imize the sum of the squares<br />

of the distances of each po<strong>in</strong>t of A to the centroid ci of the<br />

cluster Bi conta<strong>in</strong><strong>in</strong>g it. In other words, m<strong>in</strong>imize<br />

E(B1,c1,...,Bk,ck) :=<br />

GRASSMANN CLUSTERING<br />

Peter Gruber and Fabian J. Theis<br />

Institute of Biophysics, University of Regensburg<br />

93040 Regensburg, Germany<br />

phone: +49 941 943 2924, fax: +49 941 943 2479<br />

email: fabian@theis.name, web: http://fabian.theis.name<br />

k<br />

∑ ∑<br />

i=1 a∈Bi<br />

d(a,ci) 2 . (1)<br />

If the set A conta<strong>in</strong>s only f<strong>in</strong>itely many elements<br />

a1,...,aN, then this can be easily re-formulated as constra<strong>in</strong>ed<br />

non-l<strong>in</strong>ear optimization problem: m<strong>in</strong>imize<br />

subject to<br />

wit ∈ {0,1},<br />

E(W,C) :=<br />

k<br />

∑<br />

i=1<br />

k<br />

T<br />

∑ ∑<br />

i=1 t=1<br />

witd(ai,ci) 2 . (2)<br />

wit = 1 for 1 ≤ i ≤ k,1 ≤ t ≤ T. (3)<br />

Here C := {c1,...,ck} are the centroid locations, and W :=<br />

(wit) is the partition matrix correspond<strong>in</strong>g to the partition Bi<br />

of A.<br />

A common approach to m<strong>in</strong>imiz<strong>in</strong>g (2) subject to (3) is<br />

partial optimization for W and C, i.e. alternat<strong>in</strong>g m<strong>in</strong>imization<br />

of either W and C while keep<strong>in</strong>g the other one fixed.<br />

The batch k-means algorithm employs precisely this strategy:<br />

After an <strong>in</strong>itial, random choice of centroids c1,...,ck,<br />

it iterates between the follow<strong>in</strong>g two steps until convergence<br />

measured by a suitable stopp<strong>in</strong>g criterion:<br />

• cluster assignment: at determ<strong>in</strong>e an <strong>in</strong>dex i(t) such that<br />

i(t) = argm<strong>in</strong> i d(at,ci) (4)<br />

• cluster update: with<strong>in</strong> each cluster Bi := {at|i(t) = i}<br />

determ<strong>in</strong>e the centroid ci by m<strong>in</strong>imiz<strong>in</strong>g<br />

ci := argm<strong>in</strong>c ∑ d(a,c)<br />

a∈Bi<br />

2<br />

The cluster assignment step corresponds to m<strong>in</strong>imiz<strong>in</strong>g<br />

(2) for fixed C, which means choos<strong>in</strong>g the partition W such<br />

that each element of A is assigned to the i-th cluster if ci is<br />

the closest centroid. In the cluster update step, (2) is m<strong>in</strong>imized<br />

for fixed partition W, imply<strong>in</strong>g that ci is constructed<br />

as centroid with<strong>in</strong> the i-th cluster; this <strong>in</strong>deed corresponds to<br />

m<strong>in</strong>imiz<strong>in</strong>g E(W,C) for fixed W because <strong>in</strong> this case the<br />

cost function is a sum of functions depend<strong>in</strong>g different parameters,<br />

so we can m<strong>in</strong>imize them separately lead<strong>in</strong>g to the<br />

centroid equation (5). This general update rule converges to<br />

a local m<strong>in</strong>imum under rather weak conditions [3, 7].<br />

An important special case is given by M := Rn and<br />

the Euclidean distance d(x,y) := �x − y�. The centroids<br />

from equation (5) can then be calculated <strong>in</strong> closed form,<br />

and each centroid is simply given by the cluster mean ci :=<br />

(1/|Bi|)∑a∈Bi a; this follows directly from<br />

∑ �a − ci�<br />

a∈Bi<br />

2 = ∑<br />

a∈Bi<br />

n<br />

∑<br />

j=1<br />

(a j −ci j) 2 =<br />

n<br />

∑ ∑<br />

j=1 a∈Bi<br />

(5)<br />

(a 2 j −2a jci j +c 2 i j),<br />

which can be m<strong>in</strong>imized separately for each coord<strong>in</strong>ate j and<br />

is m<strong>in</strong>imal with respect to ci j if the derivative of the quadratic<br />

function is zero, so if |Bi|ci j = ∑a∈Bi a j.<br />

In the follow<strong>in</strong>g, we are <strong>in</strong>terested <strong>in</strong> more complex metric<br />

spaces. Typically, k-means can be implemented efficiently,<br />

if the cluster centroids can be calculated quickly. In<br />

the example of R n , we saw that it was crucial to use m<strong>in</strong>imize<br />

the square distances and to use the Euclidean distance.<br />

Hence we will study metrics which also allow a closed-form<br />

centroid solution.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!