14.02.2013 Views

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

40 Chapter 1. Statistical mach<strong>in</strong>e learn<strong>in</strong>g of biomedical data<br />

Other metrics may be def<strong>in</strong>ed on Gn,p, and they result <strong>in</strong> different Riemannian geometries<br />

on the manifold. Optimization on non-Euclidean geometries is non-trivial, and has been studied<br />

for a long time, see for example Edelman et al. (1999) and references there<strong>in</strong>. For <strong>in</strong>stance <strong>in</strong><br />

the context of ICA, Amari’s sem<strong>in</strong>al paper (Amari, 1998) on tak<strong>in</strong>g <strong>in</strong>to account the geometry<br />

of the search space Gl(n) yielded a considerable <strong>in</strong>crease <strong>in</strong> performance and accuracy. Learn<strong>in</strong>g<br />

<strong>in</strong> these matrix manifolds has been reviewed <strong>in</strong> Theis (2005b) and extended <strong>in</strong> Squart<strong>in</strong>i and<br />

Theis (2006).<br />

In order to apply batch k-means to (Gn,p, d), we only had to solve the cluster assignment<br />

equation (1.17). It turned out that for this, no elaborate optimization was necessary, <strong>in</strong>stead<br />

a closed form solution that only needs eigenvalue decomposition could be found. We state this<br />

with the follow<strong>in</strong>g theorem, proved <strong>in</strong> Gruber and Theis (2006):<br />

Theorem 1.5.3 (Grassmann centroids). The centroid [C] ∈ Gn,p of a set of po<strong>in</strong>ts [V1], . . . , [Vl] ∈<br />

Gn,p accord<strong>in</strong>g to (1.17) is spanned by p <strong>in</strong>dependent eigenvectors correspond<strong>in</strong>g to the smallest<br />

eigenvalues of the generalized cluster correlation l −1 � l<br />

i=1 ViV ⊤ i .<br />

Application to nonnegative matrix factorization<br />

Detect<strong>in</strong>g clusters <strong>in</strong> multiple samples drawn from a Grassmannian is a problem aris<strong>in</strong>g <strong>in</strong><br />

various applications. In Gruber and Theis (2006), we applied this to NMF <strong>in</strong> order to illustrate<br />

the feasibility of the proposed algorithm.<br />

Consider the matrix factorization problem (1.12) with the additional non-negativity constra<strong>in</strong>ts<br />

S, A ≥ 0. If we assume that S spans the whole first quadrant, then X = AS is a conic<br />

hull with cone edges spanned by the columns of A. After projection onto the standard simplex,<br />

the conic hull reduces to the convex hull, and the projected, known mixture data set X lies<br />

with<strong>in</strong> a convex polytope of the order given by the number of rows of S. Hence we face the<br />

problem of identify<strong>in</strong>g n edges of a sampled polytope <strong>in</strong> R m−1 .<br />

In two dimensions (after reduction of m = 3), this implies the task of f<strong>in</strong>d<strong>in</strong>g the k edges<br />

of a polytope where only samples <strong>in</strong> the <strong>in</strong>side are known. We used the Quickhull algorithm<br />

(Barber et al., 1993) to construct the convex hull thus identify<strong>in</strong>g the possible edges of the<br />

polytope. However due to f<strong>in</strong>ite samples the identified polytope has far too many edges. Therefore,<br />

we applied aff<strong>in</strong>e Grassmann n-means cluster<strong>in</strong>g—with samples weighted accord<strong>in</strong>g to their<br />

volume—to these edges <strong>in</strong> order to identify the n bound<strong>in</strong>g edges, see example <strong>in</strong> figure 1.20.<br />

Biomedical applications of other matrix factorization methods are discussed <strong>in</strong> the next<br />

chapter. We only shortly want to mention Meyer-Bäse et al. (2005), where we applied NMF<br />

and related unsupervised cluster<strong>in</strong>g techniques for the self-organized segmentation of biomedical<br />

image time-series data describ<strong>in</strong>g groups of pixels exhibit<strong>in</strong>g similar properties of local signal<br />

dynamics.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!