Mathematics in Independent Component Analysis
Mathematics in Independent Component Analysis
Mathematics in Independent Component Analysis
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.5. Mach<strong>in</strong>e learn<strong>in</strong>g for data preprocess<strong>in</strong>g 37<br />
for NGCA essentially us<strong>in</strong>g the idea of separated characteristic functions from the proof was<br />
proposed <strong>in</strong> (Kawanabe and Theis, 2007).<br />
F<strong>in</strong>ally, <strong>in</strong> (Theis and Kawanabe, 2007), we presented an modification of NGCA that evaluates<br />
the time structure of the multivariate observations <strong>in</strong>stead of their higher-order statistics.<br />
We differentiated the signal subspace from noise by search<strong>in</strong>g for a subspace of non-trivially<br />
autocorrelated data. In contrast to bl<strong>in</strong>d source separation approaches however, we did not<br />
require the existence of sources, so the model is applicable to any wide-sense stationary time<br />
series without restrictions. Moreover, s<strong>in</strong>ce the method is based on second-order time structure,<br />
it could be efficiently implemented even for large dimensions, which we illustrated with an<br />
application to dimension reduction of functional MRI record<strong>in</strong>gs.<br />
1.5.3 Cluster<strong>in</strong>g<br />
Cluster<strong>in</strong>g methods are an important tool <strong>in</strong> high-dimensional explorative data m<strong>in</strong><strong>in</strong>g. They<br />
aim at identify<strong>in</strong>g samples or regions of similar characteristics, and often code them by a s<strong>in</strong>gle<br />
codebook vector or centroid. In this section, we review cluster<strong>in</strong>g algorithms, and employ these<br />
methods to solve the bl<strong>in</strong>d matrix factorization problem (1.12) from above under various source<br />
assumptions.<br />
Cluster<strong>in</strong>g for solv<strong>in</strong>g overcomplete BSS problems<br />
In Theis et al. (2006), see chapter 17, we discussed the bl<strong>in</strong>d source separation problem (1.1)<br />
<strong>in</strong> the difficult case of overcomplete BSS, where less mixtures than sources are observed (m <<br />
n). We focused on the usually more elaborate matrix recovery part. Assum<strong>in</strong>g statistically<br />
<strong>in</strong>dependent sources with exist<strong>in</strong>g variance and at most one Gaussian component, it is wellknown<br />
that A is determ<strong>in</strong>ed uniquely by the mixtures x(t) (Eriksson and Koivunen, 2003).<br />
However, how to do this algorithmically is far from obvious, and although some algorithms have<br />
been proposed recently (Bofill and Zibulevsky, 2001, Lee et al., 1999, O’Grady and Pearlmutter,<br />
2004), performance is yet limited.<br />
The most commonly used overcomplete algorithms rely on sparse sources (after possible<br />
sparsification by preprocess<strong>in</strong>g), which can be identified by cluster<strong>in</strong>g, usually by k-means or<br />
some extension (Bofill and Zibulevsky, 2001, O’Grady and Pearlmutter, 2004). However apart<br />
from the fact that theoretical justifications have not been found, mean-based cluster<strong>in</strong>g only<br />
identifies the correct A if the data density approaches a delta distribution. In figure 1.18,<br />
we illustrate the deficiency of mean-based cluster<strong>in</strong>g; we get an error of up to 5 ◦ per mix<strong>in</strong>g<br />
angle, which is rather substantial consider<strong>in</strong>g the sparse density and the simple, complete case<br />
of m = n = 2. Moreover the figure <strong>in</strong>dicates that median-based cluster<strong>in</strong>g performs much<br />
better. Indeed, mean-based cluster<strong>in</strong>g does not possess any equivariance property, which implies<br />
performance <strong>in</strong>dependent of the choice of A.<br />
We proposed a novel overcomplete, median-based cluster<strong>in</strong>g method <strong>in</strong> (Theis et al., 2006),<br />
and proved its equivariance and convergence. Simply put, we first pick 2n normalized start<strong>in</strong>g<br />
vectors w1, w ′ 1 , . . . , wn, w ′ n, and iterate the follow<strong>in</strong>g steps until an appropriate abort condition<br />
has been met: Choose a sample x(t) ∈ R m and normalize it y(t) := π(x(t)) = x(t)/|x(t)|. Let