14.02.2013 Views

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.2. Uniqueness issues <strong>in</strong> <strong>in</strong>dependent component analysis 7<br />

Instead of a multivariate random process s(t), the theorem is formulated for a random vector<br />

s, which is equivalent to assum<strong>in</strong>g an i.i.d. process. Moreover, the assumption of equal source (n)<br />

and mixture dimension (m) is made, although relaxation to the undercomplete case (1 < n < m)<br />

is straightforward, and to the overcomplete case (n > m > 1) is possible (Eriksson and Koivunen,<br />

2003). The assumption of at most one Gaussian component is crucial, s<strong>in</strong>ce <strong>in</strong>dependence of<br />

white, multivariate Gaussians is <strong>in</strong>variant under orthogonal transformation, so theorem 1.2.1<br />

cannot hold <strong>in</strong> this case.<br />

An algorithm for separation—Hessian ICA<br />

The proof of theorem 1.2.1 is constructive, and the exception of the Gaussians comes <strong>in</strong>to play<br />

naturally as zeros of a certa<strong>in</strong> differential equation. The idea, why separation is possible, becomes<br />

quite clear now. Furthermore, an algorithm can be extracted from the pattern used <strong>in</strong> the proof:<br />

After decorrelation, we can assume that the mix<strong>in</strong>g matrix A is orthogonal. By us<strong>in</strong>g the<br />

transformation properties of the Hessian matrix, we can employ the l<strong>in</strong>ear relationship x = As<br />

to get<br />

Hln px = A⊤Hln psA (1.2)<br />

for the Hessian of the mixtures. The key idea, as we have seen <strong>in</strong> the previous section, is that<br />

due to statistical <strong>in</strong>dependence, the source Hessian Hln ps is diagonal everywhere. Therefore<br />

equation (1.2) represents a diagonalization of the mixture Hessian, and the diagonalizer equals<br />

the mix<strong>in</strong>g matrix A. Such a diagonalization is unique if the eigenspaces of the Hessian are onedimensional<br />

at some po<strong>in</strong>t, and this is precisely the case if x(t) conta<strong>in</strong>s at most one Gaussian<br />

component (Theis, 2004a, lemma 5). Hence, the mix<strong>in</strong>g matrix and the sources can be extracted<br />

algorithmically by simply diagonaliz<strong>in</strong>g the mixture Hessian evaluated at some po<strong>in</strong>t. The<br />

Hessian ICA algorithm consists of local Hessian diagonalization of the logarithmic density (or<br />

equivalently the easier to estimate characteristic function). In order to improve robustness,<br />

multiple matrices are jo<strong>in</strong>tly diagonalized. Apply<strong>in</strong>g this algorithm to the mixtures from our<br />

toy example from figure 1.1 yields very well recovered sources 1.1(c).<br />

A similar algorithm has been proposed already by L<strong>in</strong> (1998), but without consider<strong>in</strong>g the<br />

necessary assumptions for successful algorithm application. In Theis (2004a, theorem 3), we<br />

gave precise conditions for when to apply this algorithm and showed that po<strong>in</strong>ts satisfy<strong>in</strong>g these<br />

conditions can <strong>in</strong>deed be found if the sources conta<strong>in</strong> at most one Gaussian component. L<strong>in</strong> used<br />

a discrete approximation of the derivative operator to approximate the Hessian; we suggested<br />

us<strong>in</strong>g kernel-based density estimation, which can be directly differentiated. A similar algorithm<br />

based on Hessian diagonalization had been proposed by Yeredor (2000) us<strong>in</strong>g the character of a<br />

random vector. However, the character is complex-valued, and additional care has to be taken<br />

when apply<strong>in</strong>g a complex logarithm. Basically, this is only well-def<strong>in</strong>ed locally at non-zeros.<br />

In algorithmic terms, the character can be easily approximated by samples. Yeredor suggested<br />

jo<strong>in</strong>t diagonalization of the Hessian of the logarithmic character evaluated at several po<strong>in</strong>ts <strong>in</strong><br />

order to avoid the locality of the algorithm. Instead of jo<strong>in</strong>t diagonalization, we proposed to<br />

use a comb<strong>in</strong>ed energy function based on the previously def<strong>in</strong>ed separator. This also takes <strong>in</strong>to<br />

account global <strong>in</strong>formation, but does not have the drawback of be<strong>in</strong>g s<strong>in</strong>gular at zeros of the<br />

density respectively character.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!