Mathematics in Independent Component Analysis
Mathematics in Independent Component Analysis
Mathematics in Independent Component Analysis
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.2. Uniqueness issues <strong>in</strong> <strong>in</strong>dependent component analysis 7<br />
Instead of a multivariate random process s(t), the theorem is formulated for a random vector<br />
s, which is equivalent to assum<strong>in</strong>g an i.i.d. process. Moreover, the assumption of equal source (n)<br />
and mixture dimension (m) is made, although relaxation to the undercomplete case (1 < n < m)<br />
is straightforward, and to the overcomplete case (n > m > 1) is possible (Eriksson and Koivunen,<br />
2003). The assumption of at most one Gaussian component is crucial, s<strong>in</strong>ce <strong>in</strong>dependence of<br />
white, multivariate Gaussians is <strong>in</strong>variant under orthogonal transformation, so theorem 1.2.1<br />
cannot hold <strong>in</strong> this case.<br />
An algorithm for separation—Hessian ICA<br />
The proof of theorem 1.2.1 is constructive, and the exception of the Gaussians comes <strong>in</strong>to play<br />
naturally as zeros of a certa<strong>in</strong> differential equation. The idea, why separation is possible, becomes<br />
quite clear now. Furthermore, an algorithm can be extracted from the pattern used <strong>in</strong> the proof:<br />
After decorrelation, we can assume that the mix<strong>in</strong>g matrix A is orthogonal. By us<strong>in</strong>g the<br />
transformation properties of the Hessian matrix, we can employ the l<strong>in</strong>ear relationship x = As<br />
to get<br />
Hln px = A⊤Hln psA (1.2)<br />
for the Hessian of the mixtures. The key idea, as we have seen <strong>in</strong> the previous section, is that<br />
due to statistical <strong>in</strong>dependence, the source Hessian Hln ps is diagonal everywhere. Therefore<br />
equation (1.2) represents a diagonalization of the mixture Hessian, and the diagonalizer equals<br />
the mix<strong>in</strong>g matrix A. Such a diagonalization is unique if the eigenspaces of the Hessian are onedimensional<br />
at some po<strong>in</strong>t, and this is precisely the case if x(t) conta<strong>in</strong>s at most one Gaussian<br />
component (Theis, 2004a, lemma 5). Hence, the mix<strong>in</strong>g matrix and the sources can be extracted<br />
algorithmically by simply diagonaliz<strong>in</strong>g the mixture Hessian evaluated at some po<strong>in</strong>t. The<br />
Hessian ICA algorithm consists of local Hessian diagonalization of the logarithmic density (or<br />
equivalently the easier to estimate characteristic function). In order to improve robustness,<br />
multiple matrices are jo<strong>in</strong>tly diagonalized. Apply<strong>in</strong>g this algorithm to the mixtures from our<br />
toy example from figure 1.1 yields very well recovered sources 1.1(c).<br />
A similar algorithm has been proposed already by L<strong>in</strong> (1998), but without consider<strong>in</strong>g the<br />
necessary assumptions for successful algorithm application. In Theis (2004a, theorem 3), we<br />
gave precise conditions for when to apply this algorithm and showed that po<strong>in</strong>ts satisfy<strong>in</strong>g these<br />
conditions can <strong>in</strong>deed be found if the sources conta<strong>in</strong> at most one Gaussian component. L<strong>in</strong> used<br />
a discrete approximation of the derivative operator to approximate the Hessian; we suggested<br />
us<strong>in</strong>g kernel-based density estimation, which can be directly differentiated. A similar algorithm<br />
based on Hessian diagonalization had been proposed by Yeredor (2000) us<strong>in</strong>g the character of a<br />
random vector. However, the character is complex-valued, and additional care has to be taken<br />
when apply<strong>in</strong>g a complex logarithm. Basically, this is only well-def<strong>in</strong>ed locally at non-zeros.<br />
In algorithmic terms, the character can be easily approximated by samples. Yeredor suggested<br />
jo<strong>in</strong>t diagonalization of the Hessian of the logarithmic character evaluated at several po<strong>in</strong>ts <strong>in</strong><br />
order to avoid the locality of the algorithm. Instead of jo<strong>in</strong>t diagonalization, we proposed to<br />
use a comb<strong>in</strong>ed energy function based on the previously def<strong>in</strong>ed separator. This also takes <strong>in</strong>to<br />
account global <strong>in</strong>formation, but does not have the drawback of be<strong>in</strong>g s<strong>in</strong>gular at zeros of the<br />
density respectively character.