14.02.2013 Views

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

Mathematics in Independent Component Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5. IEICE TF E87-A(9):2355-2363, 2004 105<br />

2<br />

2.1 Model<br />

The overdeterm<strong>in</strong>ed ICA model can be formulated as<br />

follows: Let x be a given m-dimensional random vector.<br />

An n × m-matrix W with m > n ≥ 2 is called<br />

overdeterm<strong>in</strong>ed ICA of x if<br />

y = Wx (1)<br />

is <strong>in</strong>dependent. In order to dist<strong>in</strong>guish between overdeterm<strong>in</strong>ed<br />

and ord<strong>in</strong>ary ICA, <strong>in</strong> the case m = n we call<br />

W a square ICA of x.<br />

Here W is not assumed to have full rank as usual.<br />

Theorem 2.1 shows that under reasonable assumptions<br />

this automatically holds.<br />

Often overdeterm<strong>in</strong>ed ICA is used to solve the<br />

overdeterm<strong>in</strong>ed BSS problem given by<br />

x = As (2)<br />

where s is an n-dimensional <strong>in</strong>dependent random vector<br />

and A a m × n matrix with m > n ≥ 2. Note that<br />

A can be assumed to have full rank (rank A = n),<br />

otherwise the system could be reduced to the case n−1:<br />

If A = (a1, . . . , an) with columns ai, <strong>in</strong> the case of<br />

rank A < n we can without loss of generality assume<br />

that an = � n−1<br />

i=1 λiai. Then<br />

x = As =<br />

n�<br />

j=1<br />

n−1<br />

ajsj =<br />

�<br />

(1 + λj)ajsj<br />

j=1<br />

so source sn does not appear <strong>in</strong> the mixture <strong>in</strong> this<br />

case, thus the model can be reduced to the case n −<br />

1. Overdeterm<strong>in</strong>ed ICAs of x are usually considered<br />

solutions to this BSS problem.<br />

Often, overdeterm<strong>in</strong>ed BSS is stated <strong>in</strong> the noisy<br />

case,<br />

x = As + ν (3)<br />

where ν is a decorrelated Gaussian ’noise’ random vector,<br />

<strong>in</strong>dependent of s. Without additional noise, the<br />

sources can be found by solv<strong>in</strong>g for example the square<br />

ICA problem, which is constructed from equation 1<br />

after leav<strong>in</strong>g away the last m − n observations given<br />

non degenerate projected mixture matrix. In the presence<br />

of noise usually projection by pr<strong>in</strong>cipal component<br />

analysis (PCA) is chosen <strong>in</strong> order to reduce this problem<br />

to the square case [14]. In the next section however,<br />

the <strong>in</strong>determ<strong>in</strong>acies of the noise-free models represented<br />

by equations 1 and 2 are studied because overdeterm<strong>in</strong>ed<br />

ICA will only be needed later after reduction of<br />

the the bil<strong>in</strong>ear model — and <strong>in</strong> this model we do not<br />

allow any noise. However, the overdeterm<strong>in</strong>ed noisy<br />

model from equation 3 can easily be reduced to the<br />

noise-free model by <strong>in</strong>clud<strong>in</strong>g ν as additional sources:<br />

x = (A I)<br />

� s<br />

ν<br />

�<br />

.<br />

IEICE TRANS. FUNDAMENTALS, VOL.E87–A, NO.9 SEPTEMBER 2004<br />

In this case n <strong>in</strong>creases and we could possibly deal with<br />

underdeterm<strong>in</strong>ed ICA (where extra care has to be taken<br />

with the now <strong>in</strong>creased number of Gaussians <strong>in</strong> the<br />

sources). Uniqueness and separability results for this<br />

case are given <strong>in</strong> [7], which shows that the follow<strong>in</strong>g<br />

theorems also hold <strong>in</strong> this noisy ICA model.<br />

2.2 Indeterm<strong>in</strong>acies<br />

The follow<strong>in</strong>g theorem presents the <strong>in</strong>determ<strong>in</strong>acy of<br />

the unmix<strong>in</strong>g matrix <strong>in</strong> the case of overdeterm<strong>in</strong>ed mixtures,<br />

with the slight generalization that this unmix<strong>in</strong>g<br />

matrix does not necessarily have to be of full rank.<br />

Later <strong>in</strong> this section we show that it is necessary to assume<br />

that the observed data set x is <strong>in</strong>deed a mixture.<br />

Theorem 2.1 (Indeterm<strong>in</strong>acies of overdeterm<strong>in</strong>ed<br />

ICA). Let m ≥ n ≥ 2. Let x = As as <strong>in</strong><br />

the model of equation 2, and let the n × m matrix W<br />

be an overdeterm<strong>in</strong>ed or square ICA of x such that at<br />

most one component of Wx is determ<strong>in</strong>istic. Furthermore<br />

assume one of the follow<strong>in</strong>g:<br />

i. s has at most one Gaussian component and the<br />

variances of s exist.<br />

ii. s has no Gaussian component.<br />

Then there exist a permutation matrix P and an <strong>in</strong>vertible<br />

scal<strong>in</strong>g matrix L with<br />

W = LP(A ⊤ A) −1 A ⊤ + C (4)<br />

where C is an n × m matrix with rows ly<strong>in</strong>g the kernel<br />

of A ⊤ that is with CA = 0. The converse also holds,<br />

i.e. if W fulfills equation 4 then Wx is <strong>in</strong>dependent.<br />

A less general form of this <strong>in</strong>determ<strong>in</strong>acy has been<br />

given by Joho et al. [14]. In the square case, the above<br />

theorem shows that it is not necessary to assume that<br />

mix<strong>in</strong>g and especially demix<strong>in</strong>g matrix have full rank if<br />

it is assumed that the transformation also has at most<br />

one determ<strong>in</strong>istic component. Obviously, this assumption<br />

is not necessary if W is required to have full rank.<br />

S<strong>in</strong>ce rank A = n, the pseudo <strong>in</strong>verse (Moore-<br />

Penrose <strong>in</strong>verse) A + of A has the explicit form A + =<br />

(A ⊤ A) −1 A ⊤ . Note that A + A = I. So from equation 4<br />

we get WA = LP. We remark as corollary to the above<br />

theorem that overdeterm<strong>in</strong>ed ICA is separable, which<br />

means that the sources are uniquely (except for scal<strong>in</strong>g<br />

and permutation) reconstructible, because for approximated<br />

sources y we get y = Wx = WAs = LPs. This<br />

of course is well-known because overdeterm<strong>in</strong>ed BSS<br />

can be reduced to square BSS (m = n) by projection;<br />

yet the <strong>in</strong>determ<strong>in</strong>acies of the demix<strong>in</strong>g matrix, which<br />

are simple permutation and scal<strong>in</strong>g <strong>in</strong> the square case,<br />

are not so obvious for overdeterm<strong>in</strong>ed BSS.<br />

Proof of theorem 2.1. Consider B := WA and y :=<br />

Bs. Let b1, . . . , bn ∈ R n denote the (transposed) rows<br />

of B. Then

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!