Séparation de sources. Principes et algorithmes - Gretsi
Séparation de sources. Principes et algorithmes - Gretsi
Séparation de sources. Principes et algorithmes - Gretsi
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong>. <strong>Principes</strong> <strong>et</strong> <strong>algorithmes</strong><br />
Pierre Comon (1), Christian Jutten (2)<br />
(1) : I3S, CNRS, Univ. of Nice, Sophia-Antipolis<br />
(2) : GIPSA-lab, CNRS, Univ. <strong>de</strong> Grenoble<br />
c○ 2011<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS . 1
Contents of the course<br />
I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />
II Tools and Optimization criteria . . . . . . . . . . . . . . . . . 40<br />
III Second Or<strong>de</strong>r M<strong>et</strong>hods . . . . . . . . . . . . . . . . . . . . . . . 122<br />
IV Algorithms with Prewhitening . . . . . . . . . . . . . . . . . 152<br />
V Algorithms without Prewhitening . . . . . . . . . . . . . . 199<br />
VI Convolutive Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . 230<br />
VII NonLinear Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . 291<br />
VIII Un<strong>de</strong>rD<strong>et</strong>ermined Mixtures . . . . . . . . . . . . . . . . . . . . 329<br />
IX Nonnegative mixtures . . . . . . . . . . . . . . . . . . . . . . . . . 441<br />
X Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456<br />
B Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .562<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS . 2
Problème ACI Indépendance<br />
I. Le problème <strong>et</strong> les principes <strong>de</strong><br />
résolution<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 3
Problème ACI Indépendance<br />
1 Le problème <strong>de</strong> séparation <strong>de</strong> <strong>sources</strong><br />
2 <strong>Principes</strong> <strong>de</strong> l’ACI<br />
3 Mesure <strong>de</strong> dépendance<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 4
Problème ACI Indépendance<br />
<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> : un problème courant ?<br />
Un exemple<br />
Le signal délivré par un capteur (électro<strong>de</strong>, antenne, microphone,<br />
<strong>et</strong>c.) est un mélange <strong>de</strong> signaux<br />
La question<br />
Est-il possible <strong>de</strong> r<strong>et</strong>rouver les différents signaux (les <strong>sources</strong>) à partir<br />
du mélange observé en sortie du capteur ?<br />
Si oui, à quelle condition ? Et comment ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 5
Problème ACI Indépendance<br />
<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> avec <strong>de</strong>s a priori (1/3)<br />
L’idée <strong>de</strong> Widrow <strong>et</strong> Hoff (1960)<br />
On dispose alors d’une référence du signal <strong>de</strong> bruit :<br />
la référence est le bruit à un gain près, s2(t), ou une version<br />
filtrée du bruit, G(s2) ;<br />
on estime un filtre qui annule la contribution <strong>de</strong> s2 dans<br />
l’observation, c’est-à-dire tel que la différence n’est plus<br />
corrélée avec la référence <strong>de</strong> bruit.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 6
d’où :<br />
Problème ACI Indépendance<br />
<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> avec <strong>de</strong>s a priori (2/3)<br />
E[(as1(t) + (b − k)s2(t))s2(t)] = 0 (1)<br />
E[x1(t)s2(t)] = kE[s 2 2(t)], (2)<br />
k = E[x1(t)s2(t)]/E[s 2 2(t)] (3)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 7
Problème ACI Indépendance<br />
<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> avec <strong>de</strong>s a priori (3/3)<br />
Approche classique<br />
Si le signal utile <strong>et</strong> le bruit sont dans <strong>de</strong>s ban<strong>de</strong>s <strong>de</strong> fréquences<br />
différentes<br />
on peut faire la séparation par filtrage.<br />
En absence d’a priori<br />
Que faire...<br />
si le signal utile <strong>et</strong> le bruit sont dans la même gamme <strong>de</strong><br />
fréquences ?<br />
si on ne dispose pas d’une référence <strong>de</strong> bruit ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 8
Problème ACI Indépendance<br />
<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> : l’idée fondamentale<br />
Plusieurs observations<br />
La séparation <strong>de</strong>vient possible si l’on a plusieurs observations :<br />
plus <strong>de</strong> capteurs que <strong>de</strong> <strong>sources</strong>,<br />
mélanges, différents, non proportionnels ad = bc<br />
C’est la diversité spatiale<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 9
Problème ACI Indépendance<br />
Origine du problème<br />
Décodage du mouvement chez les vertébrés [HJA85]<br />
Questions ?<br />
fI (t) = a11p(t) + a12v(t)<br />
fII (t) = a21p(t) + a22v(t).<br />
Peut-on r<strong>et</strong>rouver p(t) <strong>et</strong> v(t) à partir <strong>de</strong>s mélanges ?<br />
Si oui, à quelles conditions ? Et comment ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 10
Problème ACI Indépendance<br />
Formalisation mathématique<br />
Les hypothèses sur le mélange (inversible)<br />
Mélange linéaire instantané<br />
Mélange linéaire convolutif<br />
Mélange non linéaire<br />
<strong>Principes</strong> <strong>de</strong> solution<br />
Bloc <strong>de</strong> séparation, B, adapté au mélange A pour l’inverser<br />
Comment estimer B ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 11
Problème ACI Indépendance<br />
Un problème mal posé ?<br />
Impossible sans hypothèses supplémentaires<br />
Hypothèse : on connait A, B adapté à A<br />
Le problème est mal posé, quel que soit le type <strong>de</strong> mélange<br />
Il faut ajouter <strong>de</strong>s hypothèses sur les <strong>sources</strong> s1(t), . . . , sP(t)<br />
Quels a priori ?<br />
Les <strong>sources</strong> sont non corrélées : insuffisant<br />
Les <strong>sources</strong> sont statistiquement indépendantes → ICA<br />
Autres a priori : <strong>sources</strong> discrètes, <strong>sources</strong> à support borné,<br />
<strong>sources</strong> parcimonieuses, <strong>sources</strong> positives, <strong>et</strong>c.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 12
Problème ACI Indépendance<br />
Résultat <strong>de</strong> Darmois (1953) [Dar53]<br />
Analyse factorielle linéaire<br />
Mélange linéaire : x = As<br />
Hypothèse : les composantes du vecteur aléatoire s sont<br />
mutuellement indépendantes<br />
Résultat théorique<br />
<strong>Séparation</strong> impossible si les <strong>sources</strong> sont indépendantes <strong>et</strong><br />
i<strong>de</strong>ntiquement distribuées (iid) ET gaussiennes<br />
Pistes pour la séparation<br />
<strong>Séparation</strong> possible si les <strong>sources</strong> sont iid ET NON gaussiennes<br />
: ICA, <strong>et</strong> statistiques d’ordre supérieur à 2<br />
<strong>Séparation</strong> possible si les <strong>sources</strong> sont gaussiennes (statistiques<br />
d’ordre 2) ET NON iid :<br />
<strong>sources</strong> temporellement corrélées<br />
<strong>sources</strong> non stationnaires<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 13
iid non gaussiennes<br />
Problème ACI Indépendance<br />
Commentaires sur les hypothèses<br />
non gaussiennes : on peut (doit) utiliser <strong>de</strong>s statistiques<br />
d’ordre supérieur à <strong>de</strong>ux<br />
iid : on n’exploite pas les relations temporelles entre<br />
échantillons...<br />
par conséquent, l’ACI peut séparer <strong>de</strong>s <strong>sources</strong> iid ou non<br />
Gaussiennes non iid<br />
non iid : on exploite les relations temporelles entre<br />
échantillons...<br />
gaussiennes : on se restreint aux statistiques jusqu’à l’ordre 2<br />
par conséquent, ces approches à l’ordre 2 sont optimales pour<br />
<strong>de</strong>s <strong>sources</strong> gaussiennes, mais peuvent séparer <strong>de</strong>s <strong>sources</strong> non<br />
gaussiennes<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 14
Problème ACI Indépendance<br />
Analyse en composantes indépendantes (ICA)<br />
Principe le la métho<strong>de</strong><br />
Mélange général : x = A(s), avec P = K <strong>et</strong> A inversible<br />
Hypothèse : les composantes <strong>de</strong> la source s sont mutuellement<br />
indépendantes<br />
Idée : estimer B <strong>de</strong> sorte que y = B(x) ait <strong>de</strong>s composantes<br />
mutuellement indépendantes<br />
Questions<br />
Si y est indépendante, est-ce que B = A −1 ?<br />
Critère pour estimer B = indépendance <strong>de</strong> s. Comment le<br />
mesurer ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 15
Modèle <strong>de</strong> mélange<br />
Problème ACI Indépendance<br />
ICA : mélanges linéaires instantanés<br />
x = As<br />
Résultat théorique (Comon, HOS 1991 <strong>et</strong> SP 1994) [Com94a]<br />
Soit x(t) = As(t), où A est une matrice régulière <strong>et</strong> s(t) est<br />
un vecteur source à composantes statistiquement<br />
indépendantes, dont au plus une est gaussienne, alors<br />
y(t) = Bx(t) est un vecteur à composantes mutuellement<br />
indépendantes si <strong>et</strong> seulement si BA = DP, où D est une<br />
matrice diagonale <strong>et</strong> P est une matrice <strong>de</strong> permutation.<br />
Commentaires<br />
Indépendance = séparation, avec <strong>de</strong>s hypothèses faibles : A<br />
régulière, <strong>sources</strong> non gaussiennes indépendantes<br />
Indétermination d’échelle <strong>et</strong> <strong>de</strong> permutation<br />
Si plusieurs <strong>sources</strong> sont gaussiennes...?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 16
Modèle <strong>de</strong> mélange<br />
Problème ACI Indépendance<br />
ICA : mélanges linéaires convolutifs<br />
x(t) = A(t) ∗ s(t)<br />
ou en temps discr<strong>et</strong> :<br />
xi(n) = aij(k)sj(n − k), ∀i = 1, . . . , K<br />
Premiers résultats théoriques [YW94, TJ95]<br />
Soient x(t) = [A(z)]s(t), où A(z) est une matrice <strong>de</strong> filtres<br />
inversible <strong>et</strong> s(t) est un vecteur source à composantes<br />
statistiquement indépendantes, dont au plus une est<br />
gaussienne, alors y(t) = [B(z)]x(t) est un vecteur à<br />
composantes mutuellement indépendantes si <strong>et</strong> seulement si<br />
B(z)A(z) = D(z)P, où D(z) est une matrice <strong>de</strong> filtres<br />
diagonale <strong>et</strong> P est une matrice <strong>de</strong> permutation.<br />
Commentaires<br />
Sources r<strong>et</strong>rouvées à un filtre <strong>et</strong> une permutation près<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 17
Modèle <strong>de</strong> mélange<br />
Problème ACI Indépendance<br />
ICA : mélanges non linéaires (NL)<br />
x(t) = A(s(t))<br />
Impossible en général [Dar53, HP99]<br />
Dans la cas général, l’indépendance <strong>de</strong> y n’implique pas la<br />
séparation <strong>de</strong> <strong>sources</strong> : y(t) = B(x(t)) peut avoir <strong>de</strong>s<br />
composantes indépendantes avec B ◦ A non diagonale !<br />
Si l’ACI perm<strong>et</strong> la séparation, la solution est indéterminée à<br />
une permutation <strong>et</strong> une fonction non linéaire près : f (si) avec<br />
f (.) fonction NL inconnue.<br />
Possible pour certains mélanges<br />
Mélanges post-non-linéaires [TJ99a, BZJN02, JBZH04, AJ05]<br />
Mélanges NL linéarisables [KLR73, EK02]<br />
Mélanges bilinéaires <strong>et</strong> linéaires-quadratiques (sans preuve<br />
d’i<strong>de</strong>ntifiabilité : [HD03, DH09])<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 18
Problème ACI Indépendance<br />
Aveugle, vous avez dit aveugle ?<br />
La séparation <strong>de</strong> <strong>sources</strong> est un problème...<br />
non aveugle : nature du mélange<br />
non aveugle : hypothèses supplémentaires sur les <strong>sources</strong><br />
indépendance <strong>de</strong>s <strong>sources</strong> : hypothèse très large (même si mise<br />
en oeuvre complexe), d’où le terme (abusif) aveugle associé à<br />
l’ACI<br />
Autres informations a priori<br />
Autres a priori : autres critères <strong>et</strong> autres métho<strong>de</strong>s<br />
Sources avec dépendances temporelles entre échantillons :<br />
métho<strong>de</strong>s à l’ordre 2<br />
Sources discrètes, à support borné : métho<strong>de</strong>s algébriques ou<br />
géométriques<br />
Parcimonie : masquage dans le plan t-f, analyse en<br />
composantes parcimonieuses (SCA)<br />
P.Comon &Positivité C.Jutten, Peyresq : factorisation July 2011 non-négative<br />
BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 19
Problème ACI Indépendance<br />
Mélanges déterminés <strong>et</strong> sur-déterminés<br />
Mélanges déterminés<br />
Autant <strong>de</strong> mélanges que <strong>de</strong> <strong>sources</strong>, K = P<br />
I<strong>de</strong>ntifier A ou son inverse fournit directement les <strong>sources</strong>.<br />
Mélanges sur-déterminés<br />
Plus <strong>de</strong> mélanges que <strong>de</strong> <strong>sources</strong>, K > P<br />
Dans le cas linéaire : solution si la matrice <strong>de</strong> mélange est <strong>de</strong><br />
rang plein<br />
Pré-traitement par PCA pour proj<strong>et</strong>er dans l’espace signal<br />
avant séparation<br />
Dans les <strong>de</strong>ux cas, on peut utiliser<br />
la métho<strong>de</strong> directe : i<strong>de</strong>ntifier A puis l’inverser,<br />
la métho<strong>de</strong> indirecte : i<strong>de</strong>ntifier B, une inverse <strong>de</strong> A.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 20
Problème ACI Indépendance<br />
Mélanges sous-déterminés<br />
Plus <strong>de</strong> <strong>sources</strong> que <strong>de</strong> mélanges, P > K<br />
Si on connait A (pas d’inverse !), on ne déduit pas simplement<br />
s<br />
I<strong>de</strong>ntification <strong>de</strong> A <strong>et</strong> restauration <strong>de</strong>s <strong>sources</strong> sont <strong>de</strong>ux<br />
problèmes distincts <strong>et</strong> difficiles<br />
Sans a priori supplémentaire, pas <strong>de</strong> solution<br />
Dans la cas linéaire, si A est connu, on peut r<strong>et</strong>rouver les<br />
<strong>sources</strong> à un vecteur aléatoire près [TJ99a]<br />
Solution possible si les <strong>sources</strong> sont discrètes ou parcimonieuses<br />
La métho<strong>de</strong> indirecte est donc impossible<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 21
Problème ACI Indépendance<br />
Indéterminations <strong>de</strong>s solutions revisitées<br />
Propriétés <strong>de</strong> l’indépendance<br />
Si <strong>de</strong>ux <strong>sources</strong> si <strong>et</strong> sj sont indépendantes, alors<br />
sj <strong>et</strong> si sont aussi indépendantes,<br />
kjsj <strong>et</strong> kisi, où ki, kj sont <strong>de</strong>ux scalaires quelconques, sont<br />
aussi indépendantes<br />
fj(sj) <strong>et</strong> fi(si), où fi, fj sont <strong>de</strong>ux fonctions non linéaires<br />
quelconques, sont aussi indépendantes<br />
Hj(sj) <strong>et</strong> Hi(si), où Hi, Hj sont <strong>de</strong>ux filtres quelconques, sont<br />
aussi indépendantes<br />
Propriétés <strong>de</strong>s observations : cas linéaire instantané<br />
Les observations xi(t) = <br />
j aijsj(t) ne sont pas discernables<br />
<strong>de</strong> xi(t) = <br />
j<br />
aij<br />
αj αjsj(t).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 22
Problème ACI Indépendance<br />
Conséquences <strong>de</strong>s indéterminations<br />
Directes, sur la matrice <strong>de</strong> mélange<br />
On ne peut pas estimer la puissance <strong>de</strong>s <strong>sources</strong><br />
P paramètres <strong>de</strong> A ne sont pas i<strong>de</strong>ntifiables ; on ne peut<br />
estimer que les vecteurs colonne <strong>de</strong> A, c’est-à-dire les<br />
“directions” <strong>de</strong>s <strong>sources</strong><br />
on peut imposer à A la contrainte : aii = 1<br />
Indirectes, sur la matrice <strong>de</strong> séparation<br />
On peut fixer arbitrairement la puissance <strong>de</strong>s <strong>sources</strong><br />
estimées : normalisation ou pré-blanchiment<br />
P paramètres <strong>de</strong> B ne sont pas i<strong>de</strong>ntifiables : on peut fixer<br />
bii = 1, ou normaliser les lignes <strong>de</strong> B<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 23
Problème ACI Indépendance<br />
Indépendance <strong>de</strong> variables <strong>et</strong> vecteurs aléatoires<br />
Factorisation <strong>de</strong>s lois ou <strong>de</strong>nsités <strong>de</strong> probabilité (ddp)<br />
Deux variables aléatoires<br />
2 variables aléatoires S1 <strong>et</strong> S2 sont indépendantes ssi<br />
pS1S2 (u1, u2) = pS1 (u1)pS2 (u2)<br />
Vecteurs aléatoires<br />
Soit S un vecteur aléatoire à P composantes S1, . . . SP<br />
Les composantes sont mutuellement indépendantes ssi<br />
pS1S2...SP (u1, u2, . . . uP) = pS1 (u1)pS2 (u2) . . . pSP (uP)<br />
Commentaires<br />
Critère délicat à m<strong>et</strong>tre en oeuvre<br />
Egalité entre <strong>de</strong>ux fonctions multivariables<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 24
Problème ACI Indépendance<br />
Première fonction caractéristique<br />
Fonctions caractéristiques<br />
C’est la T. Fourier inverse <strong>de</strong> la ddp <strong>de</strong> S<br />
Conséquence : même information que dans la ddp<br />
Φ(ν) = E[exp(jν T S)] = . . . pS(u) exp(jν T S)du<br />
Secon<strong>de</strong> fonction caractéristique (SFC)<br />
C’est le log <strong>de</strong> la première fonction caractéristique<br />
Conséquence : même information que dans la ddp<br />
Ψ(ν) = log Φ(ν)<br />
Propriétés <strong>de</strong> la SFC<br />
Ψ(ν) = 0 pour ν = 0<br />
Elle est continue <strong>et</strong> dérivable en ν = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 25
Problème ACI Indépendance<br />
Développement <strong>de</strong> la SFC <strong>et</strong> cumulants<br />
Développement <strong>de</strong> la SFC<br />
Ψ(ν) = ∞ (jν)p<br />
p=1<br />
κp p!<br />
Cumulant d’ordre p<br />
κp = (−j) p d p Ψ(ν)<br />
dν p<br />
<br />
<br />
ν=0<br />
Cumulants d’une variable centrée<br />
κ1 = µ1 = 0 (4)<br />
κ2 = µ2<br />
κ3 = µ3<br />
κ4 = µ4 − 3µ 2 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 26<br />
(5)<br />
(6)<br />
(7)
Problème ACI Indépendance<br />
P<strong>et</strong>it exercice<br />
Calcul <strong>de</strong>s cumulants d’ordre 2, 3 <strong>et</strong> 4 d’une variable centrée<br />
Définition<br />
Ψ(ν) = ∞ (jν)p<br />
p=1<br />
κp p!<br />
Développement en ν = 0<br />
Ψ(ν) = ln Φ(ν) = ln Φ(0)+<br />
Détermination <strong>de</strong>s cumulants...<br />
par simple i<strong>de</strong>ntification<br />
d ln Φ(ν) <br />
<br />
dν ν+ ν=0 d 2 ln Φ(ν)<br />
dν 2<br />
<br />
<br />
ν=0<br />
ν2 2 +. . .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 27
Problème ACI Indépendance<br />
Indépendance, fonction caractéristique <strong>et</strong> cumulants<br />
Indépendance <strong>et</strong> fonction caractéristique<br />
pS1S2...SP (u1, u2, . . . uP) = pS1 (u1)pS2 (u2) . . . pSP (uP) (8)<br />
Développement <strong>de</strong> la SFC<br />
Φ(ν) = Φ(ν1)Φ(ν2) . . . Φ(νP) (9)<br />
ln Φ(ν) = ln Φ(ν1) + . . . + ln Φ(νP) (10)<br />
Ψ(ν) = Ψ(ν1) + Ψ(ν2) + . . . + Ψ(νP) (11)<br />
le terme <strong>de</strong> gauche contient <strong>de</strong>s termes croisés<br />
les termes <strong>de</strong> droites ne contiennent que <strong>de</strong>s dérivées par<br />
rapport à une seule variable<br />
Problème<br />
indépendance ssi tous les cumulants croisés sont nuls<br />
Il y a une infinité <strong>de</strong> cumulants<br />
Peut-on s’arrêter à un ordre donné ? Comment le définir ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 28
Définition<br />
Problème ACI Indépendance<br />
Divergence <strong>de</strong> Kullback-Leibler<br />
KL(f g) = . . . f (u) log<br />
f (u)<br />
g(u) du<br />
KL(f g) mesure l’écart entre les <strong>de</strong>ux distributions f <strong>et</strong> g<br />
c’est un scalaire positif qui s’annule ssi f = g (la<br />
démonstration est aisée en utilisant l’inégalité <strong>de</strong> Jensen)<br />
Divergence KL pour mesure l’indépendance<br />
KL(pYΠpYi ) = . . . pY(u) log pY(u)<br />
ΠpY i (ui ) du<br />
KL(pYΠpYi ) est un scalaire positif qui s’annule ssi les<br />
composantes du vecteur aléatoire Y sont mutuellement<br />
indépendantes<br />
On peut i<strong>de</strong>ntifier c<strong>et</strong>te mesure, KL(pYΠpYi ), à l’information<br />
mutuelle (IM) I (Y) définie en théorie <strong>de</strong> l’information<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 29
Définition<br />
Problème ACI Indépendance<br />
Décomposition <strong>de</strong> l’IM à l’ai<strong>de</strong> d’entropie<br />
<br />
I (Y) = . . . pY(u) log pY(u)<br />
du (12)<br />
ΠpYi (ui)<br />
= <br />
H(Yi) − H(Y) (13)<br />
i<br />
avec les définitions suivantes <strong>de</strong>s entropies marginales <strong>et</strong> jointes<br />
<br />
H(Yi) = pYi (ui) log pYi (ui)dui<br />
<br />
(14)<br />
H(Y) = . . . pY(u) log pY(u)du (15)<br />
Problème<br />
L’estimation <strong>de</strong>s entropies, jointe <strong>et</strong> marginales, nécessite<br />
l’estimation <strong>de</strong>s <strong>de</strong>nsités, jointe <strong>et</strong> marginales, resp.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 30
Problème ACI Indépendance<br />
Une astuce dans le cas linéaire<br />
Dans le cas déterminé, avec une matrice <strong>de</strong> mélange A inversible,<br />
on a :<br />
L’astuce<br />
I (Y) = <br />
H(Yi) − H(Y) (16)<br />
i<br />
= <br />
H(Yi) − H(X) − E[log | <strong>de</strong>t B |] (17)<br />
i<br />
Conséquence<br />
<br />
minB i H(Yi)<br />
<br />
− H(Y) ⇔ minB i H(Yi) − log | <strong>de</strong>t B |<br />
L’astuce évite l’estimation <strong>de</strong> l’entropie jointe !<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 31
Problème ACI Indépendance<br />
Minimisation <strong>de</strong> l’IM <strong>et</strong> fonction score<br />
Dans le cas linéaire, on estime une matrice <strong>de</strong> séparation B qui<br />
minimise I (y).<br />
Dérivée <strong>de</strong> l’IM par rapport à B<br />
dI (Y)<br />
dB<br />
= <br />
i<br />
= <br />
i<br />
dH(Yi)<br />
dB<br />
+ d log | <strong>de</strong>t B |<br />
dB<br />
−d log pYi(yi) dyi<br />
E[ ] + B−T<br />
dyi dB<br />
<br />
fonction score <strong>de</strong> yi<br />
Minimisation <strong>de</strong> l’IM <strong>et</strong> annulation <strong>de</strong> HOS<br />
On écrit<br />
dI (Y) dI (Y)<br />
dB = 0 : dB = E[ϕY(Y)XT ] + B−T = 0<br />
soit, en multipliant par BT :<br />
dI (Y)<br />
dB = 0 ⇔ E[ϕY(Y)YT ] + I = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 32
Pourquoi ?<br />
Problème ACI Indépendance<br />
Critères mieux adaptés<br />
Exploiter <strong>de</strong>s a priori (1/4)<br />
Algorithmes plus simples<br />
Relâchement d’hypothèses<br />
Sélection <strong>de</strong>s <strong>sources</strong> séparées<br />
Sources colorées <strong>et</strong> non stationnaires<br />
Exploite les dépendances temporelles (<strong>sources</strong> non iid)<br />
Voir chapitre sur les métho<strong>de</strong>s au second ordre<br />
Sources positives<br />
Exploite la positivité <strong>de</strong>s <strong>sources</strong> <strong>et</strong>/ou <strong>de</strong>s mélanges<br />
Voir chapitre consacré aux mélanges non négatifs<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 33
Sources discrètes<br />
Problème ACI Indépendance<br />
Exploiter <strong>de</strong>s a priori (2/4)<br />
Si D est le nombre d’états,<br />
Les mélanges (bruités) sont <strong>de</strong>s ensembles <strong>de</strong> points (nuages)<br />
Au total, D P points (nuages)<br />
Mélange linéaire <strong>de</strong> <strong>sources</strong> binaires : points sur les somm<strong>et</strong>s<br />
d’un parallépipè<strong>de</strong><br />
s1 = 0 ⇒ x = a2s2<br />
s2 = 0 ⇒ x = a1s1<br />
où ai = colonne i <strong>de</strong> A<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 34
Problème ACI Indépendance<br />
Sources à support borné<br />
Exploiter <strong>de</strong>s a priori (3/4)<br />
Sources dont la ddp est à support borné, avec valeurs non<br />
nulles sur les bornes<br />
Dans le cas <strong>de</strong> mélange linéaire : les observations sont inscrites<br />
dans un parallépipè<strong>de</strong><br />
s1 = 0 ⇒ x = a2s2<br />
s2 = 0 ⇒ x = a1s1<br />
où ai = colonne i <strong>de</strong> A<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 35
Problème ACI Indépendance<br />
Sources parcimonieuses<br />
Exploiter <strong>de</strong>s a priori (4/4)<br />
Sources dont les échantillons sont presque toujours nuls<br />
Parcimonie dans un espace <strong>de</strong> représentation adapté<br />
Cas <strong>de</strong> mélange linéaire : x = As ⇒ T (x) = T (As) = AT (s)<br />
Voir les cours <strong>de</strong> Stark <strong>et</strong> Dau<strong>de</strong>t<br />
s1 = 0 ⇒ x = a2s2<br />
s2 = 0 ⇒ x = a1s1<br />
où ai = colonne i <strong>de</strong> A<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 36
Problème ACI Indépendance<br />
Cas <strong>de</strong> mélanges linéaires<br />
<strong>Séparation</strong> <strong>de</strong> mélanges bruités<br />
Les observations sont : x(t) = As(t) + n(t), où n est un bruit<br />
additif, indépendant <strong>de</strong> s.<br />
La présence <strong>de</strong> bruit entraîne généralement une erreur sur<br />
l’estimation <strong>de</strong> A ou <strong>de</strong> son inverse.<br />
Si on suppose que B est parfaitement estimée, soit B = A −1 ,<br />
alors:<br />
y(t) = Bx(t) = s(t) + Bn(t)<br />
La séparation idéale conduit évi<strong>de</strong>mment au meilleur SIR, mais<br />
pas forcément au meilleur SNR<br />
Pour <strong>de</strong>s détails voir Chapitres 1 (1.4) <strong>et</strong> 4 (4.7) dans [CJ10]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 37
Problème ACI Indépendance<br />
ACI : supervisée ou non ?<br />
ACI : une métho<strong>de</strong> d’apprentissage<br />
Estimation à partir <strong>de</strong> données empiriques<br />
Apprentissage au sens “machine learning”<br />
Supervisé ou non ?<br />
Critère d’indépendance<br />
Mesure d’indépendance repose sur les données<br />
Non supervisé... aveugle par opposition à “supervisé”, qui<br />
<strong>de</strong>man<strong>de</strong> <strong>de</strong>s informations extérieures (superviseur, valeurs<br />
désirées)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 38
Algebraic tools Statistical tools Criteria<br />
II. Tools and optimization criteria<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 39
Algebraic tools Statistical tools Criteria<br />
4 Algebraic tools<br />
Spatial whitening<br />
Tensors<br />
5 Statistical tools<br />
In<strong>de</strong>pen<strong>de</strong>nce<br />
Cumulants<br />
Mutual Information<br />
6 Optimization Criteria<br />
Cumulant matching<br />
Contrasts<br />
Contents of course II<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 40
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Spatial whitening (1)<br />
Standardization via Cholesky or QR L<strong>et</strong> x be a zero-mean r.v.<br />
with covariance matrix:<br />
Then Cholesky yields:<br />
Γx <strong>de</strong>f<br />
= E{x x H }<br />
∃L / L L H = Γx<br />
Consequence: L −1 x has a unit variance.<br />
Variable ˜x <strong>de</strong>f<br />
= L −1 x is a standardized random variable.<br />
QR factorization of data matrix as X = L ˜X yields same L as<br />
Cholesky factorization of sample covariance, but more<br />
accurate.<br />
Limitation: L may not be invertible if the covariance Γx is not<br />
full rank.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 41
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Spatial whitening (2)<br />
Standardization via PCA<br />
Definition<br />
PCA is based on second or<strong>de</strong>r statistics<br />
Observed random variable x of dimension K. Then ∃(U, z):<br />
x = Uz, U unitary<br />
where Principal Components zi are uncorrelated<br />
ith column ui of U is called ith PC Loading vector<br />
Two possible calculations:<br />
EVD of Covariance Rx: Rx = UΣ 2 U H<br />
Sample estimate by SVD: X = UΣV H<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 42
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Spatial whitening (3)<br />
Summary<br />
Find a linear transform L such that vector ˜x <strong>de</strong>f<br />
= Lx has unit<br />
covariance. Many possibilities, including:<br />
Remarks<br />
PCA yields ˜x = Σ −1 U H x<br />
Cholesky Rx = L L H yields ˜x = L −1 x<br />
Infinitely many possibilities: L is as good as L Q, for any<br />
unitary Q.<br />
If Rx not invertible, then L not invertible (ill-posed). One may<br />
use pseudo-inverse of Σ in PCA to compute L, or regularize<br />
Rx.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 43
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Bibliography<br />
Antenna Array Processing: [BK83] [Sch86]<br />
Implementations: [CJ10], LaPack, LinPack<br />
Adaptive versions: via QR [Com89a], or via SVD [CG90], or<br />
stochastic relative gradient [CL96]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 44
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Arrays and Multi-linearity<br />
A tensor of or<strong>de</strong>r d is a multi-linear map:<br />
S ∗ 1 ⊗ . . . ⊗ S∗ m → Sm+1 ⊗ . . . ⊗ Sd<br />
Once bases of spaces Sℓ are fixed, they can be represented by<br />
d-way arrays of coordinates<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 45
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Arrays and Multi-linearity<br />
A tensor of or<strong>de</strong>r d is a multi-linear map:<br />
S ∗ 1 ⊗ . . . ⊗ S∗ m → Sm+1 ⊗ . . . ⊗ Sd<br />
Once bases of spaces Sℓ are fixed, they can be represented by<br />
d-way arrays of coordinates<br />
➽ bilinear form, or linear operator: represented by a matrix<br />
➽ trilinear form, or bilinear operator: by a 3rd or<strong>de</strong>r tensor.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 45
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Definitions Table T = {Tij..k}<br />
Arrays and Tensors<br />
Or<strong>de</strong>r of T <strong>de</strong>f<br />
= # of its ways = # of its indices<br />
Dimension Kℓ<br />
<strong>de</strong>f<br />
= range of the ℓth in<strong>de</strong>x<br />
T is Cubic when all dimensions Kℓ = K are equal<br />
T is Symm<strong>et</strong>ric when it is square and when its entries do not<br />
change by any permutation of indices<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 46
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Compact notation<br />
Multi-linearity<br />
Linear change in contravariant spaces:<br />
<br />
=<br />
Denoted compactly<br />
T ′<br />
ijk<br />
Example: covariance matrix<br />
npq<br />
AinBjpCkqTnpq<br />
T ′ = (A, B, C) · T (18)<br />
z = Ax ⇒ Rz = (A, A) · Rx = A RxA T<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 47
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Decomposable tensor<br />
A dth or<strong>de</strong>r “<strong>de</strong>composable” tensor is the tensor product of d<br />
vectors:<br />
T = u ⊗ v ⊗ . . . ⊗ w<br />
and has coordinates Tij...k = ui vj . . . wk.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 48
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Decomposable tensor<br />
A dth or<strong>de</strong>r “<strong>de</strong>composable” tensor is the tensor product of d<br />
vectors:<br />
T = u ⊗ v ⊗ . . . ⊗ w<br />
and has coordinates Tij...k = ui vj . . . wk.<br />
may be seen as a discr<strong>et</strong>ization of multivariate functions whose<br />
variables separate:<br />
t(x, y, . . . , z) = u(x) v(y) . . . w(z)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 48
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Decomposable tensor<br />
A dth or<strong>de</strong>r “<strong>de</strong>composable” tensor is the tensor product of d<br />
vectors:<br />
T = u ⊗ v ⊗ . . . ⊗ w<br />
and has coordinates Tij...k = ui vj . . . wk.<br />
may be seen as a discr<strong>et</strong>ization of multivariate functions whose<br />
variables separate:<br />
t(x, y, . . . , z) = u(x) v(y) . . . w(z)<br />
Nothing else but rank-1 tensors, with forthcoming <strong>de</strong>finition<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 48
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Example<br />
Take v =<br />
1<br />
−1<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 49
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Example<br />
<br />
1<br />
Take v =<br />
−1<br />
<br />
⊗ Then v 3 <strong>de</strong>f<br />
1 −1 −1 1<br />
= v ⊗ v ⊗ v =<br />
−1 1 1 −1<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 49
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Example<br />
<br />
1<br />
Take v =<br />
−1<br />
<br />
⊗ Then v 3 <strong>de</strong>f<br />
1 −1 −1 1<br />
= v ⊗ v ⊗ v =<br />
−1 1 1 −1<br />
This is a “<strong>de</strong>composable” symm<strong>et</strong>ric tensor → rank-1<br />
✇<br />
. ✇<br />
✇<br />
. ✇<br />
✇<br />
. ✇<br />
✇<br />
. ✇<br />
blue bull<strong>et</strong>s = 1, red bull<strong>et</strong>s = −1.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 49
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
From SVD to tensor <strong>de</strong>compositions<br />
Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />
to tensors<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
From SVD to tensor <strong>de</strong>compositions<br />
Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />
to tensors<br />
Keep orthogonality: Orthogonal Tucker, HOSVD<br />
T = (U, V, W) · C<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
From SVD to tensor <strong>de</strong>compositions<br />
Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />
to tensors<br />
Keep orthogonality: Orthogonal Tucker, HOSVD<br />
T = (U, V, W) · C<br />
C is R1 × R2 × R3: multilinear rank = (R1, R2, R3)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
From SVD to tensor <strong>de</strong>compositions<br />
Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />
to tensors<br />
Keep orthogonality: Orthogonal Tucker, HOSVD<br />
T = (U, V, W) · C<br />
C is R1 × R2 × R3: multilinear rank = (R1, R2, R3)<br />
Keep diagonality: Canonical Polyadic <strong>de</strong>composition (CP)<br />
T = (A, B, C) · L (19)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
From SVD to tensor <strong>de</strong>compositions<br />
Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />
to tensors<br />
Keep orthogonality: Orthogonal Tucker, HOSVD<br />
T = (U, V, W) · C<br />
C is R1 × R2 × R3: multilinear rank = (R1, R2, R3)<br />
Keep diagonality: Canonical Polyadic <strong>de</strong>composition (CP)<br />
T = (A, B, C) · L (19)<br />
L is R × R × R diagonal, λi = 0: rank = R.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Canonical Polyadic (CP) <strong>de</strong>composition<br />
Any I × J × · · · × K tensor T can be <strong>de</strong>composed as<br />
T = <br />
➽ “Polyadic form” [Hitchcock’27]<br />
q<br />
λq u (q) ⊗ v (q) ⊗ . . . ⊗ w (q)<br />
(20)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 51
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Canonical Polyadic (CP) <strong>de</strong>composition<br />
Any I × J × · · · × K tensor T can be <strong>de</strong>composed as<br />
T =<br />
R(T )<br />
<br />
➽ “Polyadic form” [Hitchcock’27]<br />
q<br />
λq u (q) ⊗ v (q) ⊗ . . . ⊗ w (q)<br />
The tensor rank of T is the minimal number R(T ) of<br />
“<strong>de</strong>composable” terms such that equality holds.<br />
(20)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 51
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Canonical Polyadic (CP) <strong>de</strong>composition<br />
Any I × J × · · · × K tensor T can be <strong>de</strong>composed as<br />
T =<br />
R(T )<br />
<br />
➽ “Polyadic form” [Hitchcock’27]<br />
q<br />
λq u (q) ⊗ v (q) ⊗ . . . ⊗ w (q)<br />
The tensor rank of T is the minimal number R(T ) of<br />
“<strong>de</strong>composable” terms such that equality holds.<br />
May impose unit norm vectors u (q) , v (q) , . . . w (q)<br />
(20)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 51
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Frank Lauren Hitchcock<br />
(1875-1957)<br />
[Courtesy of L-H.Lim]<br />
Hitchcock<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 52
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Frank Lauren Hitchcock<br />
(1875-1957)<br />
[Courtesy of L-H.Lim]<br />
Hitchcock<br />
Clau<strong>de</strong> Elwood Shannon<br />
(1916-2001)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 52
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Towards a unique terminology<br />
Minimal Polyadic Form [Hitchcock’27]<br />
Canonical <strong>de</strong>composition [Weinstein’84, Carroll’70,<br />
Chiantini-Ciliberto’06, Comon’00, Khoromskij, Tyrtyshnikov]<br />
Parafac [Harshman’70, Sidiropoulos’00]<br />
Optimal computation [Strassen’83]<br />
Minimum-length additive <strong>de</strong>composition (AD) [Iarrobino’96]<br />
Suggestion:<br />
Canonical Polyadic <strong>de</strong>composition (CP) [Comon’08, Grasedyk,<br />
Espig...]<br />
CP does also already stand for Can<strong>de</strong>comp/Parafac [Bro’97,<br />
Kiers’98, tenBerge’04...]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 53
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Richard A. Harshman<br />
(1970)<br />
Psychom<strong>et</strong>rics<br />
J. Douglas Carroll<br />
(1970)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 54
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Uniqueness: Kruskal 1/2<br />
The Kruskal rank of a matrix A is the maximum number kA, such<br />
that any subs<strong>et</strong> of kA columns are linearly in<strong>de</strong>pen<strong>de</strong>nt.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 55
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Uniqueness: Kruskal 2/2<br />
Sufficient condition for uniqueness of CP<br />
[Kruskal’77, Sidiropoulos-Bro’00, Landsberg’09]:<br />
Essential uniqueness is ensured if tensor rank R is below the<br />
so-called Kruskal’s bound:<br />
2R + 2 ≤ kA + kB + kC<br />
(21)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 56
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Uniqueness: Kruskal 2/2<br />
Sufficient condition for uniqueness of CP<br />
[Kruskal’77, Sidiropoulos-Bro’00, Landsberg’09]:<br />
Essential uniqueness is ensured if tensor rank R is below the<br />
so-called Kruskal’s bound:<br />
2R + 2 ≤ kA + kB + kC<br />
or generically, for a tensor of or<strong>de</strong>r d and dimensions Nℓ:<br />
2R ≤<br />
d<br />
min(Nℓ, R) − d + 1<br />
ℓ=1<br />
➽ Bound much smaller than expected rank:<br />
∃ a much b<strong>et</strong>ter bound, in almost sure sense<br />
(21)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 56
Example<br />
T =<br />
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Rank-3 example 1/2<br />
2 = +<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 57<br />
+ 2
Example<br />
T =<br />
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Rank-3 example 1/2<br />
2 = +<br />
= + + 2<br />
blue bull<strong>et</strong>s = 1, red bull<strong>et</strong>s = −1.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 57<br />
+ 2
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Rank-3 example 2/2<br />
Conclusion: the 2 × 2 × 2 tensor<br />
<br />
0<br />
T = 2<br />
1<br />
1<br />
0<br />
1<br />
0<br />
<br />
0<br />
0<br />
admits the CP<br />
T =<br />
1<br />
1<br />
⊗ 3<br />
<br />
−1<br />
+<br />
1<br />
⊗ 3<br />
+ 2<br />
and has rank 3, hence larger than dimension<br />
0<br />
−1<br />
⊗ 3<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 58
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Why need for approximation?<br />
Additive noise in measurements<br />
Noise has a continuous probability distribution<br />
Then tensor rank is generic<br />
Hence there are often infinitely many CP <strong>de</strong>compositions<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 59
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Why need for approximation?<br />
Additive noise in measurements<br />
Noise has a continuous probability distribution<br />
Then tensor rank is generic<br />
Hence there are often infinitely many CP <strong>de</strong>compositions<br />
➽ Approximations aim at g<strong>et</strong>ting rid of noise, and at restoring<br />
uniqueness:<br />
Arg inf ||T −<br />
a(p),b(p),c(p)<br />
r<br />
p=1<br />
a(p) ⊗ b(p) . . . ⊗ c(p)|| 2<br />
But infimum may be reached for tensors of rank > r !<br />
(22)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 59
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Bor<strong>de</strong>r rank<br />
T has bor<strong>de</strong>r rank R iff it is the limit of tensors of rank R, and not<br />
the limit of tensors of lower rank.<br />
[Bini’79, Schönhage’81, Strassen’83, Likteig’85]<br />
rank > R<br />
+<br />
+<br />
+<br />
+<br />
+++<br />
+<br />
x<br />
rank ≤ R<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 60
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Example<br />
L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />
T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Example<br />
L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />
T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />
And <strong>de</strong>fine sequence T ε = 1<br />
ε<br />
(u + ε v) ⊗ 4 − u ⊗ 4 .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Example<br />
L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />
T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />
And <strong>de</strong>fine sequence T ε = 1<br />
ε<br />
Then T ε → T 0 as ε → 0<br />
(u + ε v) ⊗ 4 − u ⊗ 4 .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Example<br />
L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />
T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />
And <strong>de</strong>fine sequence T ε = 1<br />
ε<br />
Then T ε → T 0 as ε → 0<br />
(u + ε v) ⊗ 4 − u ⊗ 4 .<br />
➽ Hence rank{T 0 } = 4, but bor<strong>de</strong>r rank is 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Ill-posedness<br />
Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />
approximating sequence contains several <strong>de</strong>composable tensors<br />
which<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Ill-posedness<br />
Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />
approximating sequence contains several <strong>de</strong>composable tensors<br />
which<br />
tend to infinity and<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Ill-posedness<br />
Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />
approximating sequence contains several <strong>de</strong>composable tensors<br />
which<br />
tend to infinity and<br />
cancel each other, viz, some columns become close to collinear<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Ill-posedness<br />
Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />
approximating sequence contains several <strong>de</strong>composable tensors<br />
which<br />
tend to infinity and<br />
cancel each other, viz, some columns become close to collinear<br />
I<strong>de</strong>as towards a well-posed problem:<br />
Prevent collinearity or bound columns.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Remedies<br />
1 Impose orthogonality of columns within factor matrices<br />
[Com92a]<br />
2 Impose orthogonality b<strong>et</strong>ween <strong>de</strong>composable tensors [Kol01]<br />
3 Prevent ten<strong>de</strong>ncy to infinity by norm constraint on factor<br />
matrices [Paa00]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 63
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Remedies<br />
1 Impose orthogonality of columns within factor matrices<br />
[Com92a]<br />
2 Impose orthogonality b<strong>et</strong>ween <strong>de</strong>composable tensors [Kol01]<br />
3 Prevent ten<strong>de</strong>ncy to infinity by norm constraint on factor<br />
matrices [Paa00]<br />
4 Nonnegative tensors: impose <strong>de</strong>composable tensors to be<br />
nonnegative [LC09] → “nonnegative rank”<br />
5 Impose minimal angle b<strong>et</strong>ween columns of factor matrices<br />
[CL11]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 63
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Tensors as linear operators<br />
Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />
restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Tensors as linear operators<br />
Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />
restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />
Representation with a marix: Flattening or Unfolding matrix<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Tensors as linear operators<br />
Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />
restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />
Representation with a marix: Flattening or Unfolding matrix<br />
Resorts to special matrix products: Kronecker ⊗, and<br />
Khatri-Rao ⊙ matrix products.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Tensors as linear operators<br />
Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />
restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />
Representation with a marix: Flattening or Unfolding matrix<br />
Resorts to special matrix products: Kronecker ⊗, and<br />
Khatri-Rao ⊙ matrix products.<br />
NB: do not confuse tensor ⊗ and Kronecker ⊗ products...<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Matrix products (1/2)<br />
Kronecker product b<strong>et</strong>ween two matrices A and B of dimensions<br />
m × n and p × q respectively, is a matrix of size mp × nq <strong>de</strong>fined as:<br />
⎛<br />
⎞<br />
A ⊗ B <strong>de</strong>f<br />
=<br />
⎜<br />
⎝<br />
A11B A12B · · ·<br />
A21B A22B · · ·<br />
.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 65<br />
.<br />
⎟<br />
⎠
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Matrix products (1/2)<br />
Kronecker product b<strong>et</strong>ween two matrices A and B of dimensions<br />
m × n and p × q respectively, is a matrix of size mp × nq <strong>de</strong>fined as:<br />
⎛<br />
⎞<br />
A ⊗ B <strong>de</strong>f<br />
=<br />
⎜<br />
⎝<br />
A11B A12B · · ·<br />
A21B A22B · · ·<br />
.<br />
Khatri-Rao product b<strong>et</strong>ween two matrices with the same number<br />
of columns:<br />
.<br />
⎟<br />
⎠<br />
A ⊙ B <strong>de</strong>f<br />
= a1 ⊗ b1 a2 ⊗ b2 · · · .<br />
☞ This is a column-wise Kronecker product.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 65
Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />
Matrix products (2/2)<br />
Example: L<strong>et</strong><br />
a ∈ S, space of basis {e(i)} of dim. I ,<br />
b ∈ S ′ , space of basis {e ′ (j)} of dim. J,<br />
c ∈ S ′′ , space of basis {e ′′ (k)} of dim. K<br />
Then tensor a ⊗ b ⊗ c has coordinates given by vector a ⊗ b ⊗ c in<br />
basis {e(i) ⊗ e(j) ′ ⊗ e ′′ (k)}<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 66
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Decorrelation vs In<strong>de</strong>pen<strong>de</strong>nce<br />
Example 1: Mixture of 2 iid <strong>sources</strong><br />
Consi<strong>de</strong>r the mixture of two in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong><br />
<br />
x1 1 1 s1<br />
=<br />
·<br />
1 −1<br />
x2<br />
where E{s 2 i } = 1 and E{si} = 0. Then xi are uncorrelated:<br />
s2<br />
E{x1 x2} = E{s 2 1} − E{s 2 2} = 0<br />
But xi are not in<strong>de</strong>pen<strong>de</strong>nt since, for instance:<br />
E{x 2 1 x 2 2 } − E{x 2 1 }E{x 2 2 } = E{s 4 1} + E{s 4 2} − 6 = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 67
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
PCA vs ICA<br />
Example 2: 2 <strong>sources</strong> and 2 sensors<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 68
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Statistical in<strong>de</strong>pen<strong>de</strong>nce<br />
Definition<br />
Components sk of a K-dimensional r.v. s are mutually in<strong>de</strong>pen<strong>de</strong>nt<br />
The joint pdf equals the product of marginal pdf’s:<br />
⇕<br />
ps(u) = <br />
ps (uk) (23)<br />
k<br />
Definition<br />
Components sk of s are pairwise in<strong>de</strong>pen<strong>de</strong>nt ⇔ Any pair of<br />
components (sk, sℓ) are mutually in<strong>de</strong>pen<strong>de</strong>nt.<br />
k<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 69
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Mutual and Pairwise in<strong>de</strong>pen<strong>de</strong>nce<br />
Darmois’s Theorem (1953)<br />
L<strong>et</strong> two random variables be <strong>de</strong>fined as linear combinations of<br />
in<strong>de</strong>pen<strong>de</strong>nt random variables xi:<br />
X1 =<br />
N<br />
ai xi, X2 =<br />
i=1<br />
N<br />
i=1<br />
bi xi<br />
Then, if X1 and X2 are in<strong>de</strong>pen<strong>de</strong>nt, those xj for which ajbj = 0 are<br />
Gaussian.<br />
Proof<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 70
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Mutual and Pairwise in<strong>de</strong>pen<strong>de</strong>nce (cont’d)<br />
Corollary [Com92a]<br />
If z = C s, where si are in<strong>de</strong>pen<strong>de</strong>nt r.v., with at most one of them<br />
being Gaussian, then the following properties are equivalent:<br />
1 Components zi are pairwise in<strong>de</strong>pen<strong>de</strong>nt<br />
2 Components zi are mutually in<strong>de</strong>pen<strong>de</strong>nt<br />
3 C = Λ P, with Λ diagonal and P permutation<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 71
First c.f.<br />
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Characteristic functions<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{e j tx } = <br />
u ej tu dFx(u)<br />
Real Multivariate: Φx(t) <strong>de</strong>f<br />
= E{e j tT x } = <br />
u ej tT x dFx(u)<br />
Second c.f.<br />
Ψ(t) <strong>de</strong>f<br />
= log Φ(t)<br />
Properties:<br />
Always exists in the neighborhood of 0<br />
Uniquely <strong>de</strong>fined as long as Φ(t) = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 72
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Moments:<br />
Cumulants:<br />
Definition of Multivariate cumulants<br />
<strong>de</strong>f<br />
Cij..k<br />
µ ′ <strong>de</strong>f<br />
ij..k<br />
= E{xixj..xk}<br />
= (−j)<br />
<br />
r times<br />
r<br />
= Cum{xi, xj.., xk}<br />
= (−j)<br />
<br />
r times<br />
r<br />
∂r <br />
Φ(t) <br />
<br />
∂ti∂ti..∂tk<br />
∂r <br />
Ψ(t) <br />
<br />
∂ti∂ti..∂tk<br />
t=0<br />
t=0<br />
NB: Leonov Shiryayev relationship b<strong>et</strong>ween Moments and<br />
Cumulants obtained by expanding both si<strong>de</strong>s in Taylor series:<br />
Log Φx(t) = Ψx(t)<br />
(24)<br />
(25)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 73
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Definition of Multivariate cumulants (cont’d)<br />
First cumulants:<br />
µ ′ i = Ci<br />
µ ′ ij = Cij + CiCj<br />
µ ′ ijk = Cijk + [3] CiCjk + CiCjCk<br />
with [n]: Mccullagh’s brack<strong>et</strong> notation.<br />
Next, for zero-mean variables:<br />
µijkℓ = Cijkℓ + [3] CijCkℓ<br />
µijkℓm = Cijkℓm + [10] CijCkℓm<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 74
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Definition of Complex Cumulants<br />
Definition<br />
L<strong>et</strong> z = x + j y. Then pdf pz = joint pdf px,y<br />
Notation<br />
Characteristic function:<br />
Φz(w) = E{exp[j(x T u + y T v)]} = E{exp[jℜ(z H w)]}<br />
where w <strong>de</strong>f<br />
= u + jv.<br />
Generates Moments & Cumulants, e.g.:<br />
j<br />
Variance: Var{z}ij = C<br />
z i<br />
Higher or<strong>de</strong>rs: Cum{zi, . . . , zj, z∗ k , . . . , z∗ ℓ } = C z<br />
where conjugated r.v. are labeled in superscript.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 75<br />
kℓ<br />
ij
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Properties of Cumulants<br />
Multi-linearity (also enjoyed by moments):<br />
Cum{αX , Y , .., Z} = α Cum{X , Y , .., Z} (26)<br />
Cum{X1 + X2, Y , .., Z} = Cum{X1, Y , .., Z} + Cum{X2, Y , .., Z}<br />
Cancellation: If {Xi} can be partitioned into 2 groups of<br />
in<strong>de</strong>pen<strong>de</strong>nt r.v., then<br />
Cum{X1, X2, .., Xr } = 0 (27)<br />
Additivity: If X and Y are in<strong>de</strong>pen<strong>de</strong>nt, then<br />
Cum{X1 + Y1, X2 + Y2, .., Xr + Yr } = Cum{X1, X2, .., Xr }<br />
+ Cum{Y1, Y2, .., Yr }<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 76
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Mutual Information: <strong>de</strong>finition<br />
According to the <strong>de</strong>finition of page 95, one should measure a<br />
divergence:<br />
<br />
N<br />
δ px,<br />
<br />
i=1<br />
If the Kullback divergence is used:<br />
K(px, py ) <strong>de</strong>f<br />
<br />
= px(u) log px(u)<br />
py (u) du,<br />
then we g<strong>et</strong> the Mutual Information as an in<strong>de</strong>pen<strong>de</strong>nce<br />
measure:<br />
<br />
px(u)<br />
I (px) = px(u) log du. (28)<br />
N<br />
pxi i=1 (ui)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 77<br />
pxi
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
MI always positive<br />
Mutual Information: properties<br />
Cancels if r.v. are mutually in<strong>de</strong>pen<strong>de</strong>nt<br />
MI is invariant by scale change<br />
Proof...<br />
Example 3: Gaussian case<br />
I (gx) = 1<br />
2 log<br />
<br />
Vii<br />
<strong>de</strong>t V<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 78
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Mutual Information: <strong>de</strong>composition<br />
Define the Negentropy as the divergence:<br />
<br />
J(px) = K(px, gx) = px(u) log px(u)<br />
du.<br />
gx(u)<br />
(29)<br />
Negentropy is invariant by invertible transforms<br />
Then MI can be <strong>de</strong>composed into:<br />
I (px) = I (gx) + J(px) − <br />
J(p)<br />
<br />
i J(pi)<br />
I (g) ✟<br />
❅ ✘✘✘✿<br />
✘✘✘<br />
❅ ✘✘✘<br />
❅❘✘ ✘✘✘ I (p)<br />
✟✟✟✟✟✟✯<br />
i<br />
J(pxi ). (30)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 79
px(u)<br />
gx(u)<br />
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Edgeworth expansion of a pdf (2)<br />
Francis Edgeworth (1845-1926).<br />
= 1 + 1<br />
3! κ3 h3(v) + 1<br />
4! κ4 h4(v) + 10<br />
6! κ2 3 h6(v)<br />
+ 1<br />
5! κ5 h5(v) + 35<br />
7! κ3κ4 h7(v) + 280<br />
9! κ3 3 h9(v) + . . .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 80
Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />
Edgeworth expansion of the MI<br />
For standardized random variables x, the approximation [Com92a]:<br />
I (px) = J(px)− 1<br />
48<br />
<br />
i<br />
2 2 4 2<br />
4 C + Ciiii + 7 Ciii − 6 Ciii Ciiii +o(m<br />
iii<br />
−2 ).<br />
If 3rd or<strong>de</strong>r = 0, then I (px) ≈ J(px) − 1 <br />
12<br />
If 3rd or<strong>de</strong>r ≈ 0, then I (px) ≈ J(px) − 1<br />
48<br />
i C 2<br />
iii<br />
2<br />
<br />
i C iiii<br />
(31)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 81
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Optimization Criteria<br />
Cumulant matching<br />
Contrast criteria<br />
Mutual Information<br />
Likelihood<br />
CoM family (after prewhitening)<br />
STO, JAD (after prewhitening)<br />
Entropy (after NL transform)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 82
Principle<br />
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
I<strong>de</strong>ntification by Cumulant matching<br />
Estimate the mixture by solving the I/O Multi-linear equations<br />
Apply a separating filter based on the latter estimate<br />
s x<br />
✲ A ✲<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 83
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Noiseless mixture of 2 <strong>sources</strong><br />
Example 4: 2 × 2 by Cumulant matching (cf. <strong>de</strong>mo PCA-ICA)<br />
After standardization, the mixture takes the form<br />
<br />
cos α<br />
x =<br />
sin α e<br />
− sin α ejϕ −jϕ cos α<br />
<br />
s (32)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 84
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Noiseless mixture of 2 <strong>sources</strong><br />
Example 5: 2 × 2 by Cumulant matching (cf. <strong>de</strong>mo PCA-ICA)<br />
After standardization, the mixture takes the form<br />
<br />
cos α<br />
x =<br />
sin α e<br />
− sin α ejϕ −jϕ cos α<br />
<br />
s (32)<br />
Denote γ kℓ<br />
ij = Cum{xi, xj, x ∗ k , x ∗ ℓ } and κi = Cum{si, si, s ∗ i , s∗ i }.<br />
Then by Multi-linearity:<br />
γ 12<br />
12 = cos 2 α sin 2 α (κ1 + κ2)<br />
γ 12<br />
11 = cos 3 α sin α e jϕ κ1 − cos α sin 3 α e jϕ κ2<br />
γ 22<br />
12 = cos α sin 3 α e jϕ κ1 − cos 3 α sin α e jϕ κ2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 84
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Noiseless mixture of 2 <strong>sources</strong><br />
Example 6: 2 × 2 by Cumulant matching (cf. <strong>de</strong>mo PCA-ICA)<br />
After standardization, the mixture takes the form<br />
<br />
cos α<br />
x =<br />
sin α e<br />
− sin α ejϕ −jϕ cos α<br />
<br />
s (32)<br />
Denote γ kℓ<br />
ij = Cum{xi, xj, x ∗ k , x ∗ ℓ } and κi = Cum{si, si, s ∗ i , s∗ i }.<br />
Then by Multi-linearity:<br />
γ 12<br />
12 = cos 2 α sin 2 α (κ1 + κ2)<br />
γ 12<br />
11 = cos 3 α sin α e jϕ κ1 − cos α sin 3 α e jϕ κ2<br />
γ 22<br />
12 = cos α sin 3 α e jϕ κ1 − cos 3 α sin α e jϕ κ2<br />
Compact solution: γ22<br />
12−γ12 11<br />
γ12 12<br />
= −2 cot 2α e jϕ<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 84
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Now the inverse approach<br />
Cumulant matching (direct approach: i<strong>de</strong>ntification)<br />
Contrast Criteria (inverse approach: equalization):<br />
s x<br />
z<br />
✲ A ✲ B<br />
x ✲ ✲<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 85
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Noisy Mixtures of 2 <strong>sources</strong><br />
Example 7: Separation of 2 non Gaussian <strong>sources</strong> by<br />
contrast maximization<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 86
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Source additional hypotheses<br />
H1. Each <strong>sources</strong> sj[k] is an i.i.d. sequence, for any fixed j<br />
H2. Sources sj are mutually statistically in<strong>de</strong>pen<strong>de</strong>nt<br />
H3. At most one source is Gaussian<br />
H4. At most one source has a null marginal cumulant<br />
H5. Sources are Discr<strong>et</strong>e, and belong to some known alphab<strong>et</strong><br />
(but may be stat. <strong>de</strong>pen<strong>de</strong>nt)<br />
H6. Sources sj[k] are sufficiently exciting<br />
H7. Sources are colored, and the s<strong>et</strong> of source spectra forms a<br />
family of linearly in<strong>de</strong>pen<strong>de</strong>nt functions<br />
H8. Sources are non stationary, and have different time<br />
profiles<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 87
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Trivial Filters<br />
They account for Inherent in<strong>de</strong>terminacies, remaining after<br />
assuming Source additional hypotheses<br />
For instance:<br />
For dynamic (convolutive) mixtures, un<strong>de</strong>r H1, H2, H3,<br />
ˇT[z] = P ˇD[z], where P is a permutation, and ˇD[z] a diagonal<br />
filter, with entries of the form ˇDpp[z] = λp z δp , where δp is an<br />
integer.<br />
For static mixtures, un<strong>de</strong>r H2, H3, T = PD, where P<br />
permutation and D diagonal invertible.<br />
In other words, if s satisfies Hi, then so does Ts<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 88
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Contrast criteria: <strong>de</strong>finition<br />
Axiomatic <strong>de</strong>finition<br />
A Contrast optimization criterion Υ should enjoy 3 properties:<br />
Invariance: Υ should not change un<strong>de</strong>r the action of trivial<br />
filters (cf. <strong>de</strong>finition sli<strong>de</strong> 116)<br />
Domination: If <strong>sources</strong> are already separated, any filter should<br />
<strong>de</strong>crease (or leave unchanged) Υ<br />
Discrimination: The maximum achievable value should be<br />
reached only when <strong>sources</strong> are separated (i.e. all absolute<br />
maxima are related to each other by trivial filters)<br />
To summarize:<br />
∀Q, Υ(Q) ≤ Υ(I), with equality iff Q trivial (33)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 89
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Examples of contrasts (1/2)<br />
Minimize Mutual Information I (z)<br />
Maximize Noiseless Likelihood ΥML = −I (z) − <br />
i K(zi, si), if<br />
are known<br />
psi<br />
Maximize Entropy H(z), if H(zi) are kept constant<br />
❅<br />
❅❅<br />
−ΥML<br />
❅<br />
✏ ❅❘<br />
✏✏✏✏✏✏✏✏✏✏✏✏✏✶<br />
✛<br />
✓<br />
✓<br />
✓✓✓✼<br />
H(z)<br />
<br />
i H(zi)<br />
<br />
i Ki<br />
I (z)<br />
✲<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 90
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Examples of contrasts (2/2)<br />
Contrasts based on tensor slices<br />
<br />
<strong>de</strong>f<br />
= =<br />
Υ (r) <strong>de</strong>f<br />
CoM<br />
i<br />
|Ciiii| r , Υ STO<br />
➽ Υ (2)<br />
CoM ≤ Υ STO ≤ Υ JAD shows that Υ CoM<br />
Example for 4 × 4 × 4 tensors<br />
ij<br />
|Ciiij| 2 <strong>de</strong>f<br />
, ΥJAD = <br />
ijk<br />
is more sensitive<br />
Matrix slices diagonalization = Tensor diagonalization<br />
|Ciijk| 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 91
Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />
Optimization criteria<br />
Bibliographical notes<br />
Expansion of the MI in terms of cumulants [Com92a] [Com94a]<br />
Contrasts: CoM2 [Com92a] [Com94a], JADE [CS93], STO<br />
[LMV01]<br />
Link b<strong>et</strong>ween Likelihood and Infomax [Car99] [Car97]<br />
Likelihood, Estimating equations: [CJ10, ch.4]<br />
Pairwise sweeping in tensors: [Com89b] [Com92a] [CS93]<br />
[Com94a] [Com94b] [MvL08]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 92
Introduction Colored Nonstationary GEVD Discussion<br />
III. Second Or<strong>de</strong>r M<strong>et</strong>hods<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 93
Introduction Colored Nonstationary GEVD Discussion<br />
7 Introduction<br />
8 Colored <strong>sources</strong><br />
9 Nonstationary <strong>sources</strong><br />
10 Related approaches<br />
11 Discussion<br />
Plan du cours III<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 94
Introduction Colored Nonstationary GEVD Discussion<br />
Decorrelation ?<br />
In<strong>de</strong>pen<strong>de</strong>nce or <strong>de</strong>correlation?<br />
In<strong>de</strong>pen<strong>de</strong>nce ⇒ <strong>de</strong>correlation, but reverse is wrong<br />
However, <strong>de</strong>correlation is a first step to in<strong>de</strong>pen<strong>de</strong>nce, hence<br />
2-step approaches with whitening or sphering<br />
Algebraic point of view<br />
For 2 linear mixtures of 2 <strong>sources</strong> (assumed zero mean, iid)<br />
B has 4 unknown param<strong>et</strong>ers,<br />
using SOS, one has 3 equations related to E(Y 2 1 ), E(Y 2 2 ) and<br />
E(Y1Y2)<br />
For P linear mixtures of P <strong>sources</strong> (assumed zero mean, iid)<br />
B has P2 unknown param<strong>et</strong>ers,<br />
using SOS, one has P + P(P − 1)/2 = P(P + 1)/2 equations<br />
related to E(Y 2 1 ), . . . E(Y 2 P ) and<br />
E(Y1Y2), E(Y1Y3) . . . , E(YPYP−1)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 95
Introduction Colored Nonstationary GEVD Discussion<br />
Assumptions<br />
Zeroing the <strong>de</strong>correlation<br />
For 2 linear mixtures of two <strong>sources</strong><br />
ajj = 1 and enforcing bjj = 1<br />
zero-mean <strong>sources</strong> with variance σ 2 i<br />
Decorrelation equation<br />
In the plane (b12, b21), solution to E[Y1Y2] = 0 is a s<strong>et</strong> of<br />
hyperboles<br />
<br />
b12 a2 21 +<br />
b21 =<br />
σ2 2<br />
σ2 σ<br />
+a21+a12<br />
1<br />
2 2<br />
σ2 1<br />
σ<br />
b12 a21+a12<br />
2 2<br />
σ2 <br />
+1+a2 σ<br />
21<br />
1<br />
2 2<br />
σ2 1<br />
Each hyperbole <strong>de</strong>pends on the mixture coefficients and on the<br />
variance ratio.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 96
Introduction Colored Nonstationary GEVD Discussion<br />
Decorrelation hyperboles<br />
Zeroing the correlation<br />
Hyperboles are intersected in points related to mixing matrix<br />
coefficients<br />
<br />
1<br />
A =<br />
0.7<br />
<br />
0.4<br />
1<br />
(σ2/σ1) 2 = 1 in blue dots, (σ2/σ1) 2 = 2 in red +.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 97
Introduction Colored Nonstationary GEVD Discussion<br />
Gaussian non iid <strong>sources</strong><br />
Second or<strong>de</strong>r separation<br />
Colored <strong>sources</strong>: AMUSE, Tong <strong>et</strong> al.[TSLH90]; SOBI,<br />
Belouchrani <strong>et</strong> al. [BAM93]<br />
Nonstationary <strong>sources</strong>: Matsuoka <strong>et</strong> al. [MOK95]; Pham and<br />
Cardoso [PC01]<br />
Related approaches<br />
GEVD problem<br />
Common Spatial Pattern<br />
Periodic Component Analysis [TSS + 09, SJS10]<br />
Discussion<br />
Joint diagonalization<br />
Criteria and practical issues<br />
Conclusions<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 98
Introduction Colored Nonstationary GEVD Discussion<br />
For real linear mixtures<br />
A 2-step approach<br />
B can be factorized in 2 matrices:<br />
a whitening (or sphering) matrix W<br />
an orthogonal matrix U<br />
W is computed so that E[ZZ T ] = I, i.e.<br />
E[WAS(WAS) T ] = WAE[SS T ](WA) T = WA(WA) T = I<br />
It means that WA is an orthogonal matrix, consequently U<br />
must be an orthogonal matrix too.<br />
W is a symm<strong>et</strong>ric matrix, and its estimation requires<br />
P(P + 1)/2 param<strong>et</strong>ers<br />
U is associated to P(P − 1)/2 plane (Givens) rotations<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 99
Introduction Colored Nonstationary GEVD Discussion<br />
Decorrelation not enough for iid Gaussian<br />
After whitening<br />
samples z(t) are spatially non correlated: E[ZZ T ] = I<br />
for Gaussian iid <strong>sources</strong>, after whitening, <strong>sources</strong> are<br />
statistically in<strong>de</strong>pen<strong>de</strong>nt: there is no way for estimating U !<br />
Using mutual information<br />
Remember the estimating equations: E[ϕyi (yi)yj] = 0, ∀i = j,<br />
where ϕyi is the score function<br />
for Gaussian (iid or not) <strong>sources</strong> with unit variance, ϕyi = +yi<br />
Consequently, for Gaussian iid variables, MI minimisation is<br />
equivalent to <strong>de</strong>correlation !<br />
Short visual <strong>de</strong>mo<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 100
Introduction Colored Nonstationary GEVD Discussion<br />
I<strong>de</strong>a<br />
Basic i<strong>de</strong>a and early works<br />
Computing a transform B which <strong>de</strong>correlates simultaneously<br />
yi(t) and yj(t − τ), for various values of τ,<br />
Equivalent to cancel simultaneously E[yi(t)yj(t − τ)], ∀i = j,<br />
or to diagonalize simultaneously E[y(t)y T (t − τ)], for at least<br />
two values of τ.<br />
Early works<br />
Féty, in his PhD (1988) [F´88]<br />
AMUSE, proposed by Tong <strong>et</strong> al. in 1990 [TSLH90]<br />
SOBI, proposed by Belouchrani <strong>et</strong> al. in 1993 [BAM93]<br />
Molge<strong>de</strong>y and Schuster in 1994 [MS94]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 101
Introduction Colored Nonstationary GEVD Discussion<br />
Mo<strong>de</strong>l<br />
Notations and assumptions<br />
Linear instantaneous mixture: x(n) = As(n), n = 1, . . . , N<br />
Assumptions<br />
Sources sp(.), p = 1, . . . , P, are zero-mean, real, second or<strong>de</strong>r<br />
stationay processes; they are spatially <strong>de</strong>correlated<br />
A is a real valued, square and regular matrix<br />
Notations<br />
Covariance matrices of <strong>sources</strong> s: Rs (τ) = E[s(t)s T (t − τ)]<br />
Due to source spatial <strong>de</strong>correlation, Rs (τ) are real diagonal<br />
matrices. We assume Rs (0) = I.<br />
Covariance matrices of observations x:<br />
Rx (τ) = E[x(t)x T (t − τ)]<br />
Since Rx (τ) = ARs (τ)A T , they can be simultaneously<br />
diagonalized<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 102
Introduction Colored Nonstationary GEVD Discussion<br />
I<strong>de</strong>ntifiability: first step<br />
First step: diagonalization of Rx (0)<br />
Diagonalization of Rx (0) = Whitening of x<br />
B0Rx (0)B T 0 = I<br />
B0(ARs (0)A T )B T 0 = I<br />
(B0A)Rs (0)(B0A) T = I<br />
Since Rs (0) = I, we <strong>de</strong>duce B0A = UT 0 is orthogonal<br />
B0 provi<strong>de</strong>s a source estimation s0(.) = B0x(.) = UT 0 s(.)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 103
Introduction Colored Nonstationary GEVD Discussion<br />
I<strong>de</strong>ntifiability: second step<br />
Second step: diagonalization of Rs 0 (1)<br />
One computes the orthogonal matrix U1, which diagonalizes<br />
Rs 0 (1). Rs 0 (1) = UT 1 D1U1<br />
= B0Rx (1)B T 0 = U T 0 Rs (1)U0<br />
It is clearly equivalent to diagonalize Rx (1)<br />
B1 = U1B0 then jointly diagonalizes Rx (0) and Rx (1):<br />
B1Rx (0)BT 1 = I and B1Rx (1)BT 1 = D1<br />
The new source estimation s1(.) = B1x(.) = U1UT 0 s(.) is a<br />
source separation solution if Rs (1) has distinct diagonal entries<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 104
Introduction Colored Nonstationary GEVD Discussion<br />
If Rs(1) has equal diagonal entries<br />
Case of 2 mixtures of 2 <strong>sources</strong><br />
Since Rs (τ) = ρ(τ)I, it is invariant by any orthogonal<br />
transform U. In fact:<br />
Hence:<br />
URs (τ)U T = ρ(τ)UU T = ρ(τ)I<br />
x(.) = As(.) = [AU T ][Us(.)]<br />
Rx (.) = ARs (.)A T = [AU T ]Rs (.)[AU T ] T<br />
It is thus impossible to distinguish As(.) and [AU T ][Us(.)]<br />
In other words, A can be estimated, only up to any orthogonal<br />
matrix !<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 105
Introduction Colored Nonstationary GEVD Discussion<br />
General case<br />
Solution<br />
If Rs(1) has equal diagonal entries<br />
If two entries of Rs (1) are equal, say ρi(τ) = ρj(τ), it is<br />
impossible to separate the two <strong>sources</strong> si and sj, due to the<br />
orthogonal in<strong>de</strong>terminacy, i.e. A is only i<strong>de</strong>ntifiable up to a<br />
plane (Givens) rotation in the (si, sj) plane<br />
All the other <strong>sources</strong>, with different entries, can be separated<br />
after the second step<br />
For separating the <strong>sources</strong> with same entries in Rs (1), we<br />
continue with a third step, similar to the second one, i.e. one<br />
computes an orthogonal matrix U2 which diagonalizes Rx (2),<br />
B2 = U2U1B0 then jointly diagonalizes Rx (0), Rx (1) and<br />
Rx (2):<br />
B2Rx (0)BT 2 = I, and B2Rx (1)BT 2 = D1 and<br />
B2Rx (2)BT 2 = D2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 106
Introduction Colored Nonstationary GEVD Discussion<br />
Theorem<br />
I<strong>de</strong>ntifiability theorem<br />
The mixing matrix A is i<strong>de</strong>ntifiable from second or<strong>de</strong>r statistics<br />
(up to scale and permutation in<strong>de</strong>terminacies) iff the<br />
correlation sequences of all <strong>sources</strong> are pairwise linearly<br />
in<strong>de</strong>pen<strong>de</strong>nt, i.e. if (ρi(1), . . . , ρi(K)) = (ρj(1), . . . , ρj(K)),<br />
∀i = j<br />
In frequency domain<br />
Due to uniqueness of Fourier transform, the i<strong>de</strong>ntifiability<br />
condition in the time domain can be transposed in the<br />
frequency domain:<br />
The mixing matrix A is i<strong>de</strong>ntifiable from second or<strong>de</strong>r statistics<br />
(up to scale and permutation in<strong>de</strong>terminacies) iff the <strong>sources</strong><br />
have distinct spectra i.e. pairwise linearly in<strong>de</strong>pen<strong>de</strong>nt spectra.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 107
Introduction Colored Nonstationary GEVD Discussion<br />
Consequences<br />
Since the separation is achieved using second or<strong>de</strong>r statistics<br />
(SOS), Gaussian <strong>sources</strong> can be separated<br />
Since we just use SOS, maximum likelihood approaches can be<br />
<strong>de</strong>velopped assuming Gaussian <strong>de</strong>nsities.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 108
Introduction Colored Nonstationary GEVD Discussion<br />
Notations and assumptions<br />
x(t) = As(t) + v(t)<br />
Noisy mixtures<br />
Sources zero-mean, real second or<strong>de</strong>r stationary processes;<br />
spatially <strong>de</strong>correlated<br />
A is K × P full rank matrix, with rank equal to K ≥ P<br />
v real, zero-mean, second or<strong>de</strong>r stationary process, assumed<br />
uncorrelated of the <strong>sources</strong>, and spatially as well as temporally<br />
white<br />
Equations<br />
Rx (0) = AA T + σ 2 v I and Rx (τ) = ARs (τ)A T , ∀τ ≥ 1<br />
The condition of different ρi(τ) is no longer sufficient for<br />
i<strong>de</strong>ntifying A<br />
In fact, the extra param<strong>et</strong>er σ2 v must be known or i<strong>de</strong>ntified<br />
It can be done iff AAT is i<strong>de</strong>ntifiable<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 109
Introduction Colored Nonstationary GEVD Discussion<br />
Example of non i<strong>de</strong>ntifiable noisy mixtures<br />
Notations and assumptions<br />
x(t) = As(t) + v(t)<br />
In addition to previous assumptions, <strong>sources</strong> are first or<strong>de</strong>r<br />
MA, so that Rs (τ) = 0 for τ ≥ 2.<br />
Consequently, one only has 2 equations:<br />
Rx (0) = AA T + σ 2 v I and Rx (1) = ARs (1)A T<br />
Since Rx (0) and Rx (1) are symm<strong>et</strong>ric P × P, there are<br />
P + P(P − 1)/2 = P(P + 1)/2 equations for each matrix<br />
equation, i.e. a total number of P(P + 1) equations<br />
But the number of unknown is P(P + 1) + 1 : P 2 (matrix A),<br />
P (entries of Rs (1)), 1 for σ 2 v !<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 110
Introduction Colored Nonstationary GEVD Discussion<br />
Non stationary <strong>sources</strong><br />
Covariance matrices on two time windows<br />
Assume x(t) = As(t), with the same assumptions as above<br />
In addition, we assume that <strong>sources</strong> are nonstationary, with<br />
pairwise variance ratio (σi/σj) 2 (t) = 1<br />
On two time windows with different variance ratios, one g<strong>et</strong><br />
two covariance matrices, R1(0) and R2(0), which are jointly<br />
diagonalizable<br />
(σ2/σ1) 2 = 1 in blue dots, (σ2/σ1) 2 = 2 in red +.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 111
Introduction Colored Nonstationary GEVD Discussion<br />
Non stationary <strong>sources</strong> (con’t)<br />
Pham and Cardoso approach [PC01]<br />
Time is divi<strong>de</strong>d into p time blocks, Wi with length Ti,<br />
Covariance matrix on each window is estimated using<br />
ˆRx ,i = 1<br />
<br />
x(t)x T (t), ∀i = 1 . . . , p<br />
Ti<br />
t∈Wi<br />
Assuming Gaussian <strong>sources</strong> (SOS m<strong>et</strong>hod), Pham and Cardoso<br />
[PC01] prove that ML estimate of the separating matrix B, for<br />
unknown source variances, is provi<strong>de</strong>d by joint diagonalization<br />
of the estimated matrices ˆRx ,i.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 112
Introduction Colored Nonstationary GEVD Discussion<br />
General Eigen Value Decomposition (GEVD) problem<br />
GEVD problem<br />
Observation: x(t) = As(t) + n(t) = x s(t) + x n(t), where x s<br />
and x s are <strong>de</strong>sired and un<strong>de</strong>sired signals, resp.<br />
Objective: estimate b such that y = b T x has <strong>de</strong>sired property<br />
Maximizing the power according to 2 conditions can be<br />
associated to the joint diagonalization of two matrices M1 and<br />
M2, called GEVD problem<br />
Using Rayleigh-Ritz theorem, maximizing the above ratio is<br />
equivalent to solve the GEVD of (M1, M2), i.e.<br />
max<br />
B<br />
B T M1B<br />
B T M2B ⇔ B such that M1B = M2BD<br />
where D is the eigenvalue diagonal matrix<br />
The m<strong>et</strong>hod is used for extracting <strong>sources</strong> with given properties<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 113
Introduction Colored Nonstationary GEVD Discussion<br />
General Eigen Value Decomposition (GEVD) problem<br />
(con’t)<br />
SNR maximizer<br />
E[y<br />
max SNR(b) = max<br />
b b<br />
2 s ]<br />
E[y 2 b<br />
= max<br />
n ] b<br />
T Rx s b<br />
b T Rx nb ⇔ b = GEVD(Rx s , Rx n)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 114
Introduction Colored Nonstationary GEVD Discussion<br />
General Eigen Value Decomposition (GEVD) problem<br />
(con’t)<br />
Periodicity maximizer [TSS + 09, SJS10]<br />
min<br />
b<br />
Et[(y(t + τ) − y(t)) 2 ]<br />
E[y(t) 2 ]<br />
⇔ b = GEVD(Rx (τ), Rx (0))<br />
Application for ECG extraction: component are sorted by<br />
similarity to ECG (R peaks)<br />
Mixtures Est. <strong>sources</strong> ICA (JADE) Est. <strong>sources</strong> PiCA<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 115
Introduction Colored Nonstationary GEVD Discussion<br />
General Eigen Value Decomposition (GEVD) problem<br />
(con’t)<br />
Spectral-contrast maximizer<br />
max<br />
b<br />
Ef ∈BP[|Y(f )| 2 ]<br />
Ef [|Y(f )| 2 ]<br />
= max<br />
b<br />
b T Sx ,BPb<br />
b T Sx b ⇔ b = GEVD(Sx ,BP, Sx )<br />
Application for extraction of signal with prior that power in<br />
important in a given frequency band, BP. Similar i<strong>de</strong>a used for<br />
atrial fibrillation extraction [PZL10, LISC11]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 116
Introduction Colored Nonstationary GEVD Discussion<br />
General Eigen Value Decomposition (GEVD) problem<br />
(con’t)<br />
Nonstationary maximizer<br />
max Et∈W1[y 2 (t)]<br />
Et∈W2 [y 2 b<br />
= max<br />
(t)] b<br />
T Rx ,W1(0)b<br />
b T Rx ,W2 (0)b ⇔ b = GEVD(Rx ,W1(0), Rx ,W2(0))<br />
Common spatial pattern<br />
CSP has been introduced by K. Fukunaga and W. L. G.<br />
Koontz (Application of the K-L expansion to feature selection<br />
and or<strong>de</strong>ring, IEEE Trans. on Computers, , no. 4, pp.<br />
311-318, 1970),<br />
CSP aims at maximizing the ratio of variance b<strong>et</strong>ween two<br />
classes, associated to windows Wi, i = 1, 2<br />
CSP is then nothing but a “nonstationarity maximizer”<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 117
Introduction Colored Nonstationary GEVD Discussion<br />
Common Spatial Pattern: Example<br />
Epileptic focus localization, Samadi <strong>et</strong> al. [SASZJ11]<br />
Data : s<strong>et</strong> of intracranial EEG for epileptic patients<br />
Objectives: estimate the spatial filter B which provi<strong>de</strong>s iEEG<br />
<strong>sources</strong> the most strongly related to interictal discharges (IED)<br />
The two classes are then samples in either “IED” (class 1) or<br />
“NON IED” (Class 2) time intervals<br />
CSP provi<strong>de</strong>s <strong>sources</strong>, sorted in <strong>de</strong>creasing or<strong>de</strong>r, according to<br />
the power in class 1 condition<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 118
Introduction Colored Nonstationary GEVD Discussion<br />
Joint diagonalization: exact or approximate ?<br />
Exact joint diagonalization (EJD)<br />
EJD of two sym<strong>et</strong>ric or positive-<strong>de</strong>finite matrices exists<br />
For more than 2 matrices, with exact matrices, EJD still exists<br />
However, in practice, due to matrix estimation errors:<br />
EJD does not exist for more than 2 matrices<br />
EJD of two matrices sensitive to matrix estimation errors<br />
Approximate joint diagonalization (AJD)<br />
AJD of a s<strong>et</strong> of matrices more robust than EJD of 2 matrices<br />
How to choose the s<strong>et</strong> of matrices (<strong>de</strong>lays τ1, . . . τK ) for the<br />
best estimation ?<br />
AJD is a usual problem in data analysis ⇒ huge number of<br />
algorithms, with different structures (with or without<br />
whitening), selection of optimal <strong>de</strong>lays τi, criteria, <strong>et</strong>c.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 119
Introduction Colored Nonstationary GEVD Discussion<br />
Criteria<br />
Criteria for approximate joint diagonalization<br />
Algorithms aim at jointly diagonalizing a s<strong>et</strong> of p matrices Ri,<br />
i.e. estimating U such that<br />
URiU T = Di, ∀i = 1, . . . p<br />
Practical criteria aim at minimizing the square sum of off<br />
diagonal entries of the matrices URiU T , ∀i = 1, . . . p.<br />
Criteria usually write like:<br />
C(R1, . . . , Rp; U) =<br />
p <br />
i=1 k=l<br />
[URiU T ] 2 kl =<br />
p<br />
offdiag(URiU T ) 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 120<br />
i=1
Introduction Colored Nonstationary GEVD Discussion<br />
In frequency domain<br />
Joint diagonalization: practical issues<br />
Since Fourier transform preserves linearity:<br />
F{x(t)} = F{As(t)} = AF{s(t)}<br />
Practically, with short term Fourier transform on a time<br />
window Wi:<br />
FWi {x(t), Ti} = FWi {As(t), Ti} = AFWi {s(t), Ti}<br />
In the frequency domain, it<br />
is then possible to jointly<br />
<br />
diagonalizing Sx (ν) = E[ XWi (ν)XT Wi (ν)<br />
<br />
<br />
]<br />
For both colored and non stationary <strong>sources</strong><br />
One can jointly diagonalize a double s<strong>et</strong> of matrices:<br />
a s<strong>et</strong> {Rx (τ), τ = 0, . . . , K1}, using coloration property,<br />
a s<strong>et</strong> {Rx ,Wi (0), i = 0, . . . , K2}, using nonstationarity.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 121
Introduction Colored Nonstationary GEVD Discussion<br />
Conclusions on second or<strong>de</strong>r m<strong>et</strong>hod<br />
Can be used for Gaussian non iid <strong>sources</strong><br />
Non iid means colored <strong>sources</strong> or nonstationary <strong>sources</strong><br />
Gaussian <strong>sources</strong> can be separated (contrary to HOS-based<br />
m<strong>et</strong>hods), but Gaussianity is not required<br />
Algorithms based either on ML using Gaussian assumptions, or<br />
on joint diagonalization algorithms<br />
Huge number of simple and fast algorithms for Approximate<br />
Joint Diagonalisation<br />
Joint diagonalization can be applied on matrices computed in<br />
the time or in the frequency domains, mixing coloration and<br />
non stationarity properties<br />
Other priors can be used for computing spatial filtering leading<br />
to source separation, and estimated using SOS through GEVD<br />
problem (e.g. Common Spatial Pattern, Periodic Component<br />
Analysis)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 122
Intro Dim2 Orthogonal Deflation Discussion<br />
IV. Algorithms with prewhitening<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 123
Intro Dim2 Orthogonal Deflation Discussion<br />
Contents of course IV<br />
12 Introduction<br />
13 Problem in dimension 2<br />
14 Orthogonal transform<br />
Pair sweeping<br />
CoM<br />
JAD<br />
STO<br />
Limits<br />
Other orthogonal approaches<br />
15 Algorithms based on Deflation<br />
Intro<br />
Algos<br />
FastICA<br />
RobustICA<br />
SAUD<br />
16 Discussion, Bibliography<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 124
Intro Dim2 Orthogonal Deflation Discussion<br />
Algorithms<br />
Batch (off-line):<br />
algebraic (eg. in dim 2)<br />
semi-algebraic (eg. pair sweeping)<br />
iterative (eg. <strong>de</strong>scent of or<strong>de</strong>r r)<br />
Adaptive (on-line):<br />
stochastic versions of the above, and in particular<br />
or<strong>de</strong>r 0 (eg. stochastic iteration)<br />
or<strong>de</strong>r 1 (eg. stochastic gradient)<br />
or<strong>de</strong>r 2 (eg. stochastic Newton)<br />
Non stationarities: block- vs sample-wise updates<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 125
Intro Dim2 Orthogonal Deflation Discussion<br />
Spatial whitening<br />
Batch algorithms: already seen in sli<strong>de</strong>s QR-PCA<br />
Adaptive algorithms:<br />
Adaptive QR [Com89a]<br />
Adaptive SVD [CG90]<br />
Gradient [CA02]<br />
Relative gradient [CL96] [CA02]<br />
In this course IV, data are assumed whitened: Rx = I.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 126
Intro Dim2 Orthogonal Deflation Discussion<br />
Direct vs Inverse<br />
Two formulations in terms of cumulants:<br />
1 Direct: look for A so as to fit the input-output multilinearity<br />
cumulant relation:<br />
min<br />
A ||Cy ,ijk..ℓ −<br />
P<br />
p=1<br />
Aip Ajp Akp.. Aℓp Cs,ppp..p|| 2<br />
i.e. <strong>de</strong>compose Cy into a sum of P rank-one terms<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 127
Intro Dim2 Orthogonal Deflation Discussion<br />
Direct vs Inverse<br />
Two formulations in terms of cumulants:<br />
1 Direct: look for A so as to fit the input-output multilinearity<br />
cumulant relation:<br />
min<br />
A ||Cy ,ijk..ℓ −<br />
P<br />
p=1<br />
Aip Ajp Akp.. Aℓp Cs,ppp..p|| 2<br />
i.e. <strong>de</strong>compose Cy into a sum of P rank-one terms<br />
2 Inverse: look for B:<br />
<br />
min<br />
B<br />
mnp..q=ppp..p<br />
| <br />
ijk..ℓ<br />
Cy ,ijk..ℓBim Bjn Bkp.. Bℓq| 2<br />
i.e. try to diagonalize Cy by linear transform, z = By<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 127
Intro Dim2 Orthogonal Deflation Discussion<br />
Orthogonal <strong>de</strong>composition<br />
If Q orthogonal, the two problems are equivalent:<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128
Intro Dim2 Orthogonal Deflation Discussion<br />
Orthogonal <strong>de</strong>composition<br />
If Q orthogonal, the two problems are equivalent:<br />
1 Direct:<br />
min<br />
Q,Λ ||Cijkℓ −<br />
P<br />
p=1<br />
Qip Qjp Qkp Qℓp Λpppp|| 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128
Intro Dim2 Orthogonal Deflation Discussion<br />
Orthogonal <strong>de</strong>composition<br />
If Q orthogonal, the two problems are equivalent:<br />
1 Direct:<br />
min<br />
Q,Λ ||Cijkℓ −<br />
P<br />
p=1<br />
Qip Qjp Qkp Qℓp Λpppp|| 2<br />
2 Inverse: min Q,Λ || <br />
ijkℓ Qip Qjq Qkr Qℓs Cijkℓ − Λpppp δpqrs|| 2 or<br />
e.g. max<br />
Q<br />
<br />
| <br />
p<br />
ijkℓ<br />
Qip Qjp Qkp Qℓp Cijkℓ| 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128
Intro Dim2 Orthogonal Deflation Discussion<br />
Orthogonal <strong>de</strong>composition<br />
If Q orthogonal, the two problems are equivalent:<br />
1 Direct:<br />
min<br />
Q,Λ ||Cijkℓ −<br />
P<br />
p=1<br />
Qip Qjp Qkp Qℓp Λpppp|| 2<br />
2 Inverse: min Q,Λ || <br />
ijkℓ Qip Qjq Qkr Qℓs Cijkℓ − Λpppp δpqrs|| 2 or<br />
e.g. max<br />
Q<br />
<br />
| <br />
p<br />
ijkℓ<br />
Qip Qjp Qkp Qℓp Cijkℓ| 2<br />
Proof.<br />
The Frobenius norm is invariant un<strong>de</strong>r orthogonal change of<br />
coordinates.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128
Intro Dim2 Orthogonal Deflation Discussion<br />
Divi<strong>de</strong> to conquer<br />
Well posed problem: the s<strong>et</strong> of orthogonal matrices is closed<br />
But difficulty: many unknowns, in real or complex field<br />
1 1st i<strong>de</strong>a: address a sequence of problems of smaller dimension<br />
instead of a single one in larger dimension. (<strong>de</strong>flation, pair<br />
sweeping, fix <strong>de</strong>scent directions...)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 129
Intro Dim2 Orthogonal Deflation Discussion<br />
Divi<strong>de</strong> to conquer<br />
Well posed problem: the s<strong>et</strong> of orthogonal matrices is closed<br />
But difficulty: many unknowns, in real or complex field<br />
1 1st i<strong>de</strong>a: address a sequence of problems of smaller dimension<br />
instead of a single one in larger dimension. (<strong>de</strong>flation, pair<br />
sweeping, fix <strong>de</strong>scent directions...)<br />
2 2nd i<strong>de</strong>a: <strong>de</strong>compose A into two factors, A = L Q, and<br />
compute L so as to exactly standardize the data. Look for the<br />
best Q in a second stage.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 129
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Algorithms: pair Sweeping (1)<br />
Pairwise processing<br />
Split the orthogonal matrix into a product of plane Givens rotations:<br />
G[i, j] <strong>de</strong>f<br />
<br />
1 1 θ<br />
= √<br />
1 + θ2 −θ 1<br />
acting in the subspace <strong>de</strong>fined by (zi, zj).<br />
➽ the dimension has been reduced to 2, and we have a single<br />
unknown, θ, that can be imposed to lie in (−1, 1].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 130
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Algorithms: pair Sweeping (2)<br />
Cyclic sweeping with fixed or<strong>de</strong>ring: Example in dimension P = 3<br />
x ˜x z<br />
L −1<br />
Carl Jacobi, 1804-1851<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 131
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Algorithms: pair Sweeping (3)<br />
Sweeping a 3 × 3 × 3 symm<strong>et</strong>ric tensor<br />
⎛<br />
X x x<br />
⎞ ⎛<br />
X x x<br />
⎞<br />
⎝ x x x ⎠ ⎝ x x x ⎠<br />
⎛<br />
x<br />
x<br />
x<br />
x<br />
x<br />
x<br />
⎞ ⎛<br />
x<br />
x<br />
x<br />
x<br />
x<br />
⎞<br />
x<br />
⎝ x X x ⎠ → ⎝ x . x ⎠ →<br />
⎛<br />
x<br />
x<br />
x<br />
x<br />
x<br />
⎞<br />
x<br />
⎛<br />
x<br />
x<br />
x<br />
x<br />
x<br />
x<br />
⎞<br />
⎝ x x x ⎠ ⎝ x x x ⎠<br />
x x .<br />
x x X<br />
X : maximized<br />
x : minimized<br />
. : unchanged<br />
⎛<br />
⎝<br />
⎛<br />
⎝<br />
⎛<br />
⎝<br />
. x x<br />
x x x<br />
x x x<br />
x x x<br />
x X x<br />
x x x<br />
x x x<br />
x x x<br />
x x X<br />
⎫<br />
⎬<br />
by last Givens rotation<br />
⎭<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 132<br />
⎞<br />
⎠<br />
⎞<br />
⎠<br />
⎞<br />
⎠
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Algorithms: pair Sweeping (4)<br />
Criteria Υ CoM , Υ STO and Υ JAD , are rational functions of θ,<br />
and their absolute maxima can be computed algebraically.<br />
To prove this, consi<strong>de</strong>r the elementary 2 × 2 problem<br />
z = G ˜x,<br />
<strong>de</strong>note Cijkℓ the cumulants of z, and<br />
G <strong>de</strong>f<br />
=<br />
cos β sin β<br />
− sin β cos β<br />
<strong>de</strong>f<br />
=<br />
1<br />
√ 1 + θ 2<br />
1 θ<br />
−θ 1<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 133
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Solution for Υ CoM<br />
<strong>de</strong>f<br />
= (C1111) 2 + (C2222) 2<br />
We have ΥCoM<br />
Denote ξ = θ − 1/θ. Then it is a rational function in ξ:<br />
ψ4(ξ) = (ξ 2 + 4) −2<br />
4<br />
i=0<br />
bi ξ i<br />
Its stationary points are roots of a polynomial of <strong>de</strong>gree 4:<br />
ω4(ξ) =<br />
4<br />
i=0<br />
ci ξ i<br />
obtainable algebraically via Ferrari’s technique. Coefficients bi<br />
and ci are given functions of cumulants of ˜x.<br />
θ is obtained from ξ by rooting a 2nd <strong>de</strong>gree trinomial.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 134
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Separation of a noisy mixture by maximization of contrast<br />
CoM2: Influence of or<strong>de</strong>ring<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 135
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Solution for Υ JAD<br />
Goal: maximize squares of diagonal terms of GH <br />
N(r)<br />
<br />
G,<br />
ar br<br />
where matrix slices are <strong>de</strong>noted N(r) =<br />
and are<br />
cumulants of ˜x<br />
cr dr<br />
L<strong>et</strong> v <strong>de</strong>f<br />
= [cos 2β, sin 2β] T . Then this amounts to maximizing<br />
the quadratic form v T Mv where<br />
M <strong>de</strong>f<br />
= <br />
<br />
ar − dr<br />
[ar − dr , br + cr ]<br />
r<br />
br + cr<br />
Thus, 2β is given by the dominant eigenvector of M<br />
and G is obtained by rooting a 2nd <strong>de</strong>gree trinomial.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 136
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Solution for Υ STO<br />
Goal: maximize squares of diagonal terms of 3rd or<strong>de</strong>r tensors<br />
T[ℓ]pqr <strong>de</strong>f<br />
= <br />
ijk Gpi Gqj Grk Cijkℓ<br />
<br />
cos β sin β<br />
If G =<br />
then <strong>de</strong>note v<br />
− sin β cos β<br />
<strong>de</strong>f<br />
= [cos 2β, sin 2β] T .<br />
Angle 2β is given by vector v maximizing a quadratic form<br />
v T Bv, where B is 2 × 2 symm<strong>et</strong>ric and contains sums of<br />
products of cumulants of ˜x<br />
θ is obtained from ξ by rooting a 2nd <strong>de</strong>gree trinomial.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 137
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
First conclusions<br />
Pair sweeping can be executed thanks to the equivalence<br />
b<strong>et</strong>ween pairwise and mutual in<strong>de</strong>pen<strong>de</strong>nce<br />
The cumulant tensor can be diagonalized iteratively via a<br />
Jacobi-like algorithm<br />
For each pair, there is a closed-form solution for the optimal<br />
Givens rotation (absolute maximimum of the contrast<br />
criterion).<br />
Questions:<br />
But what about global convergence?<br />
Pairwise processing holds valid if mo<strong>de</strong>l is exact<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 138
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Stationary points: symm<strong>et</strong>ric matrix case<br />
Given a matrix m with components mij, it is sought for an<br />
orthogonal matrix Q such that Υ2 is maximized:<br />
Υ2(Q) = <br />
M 2 ii ; Mij = <br />
QipQjq mpq.<br />
i<br />
Stationary points of Υ2 satisfy for any pair of indices<br />
(q, r), q = r:<br />
p,q<br />
MqqMqr = Mrr Mqr<br />
Next, d 2 Υ2 < 0 ⇔ M 2 qr < (Mqq − Mrr ) 2 , which proves that<br />
Mqr = 0, ∀q = r yields a maximum<br />
Mqq = Grr , ∀q, r yields a minimum<br />
Other stationary points are saddle points<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 139
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Stationary points: symm<strong>et</strong>ric tensor case<br />
Similarly, one can look at relations characterizing local maxima<br />
of criterion Υ<br />
TqqqqTqqqr − TrrrrTqrrr = 0,<br />
4T 2 qqqr + 4T 2 qrrr − (Tqqqq − 3 2<br />
Tqqrr)<br />
2<br />
−(Trrrr − 3<br />
2 Tqqrr) 2 < 0.<br />
for any pair of indices (p, q), p = q.<br />
As a conclusion, contrary to Υ2 in the matrix case, Υ might<br />
have theor<strong>et</strong>ically spurious local maxima in the tensor case<br />
(or<strong>de</strong>r > 2).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 140
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Open problem<br />
1 At each step, a plane rotation is computed and yields the<br />
global maximum of the objective Υ restricted to one variable<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 141
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Open problem<br />
1 At each step, a plane rotation is computed and yields the<br />
global maximum of the objective Υ restricted to one variable<br />
2 There is no proof that the sequence of successive plane<br />
rotations yields the global maximum, in the general case<br />
(tensors that are not necessarily diagonalizable)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 141
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Open problem<br />
1 At each step, a plane rotation is computed and yields the<br />
global maximum of the objective Υ restricted to one variable<br />
2 There is no proof that the sequence of successive plane<br />
rotations yields the global maximum, in the general case<br />
(tensors that are not necessarily diagonalizable)<br />
3 Y<strong>et</strong>, no counter-example has been found since 1991<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 141
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Other algorithms for orthogonal diagonalization<br />
The 3 previous criteria summarize most ways to address tensor<br />
approximate diagonalization.<br />
But the orthogonal matrix can be treated differently, e.g. via other<br />
param<strong>et</strong>erizations<br />
express an orthogonal matrix as the exponential of a<br />
skew-symm<strong>et</strong>ric matrix: Q = exp S<br />
or use the Cayley param<strong>et</strong>erization: Q = (I − S)(I + S) −1 ,<br />
...<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 142
Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />
Joint Approximate Diagonalization S<br />
The i<strong>de</strong>a is to consi<strong>de</strong>r the symm<strong>et</strong>ric tensor of dimension K<br />
and or<strong>de</strong>r d as a collection of K d−2 symm<strong>et</strong>ric K × K matrices<br />
This is the Joint Approximate Diagonalization (JAD) problem<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 143
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
Algorithms based on Deflation<br />
Principle: Joint extraction vs Deflation<br />
Unitary adaptive <strong>de</strong>flation<br />
A so-called “fixed point”algorithm: FastICA<br />
RobustICA<br />
Deflation without spatial prewhitening, algebraic <strong>de</strong>flation<br />
Discussion on MISO criteria<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 144
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
Deflation:<br />
Joint extraction vs Deflation<br />
x<br />
✲<br />
✲<br />
✲<br />
✲<br />
✲<br />
f H<br />
z<br />
✲<br />
✞❄☎<br />
✲<br />
✲<br />
✲<br />
✲<br />
✲<br />
✲<br />
✲<br />
✲<br />
✲<br />
✝ ✆<br />
Advantage: (a) reduced complexity at each stage, (b) simpler<br />
to un<strong>de</strong>rstand<br />
Drawbacks: (i) accumulation of regression errors, limitation of<br />
number of extracted <strong>sources</strong>, (ii) possibly larger final<br />
complexity<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 145
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
Adaptive algorithms<br />
Deflation by Kurtosis Gradient Ascent<br />
Again same i<strong>de</strong>a<br />
After standardization, it is equivalent to maximize 4th or<strong>de</strong>r<br />
moment criterion, M z (4) = E{|z| 4 }, whose gradient is:<br />
Overview<br />
∇M = 4 E{x (f H x)(x H f ) 2 }<br />
Fixed step gradient on anglular param<strong>et</strong>ers: [DL95, CJ10]<br />
Locally optimal step gradient on filter taps: FastICA [CJ10,<br />
ch.6]<br />
Globally optimal step gradient on filter taps: RobustICA<br />
[CJ10, ZC10]<br />
Semi-Algebraic Unitary Deflation (SAUD) [ACX07]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 146
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
Adaptive implementation<br />
Adaptive algorithms<br />
Fully adaptive solutions (update at every sample arrival)<br />
nowadays little useful<br />
Always easy to <strong>de</strong>vise fully adaptive, or block-adaptive<br />
solutions form from block semi-algebraic algorithms (but<br />
reverse is not true!)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 147
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
Unitary adaptive <strong>de</strong>flation (1)<br />
Extraction<br />
To extract the first source, find a unitary matrix U so as to<br />
maximize the kurtosis of the first output<br />
Matrix U can be iteratively <strong>de</strong>termined by a sequence of<br />
Givens rotations<br />
At each step, <strong>de</strong>termine the best angle of the Givens rotation,<br />
e.g. by a gradient ascent [DL95]<br />
NB: only P − 1 Givens rotations are involved<br />
Deflation<br />
After convergence, the first output is extracted, and the P − 1<br />
remaining outputs of U can be processed in the same way<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 148
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
At stage k, Q =<br />
Unitary adaptive <strong>de</strong>flation (2)<br />
q H<br />
k<br />
QH k<br />
<br />
is unitary of size P − k + 1, and only its<br />
first row is used to extract source k, 1 ≤ k ≤ P − 1<br />
q H<br />
1<br />
Q H<br />
1<br />
q H<br />
2<br />
Q H<br />
2<br />
q H<br />
3<br />
Q H<br />
3<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 149
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
A so-called fixed point: FastICA (1)<br />
Any gradient ascent of a function Mρ = E{ρ(f H x)} un<strong>de</strong>r<br />
unit-norm constraint ||f || 2 = 1 admits the Lagrangian<br />
formulation:<br />
E{x ˙ρ(f H x)} = λ f<br />
Convergence: when ∇C and f collinear (and not when<br />
gradient is null, because of constraint ||f || = 1).<br />
One can take ρ(z) = |z| 4 (but this choice makes FastICA<br />
uninteresting)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 150
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
A so-called fixed point: FastICA (2)<br />
D<strong>et</strong>ails of the algorithm proposed in [Hyv99] in the real field; only<br />
difference compared to [Tug97] is fixed step size.<br />
Gradient: ∇M = 4 E{x (f T x) 3 }<br />
Hessian: 12 E{xx T (f T x) 2 }<br />
Heavy approximation of Hyvarinen [Hyv99]:<br />
E{xx T (f T x) 2 } ≈ E{xx T } E{(f T x) 2 }<br />
If x standardized and f unit norm, then Hessian equals<br />
I<strong>de</strong>ntity.<br />
This yields an “approximate Newton iteration”: a mere fixed<br />
step gradient!<br />
f ← f − 1<br />
3 E{x (f T x) 3 } or f ← E{x (f T x) 3 } − 3 f<br />
f ← f /||f ||<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 151
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
FastICA: weaknesses<br />
This is a mere fixed step-size projected gradient algorithm,<br />
inheriting problems such as:<br />
Saddle points (slow/ill convergence)<br />
Flat areas (slow convergence)<br />
Local maxima (ill convergence)<br />
NB: slow convergence may mean high complexity to reach the<br />
solution, or stopping iterations before reaching convergence<br />
(<strong>de</strong>pends on stopping criterion).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 152
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
Polynomial rooting<br />
Theorem (1830). A polynomial of <strong>de</strong>gree higher than 4 cannot in<br />
general be rooted algebraically in terms of a finite number of<br />
additions, subtractions, multiplications, divisions, and radicals (root<br />
extractions).<br />
Niels Abel, 1802-1829 Evariste Galois 1811-1832<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 153
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
How to fix most drawbacks: RobustICA<br />
Principle: Cheap exhaustive Line Search of a criterion J<br />
Look for absolute maximum in the gradient direction (1-dim<br />
search)<br />
Not costly when criteria are polynomials or rational functions<br />
of low <strong>de</strong>gree (same as AMiSRoF [GC98]: polynomial to root,<br />
but here at most of <strong>de</strong>gree 4)<br />
Applies to Kurtosis Maximization (KMA), Constant-Modulus<br />
(CMA), Constant-Power (CPA) Algorithms...<br />
This yields corresponding Optimal-Step (OS) algorithms: OS-KMA,<br />
OS-CMA, OS-CPA... [ZC08] [ZC05]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 154
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
Algorithm<br />
RobustICA<br />
compute coefficients of polynomial ∂<br />
∂µ J (f + µ∇) for fixed f<br />
and ∇<br />
compute all its roots {µi}<br />
select µopt among those roots, which yields the absolute<br />
maximum<br />
s<strong>et</strong> f ← f + µopt∇<br />
See: [ZC10, ZC08]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 155
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
SMSE (dB)<br />
0<br />
−5<br />
−10<br />
RobustICA vs FastICA<br />
Signal Mean Square error<br />
−15 FastICA<br />
OS−CMA<br />
OS−KMA<br />
MMSE<br />
−20 source 1<br />
source 2<br />
source 3<br />
source 4<br />
−25<br />
0 10 20<br />
SNR (dB)<br />
30 40<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 156
Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />
CoM1 [Com01]<br />
Semi-Algebraic Unitary Deflation<br />
Loop on sweeps<br />
for i = 1 to P − 1<br />
for j = i to P<br />
Algebraic 2 × 2 separ.<br />
end<br />
end<br />
end<br />
Extraction<br />
SAUD [ACX07]<br />
for i = 1 to P − 1<br />
Loop on sweeps<br />
for j = i to P<br />
Algebraic 2 × 2 separ.<br />
end<br />
end<br />
Extraction<br />
end<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 157
Intro Dim2 Orthogonal Deflation Discussion<br />
Equivalence b<strong>et</strong>ween KMA and CMA<br />
Recall the 2 criteria:<br />
ΥKMA = Cum{z, z, z∗ , z ∗ }<br />
[E{ |z| 2 }] 2 , JCMA = E{[ |z| 2 − R ] 2 }<br />
Assume 2nd Or<strong>de</strong>r circular <strong>sources</strong>: E{s 2 } = 0<br />
Then KMA and CMA are equivalent [Reg02] [Com04]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 158
Intro Dim2 Orthogonal Deflation Discussion<br />
Discussion on Deflation (MISO) criteria<br />
L<strong>et</strong> z <strong>de</strong>f<br />
= f H x. Criteria below stationary iff differentials of p and q<br />
are collinear:<br />
Ratio: Max<br />
f<br />
p(f )<br />
q(f )<br />
Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />
and q = E{|z| 2 } 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 159
Intro Dim2 Orthogonal Deflation Discussion<br />
Discussion on Deflation (MISO) criteria<br />
L<strong>et</strong> z <strong>de</strong>f<br />
= f H x. Criteria below stationary iff differentials of p and q<br />
are collinear:<br />
Ratio: Max<br />
f<br />
p(f )<br />
q(f )<br />
Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />
and q = E{|z| 2 } 2<br />
Difference: Min<br />
f p(f ) − α q(f )<br />
Examples: Constant Modulus, with p = E{|z| 4 } and<br />
q = 2 a E{|z| 2 } − a 2 or<br />
Constant Power, with q = 2 a ℜ(E{z 2 }) − a 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 159
Intro Dim2 Orthogonal Deflation Discussion<br />
Discussion on Deflation (MISO) criteria<br />
L<strong>et</strong> z <strong>de</strong>f<br />
= f H x. Criteria below stationary iff differentials of p and q<br />
are collinear:<br />
Ratio: Max<br />
f<br />
p(f )<br />
q(f )<br />
Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />
and q = E{|z| 2 } 2<br />
Difference: Min<br />
f p(f ) − α q(f )<br />
Examples: Constant Modulus, with p = E{|z| 4 } and<br />
q = 2 a E{|z| 2 } − a 2 or<br />
Constant Power, with q = 2 a ℜ(E{z 2 }) − a 2<br />
Constrained: Max p(f )<br />
q(f )=1<br />
Example: Cumulant, with<br />
p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z2 }| 2<br />
Example: Moment, with p = E{|z| 4 }, if standardized and<br />
with either q = ||f || 2 or q = E{|z| 2 } 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 159
Intro Dim2 Orthogonal Deflation Discussion<br />
Bibliographical comments S<br />
Orthogonal diagonalization of symm<strong>et</strong>ric tensors:<br />
Direct diago: CoM without slicing [Com92a] [Com94b]<br />
JAD in 2 mo<strong>de</strong>s: [Lee78] (R), [CS93] (C), with matrix<br />
exponential [TF07]<br />
JAD 2 mo<strong>de</strong>s with positive <strong>de</strong>finite matrices: [FG86]<br />
JAD in 3 mo<strong>de</strong>s: [LMV01]<br />
Orthogonal diagonalization of non symm<strong>et</strong>ric tensors:<br />
ALS type [Kro83] [Kie92]<br />
ALS on pairs: [MvL08] [Sør10]<br />
JAD in 2 mo<strong>de</strong>s (R): [PPPP01]<br />
Matrix exponential [SICD08]<br />
Deflation, FastICA, RobustICA: [CJ10, ch.6] [ZC07] [ZCK06]<br />
[ZC10]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 160
Invertible Iterative SemiAlg ConclRemarks<br />
V. Algorithms without prewhitening<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 161
Invertible Iterative SemiAlg ConclRemarks<br />
17 Invertible transform<br />
18 Iterative algorithms<br />
HJ algorithm<br />
InfoMax<br />
Relative gradient<br />
Probabilistic approach<br />
RobustICA<br />
Contents of course V<br />
19 Semi algebraic algorithms<br />
Alternating Least Squares<br />
Diagonally dominant matrices<br />
Joint Schur<br />
ICAR<br />
20 Concluding remarks<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 162
Invertible Iterative SemiAlg ConclRemarks<br />
Approximate diagonalization by invertible transform<br />
One view of the problem: tensor (approximate) diagonalization by<br />
invertible transform<br />
Joint Approximate Diagonalization (JAD) of a collection of:<br />
symm<strong>et</strong>ric matrices<br />
symm<strong>et</strong>ric diagonally dominant matrices<br />
symm<strong>et</strong>ric positive <strong>de</strong>finite matrices<br />
for a collection of matrices, not necessarily symm<strong>et</strong>ric (2<br />
invertible transforms)<br />
...<br />
Direct approaches without slicing the tensor into a collection<br />
of matrices: algorithms <strong>de</strong>vised for un<strong>de</strong>r<strong>de</strong>termined mixtures<br />
apply (eg. ICAR)<br />
NB: Ill-poseness of optimization over s<strong>et</strong> of invertible matrices<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 163
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Hérault-Jutten algorithm (1/2)<br />
Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />
Diag{C} = 0.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Hérault-Jutten algorithm (1/2)<br />
Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />
Diag{C} = 0.<br />
Update rule: Cij[k + 1] = Cij[k] + µ f (zi[k]) g(zj[k])<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Hérault-Jutten algorithm (1/2)<br />
Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />
Diag{C} = 0.<br />
Update rule: Cij[k + 1] = Cij[k] + µ f (zi[k]) g(zj[k])<br />
The iteration C[k + 1] = C[k] + µ Φ(C[k]) actually searches<br />
for zeros of function Φ having negative <strong>de</strong>rivatives if<br />
0 < µ < 1 (resp. positive if −1 < µ < 0).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Hérault-Jutten algorithm (1/2)<br />
Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />
Diag{C} = 0.<br />
Update rule: Cij[k + 1] = Cij[k] + µ f (zi[k]) g(zj[k])<br />
The iteration C[k + 1] = C[k] + µ Φ(C[k]) actually searches<br />
for zeros of function Φ having negative <strong>de</strong>rivatives if<br />
0 < µ < 1 (resp. positive if −1 < µ < 0).<br />
Stochastic iteration Robbins-Monro (1951) [CJH91]:<br />
If Φ(C) is the expectation of some ϕ(C, z), then stochastic<br />
version<br />
C[k + 1] = C[k] + µ ϕ(C[k], z[k])<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Hérault-Jutten algorithm (2/2)<br />
Choice of functions f (·) and g(·).<br />
At least one should be non linear<br />
Different<br />
Odd<br />
Link with Estimating Equations if f (z) = ψ(z) and g(z) = z<br />
Convergence is slow at the end, and the limiting zero<br />
<strong>de</strong>pends on f and g. Permuting f and g changes the limiting<br />
zero [Sor91].<br />
Direct implementations exist [MM97]<br />
Normalization improves behavior [DAC02]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 165
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
InfoMax<br />
I<strong>de</strong>a: maximize output Mutual Information [BS95] [Car99]<br />
But may tend to infinity!<br />
ΥMI (z) = <br />
H(zi) − H(z)<br />
Impose as constraints H(zi) constant ⇒ well posed<br />
Now just maximize output joint entropy H(y)<br />
i<br />
But how?<br />
Build z = g(Bx), where g(·) is a c.d.f mapping R to [0, 1].<br />
Maximize the output entropy H(g(Bx)) w.r.t. B by a<br />
stochastic version of:<br />
B ← B + µ ∇H(g(Bx)).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 166
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Standard gradient:<br />
Relative gradient<br />
Υ(x + h) − Υ(x) ≈ 〈g(x), h〉<br />
Relative gradient: Useful when x belongs to a continuous<br />
multiplicative group. Take h = ɛx:<br />
Υ(x + ɛx) − Υ(x) ≈ 〈g R (x), ɛ〉<br />
Relationship: because 〈g(x), ɛx〉 = 〈g(x)x T , ɛ〉, we have:<br />
g R (x) = g(x) x T<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 167
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Estimating Equations in BSS<br />
Mutual Information:<br />
ΥMI (z) = <br />
i H(zi) − H(z)<br />
If z = Bx, then H(z) = H(x) + log | <strong>de</strong>t B|<br />
Gradient: ∂H(z)<br />
∂ <br />
∂B i H(zi) = E{ <br />
i ∂<br />
∂B log pi(zi)} = E{ <br />
Hence<br />
∂ΥMI (z)<br />
∂B<br />
where ψz(z) is the score function.<br />
∂B = B−T because H(x) constant, and<br />
= E{ψ z(z)x T } + B −T<br />
Relative gradient: ∇ R Υ = ∇Υ · B T yields<br />
the Estimating Equation:<br />
i<br />
∂ log pi (zi )<br />
∂zi<br />
∂zi<br />
∂B }<br />
∇ R ΥMI (z) = E{ψz(z)z T } + I (34)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 168
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Equivariance<br />
Assume we want to maximize Υ(B)<br />
Υ(B + ɛ B) − Υ(B) ≈ 〈∇ R Υ(B), ɛ B〉 = 〈∇ R Υ(B) B T , ɛ〉<br />
Maximal if ɛ = ∇ R Υ(B) B T<br />
Update rule: B ← (I + µ ɛ)B<br />
Now <strong>de</strong>note G = B A the global filter. Post-multiplying by A<br />
shows that update rule is equivalent to<br />
G ← (I + µ ɛ)G<br />
which does not <strong>de</strong>pend on A ⇒ uniform performance<br />
☞ For more <strong>de</strong>tails, see [CJ10, ch.4].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 169
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Probabilistic approach (1)<br />
L<strong>et</strong> T(q) be a collection of symm<strong>et</strong>ric positive semi<strong>de</strong>finite matrices<br />
Look for B such that M(q) <strong>de</strong>f<br />
= B T(q) BT are as diagonal as<br />
possible. L<strong>et</strong> αk ∈ [0, 1] and <br />
k αk = 1.<br />
1 Criterion to maximize: Υ <strong>de</strong>f<br />
= <br />
q αq<br />
<strong>de</strong>t M(q)<br />
log <strong>de</strong>t Diag{M(q)}<br />
We have Υ ≤ 0 from Hadamard’s inequality.<br />
Avoids singularity<br />
Linked to Maximum Likelihood<br />
2 Use multiplicative update as B (ℓ+1) = U B (ℓ)<br />
3 Criterion after update: Υ = <br />
q αq log<br />
<strong>de</strong>t M(q) <strong>de</strong>t 2 U<br />
<strong>de</strong>t Diag{UM(q)U T }<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 170
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
Probabilistic approach (2)<br />
4 Variation<br />
<br />
of Υ during one update:<br />
<br />
2 log <strong>de</strong>t U − log <strong>de</strong>t Diag{UM(q)UT } + log <strong>de</strong>t DiagM(q)<br />
<br />
q αq<br />
5 Update two rows at a time, i.e. U is equal to I<strong>de</strong>ntity except<br />
for entries (i, i), (i, j), (j, i), (j, j).<br />
By concavity of log, g<strong>et</strong> a lower bound on variation:<br />
<br />
2 log <strong>de</strong>t U − log(UPU T )11 − log(UQU T <br />
)22<br />
q<br />
αq<br />
where P and Q are the 2 × 2 matrices:<br />
P = <br />
q<br />
αq<br />
M(q)ii<br />
M(q)[i, j] and Q = <br />
q<br />
αq<br />
M(q)jj<br />
M(q)[i, j]<br />
6 Maximize this bound instead. This leads to rooting a 2nd<br />
<strong>de</strong>gree trinomial. Sweep all the pairs in turns<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 171
Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />
SMSE (dB)<br />
0<br />
−5<br />
−10<br />
−15<br />
−20<br />
−25<br />
10 1<br />
RobustICA again, vs FastICA<br />
Signal Mean Square error [ZC07]<br />
K = 5<br />
10 2<br />
K = 10<br />
10 3<br />
K = 20<br />
10 4<br />
complexity per source per sample (flops)<br />
FastICA<br />
pw+FastICA<br />
RobustICA<br />
pw+RobustICA<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 172<br />
10 5
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Survey of some semi-algebraic algorithms<br />
1 An algorithm of ALS type: ACDC<br />
2 An algorithm with a multiplicative update, valid if A is<br />
diagonally dominant<br />
3 An algorithm based on Joint triangularization<br />
4 An SVD-based algorithm: ICAR (Biome4)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 173
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Alternating Least Squares<br />
1 Two writings of the criterion:<br />
Υ = <br />
q<br />
||T(q) − BΛ(q)B H || 2<br />
Υ = <br />
||t(q) − B λ(q)|| 2<br />
q<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Alternating Least Squares<br />
1 Two writings of the criterion:<br />
Υ = <br />
q<br />
||T(q) − BΛ(q)B H || 2<br />
Υ = <br />
||t(q) − B λ(q)|| 2<br />
q<br />
2 Stationary values for DiagΛ(q): λ(q) = {B H B} −1 B H t(q)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Alternating Least Squares<br />
1 Two writings of the criterion:<br />
Υ = <br />
q<br />
||T(q) − BΛ(q)B H || 2<br />
Υ = <br />
||t(q) − B λ(q)|| 2<br />
q<br />
2 Stationary values for DiagΛ(q): λ(q) = {B H B} −1 B H t(q)<br />
3 Stationary value for each column b[ℓ] of matrix B is the<br />
dominant eigenvector of the Hermitean matrix<br />
P[ℓ] = 1 <br />
λℓ(q){˜T[q; ℓ]<br />
2<br />
H + ˜T[q; ℓ]}<br />
q<br />
where ˜T[q; ℓ] <strong>de</strong>f<br />
= T(q) − <br />
n=ℓ λn(q) b[n]b[n] H .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Alternating Least Squares<br />
1 Two writings of the criterion:<br />
Υ = <br />
q<br />
||T(q) − BΛ(q)B H || 2<br />
Υ = <br />
||t(q) − B λ(q)|| 2<br />
q<br />
2 Stationary values for DiagΛ(q): λ(q) = {B H B} −1 B H t(q)<br />
3 Stationary value for each column b[ℓ] of matrix B is the<br />
dominant eigenvector of the Hermitean matrix<br />
P[ℓ] = 1 <br />
λℓ(q){˜T[q; ℓ]<br />
2<br />
H + ˜T[q; ℓ]}<br />
q<br />
where ˜T[q; ℓ] <strong>de</strong>f<br />
= T(q) − <br />
n=ℓ λn(q) b[n]b[n] H .<br />
4 ALS: calculate Λ(q) and B alternately. Use LS solution when<br />
matrices are singular.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Diagonally dominant matrices (1)<br />
One wishes to minimize iteratively <br />
q ||T(q) − A Λ(q) AT || 2<br />
Assume A is strictly diagonally dominant: |Aii| > <br />
j=i |Aij|<br />
(cf. Levy-Desplanques theorem)<br />
1 Initialize A = I<br />
2 Update A multiplicatively as A ← (I + W)A, where W is<br />
zero-diagonal<br />
3 Compute the best W assuming that it is small and that T(q)<br />
are almost diagonal (first or<strong>de</strong>r approximation)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 175
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Computational <strong>de</strong>tails:<br />
Diagonally dominant matrices (2)<br />
We have: T (ℓ+1) (q) ← (I + W)T (ℓ) (q)(I + W) T<br />
T (ℓ) (q) <strong>de</strong>f<br />
= (D − E), where D is diagonal, and E zero-diagonal<br />
If W and E are small:<br />
T (ℓ+1) (q) ≈ D(q) + W D(q) + D(q) W T − E(q)<br />
Hence minimize, wrt W:<br />
<br />
|Wij Djj(q) + Wji Dii(q) − Eij(q)| 2<br />
q<br />
i=j<br />
This is of the form minw ||J w − e|| 2 , where J is sparse: J T J<br />
is block diagonal<br />
Thus one g<strong>et</strong>s at each iteration a collection of <strong>de</strong>coupled 2 × 2<br />
linear systems<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 176
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Joint triangularization of matrix slices<br />
1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />
T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />
= diag{Cq,:}.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Joint triangularization of matrix slices<br />
1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />
T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />
= diag{Cq,:}.<br />
2 Compute the Generalized Schur <strong>de</strong>composition<br />
Q T(q) Z = R(q), where R(q) are upper-triangular<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Joint triangularization of matrix slices<br />
1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />
T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />
= diag{Cq,:}.<br />
2 Compute the Generalized Schur <strong>de</strong>composition<br />
Q T(q) Z = R(q), where R(q) are upper-triangular<br />
3 Since QT R(q) ZT = A D(q) BT , matrices R ′ <strong>de</strong>f<br />
= Q A and<br />
R ′′ <strong>de</strong>f<br />
= BT Z are upper triangular, and can be assumed to have<br />
a unit diagonal. Hence R ′ and R ′′ can be computed by<br />
solving from the bottom to the top the triangular system, two<br />
at a time:<br />
entries R ′ ij<br />
and R′′<br />
ij<br />
R(q) = R ′ D(q) R ′′<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Joint triangularization of matrix slices<br />
1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />
T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />
= diag{Cq,:}.<br />
2 Compute the Generalized Schur <strong>de</strong>composition<br />
Q T(q) Z = R(q), where R(q) are upper-triangular<br />
3 Since QT R(q) ZT = A D(q) BT , matrices R ′ <strong>de</strong>f<br />
= Q A and<br />
R ′′ <strong>de</strong>f<br />
= BT Z are upper triangular, and can be assumed to have<br />
a unit diagonal. Hence R ′ and R ′′ can be computed by<br />
solving from the bottom to the top the triangular system, two<br />
at a time:<br />
entries R ′ ij<br />
and R′′<br />
ij<br />
R(q) = R ′ D(q) R ′′<br />
4 Compute A = R ′ Q T and B T = R ′′ Z T<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
Joint triangularization of matrix slices<br />
1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />
T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />
= diag{Cq,:}.<br />
2 Compute the Generalized Schur <strong>de</strong>composition<br />
Q T(q) Z = R(q), where R(q) are upper-triangular<br />
3 Since QT R(q) ZT = A D(q) BT , matrices R ′ <strong>de</strong>f<br />
= Q A and<br />
R ′′ <strong>de</strong>f<br />
= BT Z are upper triangular, and can be assumed to have<br />
a unit diagonal. Hence R ′ and R ′′ can be computed by<br />
solving from the bottom to the top the triangular system, two<br />
at a time:<br />
entries R ′ ij<br />
and R′′<br />
ij<br />
R(q) = R ′ D(q) R ′′<br />
4 Compute A = R ′ Q T and B T = R ′′ Z T<br />
5 Compute matrix C from T, A and B by solving the<br />
over-<strong>de</strong>termined linear system C · {(B ⊙ A) T } = T K×JI<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
ICAR (1/2)<br />
Source kurtosis signs are known and stored in diagonal matrix ∆.<br />
1 Compute 4th or<strong>de</strong>r cumulants of x and store them in a<br />
N 2 × N 2 “quadricovariance” matrix, Qx.<br />
2 Upon appropriate or<strong>de</strong>ring, we have:<br />
Qx = [A ⊙ A]D∆D[A ⊙ A] H<br />
(35)<br />
where D∆D is the N × N diagonal matrix of source marginal<br />
cumulants, Dii ∈ R + .<br />
3 Compute a square root of Qx = EΛE H of rank N as<br />
Q 1/2<br />
x<br />
= E(Λ∆) 1/2 . Then Q 1/2<br />
x ∆ Q H/2<br />
x = Qx and<br />
∃ V : Q 1/2<br />
x<br />
= [A ⊙ A]DV H , with V H ∆V = ∆<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 178
Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />
4 Q 1/2<br />
x<br />
ICAR (2/2)<br />
has N2 rows and N columns. Denote Γn each N × N<br />
block. Then<br />
where Φn = Diag{A(n, :)}<br />
Γn = A ∗ ΦnDV H<br />
5 Compute the products Θm,n = Γ − mΓn. Then we have<br />
Θm,n = V −H Φ −1<br />
m ΦnV H<br />
6 Compute V by jointly diagonalizing Θm,n, 1 ≤ m < n ≤ N.<br />
7 Compute an estimate of A from the N blocks of<br />
Q 1/2<br />
x V −H = A ⊙ A, and eventually an estimate of the <strong>sources</strong><br />
For further <strong>de</strong>tails, see [AFCC05].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 179
Invertible Iterative SemiAlg ConclRemarks<br />
Bibliographical comments S<br />
For collection of symm<strong>et</strong>ric matrices:<br />
maximizes iteratively a lower bound to the <strong>de</strong>crease on a<br />
probabilistic objective [Pha01]<br />
alternately minimize <br />
q ||T(q) − A Λ(q) AT || 2 wrt A and<br />
Λ(q) [Yer02]. See also: [LZ07] [VO06]<br />
minimizes iteratively <br />
q ||T(q) − A Λ(q) AT || 2 un<strong>de</strong>r the<br />
assumption that A is diagonally dominant [ZNM04].<br />
For collection of non symm<strong>et</strong>ric matrices:<br />
factor A into orthogonal and triangular parts, and perform a<br />
joint Schur <strong>de</strong>composition [LMV04].<br />
others: algorithms applicable to un<strong>de</strong>r<strong>de</strong>termined case work<br />
here, e.g. ICAR [AFCC05]. Also: [CL13].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 180
Invertible Iterative SemiAlg ConclRemarks<br />
For over-<strong>de</strong>termined mixtures:<br />
Summary<br />
Solving the invertible problem is sufficient<br />
Orthogonal framework:<br />
Fully use 2nd or<strong>de</strong>r statistics, and then <strong>de</strong>compose<br />
approximately the tensor un<strong>de</strong>r orthogonal constraint.<br />
Easier to handle singularity, but arbitrary.<br />
Several (contrast) criteria and algorithms for orthogonal<br />
<strong>de</strong>composition<br />
Invertible framework:<br />
Decompose the higher or<strong>de</strong>r cumulant tensor directly un<strong>de</strong>r<br />
invertible constraint<br />
Ill-posed<br />
Several algorithms, mainly working with matrix slices<br />
Principles & algorithms <strong>de</strong>dicated to un<strong>de</strong>r-<strong>de</strong>termined mixtures<br />
can also apply (cf. subsequent course).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 181
Intro Inversibility Time Frequ<br />
VI. Convolutive Mixtures<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 182
Intro Inversibility Time Frequ<br />
Contents of course VI<br />
21 Introduction<br />
Notations<br />
Mo<strong>de</strong>ls of <strong>sources</strong><br />
22 Inversibility of MIMO convolutive mixtures<br />
23 M<strong>et</strong>hods in the time domain<br />
Contrasts<br />
Blind Equalization<br />
Blind I<strong>de</strong>ntification<br />
Matching algorithms<br />
Subspace algorithms<br />
Algorithms for ARMA channels<br />
24 M<strong>et</strong>hods in the frequency domain<br />
Introduction<br />
SOS<br />
Permutation<br />
Applications<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 183
Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />
Why convolutive mixtures ?<br />
Limitation of linear instantaneous (or memoryless) mixtures<br />
source signal may arrive to the sensors with different <strong>de</strong>lays,<br />
following different paths<br />
source signal can be filtered due to the channel propagation<br />
such situations occurs in:<br />
telecommunications due to multiple paths,<br />
audio and speech processing due to sound propagation in air,<br />
reverberation in room.<br />
For b<strong>et</strong>ter mo<strong>de</strong>lization of mixtures in such situations, one has to<br />
consi<strong>de</strong>r mixtures of filtered <strong>sources</strong>, i. e. convolutive mixtures.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 184
Sources and mixtures<br />
Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />
Notations<br />
Vector of unknown source signals: s(n) = (s1(n), . . . , sP(n)) T<br />
Vector of observable mixtures: x(n) = (x1(n), . . . , xK (n)) T<br />
Delayed samples of <strong>sources</strong> contribute to observations at a<br />
given time n, which is mo<strong>de</strong>lled by a MIMO (multi-input<br />
multi-output) linear time invariant (LTI) filter with impulse<br />
response A(n).<br />
Relation b<strong>et</strong>ween <strong>sources</strong> and mixtures is<br />
x(n) = <br />
A(k)s(n − k)<br />
k∈Z<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 185
Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />
Notations (con’t)<br />
Separation structure and estimated <strong>sources</strong><br />
Estimated <strong>sources</strong>: y(n) = (y1(n), . . . , yP(n)) T<br />
y(n) is obtained through a MIMO LTI filter B(n), objective of<br />
which is to “inverse” the channel mixture A(n):<br />
y(n) = <br />
B(k)x(n − k)<br />
k∈Z<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 186
Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />
z-transform notations<br />
Mixing and separating filters<br />
A[z] = <br />
A(k)z −k and B[z] = <br />
B(k)z −k<br />
Global filter<br />
k∈Z<br />
k∈Z<br />
The global filter <strong>de</strong>noted G(n), relies y(n) and s(n):<br />
G(n) = <br />
G(k)s(n − k), ∀n ∈ Z<br />
k∈Z<br />
The global filter is <strong>de</strong>pending on mixing and separating filters:<br />
G(n) = <br />
B(n − k)A(k), ∀n ∈ Z and<br />
k∈Z<br />
G[z] = B[z]A[z].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 187
Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />
Mo<strong>de</strong>ls of <strong>sources</strong><br />
The mo<strong>de</strong>l of <strong>sources</strong> is important for mo<strong>de</strong>lling practical<br />
situations. It also plays an important role in the in<strong>de</strong>terminacies.<br />
iid <strong>sources</strong><br />
The signal {s(n)}, n ∈ Z, is an in<strong>de</strong>pen<strong>de</strong>nt and i<strong>de</strong>ntically<br />
distributed (iid) sequence if the sample distributions are<br />
i<strong>de</strong>ntical and mutually statistically in<strong>de</strong>pen<strong>de</strong>nt.<br />
Definition of linear <strong>sources</strong><br />
The signal {s(n)}, n ∈ Z, is a linear signal if it can be written<br />
as the output of a filter fed by an iid sequence.<br />
Definition of nonlinear <strong>sources</strong><br />
The signal {s(n)}, n ∈ Z, is a nonlinear signal if it is not<br />
linear, i.e. if it cannot be written as the output of a filter feed<br />
by an iid sequence.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 188
Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />
Other properties on <strong>sources</strong><br />
Other priors on <strong>sources</strong>, encountered in various situations or<br />
application domains, can be exploited, e.g.:<br />
Nonstationarity, for instance, in music or speech processing<br />
Cyclo-stationarity, for instance in telecommunications<br />
Colored <strong>sources</strong><br />
Etc.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 189
Global filter<br />
Intro Inversibility Time Frequ<br />
Assumptions<br />
Three main assumptions can be done on the filters, leading to<br />
different situations and level of difficulty<br />
H1: The mixing filter is stable<br />
H2: The mixing filter is causal<br />
H3: The mixing filter is finite impulse response (FIR) filter<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 190
Stable mixing filter<br />
Intro Inversibility Time Frequ<br />
General inversibility results<br />
Proposition 1: For P = K), the stable P × P filter A[z] is<br />
invertible by a stable filter iif <strong>de</strong>t(A[z]) = 0, ∀z, |z| = 1.<br />
Stable and causal mixing filter<br />
Proposition 2: For P = K), the stable P × P filter A[z] is<br />
invertible by a stable and causal filter iif <strong>de</strong>t(A[z]) = 0 in a<br />
domain {z ∈ C, |z| > r} ∪ {∞}, where 0 ≤ r < 1.<br />
Stable P = K) mixing filter<br />
We now assume that P = K.<br />
Proposition 3: The stable K × P filter A[z] is invertible by a<br />
stable filter iif rank{A[z]} = P, ∀z, |z| = 1.<br />
Proofs can be found in [CJ10, ch.8]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 191
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Contrast criteria (1) P<br />
In<strong>de</strong>terminacies in convolutive mixtures<br />
White <strong>sources</strong>: the only filters preserving both strong<br />
whiteness (i.i.d) and in<strong>de</strong>pen<strong>de</strong>nce are<br />
T[z] = Λ[z] P<br />
where Λ[z] is diagonal, and λii[z] = αi z βi , β ∈ Z, i.e. pure<br />
<strong>de</strong>lays integer multiples of sampling period.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 192
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Contrast criteria (1) P<br />
In<strong>de</strong>terminacies in convolutive mixtures<br />
White <strong>sources</strong>: the only filters preserving both strong<br />
whiteness (i.i.d) and in<strong>de</strong>pen<strong>de</strong>nce are<br />
T[z] = Λ[z] P<br />
where Λ[z] is diagonal, and λii[z] = αi z βi , β ∈ Z, i.e. pure<br />
<strong>de</strong>lays integer multiples of sampling period.<br />
Colored <strong>sources</strong>: sole mutual in<strong>de</strong>pen<strong>de</strong>nce cannot allow to<br />
recover source color. Trivial filters are of same form, where<br />
Λ[z] is any diagonal matrix.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 192
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Contrast criteria (1) P<br />
In<strong>de</strong>terminacies in convolutive mixtures<br />
White <strong>sources</strong>: the only filters preserving both strong<br />
whiteness (i.i.d) and in<strong>de</strong>pen<strong>de</strong>nce are<br />
T[z] = Λ[z] P<br />
where Λ[z] is diagonal, and λii[z] = αi z βi , β ∈ Z, i.e. pure<br />
<strong>de</strong>lays integer multiples of sampling period.<br />
Colored <strong>sources</strong>: sole mutual in<strong>de</strong>pen<strong>de</strong>nce cannot allow to<br />
recover source color. Trivial filters are of same form, where<br />
Λ[z] is any diagonal matrix.<br />
Linear processes: When each source is the output of a linear<br />
filter fed by a white process, then it is then equivalent to<br />
assume that <strong>sources</strong> are white.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 192
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Contrast criteria (2)<br />
Contrasts based on output cumulants: Proofs <strong>de</strong>rived in the<br />
static case hold true in the convolutive case when <strong>sources</strong> are iid.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 193
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Contrast criteria (2)<br />
Contrasts based on output cumulants: Proofs <strong>de</strong>rived in the<br />
static case hold true in the convolutive case when <strong>sources</strong> are iid.<br />
Examples: The following are contrasts, if (a) at most one source<br />
cumulant is null, (b) p + q > 2 and (c) r ≥ 1:<br />
Υ (r)<br />
p,q = <br />
|Cum{p, q}{zi}| r<br />
i<br />
(36)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 193
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Contrast criteria (2)<br />
Contrasts based on output cumulants: Proofs <strong>de</strong>rived in the<br />
static case hold true in the convolutive case when <strong>sources</strong> are iid.<br />
Examples: The following are contrasts, if (a) at most one source<br />
cumulant is null, (b) p + q > 2 and (c) r ≥ 1:<br />
Υ (r)<br />
p,q = <br />
|Cum{p, q}{zi}| r<br />
i<br />
More generally, if φ is convex strictly increasing on R +<br />
(36)<br />
Υ φ p,q = <br />
φ(|Cum{p, q}{zi}|) (37)<br />
i<br />
For more <strong>de</strong>tails, see [CJ10, ch.3] [Com04] [MP97] [Com96] and<br />
refs therein.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 193
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Para-unitarity<br />
What are the filters preserving second-or<strong>de</strong>r whiteness?<br />
The so-called Para-unitary filters:<br />
U[z] U[1/z ∗ ] H = I<br />
Or in time domain:<br />
<br />
U(k) U(k − n) H = δ(n) I<br />
k<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 194
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Contrast criteria (3)<br />
Also possible to <strong>de</strong>vise new families of contrasts for<br />
para-unitary equalizers after prewhitening [CM97] [Com96].<br />
For instance at or<strong>de</strong>r 4:<br />
Υ(y) = <br />
|Cum{yi[n], yi[n] ∗ , yj[n − p], yk[n − q] ∗ }| 2<br />
(38)<br />
i<br />
j p<br />
k q<br />
In the above, one can conjugate any of the variables yℓ’s<br />
provi<strong>de</strong>d at most one source cumulant is null<br />
Holds true for almost any cumulants of or<strong>de</strong>r ≥ 3<br />
Only two indices need to be i<strong>de</strong>ntical with same <strong>de</strong>lay<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 195
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO mixture with diversity K = 2 (1)<br />
s<br />
Disparity condition:<br />
Bézout:<br />
Thus<br />
FIR filter h =<br />
h<br />
h<br />
x<br />
1 1<br />
h1[z] ∧ h2[z] = 1 ⇒ x1[z] ∧ x2[z] = s[z]<br />
∃v1[z], v2[z] / v1[z] h1[z] + v2[z] h2[z] = 1<br />
h1<br />
h2<br />
2<br />
⇒ v1[z] x1[z] + v2[z] x2[z] = s[z]<br />
<br />
admits the FIR inverse v = <br />
v1, v2 .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 196<br />
x<br />
2
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO mixture with diversity K = 2 (2)<br />
Theorem<br />
If two polynomials p(z) = m i=0 aiz i and q(z) = n prime, then the resultant below is non zero:<br />
<br />
<br />
a0<br />
<br />
<br />
<br />
0<br />
<br />
R(p, q) = 0<br />
<br />
b0<br />
<br />
<br />
0<br />
0<br />
. . .<br />
. ..<br />
0<br />
. . .<br />
. ..<br />
0<br />
am<br />
a0<br />
bn<br />
b0<br />
0<br />
. ..<br />
. . .<br />
0<br />
. ..<br />
. . .<br />
<br />
. . . <br />
<br />
<br />
0 <br />
<br />
am <br />
<strong>de</strong>f A<br />
. . . = <strong>de</strong>t<br />
B<br />
<br />
0 <br />
<br />
bn <br />
i=0 biz i are<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 197
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Use of time diversity<br />
Time diversity<br />
If channel bandwidth exceeds symbol rate 1 (excess bandwidth),<br />
Ts<br />
then a sampling faster than 1 brings extra information on channel.<br />
Ts<br />
[Slo94] [TXK94]<br />
How to build a SIMO channel from a SISO?<br />
sample twice faster: x[k] = x(k Ts/2)<br />
<strong>de</strong>note odd samples x1[k] = x[2k + 1], and even samples<br />
x2[k] = x[2k]<br />
then x1[k]<br />
x2[k]<br />
<br />
=<br />
H1<br />
H2<br />
<br />
s[k] <strong>de</strong>f<br />
= H s[k]<br />
Matrix H is full rank (well conditioned) if sufficient excess<br />
bandwidth<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 198
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MISO Dynamic Extractor: Deflation<br />
Fixed step gradient Deflation [Tug97]<br />
No spurious minima of contrast criterion [CJ10, ch.6] [DL95]<br />
Optimal Line search along a <strong>de</strong>scent direction, OS-KMA<br />
[ZC05] [Com02]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 199
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
After Prewhitening: PAJOD (1)<br />
Technique applied after space-time prewhitening<br />
Then one looks for a para-unitary equalizer, by maximizing the<br />
contrast<br />
J2,r (my) = <br />
||Diag{H H M(mb, mβ) H}|| 2<br />
b<br />
β<br />
Matrix H is now <strong>de</strong>fined differently, and is semi-unitary.<br />
Matrices M(·) contain cumulants of whitened observations<br />
Contrast (38) is maximized again by a sweeping technique<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 200
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
PAJOD (2)<br />
PAJOD: Partial Approximate Joint Diagonalization of matrix slices<br />
One actually attempts to diagonalize only a portion of the tensor<br />
1<br />
. . .<br />
q q<br />
N L<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 201
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO Blind Equalization<br />
linear prediction after BI [Com90]<br />
linear prediction [AMML97] [GL99]<br />
subspace [MDCM95] [GDM97] [CL97] [AMLM97]<br />
i<strong>de</strong>ntifiability issues by subspace techniques [LM00] [Des01]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 202
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Equalization after prior Blind I<strong>de</strong>ntification<br />
Assume channel H[z] has been i<strong>de</strong>ntified, with:<br />
x[z] = H[z] · s[z] + v[z]<br />
An estimate of s[z] is obtained with F[z] ⋆ x[z].<br />
Possible equalizers F[z]:<br />
Zero-Forcing: F[z] = H[z] −1<br />
Matched Filter: F[z] = H[1/z ∗ ] H<br />
(used in MLSE; optimal if channel AWGN; maximizes output<br />
SNR)<br />
Minimum Mean Square Error (MSE):<br />
F[z] = (H[z]H[1/z ∗ ] H + Rv [z]) −1 H[1/z ∗ ] H<br />
➽ One can insert soft or hard <strong>de</strong>cision to stabilize the inverse, or to<br />
reduce noise, e.g. <strong>de</strong>cision Feedback Equalizers (DFE).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 203
Interest<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Overview<br />
MA i<strong>de</strong>ntifiability (second or<strong>de</strong>r vs hOS)<br />
SISO: Cumulant matching<br />
MIMO: Cumulant matching and linear prediction (non monic<br />
MA)<br />
Algebraic approaches, Quotient Ring<br />
SIMO: Subspace approaches<br />
MIMO: Subspace, IIR, ...<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 204
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
If <strong>sources</strong> sp[k] are discr<strong>et</strong>e, it is:<br />
BE vs BI<br />
rather easy to <strong>de</strong>fine a BE optimization criterion in or<strong>de</strong>r to<br />
match an output alphab<strong>et</strong><br />
difficult to exploit a source alphab<strong>et</strong> in BI<br />
Example the property of constant modulus of an alphab<strong>et</strong> is mainly<br />
used in Blind Equalization: CMA (Constant Modulus Algorithm)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 205
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Interest of Blind I<strong>de</strong>ntification<br />
When the mixture does not have a stable inverse<br />
➽ When may want to control stability by soft/hard <strong>de</strong>cision in<br />
a Feedback Equalizer<br />
When <strong>sources</strong> are not of interest (e.g. channel characteristics,<br />
localization only)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 206
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SISO Cumulant matching (1)<br />
Consi<strong>de</strong>r first the SISO case<br />
L<br />
x[n] = h[k] s[n − k] + v[k]<br />
k=0<br />
where v[k] is Gaussian stationary, and s[n] is 4th or<strong>de</strong>r white<br />
stationary.<br />
Then, by the multilinearity property of cumulants (cf. sli<strong>de</strong>):<br />
Cx(i, j) <strong>de</strong>f<br />
= Cum{x[t+i], x[t+j], x[t+L], x[t]} = h[i] h[j] h[L] h[0] cs<br />
with cs <strong>de</strong>f<br />
= Cum{s[n], s[n], s[n], s[n]}.<br />
By substitution of the unknown h[L] h[0] cs, one g<strong>et</strong>s a whole<br />
family of equations [SGS94] [Com92b]:<br />
h[i] h[j] Cx(k, ℓ) = h[k] h[ℓ] Cx(i, j), ∀i, j, k, ℓ (39)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 207
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SISO Cumulant matching (2)<br />
A solution to the subs<strong>et</strong> of (39) for which j = ℓ can be easily<br />
obtained:<br />
h[i] Cx(k, j) − h[k] Cx(i, j) = 0, 0 ≤ i < k ≤ L, 0 ≤ j ≤ L<br />
(40)<br />
This is a linear system of L(L + 1) 2 /2 equations in L + 1<br />
unknowns<br />
⇒ Least Square (LS) solution, up to a scale factor (e.g.<br />
h(0) = 1).<br />
Since 4th or<strong>de</strong>r only, asymptotically (for large samples)<br />
insensitive to Gaussian noise.<br />
Total Least Squares (TLS) solution possible as well<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 208
Int<strong>et</strong>erminacy<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO Cumulant matching (1)<br />
Scale (scalar) factor for SISO, but ΛP factor for MIMO<br />
Reduction to a monic mo<strong>de</strong>l [Com94b]<br />
If H[0] is invertible, one has<br />
y[n] = H[0] s[n], (41)<br />
L<br />
x[n] = B[k] y[n − k] + w[k] (42)<br />
k=0<br />
where B[k] <strong>de</strong>f<br />
= H[k]H[0] −1 .<br />
Because B[0] = I, MA mo<strong>de</strong>l (42) is said to be monic.<br />
In<strong>de</strong>terminacy is only in (41), which is solved by ICA if s[n] is<br />
spatially white<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 209
Kronecker notation<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO Cumulant matching (2)<br />
Store 4th or<strong>de</strong>r cumulant tensors in vector form:<br />
ca,b,c,d<br />
<strong>de</strong>f<br />
= vec{Cum{a, b, c, d}}<br />
Then, we have the property (where ⊡ <strong>de</strong>notes term-wise<br />
Hadamard product):<br />
ca,b,c,d = E{a ⊗ b ⊗ c ⊗ d} − E{a ⊗ b} ⊗ E{c ⊗ d}<br />
−E{a ⊗ E{b ⊗ c} ⊗ d}<br />
−E{a ⊗ 1β ⊗ c ⊗ 1δ} ⊡ E{1α ⊗ b ⊗ 1γ ⊗ d}<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 210
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO Cumulant matching (3)<br />
Assume monic MA mo<strong>de</strong>l (42) where s[n] white in time and L<br />
fixed, and <strong>de</strong>note<br />
cx(i, j) <strong>de</strong>f<br />
= vec{Cum{x[t + i], x[t + j], x[t + L], x[t]}}<br />
Then we can prove [Com92b]:<br />
Cx(i, j) = Cx(0, j) B[i] T , ∀j, 0 ≤ j ≤ L<br />
where Cx(i, j) <strong>de</strong>f<br />
= Unvec P (cx) is P 3 × P<br />
For every fixed i, B[i] is obtained by solving the system of<br />
(L + 1)P 4 equations in P 2 unknowns in LS sense:<br />
[IP ⊗ Cx(0, j)] vec{B[i]} = cx(i, j) (43)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 211
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO Cumulant matching (4)<br />
Summary of the algorithm<br />
Choose a maximum L<br />
Estimate cumulants of observation, cx(i, j) for i, j ∈ {0, . . . , L}<br />
Solve the (L + 1) systems (43) in B[i]<br />
Compute the residue y[t] (Linear Prediction)<br />
Solve the ICA problem y[t] = H[0] s[t]<br />
Weaknesses<br />
H[0] must be invertible<br />
FIR mo<strong>de</strong>l needs to have a stable inverse<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 212
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Algebraic Blind i<strong>de</strong>ntification (1)<br />
Types of discr<strong>et</strong>e source studied<br />
BPSK: b[k] ∈ {−1, 1}, i.i.d.<br />
MSK: m[k + 1] = j m[k] b[k]<br />
QPSK: p[k] ∈ {−1, −j, 1, j}, i.i.d.<br />
π<br />
4 -DQPSK: d[k + 1] = ejπ/4 d[k] p[k]<br />
8-PSK: q[k] ∈ {e jnπ/4 , n ∈ Z}, i.i.d.<br />
<strong>et</strong>c...<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 213
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Input/Output relations:<br />
Algebraic Blind i<strong>de</strong>ntification (2)<br />
For s[k] BPSK: E{x[n] x[n − ℓ]} = s[0] 2 L m=0 h[m] h[m + ℓ]<br />
For s[k] MSK:<br />
E{x[n] x[n − ℓ]} = s[0] 2 L m=0 (−1)m h[m] h[m + ℓ]<br />
For s[k] QPSK:<br />
E{x[n] 2 x[n − ℓ] 2 } = s[0] 2 L m=0 h[m]2 h[m + ℓ] 2<br />
<strong>et</strong>c..<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 214
Principle:<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Algebraic Blind i<strong>de</strong>ntification (3)<br />
Compute all roots of the polynomial system in h[n].<br />
For instance for MSK <strong>sources</strong> and a channel of length 2<br />
[GCMT02]:<br />
h[0] 2 − h[1] 2 + h[2] = β0<br />
h[0] h[1] − h[1] h[2] = β1<br />
h[0] h[2] = β2<br />
Choose among these roots the one that best matches the I/O<br />
correlation:<br />
E{x[n] x[n − ℓ] ∗ } =<br />
L<br />
h[m] h[m + ℓ] ∗<br />
m=0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 215
⎛<br />
⎜<br />
⎝<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO mixture (1)<br />
FIR of length L and dimension K:<br />
L<br />
x(n) = h(i)s(n − i) + b(n)<br />
with: E{b(m) b(n) H } = σ2 and<br />
b I δ(m − n)<br />
E{b(m) s(n) ∗ } = 0<br />
For T successive values:<br />
⎞<br />
x(n)<br />
x(n−1) ⎟<br />
. ⎠<br />
x(n−T)<br />
=<br />
⎛<br />
h(0) h(1) . . . h(L)<br />
⎜ 0 h(0) . . . . . .<br />
⎜<br />
⎝<br />
.<br />
. .. . .. . ..<br />
0<br />
h(L)<br />
. . .<br />
. . .<br />
. ..<br />
0<br />
0<br />
.<br />
0 0 . . . h(0) h(1) . . . h(L)<br />
Or in compact form:<br />
i=1<br />
⎞<br />
⎛<br />
⎟<br />
⎟⎜<br />
⎟⎝<br />
⎠<br />
s(n)<br />
.<br />
s(n−T −L)<br />
X(n : n − T ) = HT S(n : n − T − L) (44)<br />
Here, HT is of size (T + 1)K × (T + L + 1)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 216<br />
⎞<br />
⎟<br />
⎠
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO mixture (2)<br />
Condition of “column” matrix<br />
H has strictly more rows than columns iff<br />
(T + 1)K > T + L + 1<br />
⇔ T > L/(K − 1) − 1 ⇐ T ≥ L<br />
It suffices that T exceeds channel memory.<br />
Disparity condition<br />
Columns of H are linearly in<strong>de</strong>pen<strong>de</strong>nt iff<br />
h[z] = 0, ∀z<br />
Noise subspace<br />
Un<strong>de</strong>r these 2 conditions, there exists a “noise subspace”:<br />
∃v / v H HT = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 217
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO mixture (3)<br />
Properties of vectors in the null space<br />
= E{X XH } = HT HH T + σ2 b I,<br />
vectors v (p) of noise space can be computed from Rx:<br />
Since Rx <strong>de</strong>f<br />
Rx v (p) = σ 2 b<br />
v (p)<br />
And since convolution is commutative:<br />
v (p)H HT = h H V (p)<br />
where V (p) block Töplitz, built on v (p) .<br />
Thus h H = [h(0) H , h(1) H , . . . h(L) H ] are obtained by<br />
computing the left singular vector common to V (p) .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 218
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO mixture (4)<br />
Summary of the SIMO Subspace Algorithm<br />
Choose T ≥ L<br />
Compute Rx, correlation matrix of size (T + 1)K<br />
Compute the d = T (K − 1) + K − L − 1 vectors v (p) of the<br />
noise space<br />
Compute vector h minimizing the quadratic form<br />
h H<br />
⎡<br />
d<br />
⎣ V (p) V (p)H<br />
⎤<br />
⎦ h<br />
p=1<br />
Cut h into L + 1 slices h(i) of length K<br />
Un<strong>de</strong>r the assumed hypotheses, the solution is unique up to a scalar<br />
scale factor [MDCM95]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 219
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO mixture (5)<br />
Summary of the SIMO Subspace Algorithm when K = 2<br />
Choose T = L. There is a single vector v in the noise space<br />
Compute Rx, correlation matrix of size (T + 1)K<br />
Compute the vector v of the noise space<br />
Cut v into L + 1 slices v(i) of lengthK = 2<br />
<br />
0 −1<br />
Compute h(i) =<br />
v(i)<br />
1 0<br />
In fact xi = hi ⋆ s ⇒ h2 ⋆ x1 − h1 ⋆ x2 = 0<br />
Approach called SRM (Subchannel Response Matching)<br />
[XLTK95] [GN95]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 220
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Second or<strong>de</strong>r statistics<br />
SISO I<strong>de</strong>ntifiability<br />
αℓ = E{x[n] x[n − ℓ] ∗ } ➽ allow to estimate |h[m]|<br />
βℓ = E{x[n] x[n − ℓ]} ➽ allow to estimate h[m] if E{s 2 } = 0<br />
Fourth or<strong>de</strong>r statistics ➽ many (polynomial) additional<br />
equations<br />
γ0jkℓ = Cum{x[n], x[n − j], x[n − k], x[n − ℓ]}<br />
γ kℓ<br />
0j = Cum{x[n], x[n − j], x[n − k]∗ , x[n − ℓ] ∗ }<br />
If some <strong>sources</strong> are 2nd or<strong>de</strong>r circular, sample Statistics of or<strong>de</strong>r<br />
higher than 2 are mandatory, but otherwise not [GCMT02] !<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 221
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SIMO I<strong>de</strong>ntifiability<br />
With a receive diversity, (<strong>de</strong>terministic) i<strong>de</strong>ntifiability conditions are<br />
weaker [Hua96] [XLTK95]<br />
Definition A length-N input sequence s[n] has P mo<strong>de</strong>s iff<br />
the Hankel matrix below is full row rank:<br />
⎛<br />
s[1]<br />
⎜ s[2]<br />
⎜<br />
⎝ .<br />
s[2]<br />
s[3]<br />
.<br />
. . .<br />
. ..<br />
⎞<br />
s[N − p + 1]]<br />
⎟<br />
s[N − p + 2] ⎟<br />
. ⎠<br />
s[p] s[p + 1] . . . s[N]<br />
Theorem A K × L FIR channel h is i<strong>de</strong>ntifiable if:<br />
Channels hk[z] do not have common zeros<br />
The observation length of each xk[n] must be at least L + 1<br />
The input sequence should have at least L + 1 mo<strong>de</strong>s<br />
(sufficiently exciting)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 222
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Subspace algorithm for MIMO mixtures<br />
Similarly to the SIMO case, we have the compact form:<br />
X(n) = HT S(n) + B(n)<br />
where H is now built on matrices H(k), 1 ≤ k ≤ L, and is of<br />
size (T + 1)K × (T + L + 1)P.<br />
For large enough T , this matrix is "column shaped"<br />
Again Rx = HT H H<br />
T + σ2 b I<br />
But now, vectors of the noise space caracterize H[z] only up to<br />
a constant post-multiplicative matrix ⇒ ICA must be used<br />
afterwards<br />
Foundations of the MIMO subspace algorithm are more<br />
complicated [Loubaton’99]<br />
In the MIMO case, HOS are in general mandatory.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 223
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO Subspace param<strong>et</strong>erization C<br />
The basic i<strong>de</strong>a is to expand the filtering operations, in which matrix<br />
entries are z polynomials, in large matrices with scalar entries.<br />
Source and observation vectors<br />
Vector of <strong>sources</strong>:<br />
S(n) = (s(n), s(n − 1), . . . , s(n − (L + L ′ ) + 1)), where<br />
s(k) = (s1(k), s2(k), . . . , sP(k))<br />
Vector of observations:<br />
X(n) = (x(n), x(n − 1), . . . , x(n − L ′ + 1)), where<br />
x(k) = (x1(k), x2(k), . . . , xK (k))<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 224
Mixing matrix<br />
⎛<br />
⎜<br />
⎝<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Subspace param<strong>et</strong>erization<br />
L and L ′ correspond to channel length and window size,<br />
respectively; they must satisfy KL ′ ≥ P(L + L ′ )<br />
The above condition ensures a <strong>de</strong>termined mo<strong>de</strong>l, i.e. with an<br />
over-<strong>de</strong>termined matrix H, with size KL ′ × P(L + L ′ )<br />
With the above notations, the mo<strong>de</strong>l x(n) = A[z]s(n) can be<br />
written X(n) = HS(n), where H is a huge Sylvester<br />
block-Toeplitz matrix:<br />
x(n)<br />
x(n − 1)<br />
.<br />
x(n − L ′ + 1)<br />
⎞<br />
⎛<br />
⎟<br />
⎠ =<br />
⎜<br />
⎝<br />
A(0) A(1) . . . A(L) 0 . . . 0<br />
0 A(0) A(1) . . . A(L) 0 0<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
0 . . . 0 A(0) A(1) . . . A(L)<br />
⎞ ⎛<br />
⎟ ⎜<br />
⎟ ⎜<br />
⎟ ⎜<br />
⎠ ⎝<br />
s(n)<br />
s(n − 1)<br />
.<br />
s(n − (L + L ′ ) + 1)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 225<br />
⎞<br />
⎟<br />
⎠
Source separation<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
Subspace param<strong>et</strong>erization<br />
If L and L ′ satisfy the above condition and the<br />
channel condition, it exists a left inverse of H which provi<strong>de</strong>s<br />
source separation.<br />
More <strong>de</strong>tails on algorithms for estimating the left inverse can<br />
be found in [MJL00]<br />
Main drawbacks of the m<strong>et</strong>hod are: huge matrix; observation<br />
number (K) must be strictly larger than the source number<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 226
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
SISO ARMA mixtures P<br />
What are the tools when the channel is IIR?<br />
In general, just consi<strong>de</strong>r it as a FIR (truncation) → already<br />
seen<br />
But also possible to assume presence of a recursive part<br />
Define I/O relation: p i=0 ai x[n − i] = q j=0 bj w[n − i]<br />
where w[·] is i.i.d. and a0 = b0 = 1<br />
Second or<strong>de</strong>r cx(τ) <strong>de</strong>f<br />
= E{x[n] x[n + τ]} can be used to i<strong>de</strong>ntify<br />
ak:<br />
p<br />
ak cx(τ − k) = 0, ∀τ > q<br />
k=0<br />
Then compute the residue and i<strong>de</strong>ntify bℓ with HOS (cf. sli<strong>de</strong>)<br />
Also possible with HOS only for AR part [SGS94]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 227
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO ARMA mixtures (1)<br />
Results of SISO case can be exten<strong>de</strong>d [Com92b]<br />
Take a K-dimensional ARMA mo<strong>de</strong>l: Define I/O relation:<br />
p<br />
Ai x[n − i] =<br />
i=0<br />
q<br />
Bj w[n − i]<br />
where w[·] is i.i.d. and A0 = I and B0 inveritible<br />
For instance at or<strong>de</strong>r 4, AR i<strong>de</strong>ntification is based on:<br />
p<br />
j=1 Aj ¯cx(t, τ − j) = −¯cx(t, τ), ∀τ > q, ∀t<br />
with ¯cx(i, j) <strong>de</strong>f<br />
= Unvec K (Cum{x[n], x[n], x[n + i], x[n + j]})<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 228<br />
j=0
Limitations<br />
Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />
MIMO ARMA mixtures (2)<br />
Sources need to be linear processes<br />
B0 needs to be invertible<br />
AR residuals need to be computed (MA filtering) to compute<br />
Bi<br />
One can compute MA residuals (AR filtering) if input s[n] is<br />
requested → but might be unstable<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 229
Fourier domain ?<br />
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Representation in frequency domain C<br />
One difficulty in the convolutive mo<strong>de</strong>l: x(t) = A(t) ∗ s(t)<br />
related to ∗<br />
Using Plancherel theorem, ∗ becomes a simple product in the<br />
Fourier domain<br />
Morever:<br />
Fourier transform preserves linearity,<br />
information is the same in time and frequency domain.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 230
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Representation in frequency domain (con’t)<br />
BSS equations in Fourier domain<br />
Taking the Fourier transform, x(t) = A(t) ∗ s(t) becomes:<br />
X(ν) = A(ν)S(ν)<br />
∗ is cancelled; but, one just have one equation: not sufficient<br />
for estimating A(ν) or its inverse !<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 231
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Separation problem in the frequency domain<br />
BSS equations in Fourier domain with STFT<br />
Applying short term Fourier transform (STFT), on sliding<br />
windows:<br />
X(ν, t) = A(ν)S(ν, t)<br />
A(ν) is not modified, if channel impulse response is time<br />
invariant. Of course, it is if channel is changing, e.g. if <strong>sources</strong><br />
are moving.<br />
For each value ν0 of the variable ν, one has an instantaneous<br />
mixture, with complex-valued entries:<br />
X(ν0, t) = A(ν0)S(ν0, t)<br />
In the Fourier domain, the convolutive mixing (real valued)<br />
equation is then transformed in a (infinite) s<strong>et</strong> of<br />
instantaneous (complex-valued) mixing equations<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 232
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Separation problem in the discr<strong>et</strong>e frequency domain<br />
BSS equations in discr<strong>et</strong>e Fourier domain<br />
With discr<strong>et</strong>e data in time, similar equations are obtained by<br />
SF discr<strong>et</strong>e FT<br />
X(νk, t) = A(νk)S(νk, t)<br />
where νk corresponds to the k-th frequency bins in the DFT.<br />
Denoting Nν this number, the convolutive mixing (real valued)<br />
equation is then transformed in a s<strong>et</strong> of Nν instantaneous<br />
(complex-valued) mixing equations<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 233
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Solving the separation problem in frequency domain<br />
Solving in any frequency bin<br />
Based on observations X(νk, t),<br />
X(νk, t) = A(νk)S(νk, t)<br />
can be solved using any m<strong>et</strong>hod for instantaneous linear<br />
mixture.<br />
Source reconstruction<br />
For reconstructing the wi<strong>de</strong>band <strong>sources</strong>, we just have to add<br />
estimated <strong>sources</strong> in each frequency bins...<br />
Just ??? No, due to permutation (and scale) in<strong>de</strong>terminacies !<br />
B(νk) = P(νk)A(νk) −1<br />
Problem if P(νk) = P(νl), k = l<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 234
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
SOS in each frequency bin<br />
SOS in frequency domain<br />
If source are not iid, we can used SOS m<strong>et</strong>hods, exploiting<br />
either coloration or nonstationarity,<br />
In each frequency band νk, the DSP (Fourier transform of the<br />
covariance matrix):<br />
E[X(νk, t)X H (νk, t−τ)] = A(νk)E[S(νk, t)S H (νk, t−τ)]A H (νk), ∀τ<br />
With the notation Sx (νk, τ) = E[X(νk, t)X H (νk, t − τ)], one<br />
obtains similar equations than in the time domain (due to FT<br />
linearity):<br />
Sx (νk, τ) = A(νk)Ss (νk, τ)A H (νk), ∀τ<br />
Joint diagonalization can be used on s<strong>et</strong> of spectral matrices:<br />
Sx (νk, τ), τ = 0, 1, . . . , K for colored <strong>sources</strong>,<br />
Sx ,Wi (νk, 0) = EWi [X(νk, t)X H (νk, t)], computed on time<br />
windows Wi, exploiting source nonstationarity<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 235
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Solving the permutation in<strong>de</strong>terminacies<br />
Many m<strong>et</strong>hods and criteria, exploiting two mains i<strong>de</strong>as:<br />
Continuity of filters<br />
Smooth variation of B(ν):<br />
short impulse response [PS00]<br />
initialisation of B(νk) estimation by ˆB(νk−) [PSB03]<br />
Problem with impulse response with echoes, e.g. if<br />
reverberation<br />
Continuity of <strong>sources</strong><br />
Time continuity of DSP of the estimated <strong>sources</strong>, on<br />
neighboor frequency bins [SP03]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 236
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Results of permutation regularization<br />
Most of the regularization criteria perform well when source<br />
power or channel coefficients are large enough<br />
Basically, it remains<br />
global permutation (not important),<br />
and frequency-bloc permutation, since a regularization failure<br />
in a frequency bins implies usually failure on a block !<br />
Reprinted from PhD <strong>de</strong>fense talk of Bertrand Riv<strong>et</strong><br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 237
Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />
Applications<br />
Main applications domains of convolutive mixture are:<br />
Telecommunications<br />
Acoustics, audio with speech and music processing<br />
For speech application, refer to the lecture of L. Dau<strong>de</strong>t and<br />
the book [MLS07]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 238
I<strong>de</strong>ntifiability ICA for NL Discussion<br />
VII. Non Linear Mixtures<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 239
I<strong>de</strong>ntifiability ICA for NL Discussion<br />
25 I<strong>de</strong>ntifiability using ICA<br />
Problem<br />
General results<br />
26 ICA for NL mixtures<br />
General NL<br />
PNL mixtures<br />
Inversion of Wiener system<br />
27 Discussion<br />
Contents of course VII<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 240
Mo<strong>de</strong>l<br />
Question<br />
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Mo<strong>de</strong>l and question<br />
K noiseless nonlinear mixtures of P in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong><br />
x(t) = A(s(t))<br />
Assuming the mixing mapping A is invertible, is it possible to<br />
estimate an inverse mapping B using in<strong>de</strong>pen<strong>de</strong>nce ?<br />
In other words: output in<strong>de</strong>pen<strong>de</strong>nce ⇔ s source separation ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 241
Un<strong>de</strong>terminacy<br />
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Darmois’s results<br />
L<strong>et</strong> si and sj be 2 in<strong>de</strong>pen<strong>de</strong>nt random variables, fi(si) and<br />
fj(sj) are in<strong>de</strong>pen<strong>de</strong>nt too<br />
Source separation is achieved if yi = hi(sj), i.e. the global<br />
mapping G = B ◦ A is diagonal, up to a permutation.<br />
Such diagonal (up to a permutation) mappings G will be<br />
<strong>de</strong>fined as trivial mappings<br />
Nonlinear mixtures are non i<strong>de</strong>ntifiable using ICA<br />
It always exists non diagonal nonlinear mixing mappings which<br />
preserve in<strong>de</strong>pen<strong>de</strong>nce<br />
Darmois [Dar53] proposed a general m<strong>et</strong>hod for constructing<br />
such mappings. The i<strong>de</strong>a has then been used by Hyvärinen and<br />
Pajunen [HP99].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 242
Assumptions<br />
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Darmois’s construction<br />
Consi<strong>de</strong>r the mapping y = G(x) where G is invertible<br />
Then : py (y) = px (x)| <strong>de</strong>t JG(x)| −1<br />
Without loss of generality, one can assume y are uniform in<br />
[0, 1]<br />
Mapping G preserving in<strong>de</strong>pen<strong>de</strong>nce<br />
We are looking for G such that the random vector y is<br />
in<strong>de</strong>pen<strong>de</strong>nt, i.e.<br />
| <strong>de</strong>t JG(x)| −1 = px (x)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 243
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
A possible mapping (1/2)<br />
Chosing the non diagonal mapping G<br />
g1(x) = g1(x1)<br />
g2(x) = g2(x1, x2)<br />
.<br />
gK (x) = gK (x1, . . . , xK )<br />
leads to a diagonal Jacobian<br />
<br />
∂g1<br />
0 . . . 0<br />
∂x1 ∂g2 ∂g2 <br />
0 . . .<br />
∂x1 ∂x2 <br />
<br />
. . . . .<br />
<br />
<br />
. . .<br />
∂gK<br />
∂x1<br />
∂gK<br />
∂x2<br />
∂gK<br />
∂xK<br />
<br />
<br />
<br />
<br />
<br />
<br />
=<br />
<br />
<br />
<br />
∂gi<br />
∂xi i<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 244
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
The last condition implies<br />
A possible mapping (2/2)<br />
∂gi<br />
i ∂xi = px (x) = p(x1)p(x2/x1) . . . p(xK /x1 . . . xK−1)<br />
A possible solution is:<br />
g1(x) = Fx1 (x1)<br />
g2(x) = F x2/x1 (x1, x2)<br />
.<br />
gK (x) = F xK /x1...xK−1 (x1, . . . , xK )<br />
where Fx is the cumulative probability <strong>de</strong>nsity function of x<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 245
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
A simple example [JBZH04]<br />
Consi<strong>de</strong>r 2 in<strong>de</strong>pen<strong>de</strong>nt Gaussian variables x1 and x2 with joint pdf<br />
px (x1, x2) = 1<br />
2π exp − x2 1 +x2 2<br />
2<br />
Consi<strong>de</strong>r the following mapping and its Jacobian<br />
x1 = r cos θ<br />
x2 = r sin θ<br />
J =<br />
<br />
cos θ −r sin θ<br />
sin θ r cos θ<br />
The joint pdf of r and θ is<br />
<br />
prθ(r, θ) =<br />
r<br />
2π exp(−r 2 ) if (r, θ) ∈ R + 0<br />
× [0, 2π]<br />
otherwise<br />
Althought r and θ are <strong>de</strong>pending both of x1 and x2, they are<br />
statistically in<strong>de</strong>pen<strong>de</strong>nt!<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 246
General results<br />
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Conclusions<br />
Statistical in<strong>de</strong>pen<strong>de</strong>nce is not sufficient for insuring<br />
i<strong>de</strong>ntifiability of NL mixtures<br />
For any <strong>sources</strong>, it exists invertible mappings with non<br />
diagonal Jacobian (i.e. mixing mappings) which preserve<br />
statistical in<strong>de</strong>pen<strong>de</strong>nce<br />
if the mapping can be i<strong>de</strong>ntified, source can be recovered up<br />
to a NL mapping (and permutation)<br />
For overcoming the problem,<br />
one can consi<strong>de</strong>r approaches which reduce the s<strong>et</strong> of nontrivial<br />
mappings which preserve in<strong>de</strong>pen<strong>de</strong>nce<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 247
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Regularization for avoiding nontrivial mappings preserving<br />
in<strong>de</strong>pen<strong>de</strong>nce<br />
Smooth mappings<br />
Restrict B to be a smooth mapping [MA99, TWZ01]<br />
Structurally constrained mappings<br />
Restrict B by structural constraints: post-nonlinear mixtures<br />
[TJ99b], mixtures satisfying addition theorem [KLR73, EK02],<br />
bilinear and linear-quadratic mixtures [HD03, DH09]<br />
Priors on <strong>sources</strong><br />
Consi<strong>de</strong>r priors on <strong>sources</strong> : boun<strong>de</strong>d <strong>sources</strong> [BZJN02],<br />
Markov [LJH04] or colored <strong>sources</strong> [HJ03]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 248
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Almeida’s claim<br />
Smooth mappings<br />
For smooth mappings (e.g. mappings provi<strong>de</strong>d by multi-layer<br />
perceptrons (MLP)), ICA can lead to separation<br />
Counterexample [JBZH04]<br />
Smoothness of the mappings is not a sufficient prior<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 249
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Structural constraints: general results<br />
Trivial mappings: <strong>de</strong>finition<br />
Definition: H is a trivial mapping if it transforms any random<br />
vector with in<strong>de</strong>pen<strong>de</strong>nt components in another random vector<br />
with in<strong>de</strong>pen<strong>de</strong>nt components.<br />
The s<strong>et</strong> of the trivial mappings will be <strong>de</strong>noted Z<br />
Trivial mappings: properties<br />
A trivial mapping is then a mapping preserving in<strong>de</strong>pen<strong>de</strong>nce<br />
for any random vector<br />
It can be shown that a trivial mapping satisfies<br />
Hi(s1, . . . , snbs) = hi(s σ(i)), ∀i = 1, . . . , K<br />
The Jacobian matrix of a trivial mapping is a diagonal matrix<br />
up to a permutation<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 250
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Structural constraints<br />
There is an infinity of nontrivial mappings preserving in<strong>de</strong>pen<strong>de</strong>nce<br />
Constrained mo<strong>de</strong>l of mixtures<br />
If the mapping G = B ◦ A is constrained to belong to a mo<strong>de</strong>l<br />
C, the un<strong>de</strong>terminacies can be reduced, and hopefully cancelled<br />
Consi<strong>de</strong>r Ω = {Fs1 , . . . , FsP }, the s<strong>et</strong> of signal distributions<br />
such that ∃G ∈ C − Z which preserves in<strong>de</strong>pen<strong>de</strong>nce for any<br />
ω ∈ Ω<br />
Ω then contains all the (particular) source distributions which<br />
cannot be separated by mapping belonging to C.<br />
Separation is then possible<br />
(1) for source distributions which do not belong to Ω, (2) with<br />
P.Comon &in<strong>de</strong>terminacies C.Jutten, Peyresq July 2011 in G ∈ ZBSS ∩ C VII.Non Linear Mixtures 251
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Structural constraints<br />
Case of linear memoryless mixtures<br />
C is the s<strong>et</strong> of square matrices<br />
Z ∩ C is the s<strong>et</strong> of square matrices which are the product of a<br />
diagonal matrix and a permutation matrix<br />
Ω is the s<strong>et</strong> of distributions which contain at least 2 Gaussian<br />
<strong>sources</strong> (consequence of the Darmois-Skitovich theorem)<br />
Conclusions<br />
For linear memoryless mixtures, source separation is possible<br />
using statistical in<strong>de</strong>pen<strong>de</strong>nce (1) for <strong>sources</strong> which are not in<br />
Ω (i.e. at most one Gaussian) and (2) with scale and<br />
permutation un<strong>de</strong>terminacies.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 252
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Structural constraints: PNL mixtures<br />
Post-nonlinear (PNL) mixtures<br />
PNL are perticular nonlinear mixtures, which structural<br />
constraints : linear part, following by NL componentwise<br />
mappings.<br />
PNL are realistic enough : linear channel, nonlinear sensors<br />
PNL i<strong>de</strong>ntifiability [TJ99a, AJ05] with suited B<br />
if (1) at most one source is Gaussian, (2) the mixing matrix<br />
has at least two nonzero entries per row and per column, and<br />
(3) the NL mappings fi are invertible and satisfy f ′<br />
i (0) = 0,<br />
then y is in<strong>de</strong>pen<strong>de</strong>nt iff gi ◦ fi is linear and BA = DP<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 253
Comments<br />
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Structural constraints: PNL mixtures<br />
PNL has the same in<strong>de</strong>terminacies that linear mixtures !<br />
It does not work if A does not mix the <strong>sources</strong> !<br />
PNL is a special case of mixtures satisfying addition theorem<br />
(see below)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 254
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Extension of the D-S theorem<br />
Constrained NL mappings<br />
D-S theorem has been exten<strong>de</strong>d to NL mappings A,<br />
continuous at least separately on each variable, satisfying an<br />
addition theorem [KLR73]<br />
f (x + y) = A[f (x), f (y)] = f (x) ◦ f (y)<br />
Example:<br />
f (.) = tan(.)<br />
A(u, v) = (u + v)/(1 + uv)<br />
tan(x+y) =<br />
tan(x) + tan(y)<br />
1 + tan(x) tan(y)<br />
If (E, ◦) is an Abelian group, one can <strong>de</strong>fine a mutiplication,<br />
<strong>de</strong>noted ∗, and satisfying<br />
f (cx) = c ∗ f (x) or cf −1 (u) = f −1 (c ∗ u)<br />
c1f −1 (u1) + . . . + ckf −1 (uk) = f −1 (c1 ∗ u1 ◦ . . . ◦ ck ∗ uk)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 255
Theorem<br />
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Constrained NL mappings<br />
L<strong>et</strong> X1, . . . , Xn be in<strong>de</strong>pen<strong>de</strong>nt variables such that<br />
E1 = a1 ∗ X1 ◦ . . . ◦ an ∗ Xn<br />
E2 = b1 ∗ X1 ◦ . . . ◦ bn ∗ Xn<br />
are in<strong>de</strong>pen<strong>de</strong>nt and where ∗ and ◦ satisfy the above<br />
conditions, then f −1 (Xi) is normally distributed if aibi = 0.<br />
The proof is based on applying the D-S theorem to f −1 (E1)<br />
and f −1 (E2):<br />
f −1 (E1) = f −1 (a1 ∗ X1 ◦ . . . ◦ an ∗ Xn) = a1f −1 (X1) + . . . + anf −1 (Xn)<br />
f −1 (E2) = f −1 (b1 ∗ X1 ◦ . . . ◦ bn ∗ Xn) = b1f −1 (X1) + . . . + bnf −1 (Xn)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 256
Comments<br />
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Constrained NL mappings<br />
These NL mappings are basically linearisable mappings<br />
PNL are particular NL mappings satisfying the addition<br />
theorem<br />
A(u1, u2) =<br />
<br />
f1(a11f −1<br />
1 (u1) + a12f −1<br />
2 (u2))<br />
f2(a21f −1<br />
1 (u1) + a22f −1<br />
2 (u2))<br />
In fact, <strong>de</strong>noting s1 = f −1<br />
1 (u1) and s2 = f −1<br />
2 (u2), one g<strong>et</strong>s<br />
PNL mixtures:<br />
<br />
f1(a11s1<br />
A(f (s1), f (s2)) =<br />
f2(a21s1<br />
+<br />
+<br />
a12s2)<br />
a22s2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 257
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Constraints on <strong>sources</strong>: boun<strong>de</strong>d <strong>sources</strong><br />
PNL with boun<strong>de</strong>d <strong>sources</strong> [BZJN02]<br />
Theorem: A component-wise mapping preserves boundary<br />
linearity iff it is a linear mapping<br />
New proof for i<strong>de</strong>ntifiability and separability of PNL mixtures,<br />
restricted to boun<strong>de</strong>d <strong>sources</strong><br />
Algorithm with in<strong>de</strong>pen<strong>de</strong>nt criteria and estimations for linear<br />
and nonlinear parts of PNL separating structure<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 258
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Constraints on <strong>sources</strong>: temporally correlated <strong>sources</strong><br />
Temporally correlated <strong>sources</strong><br />
Mo<strong>de</strong>lled by AR or Markov mo<strong>de</strong>ls [HJ03]<br />
The s<strong>et</strong> of NL mapping preserving in<strong>de</strong>pen<strong>de</strong>nce is reduced if<br />
<strong>sources</strong> are non iid, due to stronger in<strong>de</strong>pen<strong>de</strong>nce criterion<br />
(in<strong>de</strong>pen<strong>de</strong>nce of random processes)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 259
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Constrained on <strong>sources</strong>: temporally correlated <strong>sources</strong><br />
Darmois’s construction<br />
For 2 mixtures of 2 <strong>sources</strong><br />
y1(t) = g1(x1(t), x2(t)) = FX1 (x1(t))<br />
y2(t) = g2(x1(t), x2(t)) = F X2/X1 (x1(t), x2(t))<br />
If y1(t) and y2(t) are in<strong>de</strong>pen<strong>de</strong>nt,<br />
pY1Y2 (y1(t), y2(t)) = pY1 (y1(t))pY2 (y2(t))<br />
Since <strong>sources</strong> are temporally correlated, one can prove that<br />
generally:<br />
pY1Y2 (y1(t + 1), y2(t)) = pY1 (y1(t + 1))pY2 (y2(t))<br />
i.e. p(FX1 (x1(t + 1), F X2/X1 (x1(t)x2(t)) =<br />
p(FX1 (x1(t + 1))p(F X2/X1 (x1(t), x2(t))<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 260
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Constrained on <strong>sources</strong>: temporally correlated <strong>sources</strong><br />
After some computations, one can show the relation becomes:<br />
pA/BC (a, b, c)pB(b)db = pA(a)<br />
In fact, left-si<strong>de</strong> is a function of variables a and c, while right<br />
si<strong>de</strong> only <strong>de</strong>pends on b<br />
Consequently, the above mapping is generally not a mixing<br />
mappings preserving in<strong>de</strong>pen<strong>de</strong>nce for colored <strong>sources</strong>.<br />
The s<strong>et</strong> of mixing mappings preserving in<strong>de</strong>pen<strong>de</strong>nce is then<br />
reduced for colored <strong>sources</strong>.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 261
I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />
Conclusions<br />
For NL mixtures, generally in<strong>de</strong>pen<strong>de</strong>nce does not insure<br />
i<strong>de</strong>ntifiability and separability<br />
Regularization is required for reducing solutions to trivial<br />
mappings:<br />
Generally, smoothness is not a sufficient constraint<br />
Structural constraints leads to particular NL i<strong>de</strong>ntifiable and<br />
separable mixtures<br />
Priors on the <strong>sources</strong> can reduce the s<strong>et</strong> of solutions<br />
Extention to NL ICA<br />
Are NL mixtures useful ? Theor<strong>et</strong>ical curiosity or practical<br />
applications ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 262
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Mutual information in NL mixtures<br />
Mutual information and its <strong>de</strong>rivative<br />
I (Y) = <br />
i H(Yi)−H(Y) = <br />
i H(Yi)−H(X)+E[log |<strong>de</strong>t JB|]<br />
Estimating equation:<br />
param<strong>et</strong>er vector<br />
∂I (Y)<br />
∂Θ<br />
It then leads to the following equations:<br />
∂I (Y)<br />
∂Θ<br />
= 0, where Θ represents the<br />
∂H(Yi ) ∂E[log|<strong>de</strong>t JB|]<br />
= i ∂Θ + ∂Θ<br />
= <br />
i E[d log pY (yi )<br />
i ∂yi ∂E[log|<strong>de</strong>t JB|]<br />
dyi ∂Θ ] + ∂Θ<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 263<br />
= 0
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Mutual information<br />
Mutual information for PNL mixtures<br />
I (Y) = <br />
H(Yi) − H(Y)<br />
i<br />
= <br />
H(Yi) − H(X) + E[log |<strong>de</strong>t JG|] + E[log |<strong>de</strong>t JB|]<br />
i<br />
= <br />
<br />
<br />
<br />
<br />
∂gi(xi, θi)<br />
<br />
<br />
H(Yi) − H(X) + E[log <br />
]<br />
+ log |<strong>de</strong>t B|]<br />
∂xi <br />
i<br />
i<br />
(45)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 264
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Mutual information for NL mixtures<br />
Derivatives of the mutual information<br />
With respect to B<br />
∂I (Y)<br />
∂B<br />
∂H(Yi ) ∂<br />
= i ∂B −<br />
= <br />
i Ed log pY (yi )<br />
i ∂yi<br />
dyi ∂B<br />
∂B E <br />
<br />
log <br />
i ∂xi<br />
<br />
− BT = 0<br />
With respect to param<strong>et</strong>er of gi’s<br />
∂θ E<br />
k <br />
<br />
log <br />
∂gi (xi ,θi )<br />
<br />
<br />
−<br />
∂I (Y)<br />
∂θ =<br />
k ∂H(Yi )<br />
i ∂θ −<br />
k<br />
∂<br />
∂gi (xi ,θi )<br />
i ∂xi<br />
= <br />
i Ed log pY (yi ) <br />
i ∂yi<br />
∂ log|g<br />
dyi ∂θ − E<br />
k<br />
′<br />
i (xi ,θi )|<br />
∂θk ∂ log|<strong>de</strong>t B|<br />
∂B<br />
<br />
<br />
−<br />
<br />
= 0<br />
∂ log|<strong>de</strong>t B|<br />
∂θk P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 265<br />
= 0<br />
= 0
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Separation of PNL mixtures<br />
The algorithm consists in 3 steps:<br />
estimation of marginal score functions of estimated <strong>sources</strong>,<br />
estimation of the nonlinear functions gi,<br />
estimation of the separationg matrix B,<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 266
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Sources and mixtures<br />
Separation of PNL mixtures<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 267
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Estimated <strong>sources</strong><br />
Separation of PNL mixtures<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 268
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Wiener systems<br />
PNL mixtures and NL <strong>de</strong>convolution<br />
Hammerstein systems<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 269
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Blind inversion of Wiener system<br />
Wiener system is a usual NL mo<strong>de</strong>l in biology, in satellite<br />
communications, <strong>et</strong>c.<br />
Classical approaches<br />
Classical i<strong>de</strong>ntification m<strong>et</strong>hods for nonlinear systems are<br />
based on higher-or<strong>de</strong>r cross-correlations<br />
Usually, input signal is assumed to be iid Gaussian<br />
If the distortion input is available, the compensation of the<br />
nonlinearities is almost straighforward, after i<strong>de</strong>ntification of<br />
the NL<br />
However, in a real world situation, we donŠt know either the<br />
nonlinear system input or the input distribution<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 270
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Param<strong>et</strong>erization<br />
Blind inversion of Wiener system<br />
source: S(t) = [. . . s(t − k + 1)s(t − k)s(t − k − 1) . . .] T<br />
observation: X(t) = [. . . x(t − k + 1)x(t − k)x(t − k − 1) . . .] T<br />
channel:<br />
H =<br />
⎛<br />
⎜<br />
⎝<br />
. . . . . . . . . . . . . . .<br />
. . . h(k − 1) h(k) h(k + 1) . . .<br />
. . . h(k − 2) h(k − 1) h(k) . . .<br />
. . . . . . . . . . . . . . .<br />
Due to iid assumption, S has in<strong>de</strong>pen<strong>de</strong>nt components, Wiener<br />
systems is nothing but an infinite-dimension PNL mixture,<br />
X(t) = f [HS(t)], with a particular (Toeplitz) mixing matrix H.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 271<br />
⎞<br />
⎟<br />
⎠
I<strong>de</strong>ntifiability<br />
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Blind inversion of Wiener system<br />
If the Wiener system satisfies:<br />
Subsystems h and f are unknown and invertible ; h can be a<br />
nonminimum phase filter<br />
The input s(t) is an unknown (a priori) non Gaussian iid<br />
process<br />
it is a PNL mixtures, with a particular Toeplitz mixture matrix<br />
H and with the same NL function f on each channel.<br />
PNL i<strong>de</strong>ntifiability implies Wiener systems inversibility based<br />
on (temporal) sample statistical in<strong>de</strong>pen<strong>de</strong>nce.<br />
PNL separability is only proved for finite dimensions. It is<br />
conjectured for infinite dimensions.<br />
Practically, the filter h(k) and its inverse w(k) are truncated.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 272
I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />
Criterion for inversion<br />
Blind inversion of Wiener system<br />
Output of the inversion (Hammerstein) structure:<br />
y(t) = w ∗ z(t) with z(t) = g(x(t))<br />
Y(t) spatially in<strong>de</strong>pen<strong>de</strong>nt ⇔ sequence {y(t)} iid<br />
The mutual information for infinite dimensional stationary<br />
random vectors is <strong>de</strong>fined from entropy rates [CT91]:<br />
H(Y) = lim<br />
T →∞<br />
I (Y) = lim<br />
T →∞<br />
1<br />
H(y(−T ), . . . , y(T ))<br />
2T + 1<br />
1<br />
+T<br />
<br />
H(y(t)) − H(y(−T ), . . . , y(T ))<br />
2T + 1<br />
t=−T<br />
I (Y) is always positive and vanishes if and only if {y(t){ is an iid<br />
sequence<br />
For more <strong>de</strong>tails, see [TSJ01]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 273
I<strong>de</strong>ntifiability ICA for NL Discussion<br />
Algorithms for PNL<br />
Other approaches for NL mixtures<br />
Relatively many algorithms have been <strong>de</strong>signed for PNL<br />
mixtures. See a few references in [CJ10, ch.14].<br />
M<strong>et</strong>hods for smooth mappings<br />
MISEP proposed by Marques and Almeida [MA99]: the<br />
nonlinearity is mo<strong>de</strong>lled by multi-layer perceptrons (MLP)<br />
trained for insuring output in<strong>de</strong>pen<strong>de</strong>nce.<br />
Tan <strong>et</strong> al. [TWZ01] used radial basis functions.<br />
Bayesian approaches<br />
Nonlinear Factor Analysis (NFA) proposed by Helsinki team<br />
(Valpola, Karhunen, Honkela, <strong>et</strong>c.). See <strong>de</strong>tails and references<br />
in [CJ10, ch.14]<br />
Bayesian approach also <strong>de</strong>veloped by Duarte <strong>et</strong> al. [DJM10]<br />
for chemical NL mixtures, with positivity and other constraints<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 274
Conclusions<br />
I<strong>de</strong>ntifiability ICA for NL Discussion<br />
NL mixtures: conclusions and perspectives<br />
Generally, in<strong>de</strong>pen<strong>de</strong>nce is not sufficient for insuring<br />
i<strong>de</strong>ntifiability and separability<br />
In<strong>de</strong>pen<strong>de</strong>nce can insure i<strong>de</strong>ntifibility and separability in<br />
structured NL mixtures. PNL mixtures are particular separable<br />
NL mixtures.<br />
Deconvolution and Wiener system inversion are related to<br />
BSS: time in<strong>de</strong>pen<strong>de</strong>nce (iid) is then related to spatial<br />
in<strong>de</strong>pen<strong>de</strong>nce.<br />
Mutual information or Information rate can then be used for<br />
measuring spatial in<strong>de</strong>pen<strong>de</strong>nce or time in<strong>de</strong>pen<strong>de</strong>nce (iid)<br />
In<strong>de</strong>pen<strong>de</strong>nce is powerful enough for blindly compensate<br />
strong NL distortions (1) in PNL multichannel systems:<br />
satellite antenna, sensors array, <strong>et</strong>c., or (2) in Wiener systems<br />
(NL dynamic SISO)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 275
I<strong>de</strong>ntifiability ICA for NL Discussion<br />
NL mixtures: conclusions and perspectives<br />
Other contributions<br />
Extension to MIMO dynamic NL systems: Convolutive PNL<br />
mixtures [BZJN01]<br />
Bilinear and multi-linear mo<strong>de</strong>ls [HD03]<br />
Separation of NL mixtures with positivity constraints [DJM10]<br />
Practical applications for chemical sensor arrays<br />
[DJM10, DJTB + 10] or scanned mage processing<br />
[Alm05, MBBZJ08] (see in 339 of this course)<br />
Perspectives and open issues<br />
Noisy NL mixtures (problem of noise amplification)<br />
I<strong>de</strong>ntifiability for bilinear and multi-linear mo<strong>de</strong>ls<br />
I<strong>de</strong>ntifiability based on other priors, criteria and mixing<br />
structures<br />
NL dynamic problem, e.g. encoutered in gas sensor arrays<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 276
Problem Algorithms Biblio SCA<br />
VIII. Un<strong>de</strong>rD<strong>et</strong>ermined mixtures<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 277
Problem Algorithms Biblio SCA<br />
Contents of course VIII<br />
28 Problem formulation<br />
nth Characteristic Functions<br />
I<strong>de</strong>ntifiability<br />
Cumulants<br />
29 Algorithms<br />
Symm<strong>et</strong>ric tensors of dim. 2<br />
BIOME<br />
Foobi<br />
Algorithms based on CF<br />
D<strong>et</strong>erministic approaches<br />
30 Bibliography<br />
31 Sparse Component Analysis<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 278
Again<br />
First c.f.<br />
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
Again<br />
First c.f.<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{ej tx } = <br />
u ej tu dFx(u)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
Again<br />
First c.f.<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{ej tx } = <br />
u ej tu dFx(u)<br />
Real Multivariate: Φx (t) <strong>de</strong>f<br />
= E{ej t Tx <br />
} = u ej t Tu dFx (u)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
Again<br />
First c.f.<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{ej tx } = <br />
u ej tu dFx(u)<br />
Real Multivariate: Φx (t) <strong>de</strong>f<br />
= E{ej t Tx <br />
} = u ej t Tu dFx (u)<br />
Second c.f.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
Again<br />
First c.f.<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{ej tx } = <br />
u ej tu dFx(u)<br />
Real Multivariate: Φx (t) <strong>de</strong>f<br />
= E{ej t Tx <br />
} = u ej t Tu dFx (u)<br />
Second c.f.<br />
Ψ(t) <strong>de</strong>f<br />
= log Φ(t)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
Again<br />
First c.f.<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{ej tx } = <br />
u ej tu dFx(u)<br />
Real Multivariate: Φx (t) <strong>de</strong>f<br />
= E{ej t Tx <br />
} = u ej t Tu dFx (u)<br />
Second c.f.<br />
Ψ(t) <strong>de</strong>f<br />
= log Φ(t)<br />
Properties:<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
Again<br />
First c.f.<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{ej tx } = <br />
u ej tu dFx(u)<br />
Real Multivariate: Φx (t) <strong>de</strong>f<br />
= E{ej t Tx <br />
} = u ej t Tu dFx (u)<br />
Second c.f.<br />
Ψ(t) <strong>de</strong>f<br />
= log Φ(t)<br />
Properties:<br />
Always exists in the neighborhood of 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions<br />
Again<br />
First c.f.<br />
Real Scalar: Φx(t) <strong>de</strong>f<br />
= E{ej tx } = <br />
u ej tu dFx(u)<br />
Real Multivariate: Φx (t) <strong>de</strong>f<br />
= E{ej t Tx <br />
} = u ej t Tu dFx (u)<br />
Second c.f.<br />
Ψ(t) <strong>de</strong>f<br />
= log Φ(t)<br />
Properties:<br />
Always exists in the neighborhood of 0<br />
Uniquely <strong>de</strong>fined as long as Φ(t) = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions (cont’d)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions (cont’d)<br />
Properties of the 2nd Characteristic function (cont’d):<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions (cont’d)<br />
Properties of the 2nd Characteristic function (cont’d):<br />
if a c.f. Ψx(t) is a polynomial, then its <strong>de</strong>gree is at most 2 and<br />
x is Gaussian (Marcinkiewicz’1938) [Luka70]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions (cont’d)<br />
Properties of the 2nd Characteristic function (cont’d):<br />
if a c.f. Ψx(t) is a polynomial, then its <strong>de</strong>gree is at most 2 and<br />
x is Gaussian (Marcinkiewicz’1938) [Luka70]<br />
if (x, y) statistically in<strong>de</strong>pen<strong>de</strong>nt, then<br />
Ψx,y (u, v) = Ψx(u) + Ψy (v) (46)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280
Proof.<br />
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Characteristic functions (cont’d)<br />
Properties of the 2nd Characteristic function (cont’d):<br />
if a c.f. Ψx(t) is a polynomial, then its <strong>de</strong>gree is at most 2 and<br />
x is Gaussian (Marcinkiewicz’1938) [Luka70]<br />
if (x, y) statistically in<strong>de</strong>pen<strong>de</strong>nt, then<br />
Ψx,y (u, v) = Ψx(u) + Ψy (v) (46)<br />
Ψx,y (u, v) = log[E{exp(ux + vy)}]<br />
= log[E{exp(ux)} E{exp(vy)}].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Problem posed in terms of Characteristic Functions<br />
Proof.<br />
If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />
Ψx(u) = <br />
<br />
<br />
(47)<br />
p<br />
Ψsp<br />
q<br />
uq Aqp<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Problem posed in terms of Characteristic Functions<br />
Proof.<br />
If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />
Ψx(u) = <br />
<br />
<br />
(47)<br />
p<br />
Ψsp<br />
Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />
q<br />
uq Aqp<br />
Φx(u) <strong>de</strong>f<br />
= E{exp(u T A s)} = E{exp( <br />
p,q<br />
uq Aqp sp)}<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Problem posed in terms of Characteristic Functions<br />
Proof.<br />
If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />
Ψx(u) = <br />
<br />
<br />
(47)<br />
p<br />
Ψsp<br />
Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />
q<br />
uq Aqp<br />
Φx(u) <strong>de</strong>f<br />
= E{exp(u T A s)} = E{exp( <br />
p,q<br />
uq Aqp sp)}<br />
Since sp in<strong>de</strong>pen<strong>de</strong>nt, Φx(u) = <br />
p E{exp(<br />
q uq Aqp sp)}<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Problem posed in terms of Characteristic Functions<br />
Proof.<br />
If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />
Ψx(u) = <br />
<br />
<br />
(47)<br />
p<br />
Ψsp<br />
Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />
q<br />
uq Aqp<br />
Φx(u) <strong>de</strong>f<br />
= E{exp(u T A s)} = E{exp( <br />
p,q<br />
uq Aqp sp)}<br />
Since sp in<strong>de</strong>pen<strong>de</strong>nt, Φx(u) = <br />
p E{exp(<br />
q uq Aqp sp)}<br />
Taking the log conclu<strong>de</strong>s.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Problem posed in terms of Characteristic Functions<br />
Proof.<br />
If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />
Ψx(u) = <br />
<br />
<br />
(47)<br />
p<br />
Ψsp<br />
Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />
q<br />
uq Aqp<br />
Φx(u) <strong>de</strong>f<br />
= E{exp(u T A s)} = E{exp( <br />
p,q<br />
uq Aqp sp)}<br />
Since sp in<strong>de</strong>pen<strong>de</strong>nt, Φx(u) = <br />
p E{exp(<br />
q uq Aqp sp)}<br />
Taking the log conclu<strong>de</strong>s.<br />
Problem: Decompose a mutlivariate function into a sum of<br />
univariate ones<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Darmois-Skitovich theorem (1953)<br />
Theorem<br />
L<strong>et</strong> si be statistically in<strong>de</strong>pen<strong>de</strong>nt random variables, and two linear<br />
statistics:<br />
y1 = <br />
ai si and y2 = <br />
i<br />
i<br />
bi si<br />
If y1 and y2 are statistically in<strong>de</strong>pen<strong>de</strong>nt, then random variables sk<br />
for which ak bk = 0 are Gaussian.<br />
NB: holds in both R or C<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 282
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
L<strong>et</strong> characteristic functions<br />
Sk<strong>et</strong>ch of proof<br />
Ψ1,2(u, v) = log E{exp(j y1 u + j y2 v)}<br />
Ψk(w) = log E{exp(j yk w)}<br />
ϕp(w) = log E{exp(j sp w)}<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 283
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
L<strong>et</strong> characteristic functions<br />
Sk<strong>et</strong>ch of proof<br />
Ψ1,2(u, v) = log E{exp(j y1 u + j y2 v)}<br />
Ψk(w) = log E{exp(j yk w)}<br />
ϕp(w) = log E{exp(j sp w)}<br />
1 In<strong>de</strong>pen<strong>de</strong>nce b<strong>et</strong>ween sp’s implies:<br />
Ψ1,2(u, v) = P<br />
k=1 ϕk(u ak + v bk)<br />
Ψ1(u) = P<br />
k=1 ϕk(u ak)<br />
Ψ2(v) = P<br />
k=1 ϕk(v bk)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 283
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
L<strong>et</strong> characteristic functions<br />
Sk<strong>et</strong>ch of proof<br />
Ψ1,2(u, v) = log E{exp(j y1 u + j y2 v)}<br />
Ψk(w) = log E{exp(j yk w)}<br />
ϕp(w) = log E{exp(j sp w)}<br />
1 In<strong>de</strong>pen<strong>de</strong>nce b<strong>et</strong>ween sp’s implies:<br />
Ψ1,2(u, v) = P<br />
k=1 ϕk(u ak + v bk)<br />
Ψ1(u) = P<br />
k=1 ϕk(u ak)<br />
Ψ2(v) = P<br />
k=1 ϕk(v bk)<br />
2 In<strong>de</strong>pen<strong>de</strong>nce b<strong>et</strong>ween y1 and y2 implies<br />
Ψ1,2(u, v) = Ψ1(u) + Ψ2(v)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 283
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Does not restrict generality to assume that [ak, bk] not<br />
collinear. To simplify, assume also ϕp differentiable.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Does not restrict generality to assume that [ak, bk] not<br />
collinear. To simplify, assume also ϕp differentiable.<br />
3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />
Trivial for terms for which akbk = 0.<br />
From now on, restrict the sum to terms akbk = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Does not restrict generality to assume that [ak, bk] not<br />
collinear. To simplify, assume also ϕp differentiable.<br />
3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />
Trivial for terms for which akbk = 0.<br />
From now on, restrict the sum to terms akbk = 0<br />
4 Write this at u + α/aP and v − α/bP:<br />
P<br />
k=1<br />
ϕk<br />
<br />
u ak + v bk + α( ak<br />
aP<br />
− bk<br />
<br />
) = f (u) + g(v)<br />
bP<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Does not restrict generality to assume that [ak, bk] not<br />
collinear. To simplify, assume also ϕp differentiable.<br />
3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />
Trivial for terms for which akbk = 0.<br />
From now on, restrict the sum to terms akbk = 0<br />
4 Write this at u + α/aP and v − α/bP:<br />
P<br />
k=1<br />
ϕk<br />
<br />
u ak + v bk + α( ak<br />
aP<br />
− bk<br />
<br />
) = f (u) + g(v)<br />
bP<br />
5 Subtract to cancel Pth term, divi<strong>de</strong> by α, and l<strong>et</strong> α → 0:<br />
P−1 <br />
k=1<br />
( ak<br />
aP<br />
− bk<br />
) ϕ<br />
bP<br />
(1)<br />
k (u ak + v bk) = f (1) (u) + g (1) (v)<br />
for some univariate functions f (1) (u) and g (1) (u).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Does not restrict generality to assume that [ak, bk] not<br />
collinear. To simplify, assume also ϕp differentiable.<br />
3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />
Trivial for terms for which akbk = 0.<br />
From now on, restrict the sum to terms akbk = 0<br />
4 Write this at u + α/aP and v − α/bP:<br />
P<br />
k=1<br />
ϕk<br />
<br />
u ak + v bk + α( ak<br />
aP<br />
− bk<br />
<br />
) = f (u) + g(v)<br />
bP<br />
5 Subtract to cancel Pth term, divi<strong>de</strong> by α, and l<strong>et</strong> α → 0:<br />
P−1 <br />
k=1<br />
( ak<br />
aP<br />
− bk<br />
) ϕ<br />
bP<br />
(1)<br />
k (u ak + v bk) = f (1) (u) + g (1) (v)<br />
for some univariate functions f (1) (u) and g (1) (u).<br />
Conclusion: We have one term less<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />
P<br />
(<br />
j=2<br />
a1<br />
aj<br />
− b1<br />
bj<br />
) ϕ (P−1)<br />
1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />
P<br />
(<br />
j=2<br />
a1<br />
aj<br />
− b1<br />
bj<br />
) ϕ (P−1)<br />
1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />
7 Hence ϕ (P−1)<br />
1 (u a1 + v b1) is linear, as a sum of two univariate<br />
functions (ϕ (P)<br />
1 is a constant because a1b1 = 0).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />
P<br />
(<br />
j=2<br />
a1<br />
aj<br />
− b1<br />
bj<br />
) ϕ (P−1)<br />
1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />
7 Hence ϕ (P−1)<br />
1 (u a1 + v b1) is linear, as a sum of two univariate<br />
functions (ϕ (P)<br />
1 is a constant because a1b1 = 0).<br />
8 Eventually ϕ1 is a polynomial.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />
P<br />
(<br />
j=2<br />
a1<br />
aj<br />
− b1<br />
bj<br />
) ϕ (P−1)<br />
1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />
7 Hence ϕ (P−1)<br />
1 (u a1 + v b1) is linear, as a sum of two univariate<br />
functions (ϕ (P)<br />
1 is a constant because a1b1 = 0).<br />
8 Eventually ϕ1 is a polynomial.<br />
9 Lastly invoke Marcinkiewicz theorem to conclu<strong>de</strong> that s1 is<br />
Gaussian.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />
P<br />
(<br />
j=2<br />
a1<br />
aj<br />
− b1<br />
bj<br />
) ϕ (P−1)<br />
1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />
7 Hence ϕ (P−1)<br />
1 (u a1 + v b1) is linear, as a sum of two univariate<br />
functions (ϕ (P)<br />
1 is a constant because a1b1 = 0).<br />
8 Eventually ϕ1 is a polynomial.<br />
9 Lastly invoke Marcinkiewicz theorem to conclu<strong>de</strong> that s1 is<br />
Gaussian.<br />
10 Same is true for any ϕp such that apbp = 0: sp is Gaussian.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />
P<br />
(<br />
j=2<br />
a1<br />
aj<br />
− b1<br />
bj<br />
) ϕ (P−1)<br />
1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />
7 Hence ϕ (P−1)<br />
1 (u a1 + v b1) is linear, as a sum of two univariate<br />
functions (ϕ (P)<br />
1 is a constant because a1b1 = 0).<br />
8 Eventually ϕ1 is a polynomial.<br />
9 Lastly invoke Marcinkiewicz theorem to conclu<strong>de</strong> that s1 is<br />
Gaussian.<br />
10 Same is true for any ϕp such that apbp = 0: sp is Gaussian.<br />
NB: also holds if ϕp not differentiable<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
L<strong>et</strong> y admit two representations<br />
Equivalent representations<br />
y = A s and y = B z<br />
where s (resp. z) have in<strong>de</strong>pen<strong>de</strong>nt components, and A (resp. B)<br />
have pairwise noncollinear columns.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 286
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
L<strong>et</strong> y admit two representations<br />
Equivalent representations<br />
y = A s and y = B z<br />
where s (resp. z) have in<strong>de</strong>pen<strong>de</strong>nt components, and A (resp. B)<br />
have pairwise noncollinear columns.<br />
These representations are equivalent if every column of A is<br />
proportional to some column of B, and vice versa.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 286
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
L<strong>et</strong> y admit two representations<br />
Equivalent representations<br />
y = A s and y = B z<br />
where s (resp. z) have in<strong>de</strong>pen<strong>de</strong>nt components, and A (resp. B)<br />
have pairwise noncollinear columns.<br />
These representations are equivalent if every column of A is<br />
proportional to some column of B, and vice versa.<br />
If all representations of y are equivalent, they are said to be<br />
essentially unique (permutation & scale ambiguities only).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 286
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
I<strong>de</strong>ntifiability & uniqueness theorems S<br />
L<strong>et</strong> y be a random vector of the form y = A s, where sp are<br />
in<strong>de</strong>pen<strong>de</strong>nt, and A has non pairwise collinear columns.<br />
I<strong>de</strong>ntifiability theorem y can be represented as<br />
y = A1 s1 + A2 s2, where s1 is non Gaussian, s2 is Gaussian<br />
in<strong>de</strong>pen<strong>de</strong>nt of s1, and A1 is essentially unique.<br />
Uniqueness theorem If in addition the columns of A1 are<br />
linearly in<strong>de</strong>pen<strong>de</strong>nt, then the distribution of s1 is unique up<br />
to scale and location in<strong>de</strong>terminacies.<br />
Remark 1: if s2 is 1-dimensional, then A2 is also essentially unique<br />
Remark 2: the proofs are not constructive [KLR73]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 287
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Example of uniqueness S<br />
L<strong>et</strong> si be in<strong>de</strong>pen<strong>de</strong>nt with no Gaussian component, and bi be<br />
in<strong>de</strong>pen<strong>de</strong>nt Gaussian. Then the linear mo<strong>de</strong>l below is i<strong>de</strong>ntifiable,<br />
but A2 is not essentially unique whereas A1 is:<br />
<br />
<br />
<br />
s1 + s2 + 2 b1<br />
b1<br />
b1 + b2<br />
with<br />
s1 + 2 b2<br />
A1 =<br />
1 1<br />
1 0<br />
= A1 s +A2<br />
<br />
2 0<br />
, A2 =<br />
0 2<br />
b2<br />
<br />
= A1 s +A3<br />
and A3 =<br />
Hence the distribution of s is essentially unique.<br />
But (A1, A2) not equivalent to (A1, A3).<br />
b1 − b2<br />
1 1<br />
1 −1<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 288
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Example of non uniqueness S<br />
L<strong>et</strong> si be in<strong>de</strong>pen<strong>de</strong>nt with no Gaussian component, and bi be<br />
in<strong>de</strong>pen<strong>de</strong>nt Gaussian. Then the linear mo<strong>de</strong>l below is i<strong>de</strong>ntifiable,<br />
but the distribution of s is not unique because a 2 × 4 matrix<br />
cannot be full column rank:<br />
⎛<br />
⎞ ⎛<br />
s1 + s3 + s4 + 2 b1<br />
with<br />
s2 + s3 − s4 + 2 b2<br />
<br />
⎜<br />
= A ⎜<br />
⎝<br />
A =<br />
s1<br />
s2<br />
s3 + b1 + b2<br />
s4 + b1 − b2<br />
1 0 1 1<br />
0 1 1 −1<br />
⎟ ⎜<br />
⎟<br />
⎠ = A ⎜<br />
⎝<br />
s1 + 2 b1<br />
s2 + 2 b2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 289<br />
<br />
s3<br />
s4<br />
⎞<br />
⎟<br />
⎠
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
CP of Cumulant tensor<br />
If we take the dth <strong>de</strong>rivative of the core equation (47) at u = 0, we<br />
g<strong>et</strong> a relation b<strong>et</strong>ween cumulants of or<strong>de</strong>r d.<br />
For instance at or<strong>de</strong>r d = 4 in the real case:<br />
Cx,ijkℓ = <br />
p<br />
AipAjpAkpAℓp κp<br />
➽ This can be noticed to be of the same form as the CP (20) in<br />
sli<strong>de</strong>s 55-60.<br />
(48)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 290
Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />
Link with polynomials<br />
Symm<strong>et</strong>ric tensors of or<strong>de</strong>r d and dim N are bijectively<br />
mapped to homogeneous polynomials of <strong>de</strong>gree d in N<br />
variables. For instance if d = 3:<br />
p(x) = <br />
ijk<br />
Tijk xi xj xk<br />
Third or<strong>de</strong>r tensors of dimension I × J × K are bijectively<br />
related to trilinear forms in I + J + K variables:<br />
p(x, y, z) = <br />
i,j,k<br />
Tijk xi yj zk<br />
➽ Hence, <strong>de</strong>composing a polynomial is equivalent to <strong>de</strong>composing<br />
a tensor.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 291
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Tenseurs symétriques <strong>de</strong> dim.2<br />
Binary forms of <strong>de</strong>gree d<br />
James Joseph Sylvester (1814–1897)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 292
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem<br />
Sylvester’s theorem in R (1886)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem<br />
Sylvester’s theorem in R (1886)<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written in R[x1, x2] as a sum of dth powers of r distinct linear<br />
forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem<br />
Sylvester’s theorem in R (1886)<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written in R[x1, x2] as a sum of dth powers of r distinct linear<br />
forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤ ⎡ ⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎦ ⎣<br />
g0<br />
g1<br />
.<br />
gr<br />
⎥ = 0. (49)<br />
⎦<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem<br />
Sylvester’s theorem in R (1886)<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written in R[x1, x2] as a sum of dth powers of r distinct linear<br />
forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤ ⎡ ⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎦ ⎣<br />
2 q(x1, x2) <strong>de</strong>f<br />
= r ℓ=0 gℓ x ℓ r−ℓ<br />
1 x2 has r distinct real roots<br />
g0<br />
g1<br />
.<br />
gr<br />
⎥ = 0. (49)<br />
⎦<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem<br />
Sylvester’s theorem in R (1886)<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written in R[x1, x2] as a sum of dth powers of r distinct linear<br />
forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤ ⎡ ⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎦ ⎣<br />
2 q(x1, x2) <strong>de</strong>f<br />
= r ℓ=0 gℓ x ℓ r−ℓ<br />
1 x2 has r distinct real roots<br />
g0<br />
g1<br />
.<br />
gr<br />
⎥ = 0. (49)<br />
⎦<br />
Then q(x1, x2) <strong>de</strong>f<br />
= r<br />
j=1 (βj x1 − αj x2) yields the r forms<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem<br />
Sylvester’s theorem in R (1886)<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written in R[x1, x2] as a sum of dth powers of r distinct linear<br />
forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤ ⎡ ⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎦ ⎣<br />
2 q(x1, x2) <strong>de</strong>f<br />
= r ℓ=0 gℓ x ℓ r−ℓ<br />
1 x2 has r distinct real roots<br />
g0<br />
g1<br />
.<br />
gr<br />
⎥ = 0. (49)<br />
⎦<br />
Then q(x1, x2) <strong>de</strong>f<br />
= r<br />
j=1 (βj x1 − αj x2) yields the r forms<br />
Valid even in non generic cases.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293
Lemma<br />
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Proof of Sylvester’s theorem (1)<br />
For homogeneous polynomials of <strong>de</strong>gree d param<strong>et</strong>erized as<br />
p(x) <strong>de</strong>f<br />
= <br />
|i|=d c(i) γ(i; p) x i , <strong>de</strong>fine the apolar scalar product:<br />
〈p, q〉 = <br />
c(i) γ(i; p) γ(i; q)<br />
|i|=d<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 294
Lemma<br />
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Proof of Sylvester’s theorem (1)<br />
For homogeneous polynomials of <strong>de</strong>gree d param<strong>et</strong>erized as<br />
p(x) <strong>de</strong>f<br />
= <br />
|i|=d c(i) γ(i; p) x i , <strong>de</strong>fine the apolar scalar product:<br />
〈p, q〉 = <br />
c(i) γ(i; p) γ(i; q)<br />
|i|=d<br />
Then L(x) <strong>de</strong>f<br />
= a T x ⇒ 〈p, L d 〉 = <br />
|i|=d c(i) γ(i; p) ai = p(a)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 294
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Proof of Sylvester’s theorem (2)<br />
1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />
L<strong>et</strong> q(x) <strong>de</strong>f<br />
= r<br />
j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Proof of Sylvester’s theorem (2)<br />
1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />
L<strong>et</strong> q(x) <strong>de</strong>f<br />
= r<br />
j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />
2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />
〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Proof of Sylvester’s theorem (2)<br />
1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />
L<strong>et</strong> q(x) <strong>de</strong>f<br />
= r<br />
j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />
2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />
〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />
d−r−µ<br />
x2 ,<br />
1 ≤ µ ≤ d − r, and <strong>de</strong>note gℓ coefficients of q:<br />
3 Take for instance polynomials mµ(x) = x µ<br />
1<br />
〈mµq, t〉 = 0 ⇒<br />
r<br />
gℓ γℓ+µ = 0<br />
ℓ=0<br />
This is exactly (49) expressed in canonical basis<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Proof of Sylvester’s theorem (2)<br />
1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />
L<strong>et</strong> q(x) <strong>de</strong>f<br />
= r<br />
j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />
2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />
〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />
d−r−µ<br />
x2 ,<br />
1 ≤ µ ≤ d − r, and <strong>de</strong>note gℓ coefficients of q:<br />
3 Take for instance polynomials mµ(x) = x µ<br />
1<br />
〈mµq, t〉 = 0 ⇒<br />
r<br />
gℓ γℓ+µ = 0<br />
ℓ=0<br />
This is exactly (49) expressed in canonical basis<br />
4 Roots of q(x1, x2) are distinct real since forms Lj are.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Proof of Sylvester’s theorem (2)<br />
1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />
L<strong>et</strong> q(x) <strong>de</strong>f<br />
= r<br />
j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />
2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />
〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />
d−r−µ<br />
x2 ,<br />
1 ≤ µ ≤ d − r, and <strong>de</strong>note gℓ coefficients of q:<br />
3 Take for instance polynomials mµ(x) = x µ<br />
1<br />
〈mµq, t〉 = 0 ⇒<br />
r<br />
gℓ γℓ+µ = 0<br />
ℓ=0<br />
This is exactly (49) expressed in canonical basis<br />
4 Roots of q(x1, x2) are distinct real since forms Lj are.<br />
5 Reasoning goes also backwards<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Algorithm for dth or<strong>de</strong>r symm<strong>et</strong>ric tensors of dimension 2<br />
Start with r = 1 (d × 2 matrix) and increase r until it looses its<br />
column rank and until roots are distinct<br />
1 2<br />
2 3<br />
3 4<br />
4 5<br />
5 6<br />
6 7<br />
7 8<br />
−→<br />
1 2 3<br />
2 3 4<br />
3 4 5<br />
4 5 6<br />
5 6 7<br />
6 7 8<br />
−→<br />
1 2 3 4<br />
2 3 4 5<br />
3 4 5 6<br />
4 5 6 7<br />
5 6 7 8<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 296
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Decomposition of maximal rank: x 1 x d−1<br />
2<br />
1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />
[0, 0, . . . 0, 1, 0] g = 0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Decomposition of maximal rank: x 1 x d−1<br />
2<br />
1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />
[0, 0, . . . 0, 1, 0] g = 0<br />
2 Find (αi, βi) such that q(x1, x1) = d<br />
j=1 (βjx1 − αjx2)<br />
<strong>de</strong>f d = ℓ=0 gℓ x ℓ r−ℓ<br />
1 x2 has d distinct roots<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Decomposition of maximal rank: x 1 x d−1<br />
2<br />
1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />
[0, 0, . . . 0, 1, 0] g = 0<br />
2 Find (αi, βi) such that q(x1, x1) = d<br />
j=1 (βjx1 − αjx2)<br />
<strong>de</strong>f<br />
= d<br />
ℓ=0 gℓ x ℓ 1<br />
x r−ℓ<br />
2<br />
has d distinct roots<br />
3 Take αj = 1. Then gd−1 = 0 just means βj = 0<br />
Choose arbitrarily such distinct βj’s<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Decomposition of maximal rank: x 1 x d−1<br />
2<br />
1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />
[0, 0, . . . 0, 1, 0] g = 0<br />
2 Find (αi, βi) such that q(x1, x1) = d<br />
j=1 (βjx1 − αjx2)<br />
<strong>de</strong>f<br />
= d<br />
ℓ=0 gℓ x ℓ 1<br />
x r−ℓ<br />
2<br />
has d distinct roots<br />
3 Take αj = 1. Then gd−1 = 0 just means βj = 0<br />
Choose arbitrarily such distinct βj’s<br />
4 Compute λj’s by solving the Van <strong>de</strong>r Mon<strong>de</strong> linear system:<br />
⎡<br />
⎢<br />
⎣<br />
1 . . . 1<br />
β1 . . . βd<br />
β 2 1 . . . β 2 d<br />
: : :<br />
β d 1 . . . β d d<br />
⎤<br />
⎡<br />
⎥ ⎢<br />
⎥ ⎢<br />
⎥ λ = ⎢<br />
⎦ ⎣<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297<br />
0<br />
:<br />
0<br />
1<br />
0<br />
⎤<br />
⎥<br />
⎦
Algorithm<br />
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Summary<br />
Sylvester’s theorem allows to compute the <strong>de</strong>composition,<br />
even in non generic cases: [BCMT10] [CGLM08] [CM96].<br />
Rank of binary quantics<br />
A binary quantic of odd <strong>de</strong>gree 2n + 1 has generic rank n + 1<br />
A binary quantic of even <strong>de</strong>gree 2n generic rank n + 1<br />
A binary quantic of <strong>de</strong>gree d may reach maximal rank d.<br />
Orbit of maximal rank is x1 x d−1<br />
2 .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 298
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem in C S<br />
Similarly, Sylvester’s theorem in C<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem in C S<br />
Similarly, Sylvester’s theorem in C<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written as a sum of dth powers of r distinct linear forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem in C S<br />
Similarly, Sylvester’s theorem in C<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written as a sum of dth powers of r distinct linear forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ g = 0.<br />
⎦<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem in C S<br />
Similarly, Sylvester’s theorem in C<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written as a sum of dth powers of r distinct linear forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ g = 0.<br />
⎦<br />
2 q(x1, x2) <strong>de</strong>f<br />
= r ℓ=0 gℓ x ℓ r−ℓ<br />
1 x2 has r distinct roots<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem in C S<br />
Similarly, Sylvester’s theorem in C<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written as a sum of dth powers of r distinct linear forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ g = 0.<br />
⎦<br />
2 q(x1, x2) <strong>de</strong>f<br />
= r ℓ=0 gℓ x ℓ r−ℓ<br />
1 x2 has r distinct roots<br />
Then q(x1, x2) <strong>de</strong>f<br />
= r<br />
j=1 (β∗ j x1 − α ∗ j x2) yields the r forms<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Sylvester’s theorem in C S<br />
Similarly, Sylvester’s theorem in C<br />
A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />
1 x2 can be<br />
written as a sum of dth powers of r distinct linear forms:<br />
t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />
1 there exists a vector g of dimension r + 1 such that<br />
⎡<br />
⎤<br />
⎢<br />
⎣<br />
γ0 γ1 · · · γr<br />
γ1 γ2 · · · γr+1<br />
.<br />
γd−r · · · γd<br />
.<br />
⎥ g = 0.<br />
⎦<br />
2 q(x1, x2) <strong>de</strong>f<br />
= r ℓ=0 gℓ x ℓ r−ℓ<br />
1 x2 has r distinct roots<br />
Then q(x1, x2) <strong>de</strong>f<br />
= r<br />
j=1 (β∗ j x1 − α ∗ j x2) yields the r forms<br />
Valid even in non generic cases.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
BIOME algorithms<br />
These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />
2r > 4<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
BIOME algorithms<br />
These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />
2r > 4<br />
Related to symm<strong>et</strong>ric flattening introduced in previous lectures<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
BIOME algorithms<br />
These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />
2r > 4<br />
Related to symm<strong>et</strong>ric flattening introduced in previous lectures<br />
We take the case 2r = 6 for the presentation, and <strong>de</strong>note<br />
C ℓmn<br />
ijk<br />
<strong>de</strong>f<br />
= Cum{xi, xj, xk, x ∗ l , x ∗ m, x ∗ n} (50)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
BIOME algorithms<br />
These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />
2r > 4<br />
Related to symm<strong>et</strong>ric flattening introduced in previous lectures<br />
We take the case 2r = 6 for the presentation, and <strong>de</strong>note<br />
C ℓmn<br />
ijk<br />
In that case, we have<br />
C ℓmn<br />
x, ijk =<br />
<strong>de</strong>f<br />
= Cum{xi, xj, xk, x ∗ l , x ∗ m, x ∗ n} (50)<br />
P<br />
p=1<br />
Aip Ajp Akp A ∗ ℓp A∗ mp A ∗ np ∆p<br />
where ∆p <strong>de</strong>f<br />
= Cum{sp, sp, sp, s ∗ p, s ∗ p, s ∗ p} <strong>de</strong>note the diagonal<br />
entries of a P × P diagonal matrix, ∆ (6)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Writing in matrix form<br />
Tensor Cx is of dimensions K × K × K × K × K × K and<br />
enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Writing in matrix form<br />
Tensor Cx is of dimensions K × K × K × K × K × K and<br />
enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />
Tensor Cx can be stored in a K 3 × K 3 Hermitian matrix, C (6)<br />
x ,<br />
called the hexacovariance. With an appropriate storage of the<br />
tensor entries, we have<br />
C (6)<br />
x = A ⊙3 ∆ (6) A ⊙3H<br />
(51)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Writing in matrix form<br />
Tensor Cx is of dimensions K × K × K × K × K × K and<br />
enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />
Tensor Cx can be stored in a K 3 × K 3 Hermitian matrix, C (6)<br />
x ,<br />
called the hexacovariance. With an appropriate storage of the<br />
tensor entries, we have<br />
C (6)<br />
x = A ⊙3 ∆ (6) A ⊙3H<br />
Because C (6)<br />
x is Hermitian, ∃V unitary, such that<br />
(51)<br />
(C (6)<br />
x ) 1/2 = A ⊙3 (∆ (6) ) 1/2 V (52)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Writing in matrix form<br />
Tensor Cx is of dimensions K × K × K × K × K × K and<br />
enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />
Tensor Cx can be stored in a K 3 × K 3 Hermitian matrix, C (6)<br />
x ,<br />
called the hexacovariance. With an appropriate storage of the<br />
tensor entries, we have<br />
C (6)<br />
x = A ⊙3 ∆ (6) A ⊙3H<br />
Because C (6)<br />
x is Hermitian, ∃V unitary, such that<br />
(51)<br />
(C (6)<br />
x ) 1/2 = A ⊙3 (∆ (6) ) 1/2 V (52)<br />
I<strong>de</strong>a: Use redundancy existing b<strong>et</strong>ween blocks of (C (6)<br />
x ) 1/2 .<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Using the invariance to estimate V<br />
1 Cut the K 3 × P matrix (C (6)<br />
x ) 1/2 into K blocks of size K 2 × P.<br />
Each of these blocks, Γ[n], satisfies:<br />
Γ[n] = (A ⊙ A H )D[n] (∆ (6) ) 1/2 V<br />
where D[n] is the P × P diagonal matrix containing the nth<br />
row of A, 1 ≤ n ≤ K.<br />
Hence matrices Γ[n] share the same common right singular<br />
space<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 302
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Using the invariance to estimate V<br />
1 Cut the K 3 × P matrix (C (6)<br />
x ) 1/2 into K blocks of size K 2 × P.<br />
Each of these blocks, Γ[n], satisfies:<br />
Γ[n] = (A ⊙ A H )D[n] (∆ (6) ) 1/2 V<br />
where D[n] is the P × P diagonal matrix containing the nth<br />
row of A, 1 ≤ n ≤ K.<br />
Hence matrices Γ[n] share the same common right singular<br />
space<br />
2 Compute the joint EVD of the K(K − 1) matrices<br />
as: Θ[m, n] = V Λ[m, n] V H .<br />
Θ[m, n] <strong>de</strong>f<br />
= Γ[m]†Γ[n]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 302
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Estimation of A<br />
Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />
unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />
scale factor:<br />
A ⊙3 = (C (6)<br />
x ) 1/2 V (53)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Estimation of A<br />
Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />
unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />
scale factor:<br />
A ⊙3 = (C (6)<br />
x ) 1/2 V (53)<br />
One possibility to g<strong>et</strong> A from A ⊙3 is as follows:<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Estimation of A<br />
Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />
unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />
scale factor:<br />
A ⊙3 = (C (6)<br />
x ) 1/2 V (53)<br />
One possibility to g<strong>et</strong> A from A ⊙3 is as follows:<br />
3 Build K 2 matrices Ξ[m] of size K × P form rows of A ⊙3<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Estimation of A<br />
Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />
unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />
scale factor:<br />
A ⊙3 = (C (6)<br />
x ) 1/2 V (53)<br />
One possibility to g<strong>et</strong> A from A ⊙3 is as follows:<br />
3 Build K 2 matrices Ξ[m] of size K × P form rows of A ⊙3<br />
4 From Ξ[m] find A and diagonal matrices D[m], in the LS<br />
sense:<br />
Ξ[m] D[m] ≈ A, 1 ≤ m ≤ K 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Estimation of A (<strong>de</strong>tails) S<br />
Stationary values of criterion M yield the solution below<br />
m=1 ΞmDm − A 2<br />
F<br />
, M <strong>de</strong>f<br />
= K 2 ,<br />
Obtain vectors d p <strong>de</strong>f<br />
= [D1 (p, p) , D2 (p, p) , · · · DM (p, p)] T ,<br />
by solving the linear systems:<br />
Fp d p = 0<br />
where matrices Fp are <strong>de</strong>fined as<br />
<br />
H<br />
(M − 1) Ξm1 Fp (m1, m2) =<br />
Ξm1<br />
<br />
(p, p) if m1 = m2<br />
− Ξ H<br />
m1Ξm2 <br />
(p, p) otherwise<br />
Deduce the estimate A = 1 M M m=1 ΞmDm<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 304
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Conditions of i<strong>de</strong>ntifiability of BIOME(2r) S<br />
Source cumulants of or<strong>de</strong>r 2r > 4 are nonzero and have the<br />
same sign<br />
Columns vectors of mixing matrix A are not collinear<br />
Matrix A ⊙(r−1) is full column rank.<br />
This last condition implies that tensor rank must be at most<br />
K r−1 (e.g. P ≤ K 2 for or<strong>de</strong>r 2r = 6).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 305
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
FOOBI algorithms<br />
Again same problem: Given a K 2 × P matrix A ⊙2 , find a real<br />
orthogonal matrix Q such that the P columns of A ⊙2 Q are of the<br />
form a[p] ⊗ a[p] ∗<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 306
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
FOOBI algorithms<br />
Again same problem: Given a K 2 × P matrix A ⊙2 , find a real<br />
orthogonal matrix Q such that the P columns of A ⊙2 Q are of the<br />
form a[p] ⊗ a[p] ∗<br />
FOOBI: use the K 4 <strong>de</strong>terminantal equations characterizing<br />
rank-1 matrices a[p] a[p] H of the form:<br />
φ(X, Y)ijkℓ = xijyℓk − xikyℓj + yijxℓk − yikxℓj<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 306
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
FOOBI algorithms<br />
Again same problem: Given a K 2 × P matrix A ⊙2 , find a real<br />
orthogonal matrix Q such that the P columns of A ⊙2 Q are of the<br />
form a[p] ⊗ a[p] ∗<br />
FOOBI: use the K 4 <strong>de</strong>terminantal equations characterizing<br />
rank-1 matrices a[p] a[p] H of the form:<br />
φ(X, Y)ijkℓ = xijyℓk − xikyℓj + yijxℓk − yikxℓj<br />
FOOBI2: use the K 2 equations of the form:<br />
Φ(X, Y) = X Y + Y X − traceXY − traceYX<br />
where matrices X and Y are K × K Hermitian.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 306
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
FOOBI S<br />
1 Normalize the columns of the K 2 × P matrix A⊙2 such that<br />
matrices A[r] <strong>de</strong>f <br />
= UnvecK a⊙2 [r] are Hermitian<br />
2 Compute the K 2 (K − 1) 2 × P(P − 1)/2 matrix P <strong>de</strong>fined by<br />
φ(A[r], A[s]), 1 ≤ r ≤ s ≤ P.<br />
3 Compute the P weakest right singular vectors of P, Unvec<br />
them and store them in P matrices W[r]<br />
4 Jointly diagonalize W[r] by a real orthogonal matrix Q<br />
= (C (4)<br />
x ) 1/2∆Q and <strong>de</strong>duce â[r] as the<br />
dominant left singular vectors of Unvec (f [r]).<br />
5 Then compute F <strong>de</strong>f<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 307
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
FOOBI2 S<br />
1 Normalize the columns of the K 2 × P matrix A⊙2 such that<br />
matrices A[r] <strong>de</strong>f <br />
= UnvecK a⊙2 [r] are Hermitian<br />
2 Compute the K(K + 1)/2 Hermitian matrix B[r, s] of size<br />
P × P <strong>de</strong>fined by:<br />
Φ(A[r], A[s])|ij <strong>de</strong>f<br />
= B[i, j]|rs<br />
3 Jointly cancel diagonal entries of matrices B[i, j] by a real<br />
congruent orthogonal transform Q<br />
= (C (4)<br />
x ) 1/2∆Q and <strong>de</strong>duce â[r] as the<br />
dominant left singular vectors of Unvec (f [r]).<br />
4 Then compute F <strong>de</strong>f<br />
NB: B<strong>et</strong>ter bound than FOOBI and BIOME(4), but iterative<br />
algorithm sensitive to initialization<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 308
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Algorithms based on characteristic functions<br />
Fit with a mo<strong>de</strong>l of exact rank<br />
1 Back to the core equation (47):<br />
Ψx(u) = <br />
p<br />
Ψsp<br />
<br />
q<br />
uq Aqp<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 309
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Algorithms based on characteristic functions<br />
Fit with a mo<strong>de</strong>l of exact rank<br />
1 Back to the core equation (47):<br />
Ψx(u) = <br />
p<br />
Ψsp<br />
<br />
q<br />
uq Aqp<br />
2 Goal: Find a matrix A such that the K−variate function<br />
Ψx(u) <strong>de</strong>composes into a sum of P univariate functions<br />
ψp <strong>de</strong>f<br />
= Ψsp.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 309
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Algorithms based on characteristic functions<br />
Fit with a mo<strong>de</strong>l of exact rank<br />
1 Back to the core equation (47):<br />
Ψx(u) = <br />
p<br />
Ψsp<br />
<br />
q<br />
uq Aqp<br />
2 Goal: Find a matrix A such that the K−variate function<br />
Ψx(u) <strong>de</strong>composes into a sum of P univariate functions<br />
ψp <strong>de</strong>f<br />
= Ψsp.<br />
3 I<strong>de</strong>a: Fit both si<strong>de</strong>s on a grid of values u[ℓ] ∈ G<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 309
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Equations <strong>de</strong>rived from the CAF<br />
Assumption: functions ψp, 1 ≤ p ≤ P admit finite <strong>de</strong>rivatives<br />
up to or<strong>de</strong>r r in a neighborhood of the origin, containing G.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 310
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Equations <strong>de</strong>rived from the CAF<br />
Assumption: functions ψp, 1 ≤ p ≤ P admit finite <strong>de</strong>rivatives<br />
up to or<strong>de</strong>r r in a neighborhood of the origin, containing G.<br />
Then, Taking r = 3 as a working example:<br />
∂ 3 Ψx<br />
(u) =<br />
∂ui∂uj∂uk<br />
P<br />
p=1<br />
Aip Ajp Akp ψ (3)<br />
p (<br />
K<br />
q=1<br />
uq Aqp)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 310
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Equations <strong>de</strong>rived from the CAF<br />
Assumption: functions ψp, 1 ≤ p ≤ P admit finite <strong>de</strong>rivatives<br />
up to or<strong>de</strong>r r in a neighborhood of the origin, containing G.<br />
Then, Taking r = 3 as a working example:<br />
∂ 3 Ψx<br />
(u) =<br />
∂ui∂uj∂uk<br />
P<br />
p=1<br />
Aip Ajp Akp ψ (3)<br />
p (<br />
K<br />
q=1<br />
uq Aqp)<br />
If L > 1 point in grid G, then yields another mo<strong>de</strong> in tensor<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 310
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Putting the problem in tensor form<br />
A <strong>de</strong>composition into a sum of rank-1 terms:<br />
Tijkℓ = <br />
or equivalently<br />
T = <br />
p<br />
p<br />
Aip Ajp Akp Bℓp<br />
a(p) ⊗ a(p) ⊗ a(p) ⊗ b(p)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 311
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Putting the problem in tensor form<br />
A <strong>de</strong>composition into a sum of rank-1 terms:<br />
Tijkℓ = <br />
or equivalently<br />
T = <br />
p<br />
p<br />
Aip Ajp Akp Bℓp<br />
a(p) ⊗ a(p) ⊗ a(p) ⊗ b(p)<br />
Tensor T is K × K × K × L,<br />
symm<strong>et</strong>ric in all mo<strong>de</strong>s except the last.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 311
Example [CR06]<br />
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Joint use of different <strong>de</strong>rivative or<strong>de</strong>rs<br />
Derivatives of or<strong>de</strong>r 3:<br />
T (3)<br />
ijkℓ<br />
= <br />
p<br />
Aip Ajp Akp Bℓp<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 312
Example [CR06]<br />
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Joint use of different <strong>de</strong>rivative or<strong>de</strong>rs<br />
Derivatives of or<strong>de</strong>r 3:<br />
T (3)<br />
ijkℓ<br />
Derivatives of or<strong>de</strong>r 4:<br />
T (4)<br />
ijkmℓ<br />
= <br />
p<br />
= <br />
p<br />
Aip Ajp Akp Bℓp<br />
Aip Ajp Akp Amp Cℓp<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 312
Example [CR06]<br />
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Joint use of different <strong>de</strong>rivative or<strong>de</strong>rs<br />
Derivatives of or<strong>de</strong>r 3:<br />
T (3)<br />
ijkℓ<br />
Derivatives of or<strong>de</strong>r 4:<br />
T (4)<br />
ijkmℓ<br />
= <br />
p<br />
= <br />
Derivatives of or<strong>de</strong>rs 3 and 4:<br />
Tijkℓ[m] = <br />
p<br />
p<br />
Aip Ajp Akp Bℓp<br />
Aip Ajp Akp Amp Cℓp<br />
Aip Ajp Akp Dℓp[m]<br />
with Dℓp[m] = Amp Cℓp and Dℓp[0] = Bℓp.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 312
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Iterative algorithms<br />
Goal: Minimize ||T − r<br />
p=1 up ⊗ v p ⊗ . . . ⊗ w p|| 2<br />
Gradient with fixed or variable (ad-hoc) stepsize<br />
Alternating Least Squares (ALS)<br />
Levenberg-Marquardt<br />
Newton<br />
Conjugate gradient<br />
...<br />
Remarks<br />
Hessian is generally huge, but sparse<br />
Problem of local minima: ELS variants for all of the above<br />
Warning: special constraints nee<strong>de</strong>d to ensure existence of best<br />
low rank approximate<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 313
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Tensor package<br />
Matlab co<strong>de</strong> are freely downloadable at:<br />
http://www.i3s.unice.fr/∼pcomon/TensorPackage.html<br />
Co<strong>de</strong>s inclu<strong>de</strong>:<br />
Alternating Least Squares (ALS) at any or<strong>de</strong>r<br />
Gradient<br />
Quasi-Newton (BFGS update)<br />
Levenberg-Marquardt<br />
Conjugate gradient<br />
Enhanced Line Search (ELS) for any of the above<br />
Special co<strong>de</strong>s for NonNegative tensors<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 314
Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />
Non symm<strong>et</strong>ric tensors<br />
Tensors with partial symm<strong>et</strong>ries, or no symm<strong>et</strong>ry at all, often need<br />
to be approximated by <strong>de</strong>compositions of lower rank:<br />
1 Characteristic functions: one mo<strong>de</strong> corresponds to u<br />
2 INDSCAL: tensors with positive <strong>de</strong>finite matrix slices<br />
3 D<strong>et</strong>erministic approaches: no symm<strong>et</strong>ry at all (cf. course on<br />
Applications)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 315
Problem Algorithms Biblio SCA<br />
Bibliography<br />
Existence: [LC09] [CGLM08] [Ste08] [CL11] [LC10]<br />
Uniqueness: [SB00] [BS02]<br />
Computation: [AFCC04] [Ste06] [RCH08] [CL13] [Paa99]<br />
[Paa97]<br />
General: [SBG04] [CLDA09]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 316
Problem Algorithms Biblio SCA<br />
Sparse Component Analysis<br />
Review paper (see [GL06] and Chapter 10 in [CJ10]),<br />
Main i<strong>de</strong>as<br />
Initial un<strong>de</strong>r<strong>de</strong>termined source separation problem:<br />
x(t) = As(t), nbr <strong>sources</strong> P ≫ K nbr d’observations<br />
Transform with a sparsifying mapping T , which preserves<br />
linearity (e.g. wavel<strong>et</strong> transform, ST Fourier transform, <strong>et</strong>c.):<br />
˜x(t) = T (x(t)) = T (As(t)) = AT (s(t)) = A˜s(t)<br />
Solve the source separation problem in the sparse space: ˆ˜s(t)<br />
Come back to the initial space, with inverse of T :<br />
ˆs(t) = T −1 (ˆ˜s(t))<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 317
Problem Algorithms Biblio SCA<br />
Examples of <strong>sources</strong> and mixtures<br />
Sources may be sparse in time or in any domain, preserving the<br />
linear nature of mixtures<br />
2 mixtures of 3 sparse <strong>sources</strong><br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 318
Problem Algorithms Biblio SCA<br />
Solving the un<strong>de</strong>r-<strong>de</strong>termined source separation problem<br />
Since A is non invertible, indirect approach is not possible. One has<br />
to consi<strong>de</strong>r (1) i<strong>de</strong>ntification of A and then (2) restauration of s,<br />
i.e.:<br />
A 3-step approach<br />
Step 0 (representation space): if necessary apply sparsifying<br />
mapping T<br />
Step 1 (i<strong>de</strong>ntification): Estimate A, e.g. by clustering.<br />
Step 2 (source restoration): At each instant n0, find the<br />
sparsest solution of<br />
As(n0) = x(n0), n0 = 0, . . . , N<br />
Main question: HOW to find the sparsest solution of an<br />
Un<strong>de</strong>r<strong>de</strong>termined System of Linear Equations (USLE)?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 319
Problem Algorithms Biblio SCA<br />
Uniqueness of the sparsest solution<br />
x = As, K equations (observations), P unknowns (<strong>sources</strong>):<br />
P ≫ K<br />
Question: Is the sparse solution unique?<br />
Theorem [Don04, GN03]: If there is a solution ˆs with less than<br />
K/2 non-zero components, then it is unique with probability 1<br />
(that is, for almost all matrices A).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 320
Problem Algorithms Biblio SCA<br />
How to find the sparsest solution ?<br />
x = As, K equations (observations), P unknowns (<strong>sources</strong>):<br />
P ≫ K<br />
Goal: Finding the sparsest solution (at least P − K <strong>sources</strong> are<br />
zero)<br />
Direct m<strong>et</strong>hod (Minimum L0 norm m<strong>et</strong>hod):<br />
min s 0 s.t. x = As<br />
1 S<strong>et</strong> P − K (arbitrary) <strong>sources</strong> equal to zero<br />
2 Solve the remaining system of K equations and K unknowns<br />
3 Repeat steps 1 and 2 for all possible choices, and take the<br />
sparsest answer.<br />
Not tractable<br />
Example: P = 50, K = 30, then C K P<br />
≈ 5 × 1013<br />
On a basic PC, 30 × 30 system of equations : 2 × 1O −4 s<br />
Total time: (5 × 10 13 ) × (2 × 10 −4 s) = 300 years!<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 321
Problem Algorithms Biblio SCA<br />
A few faster algorithms<br />
M<strong>et</strong>hod of Frames (MoF) [DDD04]: linear m<strong>et</strong>hod, fast but<br />
not accurate<br />
Matching Pursuit [MZ93]: greedy m<strong>et</strong>hod, fast but not<br />
accurate<br />
Basis Pursuit (minimal L1 norm, Linear Programming)<br />
[CDS99]: accurate and tractable m<strong>et</strong>hod, but still time<br />
consuming<br />
Smoothed L0 [MBZJ09]: fast and accurate, based on GNC<br />
principle<br />
Theorem [DE03]: un<strong>de</strong>r some (sparsity) conditions: argmin L1 =<br />
argmin L0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 322
Problem Algorithms Biblio SCA<br />
Applications of SCA<br />
Among many others, applications in<br />
Audio and music (see the lecture of L. Dau<strong>de</strong>t)<br />
Astrophysics (see the lecture of Starck)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 323
Introduction NMF NTF<br />
IX. NonNegative Mixtures<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 324
32 Introduction<br />
Titre sous-section<br />
Introduction NMF NTF<br />
Contents of course IX<br />
33 Non-negative Matrix Factorization<br />
Condition for NN ICA<br />
Algorithm principles<br />
Discussion<br />
34 NonNegative Tensor Factorization<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 325
Introduction NMF NTF Titre sous-section<br />
Why non negative mixture ?<br />
In many real-world applications, <strong>sources</strong> are non-negative<br />
(NN): images, chemical concentrations, note or speech<br />
amplitu<strong>de</strong> in audio, <strong>et</strong>c.<br />
In addition, mixtures can som<strong>et</strong>imes be non-negitive too:<br />
superposition of images, mixtures of chemical molecules or<br />
ions, <strong>et</strong>c.<br />
NN for linear source separation problem<br />
Mixing equation x(t) = As(t); t = 1, . . . , N can be written as:<br />
X = AS<br />
where X is K × N, and S is P × N.<br />
Solving source separation consists in factorizing the observable<br />
mixtures X in two matrices, with non-negativity constraints,<br />
on S, on A or on both S and A.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 326
Introduction NMF NTF Titre sous-section<br />
Matrix and tensor factorization<br />
From matrices to tensors<br />
More complex data (3D, 3D+t, <strong>et</strong>c.), e.g. sequence of images<br />
(vi<strong>de</strong>o, TEP or MRI, hyperspectral images) or multi-modal<br />
data, can be represented by N size arrays, N > 2<br />
Non-negative matrix factorization (NMF) can then be<br />
exten<strong>de</strong>d to Non-negative Tensor Factorization (NTF)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 327
In NMF<br />
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Ealier works<br />
Paatero and Tapper [PT94] consi<strong>de</strong>red positivity constraints<br />
on A and s in environmental mo<strong>de</strong>lization,<br />
Lee and Seung [LS99] used NMF for feature extraction in face<br />
images<br />
In Nonnegative ICA<br />
Plumbley <strong>et</strong> al addressed the problem for audio and especially<br />
music separation [PAB + 02]<br />
Plumbey proposed theor<strong>et</strong>ical conditions for nonnegative<br />
in<strong>de</strong>pen<strong>de</strong>nt component analysis in 2002 [Plu02]<br />
First algorithms have been proposed by Plumbley<br />
[Plu03, Plu05] and Plumbley and Oja [PO04],<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 328
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Conditions for NN ICA [Plu02]<br />
Assumptions and pre-processing<br />
x(t) = As(t), and <strong>sources</strong> s(t) are NN and well-groun<strong>de</strong>d<br />
Definition: A source s is well-groun<strong>de</strong>d if it has non-zero pdf<br />
in the positive neighborhood of s = 0.<br />
One then apply a whitening W, computed on zero-mean data<br />
(x(t) − E[x]) , but applyied on x: z = Wx<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 329
Theorem [Plu02]<br />
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Conditions for NN ICA [Plu02]<br />
L<strong>et</strong> z be a random vector of real-valued, non-negative and<br />
well-groun<strong>de</strong>d in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong>, each with unit variance<br />
and l<strong>et</strong> y = Uz be an orthogonal rotation of z, then U is a<br />
permutation matrix iff y is nonnegative with probability 1.<br />
Practically, matrix U can be estimated by minimizing a<br />
criterion:<br />
J(U) = 1<br />
2 E<br />
<br />
z T + <br />
− U y 2 <br />
where y + = (y + 1<br />
, . . . , y +<br />
P<br />
), with y +<br />
i = max(yi, 0).<br />
In [Plu03], Plumbley propose a few algorithms, based<br />
on NN PCA,<br />
Givens rotations,<br />
line search,<br />
geo<strong>de</strong>syc search.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 330
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Cost function and its differential<br />
Simple Gradient algorithm<br />
It is the square error, written like the Frobenius norm:<br />
J(A, S) = 1<br />
2<br />
<br />
i,j<br />
(xij − (AS)ij) 2 = 1<br />
2<br />
Its gradient with respect to S entries writes:<br />
Update equations<br />
∂J(A, S)<br />
= − <br />
∂sij<br />
i,j<br />
[A T X − A T AS]i,j<br />
X − AS2<br />
F<br />
There are similar due to sym<strong>et</strong>ry b<strong>et</strong>ween A and S in the cost<br />
function:<br />
sij = sij + µij([A T X]ij − [A T AS]ij) <br />
+<br />
akl = akl + µkl([XST ]kl − [ASST ]kl)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 331
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Gradient with multiplicative updates<br />
For enhancing gradient algorithm, Lee and Seung [LS99] proposed<br />
multiplicative update:<br />
Choice of the step size<br />
One chose µij so that first and third right si<strong>de</strong> terms of the<br />
update equation cancel:<br />
sij = sij + µij([AT X]ij − [AT AS]ij) <br />
+ , i.e. sij = µij[AT AS]ij,<br />
hence:<br />
sij<br />
µij =<br />
[AT AS]ij<br />
New update equations<br />
The new (multiplicative) update equations are then::<br />
sij = sij<br />
[A T X]ij<br />
[A T AS]ij<br />
and aij = aij<br />
[AST ]ij<br />
[ASST ]ij<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 332
Newton-like m<strong>et</strong>hod<br />
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Alternating Least Square<br />
The gradient of J(A, S) with respect to S writes:<br />
∂J(A, S)<br />
∂S<br />
= −(A T X − A T AS)<br />
It must be equal to 0 at the minimum, i.e.<br />
Newton-like update equations<br />
(A T A)S = A T X<br />
The Newton-like update equations are then:<br />
S = [(A T A) −1 A T X]+ and A = [XS T (SS T ) −1 ]+<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 333
Well-groun<strong>de</strong>d condition<br />
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Comments<br />
This condition is very important, since one need samples on the<br />
edges of the scatter plot<br />
With uniform distribution: satisfied. But with a triangle pdf...<br />
Similar problem noted by Moussaoui (ICASSP 06) in chemical<br />
spectroscopy<br />
Moussaoui mentionned problem of tendancy in chemical spectra<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 334
Noisy mixtures<br />
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Noisy mixtures of NN <strong>sources</strong><br />
Finally, NMF are not robust to additive noise<br />
Moreover, is additive noise realistic ?<br />
Relevant noise mo<strong>de</strong>lling in relation with the physics of the<br />
data<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 335
NMF pros and cons<br />
Introduction NMF NTF NN ICA Algorithms Discussion<br />
Beyond NMF<br />
Simple formulation and algorithms<br />
No proof of unique factorization<br />
Require strict enough conditions<br />
Poor robustness to noisy mixtures<br />
NN Bayesian approach<br />
NN priors is simple to consi<strong>de</strong>r<br />
Although Bayesian approaches are cost consuming, they are<br />
efficient for small samples<br />
See the lecture of A. Mohammad-Djafari in this course<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 336
Introduction NMF NTF<br />
NonNegative Tensor Factorization (1/2)<br />
In a CP <strong>de</strong>composition<br />
T = (A, B, .., C) · L (54)<br />
the solution is not unique: constraints nee<strong>de</strong>d (to avoid e.g.<br />
A → ∞)<br />
If T is nonnegative, one can impose to {A, B, .., C, L} to be<br />
all nonnegative → concept of nonnegative rank [LC09]<br />
Impose unit-norm factors in (54): A = B = .. = C = 1<br />
Choice of norm?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 337
Introduction NMF NTF<br />
NonNegative Tensor Factorization (2/2)<br />
Define T E = <br />
ij..k |Tij..k|, b<strong>et</strong>ter suited in R + than<br />
Frobenius<br />
Then constraints<br />
fixes the scale in<strong>de</strong>terminacy<br />
ap1 = bp1 = .. = cp1 = 1 (55)<br />
With that constraint, u ⊗ v ⊗ .. ⊗ w = 1, and T E = λ1.<br />
Function Υ(A, B, .., C, λ) = T − R i=1 ai ⊗ bi ⊗ .. ⊗ ciE is<br />
unboun<strong>de</strong>d but coercive continuous. Hence its infimum is<br />
reached, even if not <strong>de</strong>fined on a compact.<br />
Conclusion: The best low-rank nonnegative tensor approximation<br />
problem is well posed un<strong>de</strong>r constraint (55).<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 338
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
X. Applications<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 339
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
35 Introduction<br />
Contents of course X<br />
36 D<strong>et</strong>erministic approaches<br />
Fluorescence<br />
Antenna Array Processing<br />
Spread spectrum communications<br />
37 Hyperspectral image of Mars<br />
38 Back to Cumulant-based approaches<br />
Discr<strong>et</strong>e <strong>sources</strong><br />
Algorithms<br />
39 Smart chemical sensor arrays<br />
Problem and mo<strong>de</strong>l<br />
M<strong>et</strong>hod based on silent <strong>sources</strong><br />
Bayesian approach<br />
40 Removing show-through in Scanned images<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 340
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
A large range of applications<br />
Source separation is a common problem<br />
For multi-dimensional recordings...<br />
Biomedical signals processing: non invasive m<strong>et</strong>hods,<br />
mocalisation and artefects cancellation in EEG, ECG, MEG,<br />
fMRI,<br />
Communications and antenna processing, sonar, radar, mobile<br />
phones,<br />
Audio and music processing,<br />
Image processing in astrophysics,<br />
Monitoring,<br />
Pre-processing for classification,<br />
Scanned image processing,<br />
Smart sensor <strong>de</strong>sign.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 341
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
How to use ICA ?<br />
ICA: a solution for source separation<br />
ICA is now a usual m<strong>et</strong>hod for solving blind source separation<br />
problems<br />
ICA is an estimation m<strong>et</strong>hod, which requires:<br />
a separating mo<strong>de</strong>l, suitable to the mixing mo<strong>de</strong>l,<br />
an in<strong>de</strong>pen<strong>de</strong>nce criterion,<br />
an optimisation algorithm.<br />
When running ICA algorithm, one obtains a solution, which is<br />
the best one un<strong>de</strong>r the above constraints.<br />
Are the In<strong>de</strong>pen<strong>de</strong>nt Components (IC) relevant ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 342
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
ICA: relevance of in<strong>de</strong>pen<strong>de</strong>nt component<br />
Relevance requires:<br />
a good mo<strong>de</strong>l,<br />
the <strong>sources</strong> satisfy the in<strong>de</strong>pen<strong>de</strong>nce assumption,<br />
the optimisation algorithm does not stop in a bad local<br />
minima.<br />
With actual problems, basically, the relevance is not sure since:<br />
the mo<strong>de</strong>l is often an approximation of the physical system,<br />
source in<strong>de</strong>pen<strong>de</strong>nce assumption is not sure,<br />
we don’t know either the <strong>sources</strong> or their number.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 343
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Use all the possible priors<br />
Before applying ICA<br />
Positivity, sparsity, temporal <strong>de</strong>pendance, periodicity or<br />
cyclostationnarity, discr<strong>et</strong>e-valued data, <strong>et</strong>c.<br />
Priors can lead to b<strong>et</strong>ter criteria than in<strong>de</strong>pen<strong>de</strong>nce.<br />
Careful mo<strong>de</strong>lization of observations<br />
How the physical system provi<strong>de</strong>s the observations?<br />
Mo<strong>de</strong>lization leads to a mixing mo<strong>de</strong>l, and hence to a suited<br />
mo<strong>de</strong>l of separating structure.<br />
Mo<strong>de</strong>lization is important for showing what should be the<br />
estimated <strong>sources</strong>.<br />
Physics of the system is important for component<br />
interpr<strong>et</strong>ation.<br />
Choose the best representation domain for applying BSS<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 344
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Dam monitoring<br />
The dam wall moves according to the water level, the<br />
temperature, <strong>et</strong>c.<br />
Simple pendular are hung on the wall, at different locations:<br />
the pendular <strong>de</strong>viations measure the wall motions, with<br />
different sensitivities for the water level, the temperature, <strong>et</strong>c.<br />
What is the relation b<strong>et</strong>ween <strong>de</strong>viations and water level,<br />
temperature, <strong>et</strong>c. ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 345
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Multi-access control<br />
Y. Deville, L. Andry, Application of blind source separation<br />
techniques for multi-tag contactless i<strong>de</strong>ntification systems (NOLTA<br />
95; IEICE Trans. On Fundamentals of Electronics, 1996; French<br />
patent Sept. 1995 subsequently exten<strong>de</strong>d)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 346
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
ICA: preprocessing for classification<br />
Bayesian classification framework<br />
Relate each multi-dimensionnal observation x to a class Ci<br />
Bayesian classification requires conditional <strong>de</strong>nsities p(x/Ci)<br />
Multivariate <strong>de</strong>nsity estimation (small sample, complexity)<br />
A simple i<strong>de</strong>a [Com95, AAG03]<br />
Transform observations x in ˜x by ICA, for all observations<br />
associated to Ci<br />
In the transform space, components are close to in<strong>de</strong>pendance,<br />
so that one can write: p(˜x/Ci) ≈ <br />
k p(˜xk/Ci)<br />
Then, using the inverse transform, one g<strong>et</strong>s p(x/Ci)<br />
The m<strong>et</strong>hod replaces 1 multivariate (size N) <strong>de</strong>nsity<br />
estimation by N scalar <strong>de</strong>nsity estimations<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 347
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Fluorescence Spectroscopy 1/4<br />
An optical excitation produces several effects<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 348
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Fluorescence Spectroscopy 1/4<br />
An optical excitation produces several effects<br />
Rayleigh diffusion<br />
Raman diffusion<br />
Fluorescence<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 348
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Fluorescence Spectroscopy 1/4<br />
An optical excitation produces several effects<br />
Rayleigh diffusion<br />
Raman diffusion<br />
Fluorescence<br />
At low concentrations, Beer-Lambert law can be linearized<br />
[LMRB09]<br />
<br />
I (λf , λe, k) = Io<br />
ℓ<br />
γ ℓ (λf ) ɛ ℓ (λe) c k,ℓ<br />
➽ Hence 3rd array <strong>de</strong>composition with real nonnegative<br />
factors [SBG04].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 348
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Fluorescence Spectroscopy 2/4<br />
Mixture of 4 solutes (one concentration shown)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 349
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Fluorescence Spectroscopy 3/4<br />
Obtained results with R = 4<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 350
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
With projected ALS [Cichocki’09] and R = 5<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 351
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
With nonnegative BFGS [Royer <strong>et</strong> al’10] and R = 5<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 352
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Link with Bayesian approaches<br />
Similar Non Negative tensor <strong>de</strong>composition in Bayesian approaches:<br />
p(x1, x2, . . . xp) ≈<br />
R<br />
p(zi) p(x1|zi) . . . p(xp|zi)<br />
i=1<br />
See [LC09], and course on Bayesian techniques.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 353
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Antenna Array Processing<br />
Receiver<br />
Transmitter<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Antenna Array Processing<br />
Receiver<br />
Transmitter<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Antenna Array Processing<br />
Receiver<br />
Transmitter<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Antenna Array Processing<br />
Receiver<br />
Transmitter<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Antenna Array Processing<br />
Receiver<br />
Transmitter<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Antenna Array Processing<br />
Receiver<br />
Transmitter<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Narrow band mo<strong>de</strong>l in the far field<br />
Mo<strong>de</strong>ling the signals received on an array of antennas<br />
generally leads to a matrix <strong>de</strong>composition:<br />
Tij = <br />
q<br />
aiq sjq<br />
i: space q: path, source A: antenna gains<br />
j: time S: transmitted signals<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 355
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Narrow band mo<strong>de</strong>l in the far field<br />
Mo<strong>de</strong>ling the signals received on an array of antennas<br />
generally leads to a matrix <strong>de</strong>composition:<br />
Tijp = <br />
q<br />
aiq sjq hpq<br />
i: space q: path, source A: antenna gains<br />
j: time S: transmitted signals<br />
But in the presence of additional diversity, a tensor can be<br />
constructed, thanks to new in<strong>de</strong>x p<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 355
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
space<br />
time<br />
Possible diversities in Signal Processing<br />
space translation (array geom<strong>et</strong>ry)<br />
time translation (chip)<br />
frequency/wavenumber (nonstationarity)<br />
excess bandwidth (oversampling)<br />
cyclostationarity<br />
polarization<br />
finite alphab<strong>et</strong><br />
...<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 356
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (1/2)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (2/2)<br />
Aiq: gain b<strong>et</strong>ween sensor i and source q<br />
Hpq: transfer b<strong>et</strong>ween reference and subarray p<br />
Sjq: sample j of source q<br />
βq: path loss, d q: DOA, bi: sensor location<br />
Tensor mo<strong>de</strong>l (NB far field) [SBG00]<br />
Reference subarray: Aiq = βq exp(j ω<br />
C bT i d q)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 358
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (2/2)<br />
Aiq: gain b<strong>et</strong>ween sensor i and source q<br />
Hpq: transfer b<strong>et</strong>ween reference and subarray p<br />
Sjq: sample j of source q<br />
βq: path loss, d q: DOA, bi: sensor location<br />
Tensor mo<strong>de</strong>l (NB far field) [SBG00]<br />
Reference subarray: Aiq = βq exp(j ω<br />
C bTi d q)<br />
Space translation (from reference subarray):<br />
βq exp( ω<br />
C [bi + ∆p] T d q) <strong>de</strong>f<br />
= Aiq Hpq<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 358
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Space translation diversity (2/2)<br />
Aiq: gain b<strong>et</strong>ween sensor i and source q<br />
Hpq: transfer b<strong>et</strong>ween reference and subarray p<br />
Sjq: sample j of source q<br />
βq: path loss, d q: DOA, bi: sensor location<br />
Tensor mo<strong>de</strong>l (NB far field) [SBG00]<br />
Reference subarray: Aiq = βq exp(j ω<br />
C bTi d q)<br />
Space translation (from reference subarray):<br />
βq exp( ω<br />
C [bi + ∆p] T d q) <strong>de</strong>f<br />
= Aiq Hpq<br />
Trilinear mo<strong>de</strong>l:<br />
Tijp = <br />
p: in<strong>de</strong>x of subarray<br />
q<br />
Aiq Sjq Hpq<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 358
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Time translation diversity<br />
DS-CDMA<br />
Aiq: gain b<strong>et</strong>ween sensor i and user q<br />
Cq(n): spreading sequence of user q (chip number: n)<br />
Hq(n): channel of user q<br />
Bqp = Hq ⋆ Cq(p), after removal of ISI (guards)<br />
Sqj: symbol j of user q<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 359
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Time translation diversity<br />
DS-CDMA<br />
Aiq: gain b<strong>et</strong>ween sensor i and user q<br />
Cq(n): spreading sequence of user q (chip number: n)<br />
Hq(n): channel of user q<br />
Bqp = Hq ⋆ Cq(p), after removal of ISI (guards)<br />
Sqj: symbol j of user q<br />
Tensor mo<strong>de</strong>l<br />
p: in<strong>de</strong>x of chip with no ISI<br />
Tijp = <br />
q<br />
Aiq Sqj Bqp<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 359
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />
Bibliographical comments<br />
NonNegative tensors: Projected ALS [CZPA09], Exact<br />
param<strong>et</strong>erization [RTMC11], Tensor package: url.<br />
Antenna Array Processing: [SBG00], [CL11] [LC10]<br />
CDMA: Sidiropoulos [SGB00]<br />
Discr<strong>et</strong>e <strong>sources</strong> and TCOM: [TVP96] [Com04] [ZC05] [ZC08]<br />
Audio (L.Dau<strong>de</strong>t)<br />
Image (D.Fadili)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 360
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Content<br />
Hyperspectral image of Mars<br />
Mo<strong>de</strong>ling the observations<br />
Aplying ICA<br />
IC interpr<strong>et</strong>ation<br />
Discussion<br />
with H. Hauksdottir (MsSc thesis, Univ. of Iceland), J. Chanussot<br />
(GIPSA-lab, Grenoble), S. Moussaoui (IRCCyN, Nantes), F.<br />
Schmidt and S. Douté (Plan<strong>et</strong>ology Lab., Grenoble), D. Brie<br />
5CRAN, Nancy), J. Benediksson (Univ. of Iceland)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 361
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Observations<br />
Data<br />
Hyperspectral image of Mars: Data<br />
OMEGA instrument: a spectrom<strong>et</strong>er on board of Mars Express<br />
256 frequency bands in visible and IR, from 0.926 mm to<br />
5.108 mm with a spectral resolution of 0.014 mm,<br />
variable spatial resolution (elliptic orbit) from 300m to 4km.<br />
only 174 frequency bands (the other are too noisy),<br />
portion of image: about 256 x 256 pixels<br />
one hyperspectral image is a cube of 10,000,000 data<br />
South Polar Cap of Mars mainly contains 3 endmembers: CO2<br />
ice, H2O ice and Dust.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 362
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Physical properties of observations<br />
Reflected light on the spectrom<strong>et</strong>er<br />
reflected light of sun inci<strong>de</strong>nt light according to ground<br />
reflectance,<br />
atmospheric attenuation and direct sensor illumination,<br />
geom<strong>et</strong>rical mixture occurs when a few different chemicals<br />
reflect directly the sun light according to their local abundance,<br />
intimate mixture corresponds to a (NL) mixture at very small<br />
scale,<br />
solar position with respect to the Polar cap leads to luminance<br />
gradient<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 363
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Mo<strong>de</strong>l of observations<br />
Luminance in location (x, y) t wavelength λ<br />
Assuming geom<strong>et</strong>rical mixture, the luminance writes as<br />
<br />
L(x, y, λ) = φ(λ) La(λ) +<br />
P<br />
p=1<br />
With simple algebra, it leads to a linear mo<strong>de</strong>l<br />
L(x, y, λ) =<br />
P<br />
p=1<br />
<br />
αp(x, y)Lp(λ) cos(θ(x, y)<br />
α ′ p(x, y)L ′ p(λ) + E(x, y, λ) + noise<br />
Estimated spectra different of original endmember spectra,<br />
since modified by gradient (cos θ(x, y)) and athmospheric<br />
effect (φ(λ)<br />
What are the in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong>: abundances, spectra ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 364
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Spectral or spatial ICA ?<br />
ICA approximations: spatial or spectral <strong>de</strong>composition ?<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 365
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Chosing the number of ICs<br />
Solutions<br />
Spatial ICA<br />
At least equal to the endmember number,<br />
sufficient for insuring an accurate approximation.<br />
3 endmembers: H2O, CO2 ices and Dust.<br />
Using PCA, with Nc = 7 components, one recovers 98% of<br />
the initial image power.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 366
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Spatial ICA results<br />
After PCA, JADE provi<strong>de</strong>s 7 ICs (images)<br />
How to interpr<strong>et</strong> the ICs ?<br />
For endmembers IC, one has ground truth with reference<br />
classification and reference spectra.<br />
For others ???<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 367
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
IC interpr<strong>et</strong>ation: using reference spectra<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 368
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
IC interpr<strong>et</strong>ation: using reference classification<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 369
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Interpr<strong>et</strong>ation of other ICs<br />
Labelling through expert interaction<br />
Results with preprocessed<br />
data, with removal of<br />
luminance and atmospheric<br />
effects and <strong>de</strong>fectuous<br />
sensors<br />
On these data, the power<br />
associated to IC1, IC3, IC4<br />
and IC7 are very small<br />
We <strong>de</strong>duce that these ICs are<br />
related to effects removed by<br />
the preprocessing<br />
Spectra of these ICs are<br />
more or less flat<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 370
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Interpr<strong>et</strong>ation of the last ICs<br />
Labelling through expert interaction<br />
The spectrum looks like a<br />
nonlinear mixture of CO2 ice<br />
and Dust spectra.<br />
Physicists suspect it could<br />
be associated to intinate<br />
mixtures.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 371
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Comments<br />
Hyperspectral images of Mars: Discussion<br />
5 ICs seems related to artefacts, and 2 like endmembers<br />
Artefacts cannot been explained with endmembers, and appear<br />
like ICs. The initial mo<strong>de</strong>l should be improved.<br />
2 endmembers are strongly correlated !<br />
Spectra are som<strong>et</strong>imes negative !<br />
Conclusions<br />
ICA is not suited to this problem !<br />
Classification cannot be done with these ICs.<br />
One can <strong>de</strong>sign a m<strong>et</strong>hod based on true assumptions, like<br />
positivity of spectra, e.g. based on Bayesian approaches<br />
[MHS + 08].<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 372
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Finite alphab<strong>et</strong>s<br />
Back to contrast criteria: APF<br />
Approximation of the MAP estimate<br />
Semi-Algebraic Blind Extraction algorithm: AMiSRoF<br />
Blind Extraction by ILSP<br />
Convolutive mo<strong>de</strong>l<br />
Presence of Carrier Offs<strong>et</strong> (in Digital Communications)<br />
Constant Modulus<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 373
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Contrast for discr<strong>et</strong>e inputs (1)<br />
Hypothesis H5 The <strong>sources</strong> take their value in a finite<br />
aphab<strong>et</strong> A <strong>de</strong>fined by the roots in C of some polynomial<br />
q(z) = 0<br />
Theorem [Com04]<br />
Un<strong>de</strong>r H5, the following is a contrast over the s<strong>et</strong> H of<br />
invertible P × P FIR filters.<br />
Υ(G; z) <strong>de</strong>f<br />
= − <br />
APF: Algebraic Polynomial Fitting<br />
n<br />
i<br />
|q(zi[n])| 2<br />
(56)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 374
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Contrast for discr<strong>et</strong>e inputs (2)<br />
For given alphab<strong>et</strong> A, <strong>de</strong>note G the s<strong>et</strong> of numbers c such<br />
that c A ⊆ A.<br />
Lemma 1 Trivial filters satisfying H5 are of the form:<br />
P D[z]<br />
with D[z] diagonal and Dpp[z] = cp z n , for some n ∈ Z and<br />
some cp ∈ G.<br />
Because A is finite, any c ∈ G must be of unit modulus, and<br />
we must have c A = A, ∀c ∈ G.<br />
Also any c ∈ G has an inverse c −1 in G.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 375
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Advantages<br />
Contrast for discr<strong>et</strong>e inputs : summary<br />
The previous contrast allows to separate correlated <strong>sources</strong><br />
But it needs all <strong>sources</strong> to have the same (known) alphab<strong>et</strong><br />
If <strong>sources</strong> have different alphab<strong>et</strong>s, one can extract <strong>sources</strong> in<br />
parallel with different criteria: Parallel Extraction [RZC05]<br />
By <strong>de</strong>flation with different criteria, one can extract more<br />
<strong>sources</strong> than sensors: Parallel Deflation [RZC05]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 376
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Kurtosis:<br />
Equivalence b<strong>et</strong>ween KMA and CMA<br />
Constant modulus:<br />
ΥKMA = Cum{z, z, z∗ , z ∗ }<br />
[E{ |z| 2 }] 2<br />
JCMA = E{[ |z| 2 − R ] 2 }<br />
Assume 2nd Or<strong>de</strong>r circular <strong>sources</strong>: E{s 2 } = 0<br />
Then KMA and CMA are equivalent [Reg02] [Com04]<br />
Proof.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 377
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Discussion on Deflation (MISO) criteria<br />
L<strong>et</strong> z <strong>de</strong>f<br />
= f H x. Criteria below stationary iff differentials _ p and _ q are<br />
collinear:<br />
Ratio: Max<br />
f<br />
p(f )<br />
q(f )<br />
Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />
and q = E{|z| 2 } 2<br />
Difference: Min<br />
f p(f ) − α q(f )<br />
Example: Constant Modulus, with p = E{|z| 4 } and<br />
q = 2 a E{|z| 2 } − a 2 or Constant Power, with<br />
q = 2 a ℜ(E{z 2 }) − a 2<br />
Constrained: Max p(f )<br />
q(f )=1<br />
Example: Cumulant, with<br />
p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z2 }| 2<br />
Example: Moment, with p = E{|z| 4 }, if standardized and<br />
with either q = ||f || 2 or q = E{|z| 2 } 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 378
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Parallel Deflation<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 379
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Parallel Extraction<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 380
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Parallel extraction<br />
Parallel extraction of 3 <strong>sources</strong> (QPSK, QAM16, PSK6), from a<br />
3-sensor length-3 random Gaussian channel [RZC05]<br />
Symbol Error Rate (%)<br />
10 0<br />
10 −1<br />
10 −2<br />
10 −3<br />
10 −4<br />
Lc=2, Lh=13<br />
QAM16<br />
QPSK<br />
PSK−6<br />
10<br />
0 5 10 15 20 25 30<br />
−5<br />
Signal to Noise ratio (dB)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 381
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
APF extraction<br />
Parallel Deflation from a mixture of 4 <strong>sources</strong> (2 QPAK and 2<br />
QAM16) received on 3 sensors. Extraction of a QPSK source in<br />
figure, compared to MMSE [RZC05]<br />
10 1<br />
10 0<br />
MSE<br />
10 −1<br />
Un<strong>de</strong>r<strong>de</strong>termined mixture (4 inputs − 3 ouputs), 600 samples<br />
APF extraction filter<br />
MMSE filter<br />
10<br />
0 2 4 6 8 10 12 14 16 18 20<br />
−2<br />
SNR(dB)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 382
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Optimal solution<br />
MAP estimate<br />
( ˆH, ˆs) MAP = Arg Max<br />
H s ps|x,H (x, s, H)<br />
If sp ∈ A, and if noise is Gaussian, then<br />
( ˆH, ˆs) MAP = Arg Min<br />
H, s∈AP ||x − H s|| 2<br />
Less costly to search (inverse filter when it exists)<br />
or by <strong>de</strong>flation:<br />
(ˆF, ˆs) MAP = Arg Min<br />
F , s∈AP ||Fx − s|| 2<br />
( ˆ f , ˆs) MAP = Arg Min<br />
f , s∈AP ||f H x − s|| 2<br />
(57)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 383
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Approximation of the MAP estimate<br />
For alphab<strong>et</strong> of constant modulus, MAP criterion (57) is<br />
asymptotically equivalent (for large samples of size T ) to [GC98]:<br />
ΥT (f ) = 1<br />
T<br />
T<br />
t=1<br />
cardA <br />
j=1<br />
|f H x[t] − aj[t]| 2<br />
where aj[t] ∈ A<br />
We have transformed an exhaustive serach into a polynomial<br />
alphab<strong>et</strong> fit<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 384
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Algorithm AMiSRoF<br />
Absolute Minimimum Search by Root Finding [GC98]<br />
Initialize f = f o<br />
For k = 1 to kmax, and while |µk| >threshold, do<br />
Compute gradient g k and Hessian Hk at f k−1<br />
Compute a search direction v k, e.g. v k = Hk −1g k<br />
Normalize v k to ||v k|| = 1<br />
Compute the absolute minumum µk of the rational function in<br />
µ:<br />
Φ(µ) <strong>de</strong>f<br />
= ΥT (f k−1 + µ v k)<br />
S<strong>et</strong> f k = f k−1 + µk v k<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 385
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />
Algorithm ILSP<br />
Iterative Least-Squares with Projection [TVP96]<br />
Assumes that components si[n] ∈ A, known alphab<strong>et</strong><br />
Assumes columns of H belong to a known array manifold<br />
Initialize H, and start the loop<br />
Compute LS estimate of matrix S in equation X = H S<br />
Project S onto A<br />
Compute LS estimate of H in equation X = H S<br />
Project H onto the array manifold<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 386
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Content<br />
Smart chemical sensore arrays<br />
Nicolski-Eisenman mo<strong>de</strong>l<br />
M<strong>et</strong>hod based on source silences (Eusipco 2008) [DJ08]<br />
M<strong>et</strong>hod based on positivity (ICA 2009) [DJM09]<br />
These works were done by L. Duarte during his PhD thesis in<br />
GIPSA-lab (2006-2009)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 387
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Ca 2+ sensor (mV)<br />
[Ca 2+ ] (M)<br />
Ion-selective electro<strong>de</strong>s: the interference problem<br />
- 0 . 0 2<br />
- 0 . 0 4<br />
- 0 . 0 6<br />
Aim: to estimate the concentration of several ions in a<br />
solution.<br />
Electro<strong>de</strong> response<br />
- . 0 0 8<br />
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />
0 . 0 8<br />
0 . 0 6<br />
0 . 0 4<br />
0 . 0 2<br />
Mixture 1<br />
Ca 2+ concentration (M)<br />
0<br />
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />
Ca 2+<br />
Ca 2+<br />
Na+<br />
Ca 2+<br />
Na +<br />
Na +<br />
Na + sensor (mV)<br />
[Na + ] (M)<br />
- 0 . 0 4<br />
- 0 . 0 6<br />
- 0 . 0 8<br />
Electro<strong>de</strong> response<br />
- . 0 1<br />
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />
0 . 1<br />
0 . 0 5<br />
Mixture 2<br />
Na + concentration (M)<br />
0<br />
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />
Source 1 Source 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 388
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Electro<strong>de</strong> <strong>de</strong>scription - The Nicolsky-Eisenman mo<strong>de</strong>l<br />
<br />
xi(t) = ci + di log si(t) + <br />
j,j=i<br />
zi <br />
z<br />
aijsj(t) j , (58)<br />
si(t) ⇒ targ<strong>et</strong> ion concentration; sj(t) ⇒ interfering ions<br />
concentrations<br />
ci, di, aij ⇒ mixing mo<strong>de</strong>l param<strong>et</strong>ers;<br />
zi and zj ⇒ valences of the ions i and j<br />
When zi = zj ⇒ Post-nonlinear (PNL) mixing mo<strong>de</strong>l.<br />
We are interested in the case in which zi = zj.<br />
We consi<strong>de</strong>r a scenario with two ions and two electro<strong>de</strong>s.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 389
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
The Nicolsky-Eisenman mo<strong>de</strong>l (cont.)<br />
In previous works, we assumed that the component-wise<br />
functions were known in advance.<br />
Resulting Mo<strong>de</strong>l<br />
In this work, we consi<strong>de</strong>r the compl<strong>et</strong>e mo<strong>de</strong>l<br />
<br />
x1(t) = d1 log s1(t) + a12s2(t) k<br />
<br />
<br />
x2(t) = d2 log s2(t) + a21s1(t) 1<br />
, (59)<br />
k<br />
with k = z1/z2.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 390
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
The Nicolsky-Eisenman mo<strong>de</strong>l (cont.)<br />
In previous works, we assumed that the component-wise<br />
functions were known in advance.<br />
Resulting Mo<strong>de</strong>l<br />
In this work, we consi<strong>de</strong>r the compl<strong>et</strong>e mo<strong>de</strong>l<br />
<br />
x1(t) = d1 log s1(t) + a12s2(t) k<br />
<br />
<br />
x2(t) = d2 log s2(t) + a21s1(t) 1<br />
, (59)<br />
k<br />
with k = z1/z2.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 390
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
s 1<br />
s 2<br />
The Nicolsky-Eisenman mo<strong>de</strong>l (cont.)<br />
In previous works, we assumed that the component-wise<br />
functions were known in advance.<br />
Nonlinear<br />
Mixing<br />
p 1<br />
p 2<br />
d 1 log(p 1 )<br />
d 2 log(p 2 )<br />
x 1<br />
x 2<br />
exp(x 1 /u 1 )<br />
exp(x 2 /u 2 )<br />
e 1<br />
e 2<br />
Recurrent<br />
N<strong>et</strong>work<br />
Mixing Process Separating System<br />
Resulting Mo<strong>de</strong>l<br />
In this work, we consi<strong>de</strong>r the compl<strong>et</strong>e mo<strong>de</strong>l<br />
<br />
x1(t) = d1 log s1(t) + a12s2(t) k<br />
<br />
<br />
x2(t) = d2 log s2(t) + a21s1(t) 1<br />
, (59)<br />
k<br />
with k = z1/z2.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 390<br />
y 1<br />
y 2
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Assumptions<br />
1 The <strong>sources</strong> are statistically in<strong>de</strong>pen<strong>de</strong>nt;<br />
2 The <strong>sources</strong> are positive and boun<strong>de</strong>d, i.e.,<br />
> 0;<br />
si(t) ∈ [S min<br />
i<br />
, S max<br />
i<br />
], where S max<br />
i<br />
> S min<br />
i<br />
3 The mixing system is invertible in the region given by<br />
[S min ] × [S min ];<br />
1 , S max<br />
1<br />
2 , S max<br />
2<br />
4 k (the ratio b<strong>et</strong>ween the valences) is known and takes only<br />
positive integer values;<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 391
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
s 1<br />
s 2<br />
s 1<br />
1 . 4<br />
1 . 2<br />
Basic i<strong>de</strong>a<br />
Additional assumption: there is, at least, a period of time<br />
where one, and only one, source has zero-variance.<br />
1<br />
0 . 8<br />
0 . 6<br />
0 . 4<br />
0 . 2<br />
Nonlinear<br />
Mixing<br />
p 1<br />
p 2<br />
. . . . . . . . . .<br />
0<br />
0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1 1 1<br />
s 2<br />
p 1<br />
d 1 log(p 1 )<br />
d 2 log(p 2 )<br />
2 . 5<br />
2<br />
1 . 5<br />
1<br />
0 . 5<br />
x 1<br />
x 2<br />
exp(x 1 /u 1 )<br />
exp(x 2 /u 2 )<br />
. . . . . . .<br />
0<br />
0 2 0 4 0 6 0 8 1 1 2 1 4 1 6<br />
p 2<br />
e 1<br />
e 2<br />
x 1<br />
0 . 0 1<br />
0 . 0 0 5<br />
0<br />
- 0 . 0 0 5<br />
- 0 . 0 1<br />
- 0 . 0 1 5<br />
- 0 . 0 2<br />
- 0 . 0 2 5<br />
Recurrent<br />
N<strong>et</strong>work<br />
y 1<br />
y 2<br />
- .<br />
- . - . - . - . . .<br />
0 0 3<br />
0 0 4 0 0 3 0 0 2 0 0 1 0 0 0 1 0 0 2<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 392<br />
x 2
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
s 1<br />
s 2<br />
Nonlinear<br />
Mixing<br />
Basic i<strong>de</strong>a in equations<br />
p 1<br />
p 2<br />
d 1 log(p 1 )<br />
d 2 log(p 2 )<br />
x 1<br />
x 2<br />
exp(x 1 /u 1 )<br />
exp(x 2 /u 2 )<br />
e 1<br />
e 2<br />
Recurrent<br />
N<strong>et</strong>work<br />
Mixing Process Separating System<br />
During the silent periods (s1(t) = S1):<br />
p1(t) = S1 + a12s2(t) k<br />
p2(t) = s2(t) + a21S 1<br />
k<br />
1<br />
y 1<br />
y 2<br />
. (60)<br />
In the (p1, p2) plane, we have a polynomial of or<strong>de</strong>r k:<br />
p1(t) = S1 + a12(p2(t) − a21S 1<br />
k<br />
1 )k . (61)<br />
p1(t) =<br />
k<br />
ϕip2(t) i , (62)<br />
i=0<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 393
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Recovering of polynomials<br />
Mapping b<strong>et</strong>ween the (p1, p2) and (e1, e2) planes<br />
e1(t)<br />
e2(t)<br />
=<br />
=<br />
<br />
d1 log(p1(t))<br />
exp u1<br />
<br />
d2 log(p2(t))<br />
exp<br />
u2<br />
= p1(t) d 1<br />
u1<br />
= p2(t) d 2<br />
u2<br />
. (63)<br />
Using this equation and the polynomial relation in the (p1, p2)<br />
plane<br />
<br />
k<br />
e1(t) =<br />
ϕie2(t) u <br />
2 i d2 d1 u1<br />
. (64)<br />
i=0<br />
For what values of u1 and u2 the function (64) is a polynomial<br />
of or<strong>de</strong>r k?<br />
I<strong>de</strong>al solution u1 = d1 and u2 = d2;<br />
When all the coefficients except ϕk in (64) are null, then<br />
(u1 = Dd1, u2 = Dd2) also culminates in a polynomial of or<strong>de</strong>r<br />
k.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 394
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
How to <strong>de</strong>tect the silent periods?<br />
During the silent periods,<br />
x1 = g1(s2)<br />
x2 = g2(s2)<br />
. (65)<br />
Maximum (nonlinear) correlation b<strong>et</strong>ween x1 and x2<br />
Normalized mutual information<br />
ς(x1, x2) = 1 − exp(−2I (x1, x2)) (66)<br />
ς(x1, x2) = 0 when x1 and x2 are statistically in<strong>de</strong>pen<strong>de</strong>nt;<br />
ς(x1, x2) → 1 when there is a <strong>de</strong>terministic relation b<strong>et</strong>ween x1<br />
and x2;<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 395
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Result for silent periods <strong>de</strong>tection<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 396
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Final Algorithm<br />
1 D<strong>et</strong>ection of silent periods<br />
Estimate the mutual information b<strong>et</strong>ween the mixtures x1 and<br />
x2 for a moving-time window.<br />
Select the time window in which the mutual information is<br />
maximum.<br />
2 Estimation of the component-wise functions by the polynomial<br />
recovering i<strong>de</strong>a.<br />
For the selected time window, minimize the expression<br />
min<br />
u1,u2<br />
<br />
<br />
e1(t) −<br />
t<br />
k<br />
i=0<br />
αi (e2(t)) i<br />
2<br />
, (67)<br />
3 Training the recurrent n<strong>et</strong>work<br />
Apply on ei the algorithm for the simplified version of the NE<br />
mo<strong>de</strong>l.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 397
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results<br />
Estimation of the concentration of the ions Ca 2+ and Na +<br />
using two electro<strong>de</strong>s;<br />
Mixing param<strong>et</strong>ers a12 = 0.79 and a21 = 0.40 (extracted from<br />
a IUPAC report) and d1 = 0.0129 and d2 = 0.0258;<br />
Sources artificially generated;<br />
Number of samples: 1000;<br />
Window length for silent period <strong>de</strong>tection: 151;<br />
Performance in<strong>de</strong>x:<br />
<br />
E{s<br />
SIRi = 10 log<br />
2 i }<br />
E{(si − yi) 2 <br />
. (68)<br />
}<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 398
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results (cont.)<br />
SIR1 = 40.52dB, SIR2 = 38.27dB and SIR = 39.39dB<br />
<strong>sources</strong> mixtures r<strong>et</strong>rieved <strong>sources</strong><br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 399
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Conclusions<br />
A compl<strong>et</strong>e nonlinear BSS m<strong>et</strong>hod was <strong>de</strong>veloped for a<br />
chemical sensing application.<br />
The <strong>de</strong>veloped m<strong>et</strong>hod can avoid time-<strong>de</strong>manding calibration<br />
stages.<br />
There are still several challenging points to be investigated:<br />
The number of samples in a real application may be small.<br />
The in<strong>de</strong>pen<strong>de</strong>nce assumption may be rather strong, e.g. if a<br />
regulatory process b<strong>et</strong>ween the ions exists.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 400
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Motivations for using the Bayesian approach<br />
Prior information is available<br />
⎛<br />
xit = ei + di log ⎝sit 10 + <br />
j,j=i<br />
zi z<br />
aijsjt<br />
j<br />
⎞<br />
⎠ + nit, (69)<br />
ei takes value in the interval [0.05, 0.35]; [Gru07]<br />
Theor<strong>et</strong>ical value for the Nerstian slope ⇒ di = RT ln(10)/ziF<br />
(0.059V for room temperature); [FF03]<br />
Always non-negative. Very often in the interval [0, 1] ;<br />
[UBU + 00]<br />
The <strong>sources</strong> are positive.<br />
Takes noise into account;<br />
In contrast to ICA, the statistical in<strong>de</strong>pen<strong>de</strong>nce is rather a<br />
working assumption in the Bayesian approach [FG06];<br />
May work even if the number of samples is small.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 401
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Bayesian source separation m<strong>et</strong>hod: problem and notations<br />
Problem: given X, estimate the unknown param<strong>et</strong>ers<br />
θ = [S, A, d, e, σ, φ];<br />
S ⇒ <strong>sources</strong>;<br />
φ ⇒ <strong>sources</strong> hyperparam<strong>et</strong>ers;<br />
A ⇒ selectivity coefficients;<br />
d ⇒ Nerstian slopes;<br />
e ⇒ offs<strong>et</strong> param<strong>et</strong>ers;<br />
σ ⇒ noise variances.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 402
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Bayesian source separation m<strong>et</strong>hod: an overview<br />
Problem: given X, estimate the unknown param<strong>et</strong>ers<br />
θ = [S, A, d, e, σ, φ];<br />
In the Bayesian approach, estimation of θ is based on the<br />
posterior information<br />
p(θ|X) ∝ p(X|θ)p(θ) (70)<br />
The likelihood function is given by:<br />
⎛ ⎛<br />
nd nc<br />
ns<br />
<br />
<br />
p(X|θ) = ⎝ei + di log ⎝<br />
t=1i=1<br />
Nxit<br />
j=1<br />
aijs zi /zj<br />
jt<br />
⎞<br />
⎠ , σ 2 i<br />
since we assume an i.i.d. Gaussian noise which is spatial<br />
in<strong>de</strong>pen<strong>de</strong>nt.<br />
⎞<br />
⎠ ,<br />
(71)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 403
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Prior <strong>de</strong>finitions<br />
Log-normal prior distribution for the <strong>sources</strong> (non-negative<br />
distribution)<br />
1<br />
p(sjt) = <br />
sjt 2πσ2 <br />
exp −<br />
sj<br />
(log(sjt) − µsj )2<br />
2σ2 <br />
[0,+∞[(sjt),<br />
sj<br />
(72)<br />
Motivations<br />
The estimation of φj = [µsj σ2 sj ] is not difficult, since we can<br />
<strong>de</strong>fine a conjugate pair.<br />
Ionic activities are expected to have a small variation in the<br />
logarithmic scale.<br />
The <strong>sources</strong> are assumed i.i.d. and statistically mutually<br />
in<strong>de</strong>pen<strong>de</strong>nt:<br />
ns nd p(S) = p(sjt), (73)<br />
j=1 t=1<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 404
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Prior <strong>de</strong>finitions (cont.)<br />
Sources param<strong>et</strong>ers φj = [µsj σ2 sj ]<br />
p(µsj ) = N (˜µsj , ˜σ2 sj ), p(1/σ2 sj ) = G(ασs j , βσs j ) (74)<br />
Selectivity coefficients aij: very often within [0, 1]<br />
p(aij) = U(0, 1) (75)<br />
Nernstian slopes di: i<strong>de</strong>ally 0.059V at room temperature<br />
p(di) = N (µdi = 0.059/zi, σ 2 di ) (76)<br />
Offs<strong>et</strong> param<strong>et</strong>ers ei lie in the interval [0.050, 0.350]V<br />
Noise variances σi:<br />
p(ei) = N (µei = 0.20, σ2 ei ) (77)<br />
p(1/σ 2 i ) = G(ασi , βσi ) (78)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 405
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
The posterior distribution<br />
The posterior distribution is given by<br />
ns <br />
p(θ|X) ∝ p(X|θ) ·<br />
n d<br />
j=1 t=1<br />
p(sjt|µsj , σ2 ns <br />
sj ) · p(µsj )<br />
j=1<br />
ns<br />
nc ns<br />
nc nc nc<br />
<br />
· p(σsj ) · p(aij) · p(ei) · p(di) · p(σi)<br />
j=1<br />
i=1 j=1<br />
i=1<br />
i=1<br />
Bayesian MMSE estimator ⇒ θMMSE = θp(θ|X)dθ<br />
(Difficult to calculate!)<br />
i=1<br />
Given θ 1 , θ 2 , . . . , θ M (samples drawn from p(θ|X)), the<br />
Bayesian MMSE estimator can be approximated by:<br />
θMMSE = 1<br />
M<br />
θ<br />
M<br />
i<br />
i=1<br />
(79)<br />
(80)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 406
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results on synth<strong>et</strong>ic data<br />
Synth<strong>et</strong>ic <strong>sources</strong> (drawn from log-normal distributions);<br />
Noise level ⇒ SNR1 = SNR2 = 18 dB;<br />
Number of samples: 500;<br />
Performance in<strong>de</strong>x:<br />
<br />
E{si(t)<br />
SIRi = 10 log<br />
2 }<br />
E{(si(t) − si(t)) 2 <br />
. (81)<br />
}<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 407
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results on synth<strong>et</strong>ic data: mixtures<br />
Two ISEs (nc = 2) for <strong>de</strong>tecting the activities of NH + 4 and K + (ns = 2) ⇒<br />
Post-nonlinear mo<strong>de</strong>l;<br />
Mixing coefficients a12 = 0.3, a21 = 0.4, d1 = 0.056, d2 = 0.056, e1 = 0.090,<br />
e2 = 0.105;<br />
SIR1 = 19.7 dB, SIR2 = 18.9 dB, SIR = 19.3 dB<br />
(a) Mixtures. (b) Original <strong>sources</strong>.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 408
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results on synth<strong>et</strong>ic data: estimated <strong>sources</strong><br />
Two ISEs (nc = 2) for <strong>de</strong>tecting the activities of NH + 4 and K + (ns = 2) ⇒<br />
Post-nonlinear mo<strong>de</strong>l;<br />
Mixing coefficients a12 = 0.3, a21 = 0.4, d1 = 0.056, d2 = 0.056, e1 = 0.090,<br />
e2 = 0.105;<br />
SIR1 = 19.7 dB, SIR2 = 18.9 dB, SIR = 19.3 dB<br />
(a) Mixtures. (b) R<strong>et</strong>rieved <strong>sources</strong>.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 409
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results on real data<br />
ISE array (NH + 4 − ISE and K + − ISE)<br />
Sources<br />
ISE<br />
Array<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 410
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results on real data (cont.)<br />
nd = 169, SIR1 = 25.1 dB, SIR2 = 23.7 dB, SIR = 24.4 dB<br />
(a) ISE array response. (b) Actual <strong>sources</strong>.<br />
Since the <strong>sources</strong> are clearly <strong>de</strong>pen<strong>de</strong>nt here, an ICA-based<br />
m<strong>et</strong>hod failed in this case.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 411
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Results on real data (cont.)<br />
nd = 169, SIR1 = 25.1 dB, SIR2 = 23.7 dB, SIR = 24.4 dB<br />
(a) ISE array response. (b) R<strong>et</strong>rieved signals.<br />
Since the <strong>sources</strong> are clearly <strong>de</strong>pen<strong>de</strong>nt here, an ICA-based<br />
m<strong>et</strong>hod failed in this case.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 412
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />
Conclusions<br />
A Bayesian non-linear source separation was proposed for<br />
processing the outputs of an ISE array;<br />
Good results were obtained even in a complicated situation<br />
(<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong> and reduced number of samples);<br />
Future works inclu<strong>de</strong><br />
The <strong>de</strong>velopment of a mo<strong>de</strong>l that also takes into account the<br />
time-structure of the <strong>sources</strong>;<br />
The case where neither the number of ions in the solution nor<br />
their valences are available;<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 413
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Show-through effect<br />
Cooperation with F. Merrikh-Bayat and M. Babaie-Za<strong>de</strong>h, Sharif<br />
Univ. of Technology [MBBZJ08, MBBZJ11]<br />
sous-titre<br />
Show-through, due to paper transparency and thickness,<br />
Pigment oil pen<strong>et</strong>ration,<br />
Vehicle oil component, due to loss of opacity,<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 414
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
sous-titre<br />
State-of-the-art<br />
Often applied for texts and handwritting documents: 1-si<strong>de</strong><br />
m<strong>et</strong>hods or 2-si<strong>de</strong> m<strong>et</strong>hods,<br />
ICA assuming<br />
Linear mo<strong>de</strong>l of mixtures (Tonazzini <strong>et</strong> al., 2007 ; Ophir,<br />
Malah, 2007)<br />
Nonlinear mo<strong>de</strong>l of mixtures (Almeida, 2005 ; Sharma, 2001)<br />
In this work, we consi<strong>de</strong>r:<br />
mo<strong>de</strong>lisation of the nonlinear mixture,<br />
blurring effect.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 415
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Nonlinearity of show-through: experimental study<br />
Evi<strong>de</strong>nce<br />
Sum of luminance is NL.<br />
Whiter the pixel, more important is show-through. More black<br />
than black is impossible !<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 416
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Nonlinearity of show-through: mathematical mo<strong>de</strong>l<br />
Basic equation<br />
Show-through has a gain which <strong>de</strong>pends of the grayscale of<br />
the front image<br />
It leads to the mo<strong>de</strong>l of mixtures:<br />
f s<br />
r (x, y) = a1f i<br />
r (x, y) + b1f i<br />
v (x, y)g1[f i<br />
r (x, y)]<br />
f s<br />
v (x, y) = a2f i<br />
v (x, y) + b2f i<br />
r (x, y)g2[f i<br />
v (x, y)]<br />
where i = initial, s = scanned, r = recto, v = verso, ai and bi<br />
<strong>de</strong>note unknown mixing param<strong>et</strong>ers, and gi(.) <strong>de</strong>note nonlinear<br />
gains<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 417
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Nonlinearity of show-through: mathematical mo<strong>de</strong>l<br />
Shape of the gain<br />
The gain function can be estimated by computing<br />
g1[f i<br />
r (x, y)] = [f s<br />
v (x, y) − a1f i<br />
r (x, y)]/b1f i<br />
v (x, y)<br />
g2[f i<br />
v (x, y)] = [f s<br />
v (x, y) − a2f i<br />
v (x, y)]/b2f i<br />
r (x, y)<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 418
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Nonlinearity of show-through: mathematical mo<strong>de</strong>l<br />
Approximation of the gain function<br />
The gain function can be estimated by an exponential:<br />
g1[f i<br />
r (x, y)] = γ1 exp[β1f i<br />
r (x, y)] ≈ γ1(1 + β1f i<br />
r (x, y))<br />
g2[f i<br />
v (x, y)] = γ2 exp[β2f i<br />
v (x, y)] ≈ γ2(1 + β2f i<br />
v (x, y))<br />
It leads to the approximated mixing mo<strong>de</strong>l:<br />
<br />
f s<br />
r (x, y) = a1f i<br />
r (x, y) + b ′ 1f i<br />
v (x, y)[1 + β1f i<br />
f s<br />
v (x, y) = a2f i<br />
v (x, y) + b ′ 2f i<br />
r (x, y)[1 + β2f i<br />
r (x, y)]<br />
v (x, y)]<br />
And finally:<br />
f s<br />
r (x, y) = a1f i<br />
r (x, y) − l1f i<br />
v (x, y) − q1f i<br />
v (x, y)f i<br />
r (x, y)]<br />
f s<br />
v (x, y) = a2f i<br />
v (x, y) + l2f i<br />
r (x, y) − q2f i<br />
r (x, y)f i<br />
v (x, y)]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 419
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Recursive structure<br />
Separation structure<br />
Studied by Deville and Hosseini [HD03, DH09]<br />
f s<br />
r (x, y) = a1f i<br />
r (x, y) − l1f i<br />
v (x, y) − q1f i<br />
v (x, y)f i<br />
r (x, y)]<br />
f s<br />
v (x, y) = a2f i<br />
v (x, y) + l2f i<br />
r (x, y) − q2f i<br />
r (x, y)f i<br />
v (x, y)]<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 420
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Cancellation of show-through: preliminary results<br />
Preliminary results with NL mo<strong>de</strong>l<br />
Bilinear mo<strong>de</strong>l neither always invertible, nor always stable.<br />
Param<strong>et</strong>ers estimated by ML.<br />
Comments<br />
The other si<strong>de</strong> image never perfectly removed, especially when<br />
no superimposition.<br />
It means difference b<strong>et</strong>ween verso image and recto image is<br />
not a simple gain<br />
Diffusion in the paper ⇒ blurring effect, mo<strong>de</strong>lled by 2D filter.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 421
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Improved mo<strong>de</strong>l and recursive structure<br />
Mo<strong>de</strong>l with filtering<br />
The mixture is not the nonlinear superimposition of the recto<br />
image with the verso image, but with a filtered version of the<br />
verso image, hence the final separation structure<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 422
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Cancellation of show-through<br />
Final results with NL mo<strong>de</strong>ling and filtering<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 423
Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />
Summary<br />
Scanned images: conclusions and perspectives<br />
Show-through is a NL phenomenon which can be mo<strong>de</strong>led by<br />
bilinear mixtures ;<br />
In addition, it is necessary to consi<strong>de</strong>r the blurring effect,<br />
which can be viewed as a 2-D filtering effect.<br />
Experimental results show the mixture mo<strong>de</strong>l has to take into<br />
account both NL and convolutive effects.<br />
Perspectives<br />
For avoiding registration due to the 2-si<strong>de</strong> scanning, one could<br />
explore if two scans of the same si<strong>de</strong> (un<strong>de</strong>r different<br />
conditions) provi<strong>de</strong> sufficient diversity;<br />
Other priors, like positivity of images, and of the coefficients<br />
could be exploited, e.g. by NMF or Bayesian approaches.<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 424
Bibliography<br />
XI. Bibliography<br />
I II III IV V VI VII VIII IX X<br />
P.Comon & C.Jutten, Peyresq July 2011 BSS XI.Bibliography 425
References<br />
[AAG03] U. Amato, A Antoniadis, and G. Gregoire. In<strong>de</strong>pen<strong>de</strong>nt component discriminant analysis.<br />
Int. J. Mathematics, 3:735–753, 203.<br />
[ACCF04] L. Albera, P. Comon, P. Chevalier, and A. Ferreol. Blind i<strong>de</strong>ntification of un<strong>de</strong>r<strong>de</strong>termined<br />
mixtures based on the hexacovariance. In ICASSP’04, volume II, pages 29–32, Montreal,<br />
May 17–21 2004.<br />
[ACX07] L. Albera, P. Comon, and H. Xu. SAUD, un algorithme d’ica par déflation semi-algébrique.<br />
In GRETSI, Troyes, September 2007.<br />
[AFCC04] L. Albera, A. Ferreol, P. Comon, and P. Chevalier. Blind i<strong>de</strong>ntification of overcompl<strong>et</strong>e<br />
mixtures of <strong>sources</strong> (BIOME). Lin. Algebra Appl., 391:1–30, November 2004.<br />
[AFCC05] L. Albera, A. Ferreol, P. Chevalier, and P. Comon. ICAR, a tool for blind source separation<br />
using fourth or<strong>de</strong>r statistics only. IEEE Trans. Sig. Proc., 53(10):3633–3643, October 2005.<br />
[AJ05] S. Achard and C. Jutten. I<strong>de</strong>ntifiability of post nonlinear mixtures. IEEE Signal Processing<br />
L<strong>et</strong>ters, 12(5):423–426, May 2005.<br />
[Alm05] L. Almeida. Separating a real-life nonlinear image mixture. Journal of Machine Learning<br />
Research, 6:1199–1229, July 2005.<br />
[Alm06] L. Almeida. Nonlinear Source Separation. Synthesis Lectures on Signal Processing, vol. 2.<br />
Morgan & Claypool Publishers, 2006.<br />
[AMLM97] K. Abed-Meraim, P. Loubaton, and E. Moulines. A subspace algorithm for certain blind<br />
i<strong>de</strong>ntification problems. IEEE Trans. Inf. Theory, pages 499–511, March 1997.<br />
[AMML97] K. Abed-Meraim, E. Moulines, and P. Loubaton. Prediction error m<strong>et</strong>hod for second-or<strong>de</strong>r<br />
blind i<strong>de</strong>ntification. IEEE Trans. Sig. Proc., 45(3):694–705, March 1997.<br />
[APJ01] S. Achard, D.T. Pham, and C. Jutten. Blind source separation in post nonlinear mixtures.<br />
In Proc. of the 3rd Workshop on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Signal Separation<br />
(ICA2001), pages 295–300, San Diego (California, USA), 2001.<br />
[APJ03] S. Achard, D.T. Pham, and C. Jutten. Quadratic <strong>de</strong>pen<strong>de</strong>nce measure for nonlinear blind<br />
source separation. In S.-I. Amari, A. Cichocki, S. Makino, and N. Murata, editors, Proc. of<br />
4th Int. Symp. on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Blind Source Separation (ICA2003),<br />
pages 263–268, Nara, Japan, April 2003.<br />
[APJ04] S. Achard, D.T. Pham, and C. Jutten. Criteria based on mutual information minimization<br />
for blind source separation in post-nonlinear mixtures. Signal Processing, 85(5):965–974,<br />
2004.<br />
[BAM93] A. Belouchrani and K. Abed-Meraim. <strong>Séparation</strong> aveugle au second ordre <strong>de</strong> <strong>sources</strong><br />
corrélées. In GRETSI, pages 309–312, Juan-Les-Pins, France, Septembre 1993.<br />
[BCA + 10] H. Becker, P. Comon, L. Albera, M. Haardt, and I. Merl<strong>et</strong>. Multiway analysis for source<br />
localization and extraction. In EUSIPCO’2010, pages 1349–1353, Aalborg, DK, August 23-27<br />
2010. hal-00501845. Best stu<strong>de</strong>nt paper award.<br />
1
[BCMT10] J. Brachat, P. Comon, B. Mourrain, and E. Tsigaridas. Symm<strong>et</strong>ric tensor <strong>de</strong>composition.<br />
Linear Algebra Appl., 433(11/12):1851–1872, December 2010. hal:inria-00355713,<br />
arXiv:0901.3706.<br />
[BGR80] A. Benveniste, M. Goursat, and G. Rug<strong>et</strong>. Robust i<strong>de</strong>ntification of a non-minimum phase<br />
system. IEEE Trans. Auto. Control, 25(3):385–399, June 1980.<br />
[BK83] G. Bienvenu and L. Kopp. Optimality of high-resolution array processing using the eigensystem<br />
approach. IEEE Trans. ASSP, 31(5):1235–1248, October 1983.<br />
[BS95] A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation<br />
and blind <strong>de</strong>convolution. Neural Computation, 7(6):1129–1159, November 1995.<br />
[BS02] J. M. F. Ten Berge and N. Sidiropoulos. On uniqueness in Can<strong>de</strong>comp/Parafac. Psychom<strong>et</strong>rika,<br />
67(3):399–409, September 2002.<br />
[BZJN01] M. Babaie-Za<strong>de</strong>h, C. Jutten, and K. Nayebi. Separating convolutive post non-linear mixtures.<br />
In Proc. of the 3rd Workshop on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Signal Separation<br />
(ICA2001), pages 138–143, San Diego (California, USA), 2001.<br />
[BZJN02] M. Babaie-Za<strong>de</strong>h, C. Jutten, and K. Nayebi. A geom<strong>et</strong>ric approach for separating post<br />
nonlinear mixtures. In Proc. of the XI European Signal Processing Conf. (EUSIPCO 2002),<br />
volume II, pages 11–14, Toulouse, France, September 2002.<br />
[BZJN04] M. Babaie-Za<strong>de</strong>h, C. Jutten, and K. Nayebi. Differential of mutual information. IEEE Signal<br />
Processing L<strong>et</strong>ters, 11(1):48–51, January 2004.<br />
[CA02] A. Cichocki and S-I. Amari. Adaptive Blind Signal and Image Processing. Wiley, New York,<br />
2002.<br />
[Car97] J. F. Cardoso. Infomax and Maximum Likelihood for source separation. IEEE Sig. Proc.<br />
L<strong>et</strong>ters, 4(4):112–114, April 1997.<br />
[Car99] J. F. Cardoso. High-or<strong>de</strong>r contrasts for in<strong>de</strong>pen<strong>de</strong>nt component analysis. Neural Computation,<br />
11(1):157–192, January 1999.<br />
[CBDC09] P. Comon, J. M. F. Ten Berge, L. DeLathauwer, and J. Castaing. Generic and typical ranks<br />
of multi-way arrays. Linear Algebra Appl., 430(11–12):2997–3007, June 2009. hal-00410058.<br />
[CDS99] S. S. Chen, D. L. Donoho, and M. A. Saun<strong>de</strong>rs. Atomic <strong>de</strong>composition by basis pursuit.<br />
SIAM Journal on Scientific Computing, 20(1):33–61, 1999.<br />
[CG90] P. Comon and G. H. Golub. Tracking of a few extreme singular values and vectors in<br />
signal processing. Proceedings of the IEEE, 78(8):1327–1343, August 1990. (published from<br />
Stanford report 78NA-89-01, feb 1989).<br />
[CGLM08] P. Comon, G. Golub, L-H. Lim, and B. Mourrain. Symm<strong>et</strong>ric tensors and symm<strong>et</strong>ric tensor<br />
rank. SIAM Journal on Matrix Analysis Appl., 30(3):1254–1279, September 2008. hal-<br />
00327599.<br />
2
[CJ10] P. Comon and C. Jutten, editors. Handbook of Blind Source Separation, In<strong>de</strong>pen<strong>de</strong>nt Component<br />
Analysis and Applications. Aca<strong>de</strong>mic Press, Oxford UK, Burlington USA, 2010. ISBN:<br />
978-0-12-374726-6, 19 chapters, 830 pages. hal-00460653.<br />
[CJH91] P. Comon, C. Jutten, and J. Herault. Separation of <strong>sources</strong>, part II: Problems statement.<br />
Signal Processing, Elsevier, 24(1):11–20, July 1991.<br />
[CL96] J. F. Cardoso and B. Laheld. Equivariant adaptive source separation. IEEE Trans. on Sig.<br />
Proc., 44(12):3017–3030, December 1996.<br />
[CL97] A. Chevreuil and P. Loubaton. Modulation and second or<strong>de</strong>r cyclostationarity: Structured<br />
subspace m<strong>et</strong>hod and new i<strong>de</strong>ntifiability conditions. IEEE Sig. Proc. L<strong>et</strong>ters, 4(7):204–206,<br />
July 1997.<br />
[CL11] P. Comon and L.-H. Lim. Sparse representations and low-rank tensor approximation. Research<br />
Report ISRN I3S//RR-2011-01-FR, I3S, Sophia-Antipolis, France, February 2011.<br />
[CL13] P. Comon and L-H. Lim. Tensor Computations. Cambridge Univ. Press, 2013. in preparation.<br />
[CLDA09] P. Comon, X. Luciani, and A. L. F. De Almeida. Tensor <strong>de</strong>compositions, alternating least<br />
squares and other tales. Jour. Chemom<strong>et</strong>rics, 23:393–405, August 2009. hal-00410057.<br />
[CM96] P. Comon and B. Mourrain. Decomposition of quantics in sums of powers of linear forms. Signal<br />
Processing, Elsevier, 53(2):93–107, September 1996. special issue on High-Or<strong>de</strong>r Statistics.<br />
[CM97] P. Comon and E. Moreau. Improved contrast <strong>de</strong>dicated to blind separation in communications.<br />
In ICASSP, pages 3453–3456, Munich, April 20-24 1997.<br />
[Com89a] P. Comon. Separation <strong>de</strong> melanges <strong>de</strong> signaux. In XIIème Colloque Gr<strong>et</strong>si, pages 137–140,<br />
Juan les Pins, 12 -16 juin 1989.<br />
[Com89b] P. Comon. Separation of stochastic processes. In Proc. Workshop on Higher-Or<strong>de</strong>r Spectral<br />
Analysis, pages 174–179, Vail, Colorado, June 28-30 1989. IEEE-ONR-NSF.<br />
[Com90] P. Comon. Analyse en Composantes Indépendantes <strong>et</strong> i<strong>de</strong>ntification aveugle. Traitement du<br />
Signal, 7(3):435–450, December 1990. Numero special non lineaire <strong>et</strong> non gaussien.<br />
[Com92a] P. Comon. In<strong>de</strong>pen<strong>de</strong>nt Component Analysis. In J-L. Lacoume, editor, Higher Or<strong>de</strong>r Statistics,<br />
pages 29–38. Elsevier, Amsterdam, London, 1992.<br />
[Com92b] P. Comon. MA i<strong>de</strong>ntification using fourth or<strong>de</strong>r cumulants. Signal Processing, Elsevier,<br />
26(3):381–388, 1992.<br />
[Com94a] P. Comon. In<strong>de</strong>pen<strong>de</strong>nt Component Analysis, a new concept ? Signal Processing, Elsevier,<br />
36(3):287–314, April 1994. Special issue on Higher-Or<strong>de</strong>r Statistics. hal-00417283.<br />
[Com94b] P. Comon. Tensor diagonalization, a useful tool in signal processing. In M. Blanke and<br />
T. So<strong>de</strong>rstrom, editors, IFAC-SYSID, 10th IFAC Symposium on System I<strong>de</strong>ntification, volume<br />
1, pages 77–82, Copenhagen, Denmark, July 4-6 1994. invited session.<br />
3
[Com95] P. Comon. Supervised classification, a probabilistic approach. In Verleysen, editor, ESANN-<br />
European Symposium on Artificial Neural N<strong>et</strong>works, pages 111–128, Brussels, Apr 19-21<br />
1995. D facto Publ. invited paper.<br />
[Com96] P. Comon. Contrasts for multichannel blind <strong>de</strong>convolution. IEEE Signal Processing L<strong>et</strong>ters,<br />
3(7):209–211, July 1996.<br />
[Com97] P. Comon. “Cumulant tensors”. IEEE SP Workshop on HOS, Banff, Canada, July 21-23<br />
1997. invited keynote address, www.i3s.unice.fr/~comon/plenaryBanff97.html.<br />
[Com01] P. Comon. From source separation to blind equalization, contrast-based approaches. In Int.<br />
Conf. on Image and Signal Processing (ICISP’01), Agadir, Morocco, May 3-5, 2001. invited<br />
plenary.<br />
[Com02a] P. Comon. Blind equalization with discr<strong>et</strong>e inputs in the presence of carrier residual. In<br />
Second IEEE Int. Symp. Sig. Proc. Inf. Theory, Marrakech, Morocco, December 2002. invited<br />
session.<br />
[Com02b] P. Comon. Tensor <strong>de</strong>compositions, state of the art and applications. In J. G. McWhirter and<br />
I. K. Proudler, editors, Mathematics in Signal Processing V, pages 1–24. Clarendon Press,<br />
Oxford, UK, 2002. arXiv:0905.0454v1.<br />
[Com04] P. Comon. Contrasts, in<strong>de</strong>pen<strong>de</strong>nt component analysis, and blind <strong>de</strong>convolution. Int. Journal<br />
Adapt. Control Sig. Proc., 18(3):225–243, April 2004. special issue on Signal Separation:<br />
http://www3.interscience.wiley.com/cgi-bin/jhome/4508. Preprint: I3S Research Report<br />
RR-2003-06.<br />
[Com09] P. Comon. Tensors, usefulness and unexpected properties. In 15th IEEE Workshop on<br />
Statistical Signal Processing (SSP’09), pages 781–788, Cardiff, UK, Aug. 31 - Sep. 3 2009.<br />
keynote. hal-00417258.<br />
[Com10a] P. Comon. Tensor <strong>de</strong>compositions viewed from different fields. Bari, Italy, September 13-17<br />
2010. Workshop on Tensor Decompositions and Applications (TDA 2010), invited.<br />
[Com10b] P. Comon. Tensors in Signal Processing. Aalborg, DK, August 23-27 2010. EUSIPCO’2010,<br />
invited plenary.<br />
[Com10c] P. Comon. When Tensor Decomposition me<strong>et</strong>s Compressed Sensing. Saint Malo, France,<br />
September 27-30 2010. 9th Int. Conf. Latent Variable Analysis, invited keynote.<br />
[CR06] P. Comon and M. Rajih. Blind i<strong>de</strong>ntification of un<strong>de</strong>r-<strong>de</strong>termined mixtures based on the<br />
characteristic function. Signal Processing, 86(9):2271–2281, September 2006. hal-00263668.<br />
[CS93] J. F. Cardoso and A. Souloumiac. Blind beamforming for non-Gaussian signals. IEE Proceedings<br />
- Part F, 140(6):362–370, December 1993. Special issue on Applications of High-Or<strong>de</strong>r<br />
Statistics.<br />
[CT91] T. Cover and J. Thomas. Elements of Information Theory. Wiley Series in Telecommunications,<br />
1991.<br />
4
[CZA08] A. Cichocki, R. Zdunek, and S-I. Amari. Nonnegative matrix and tensor factorization. IEEE<br />
Sig. Proc. Mag., pages 142–145, 2008.<br />
[CZPA09] A. Cichocki, R. Zdunek, A. H. Phan, and S-I. Amari. Nonnegative Matrix and Tensor<br />
Factorization. Wiley, 2009.<br />
[DAC02] Y. Deville, O. Albu, and N. Charkani. New self normalized blind source separation n<strong>et</strong>works<br />
for instantaneous and convolutive mixtures. Int. J. Adapt. contr. Sig. Proc., 16:753–761,<br />
2002.<br />
[Dar53] G. Darmois. Analyse générale <strong>de</strong>s liaisons stochastiques. Rev. Inst. Intern. Stat., 21:2–8,<br />
1953.<br />
[DDD04] I. Daubechies, M. Defrise, and C. DeMol. An iterative thresholding algorithm for linear<br />
inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics,<br />
57(11):1413–1457, 2004.<br />
[DE03] D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal)<br />
dictionaries via ℓ 1 minimization. Proc. Nat. Aca. Sci., 100(5):2197–2202, March 2003.<br />
[Des01] F. Desbouvries. Problemes Structures <strong>et</strong> Algorithmes du Second Ordre en Traitement du<br />
Signal. HdR Marne la vallée, 22 Jun 2001.<br />
[DH09] Y. Deville and S. Hosseini. Recurrent n<strong>et</strong>works for separating extractable-targ<strong>et</strong> nonlinear<br />
mixtures, part I: Non-blind configurations. Signal Processing, 89(4):378–393, 2009.<br />
[DJ08] L. Duarte and C. Jutten. A nonlinear source separation approach for the nicolsky-eisenman<br />
mo<strong>de</strong>l. In CD-Rom Proceedings of EUSIPCO 2008, Lausanne, Switzerland, sep 2008.<br />
[DJM09] L. T. Duarte, C. Jutten, and S. Moussaoui. Bayesian source separation of linear-quadratic<br />
and linear mixtures through a MCMC m<strong>et</strong>hod. In Machine Learning for Signal Processing<br />
(MLSP 2009), page 6 pages, Grenoble France, 09 2009.<br />
[DJM10] Leonardo Duarte, Christian Jutten, and Saïd Moussaoui. Bayesian Source Separation of<br />
Linear and Linear-quadratic Mixtures Using Truncated Priors. Journal of Signal Processing<br />
Systems, 2010.<br />
[DJTB + 10] L Duarte, C Jutten, P Temple-Boyer, A Benyahia, and J Launay. A datas<strong>et</strong> for the <strong>de</strong>sign<br />
of smart ion-selective electro<strong>de</strong> arrays for quantitative analysis. IEEE Sensors Journal,<br />
10(12):1891–1892, 2010.<br />
[DL95] N. Delfosse and P. Loubaton. Adaptive blind separation of in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong>: a <strong>de</strong>flation<br />
approach. Signal Processing, 45:59–83, 1995.<br />
[Don04] D. L. Donoho. For most large un<strong>de</strong>r<strong>de</strong>termined systems of linear equations the minimal<br />
l 1 -norm solution is also the sparsest solution. Technical report, 2004.<br />
[EK02] J. Eriksson and V. Koivunen. Blind i<strong>de</strong>ntifiability of class of nonlinear instantaneous ICA<br />
mo<strong>de</strong>ls. In Proc. of the XI European Signal Proc. Conf. (EUSIPCO2002), volume 2, pages<br />
7–10, Toulouse, France, September 2002.<br />
5
[F´88] L. Féty. Métho<strong>de</strong>s <strong>de</strong> traitement d’antenne adaptées aux radiocommunications. PhD thesis,<br />
ENST Paris, 1988.<br />
[FF03] P. Fabry and J. Foul<strong>et</strong>ier, editors. Microcapteurs chimiques <strong>et</strong> biologiques. Lavoisier, 2003.<br />
In French.<br />
[FG86] B. N. Flury and W. Gautsch. An algorithm for simultaneous orthogonal transformation of<br />
several positive <strong>de</strong>finite symm<strong>et</strong>ric matrices to nearly diagonal form. SIAM J. Sci. Stat.<br />
Comput, 7(1):169–184, 1986.<br />
[FG06] C. Févotte and S. J. Godsill. A Bayesian approach for blind separation of sparse <strong>sources</strong>.<br />
IEEE Trans. on Audio, Speech and Language Processing, 14:2174–2188, 2006.<br />
[GC98] O. Grellier and P. Comon. Blind separation of discr<strong>et</strong>e <strong>sources</strong>. IEEE Signal Processing<br />
L<strong>et</strong>ters, 5(8):212–214, August 1998.<br />
[GCMT02] O. Grellier, P. Comon, B. Mourrain, and P. Trebuch<strong>et</strong>. Analytical blind channel i<strong>de</strong>ntification.<br />
IEEE Trans. Signal Processing, 50(9):2196–2207, September 2002.<br />
[GDM97] D. Gesbert, P. Duhamel, and S. Mayrargue. On-line blind multichannel equalization based<br />
on mutually referenced filters. IEEE Trans. Sign. Proc., 45(9):2307–2317, September 1997.<br />
[GL99] A. Gorokhov and P. Loubaton. Blind i<strong>de</strong>ntification of MIMO-FIR systems: a generalized<br />
linear prediction approach. Signal Processing, Elsevier, 73:105–124, Feb. 1999.<br />
[GL06] R. Gribonval and S. Lesage. A survey of sparse components analysis for blind source separation:<br />
principles, perspectives and new challenges. In Proc. ESANN 2006, Bruges, Belgium,<br />
April 2006.<br />
[GN95] M. I. Gürelli and C. L. Nikias. EVAM: An eigenvector -based algorithm for multichannel<br />
blind <strong>de</strong>convolution of input colored signals. IEEE Trans. Sig. Proc., 43(1):134–149, January<br />
1995.<br />
[GN03] R. Gribonval and M. Nielsen. Sparse <strong>de</strong>compositions in unions of bases. IEEE Trans. Inform.<br />
Theory, 49(12):3320–3325, Dec. 2003.<br />
[Gru07] P. Gruendler. Chemical sensors: an introduction for scientists and engineers. Springer, 2007.<br />
[HD03] S. Hosseini and Y. Deville. Blind separation of linear-quadratic mixtures of real <strong>sources</strong> using<br />
a recurrent structure. In Proc. of the 7th Int. Work-Conference on Artificial and Natural<br />
Neural N<strong>et</strong>works (IWANN2003), Lecture Notes in Computer Science, vol. 2686, pages 241–<br />
248, Menorca, Spain, June 2003. Springer-Verlag.<br />
[HJ03] S. Hosseini and C. Jutten. On the separability of nonlinear mixtures of temporally correlated<br />
<strong>sources</strong>. IEEE Signal Processing L<strong>et</strong>ters, 10(2):43–46, February 2003.<br />
[HJA85] J. Hérault, C. Jutten, and B. Ans. Détection <strong>de</strong> gran<strong>de</strong>urs primitives dans un message<br />
composite par une architecture <strong>de</strong> calcul neuromimétique en apprentissage non supervisé. In<br />
Actes du Xeme colloque GRETSI, pages 1017–1022, Nice, France, 20-24 Mai 1985.<br />
6
[HP99] A. Hyvärinen and P. Pajunen. Nonlinear in<strong>de</strong>pen<strong>de</strong>nt component analysis: Existence and<br />
uniqueness results. Neural N<strong>et</strong>works, 12(3):429–439, 1999.<br />
[Hua96] Y. Hua. Fast maximum likelihood for blind i<strong>de</strong>ntification of multiple fir channels. IEEE<br />
Trans. Sig. Proc., 44(3):661–672, Mar. 1996.<br />
[Hyv99] A. Hyvärinen. Fast and robust fixed-point algorithms for in<strong>de</strong>pen<strong>de</strong>nt component analysis.<br />
IEEE Trans. Neural N<strong>et</strong>works, 10(3):626–634, 1999.<br />
[JBZH04] C. Jutten, M. Babaie-Za<strong>de</strong>h, and S. Hosseini. Three easy ways for separating nonlinear<br />
mixtures ? Signal Processing, 84(2):217–229, February 2004.<br />
[JT00] C. Jutten and A. Taleb. Source separation: from dusk till dawn. In Proc. 2nd Int. Workshop<br />
on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Blind Source Separation (ICA2000), pages 15–26,<br />
Helsinki, Finland, 2000.<br />
[KASC08] A. Kachenoura, L. Albera, L. Senhadji, and P. Comon. ICA: a potential tool for BCI systems.<br />
Sig. Proc. Magazine, pages 57–68, January 2008. Special Issue on Brain-Computer Interfaces.<br />
[Kie92] H. A. L. Kiers. Tuckals core rotations and constrained Tuckals mo<strong>de</strong>lling. Statistica Applicata,<br />
4(4):659–667, 1992.<br />
[KLR73] A. M. Kagan, Y. V. Linnik, and C. R. Rao. Characterization Problems in Mathematical<br />
Statistics. Probability and Mathematical Statistics. Wiley, New York, 1973.<br />
[Kol01] T. G. Kolda. Orthogonal tensor <strong>de</strong>composition. SIAM Jour. Matrix Ana. Appl., 23(1):243–<br />
255, 2001.<br />
[Kro83] P. Kroonenberg. Three mo<strong>de</strong> Principal Component Analysis. SWO Press, Lei<strong>de</strong>n, 1983.<br />
[KS77] M. Kendall and A. Stuart. The Advanced Theory of Statistics, Distribution Theory, volume 1.<br />
C. Griffin, 1977.<br />
[LAC97] J. L. Lacoume, P. O. Amblard, and P. Comon. Statistiques d’ordre superieur pour le traitement<br />
du signal. Collection Sciences <strong>de</strong> l’Ingenieur. Masson, 1997. ISBN: 2-225-83118-1.<br />
[LC09] L-H. Lim and P. Comon. Nonnegative approximations of nonnegative tensors. Jour. Chemom<strong>et</strong>rics,<br />
23:432–441, August 2009. hal-00410056.<br />
[LC10] L-H. Lim and P. Comon. Multiarray signal processing: Tensor <strong>de</strong>composition me<strong>et</strong>s compressed<br />
sensing. Compte-Rendus Mécanique <strong>de</strong> l’Aca<strong>de</strong>mie <strong>de</strong>s Sciences, 338(6):311–320,<br />
June 2010. hal-00512271.<br />
[LdAC11] X. Luciani, A. L. F. <strong>de</strong> Almeida, and P. Comon. Blind i<strong>de</strong>ntification of un<strong>de</strong>r<strong>de</strong>termined<br />
mixtures based on the characteristic function: The complex case. IEEE Trans. Signal Proc.,<br />
59(2):540–553, February 2011.<br />
[Lee78] J. De Leeuw. A new computational m<strong>et</strong>hod to fit the weighted eucli<strong>de</strong>an distance mo<strong>de</strong>l.<br />
Psychom<strong>et</strong>rika, 43(4):479–490, December 1978.<br />
7
[LISC11] R. Llinares, J. Igual, A. Salazar, and A. Camacho. Semi-blind source extraction of atrial<br />
activity by combining statistical and spectral features. Digital Signal Processing, 21(2):391–<br />
403, 2011.<br />
[LJH04] A. Larue, C. Jutten, and S. Hosseini. Markovian source separation in non-linear mixtures. In<br />
C. Punton<strong>et</strong> and A. Pri<strong>et</strong>o, editors, Proc. of the 5th Int. Conf. on In<strong>de</strong>pen<strong>de</strong>nt Component<br />
Analysis and Blind Signal Separation (ICA2004), Lecture Notes in Computer Science, vol.<br />
3195, pages 702–709, Granada, Spain, September 2004. Springer-Verlag.<br />
[LLCL01] J.S. Lee, D.D. Lee, S. Choi, and D.S. Lee. Application of nonnegative matrix factorization<br />
to dynamic positron emission tomography. In Proc. of the 3rd Workshop on In<strong>de</strong>pen<strong>de</strong>nt<br />
Component Analysis and Signal Separation (ICA2001), pages 629–632, San Diego (California,<br />
USA), 2001.<br />
[LM00] P. Loubaton and E. Moulines. On blind multiuser forward link channel estimation by subspace<br />
m<strong>et</strong>hod: I<strong>de</strong>ntifiability results. IEEE Trans. Sig. Proc., 48(8):2366–2376, August 2000.<br />
[LMRB09] X. Luciani, S. Mounier, R. Redon, and A. Bois. A simple correction m<strong>et</strong>hod of inner filter<br />
effects affecting FEEM and its application to the Parafac <strong>de</strong>composition. Chemom<strong>et</strong>rics and<br />
Intel. Lab. Syst., 96(2):227–238, 2009.<br />
[LMV01] L. De Lathauwer, B. De Moor, and J. Van<strong>de</strong>walle. In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and<br />
(simultaneous) third-or<strong>de</strong>r tensor diagonalization. IEEE Trans. Sig. Proc., pages 2262–2271,<br />
October 2001.<br />
[LMV04] L. De Lathauwer, B. De Moor, and J. Van<strong>de</strong>walle. Computation of the canonical <strong>de</strong>composition<br />
by means of a simultaneous generalized Schur <strong>de</strong>composition. SIAM Journal on Matrix<br />
Analysis and Applications, 26(2):295–327, 2004.<br />
[LS99] D.D. Lee and H.S. Seung. Learning the parts of objects by nonnegative matrix factorization.<br />
Nature, 401:788–791, 1999.<br />
[LZ07] X. L. Li and X. D. Zhang. Nonorthogonal joint diagonalization free of <strong>de</strong>generate solution.<br />
IEEE Trans. Signal Process., 55(5):1803–1814, 2007.<br />
[MA99] G. Marques and L. Almeida. Separation of nonlinear mixtures using pattern repulsion. In<br />
Proc. Int. Workshop on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Signal Separation (ICA’99),<br />
pages 277–282, Aussois, France, 1999.<br />
[MBBZJ08] F. Merrikh-Bayat, M. Babaie-Za<strong>de</strong>h, and C. Jutten. A nonlinear blind source separation<br />
solution for removing the show-through effect in the scanned documents. In Proc. of the<br />
16th European Signal Processing Conf. (EUSIPCO2008), Lausanne, Switzerland, August<br />
2008.<br />
[MBBZJ11] F. Merrikh-Bayat, M. Babaie-Za<strong>de</strong>h, and C. Juttenl. Linear-quadratic blind source separating<br />
structure for removing show-through in scanned documents. International Journal of<br />
Document Analysis and Recognition, page 15 pages, 2011. published online at Oct. 28, 2010;<br />
to appear.<br />
8
[MBZJ09] G. H. Mohimani, M. Babaie-Za<strong>de</strong>h, and C. Jutten. A Fast Approach for Overcompl<strong>et</strong>e Sparse<br />
Decomposition Based on Smoothed L-0 Norm. IEEE Transactions on Signal Processing,<br />
57(1):289–301, 2009.<br />
[Mcc87] P. Mccullagh. Tensor M<strong>et</strong>hods in Statistics. Monographs on Statistics and Applied Probability.<br />
Chapman and Hall, 1987.<br />
[MDCM95] E. Moulines, P. Duhamel, J. F. Cardoso, and S. MAYRAGUE. Subspace m<strong>et</strong>hods for the<br />
blind i<strong>de</strong>ntification of multichannel FIR filters. IEEE Trans. Sig. Proc., 43(2):516–525, February<br />
1995.<br />
[MHS + 08] S. Moussaoui, H. Hauksdóttir, F. Schmidt, C. Jutten, J. Chanussot, S. Dout, and J. A.<br />
Benediksson. On the <strong>de</strong>composition of mars hyperspectral data by ica and bayesian positive<br />
source separation. Neurocomputing, 71(10-12):2194–2208, Jun 2008.<br />
[MJL00] A. Mansour, C. Jutten, and Ph. Loubaton. An adaptive subspace algorithm for blind separation<br />
of in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong> in convolutive mixture. IEEE Trans. on Signal Processing,<br />
48(2):583–587, February 2000.<br />
[MLS07] S. Makino, T.-W. Lee, and H. Sawada, editors. Blind Speech Separation. Springer, 2007.<br />
[MM97] O. Macchi and E. Moreau. Self-adaptive source separation. part I: Convergence analysis of a<br />
direct linear n<strong>et</strong>work controlled by the Herault-Jutten algorithm. IEEE Trans. Sign. Proc.,<br />
45(4):918–926, April 1997.<br />
[MOK95] K. Matsuoka, M. Ohya, and M. Kawamoto. A neural n<strong>et</strong> for blind separation of nonstationary<br />
signals. Neural N<strong>et</strong>works, 8(3):411–419, 1995.<br />
[MP97] E. Moreau and J. C. Pesqu<strong>et</strong>. Generalized contrasts for multichannel blind <strong>de</strong>convolution of<br />
linear systems. IEEE Signal Processing L<strong>et</strong>ters, 4(6):182–183, June 1997.<br />
[MS94] L. Molge<strong>de</strong>y and H. Schuster. Separation of a mixture of in<strong>de</strong>pen<strong>de</strong>nt signals using time<br />
<strong>de</strong>layed correlation. Physical Review L<strong>et</strong>ters, 72:3634–3636, 1994.<br />
[MvL08] C. D. M. Martin and C. van Loan. A Jacobi-type m<strong>et</strong>hod for computing orthogonal tensor<br />
<strong>de</strong>compositions. SIAM J. Matrix Anal. Appl., 30(3):1219–1232, 2008.<br />
[MZ93] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans.<br />
on Signal Proc., 41(12):3397–3415, 1993.<br />
[Paa97] P. Paatero. A weighted non-negative least squares algorithm for three-way Parafac factor<br />
analysis. Chemom<strong>et</strong>rics Intell. Lab. Syst., 38:223–242, 1997.<br />
[Paa99] P. Paatero. The multilinear engine: A table-driven, least squares program for solving multilinear<br />
problems, including the n-way parallel factor analysis mo<strong>de</strong>l. Journal of Computational<br />
and Graphical Statistics, 8(4):854–888, December 1999.<br />
[Paa00] P. Paatero. Construction and analysis of <strong>de</strong>generate Parafac mo<strong>de</strong>ls. Jour. Chemom<strong>et</strong>rics,<br />
14:285–299, 2000.<br />
9
[PAB + 02] M. Plumbley, S.A. Abdallah, J.P. Bello, M.E. Davies, G. Monti, and M.B. Sandler. Automatic<br />
music transcription and audio source separation. Cybern. Syst., 33:603–627, 2002.<br />
[PC01] D.T. Pham and J.-F. Cardoso. Blind separation of instantaneous mixtures of nonstationary<br />
<strong>sources</strong>. IEEE Trans. on Signal Processing, 49(9):1837–1848, 2001.<br />
[Pha01] D. T. Pham. Joint approximate diagonalization of positive <strong>de</strong>finite hermitian matrices. SIAM<br />
Journal on matrix Analysis, 22(4):1136–1152, 2001.<br />
[Plu02] M. Plumbley. Condition for nonnegative in<strong>de</strong>pen<strong>de</strong>nt component analysis. IEEE Signal<br />
Processing L<strong>et</strong>ters, 9:177–180, 2002.<br />
[Plu03] M. Plumbley. Algorithms for nonnegative in<strong>de</strong>pen<strong>de</strong>nt component analysis. IEEE Trans. on<br />
Neural N<strong>et</strong>works, 14(3):534–543, 2003.<br />
[Plu05] M. Plumbley. Geom<strong>et</strong>rical m<strong>et</strong>hods for non-negative ICA. Neurocomputing, 67:161–197,<br />
2005.<br />
[PO04] M. Plumbley and E. Oja. Blind separation of positive <strong>sources</strong> by globally convergent gradient<br />
search. Neural Computation, 19(9):1811–1825, 2004.<br />
[PPPP01] B. Pesqu<strong>et</strong>-Popescu, J-C. Pesqu<strong>et</strong>, and A. P. P<strong>et</strong>ropulu. Joint singular value <strong>de</strong>composition<br />
- a new tool for separable representation of images. In ICISP, 2001.<br />
[PS00] L. Parra and C. Spence. Convolutive blind separation of nonstationary <strong>sources</strong>. IEEE Trans.<br />
on Speech and Audio Processing, 8(3):320–327, 2000.<br />
[PSB03] D. T. Pham, C. Servière, and H. Boumaraf. Blind separation of convolutive audio mixtures<br />
using nonstationarity. In In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Blind Signal Separation (ICA<br />
2003), Nara, Japan, 04 2003.<br />
[PT94] P. Paatero and U. Tapper. Positive matrix factorization: a nonnegative factor mo<strong>de</strong>l with<br />
optimal utilization of error estimates of data values. Environm<strong>et</strong>rics, 5:111–126, 1994.<br />
[PZL10] R. Phlypo, V. Zarzoso, and I. Lemahieu. Source extraction by maximising the variance in<br />
the conditional distribution tails. IEEE Trans on Signal Processing, 58(1):305–316, 2010.<br />
[RCH08] M. Rajih, P. Comon, and R. Harshman. Enhanced line search : A novel m<strong>et</strong>hod to accelerate<br />
Parafac. SIAM Journal on Matrix Analysis Appl., 30(3):1148–1171, September 2008.<br />
[Reg02] P. A. Regalia. Blind <strong>de</strong>convolution and source separation. In J. G. McWhirter and I. K.<br />
Proudler, editors, Mathematics in Signal Processing V, pages 25–35. Clarendon Press, Oxford,<br />
UK, 2002.<br />
[RK03] P. A. Regalia and E. Kofidis. Monotonic convergence of fixed-point algorithms for ICA.<br />
IEEE Transactions on Neural N<strong>et</strong>works, 14(4):943–949, July 2003.<br />
[RTMC11] J-P. Royer, N. Thirion-Moreau, and P. Comon. Computing the polyadic <strong>de</strong>composition of<br />
nonnegative third or<strong>de</strong>r tensors. Signal Processing, 2011.<br />
[RZC05] L. Rota, V. Zarzoso, and P. Comon. Parallel <strong>de</strong>flation with alphab<strong>et</strong>-based criteria for blind<br />
source extraction. In IEEE SSP’05, pages 751–756, Bor<strong>de</strong>aux, France, July, 17-20 2005.<br />
10
[SASZJ11] S. Samadi, L. Amini, H. Soltanian-Za<strong>de</strong>h, and C. Jutten. I<strong>de</strong>ntification of brain regions<br />
involved in epilepsy using common spatial pattern. In C. Richard A. Ferrari, P. Djuric,<br />
editor, IEEE Workshop on Statistical Signal Processing (SSP2011), pages 829–832, Nice,<br />
France, 2011. IEEE.<br />
[SB00] N. D. Sidiropoulos and R. Bro. On the uniqueness of multilinear <strong>de</strong>composition of N-way<br />
arrays. Jour. Chemo., 14:229–239, 2000.<br />
[SBG00] N. D. Sidiropoulos, R. Bro, and G. B. Giannakis. Parallel factor analysis in sensor array<br />
processing. IEEE Trans. Sig. Proc., 48(8):2377–2388, August 2000.<br />
[SBG04] A. Smil<strong>de</strong>, R. Bro, and P. Geladi. Multi-Way Analysis. Wiley, 2004.<br />
[SC10] A. Stegeman and P. Comon. Subtracting a best rank-1 approximation does not necessarily<br />
<strong>de</strong>crease tensor rank. Linear Algebra Appl., 433(7):1276–1300, December 2010. hal-00512275.<br />
[Sch86] R. O. Schmidt. Multiple emitter location and signal param<strong>et</strong>er estimation. IEEE Trans.<br />
Antenna Propagation, 34(3):276–280, March 1986.<br />
[SGB00] N. D. Sidiropoulos, G. B. Giannakis, and R. Bro. Blind PARAFAC receivers for DS-CDMA<br />
systems. IEEE Trans. on Sig. Proc., 48(3):810–823, March 2000.<br />
[SGS94] A. Swami, G. Giannakis, and S. Shamsun<strong>de</strong>r. Multichannel ARMA processes. IEEE Trans.<br />
Sign. Proc., 42(4):898–913, April 1994.<br />
[SICD08] M. Sorensen, S. Icart, P. Comon, and L. Deneire. Gradient based approximate joint diagonalization<br />
by orthogonal transforms. In 16th European Signal Processing Conference<br />
(Eusipco’08), Lausanne, August 25-29 2008.<br />
[SJS10] R. Sameni, C. Jutten, and M. Shamsollahi. A <strong>de</strong>flation procedure for subspace <strong>de</strong>composition.<br />
IEEE Transactions on Signal Processing, 58(4):2363–2374, 2010.<br />
[Slo94] T. M. Slock. Blind fractionally-spaced equalization, perfect-reconstruction filter banks and<br />
multichannel linear prediction. In Proc. ICASSP 94 Conf., A<strong>de</strong>lai<strong>de</strong>, Australia, April 1994.<br />
[Sor91] E. Sorouchyari. Separation of <strong>sources</strong>, part III: Stability analysis. Signal Processing, Elsevier,<br />
24(1):21–29, July 1991.<br />
[Sør10] M. Sørensen. Tensor Tools with Application in Signal Processing. PhD thesis, Universit <strong>de</strong><br />
Nice Sophia Antipolis, 2010.<br />
[SP03] C. Servière and D. T. Pham. A novel m<strong>et</strong>hod for permutation correction in frequency-domain<br />
in blind separation of speech mixtures. In Proc. of In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and<br />
Blind Signal Separation (ICA 2004), pages 807–815, Granada, Spain, 2003.<br />
[Ste06] A. Stegeman. Degeneracy in Can<strong>de</strong>comp/Parafac explained for p × p × 2 arrays of rank p + 1<br />
or higher. Psychom<strong>et</strong>rika, 71(3):483–501, 2006.<br />
[Ste08] A. Stegeman. Low-rank approximation of generic pxqx2 arrays and diverging components in<br />
the Can<strong>de</strong>comp/Parafac mo<strong>de</strong>l. SIAM Journal on Matrix Analysis Appl., 30(3):988–1007,<br />
2008.<br />
11
[TF07] T. Tanaka and S. Fiori. Least squares approximate joint diagonalization on the orthogonal<br />
group. In ICASSP, volume II, pages 649–652, Honolulu, April 2007.<br />
[TJ95] H.L. Nguyen Thi and C. Jutten. Blind <strong>sources</strong> separation for convolutive mixtures. Signal<br />
Processing, 45:209–229, 1995.<br />
[TJ99a] A. Taleb and C. Jutten. On un<strong>de</strong>r<strong>de</strong>termined source separation. In Proceedings of the<br />
Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference -<br />
Volume 03, pages 1445–1448, Washington, DC, USA, 1999. IEEE Computer Soci<strong>et</strong>y.<br />
[TJ99b] A. Taleb and C. Jutten. Source separation in post-nonlinear mixtures. IEEE Trans. Sign.<br />
Proc., 47:2807–2820, October 1999.<br />
[TSJ01] A. Taleb, J. Sole, and C. Jutten. Quasi-nonparam<strong>et</strong>ric blind inversion of Wiener systems.<br />
IEEE Trans. on Signal Processing, 49(5):917–924, 2001.<br />
[TSLH90] L. Tong, V. Soon, R. Liu, and Y. Huang. Amuse: a new blind i<strong>de</strong>ntification algorithm. In<br />
Proc. ISCAS, New Orleans, USA, 1990.<br />
[TSS + 09] Thato Tsalaile, Reza Sameni, Saied Sanei, Christian Jutten, and Jonathon Chambers. Sequential<br />
Blind Source Extraction For Quasi-Periodic Signals With Time-Varying Period.<br />
IEEE Transactions on Biomedical Engineering, 56(3):646–655, 2009.<br />
[Tug97] J. K. Tugnait. I<strong>de</strong>ntification and <strong>de</strong>convolution of multichannel non-Gaussian processes using<br />
higher or<strong>de</strong>r statistics and inverse filter criteria. IEEE Trans. Sig. Proc., 45:658–672, March<br />
1997.<br />
[TVP96] S. Talwar, M. Viberg, and A. Paulraj. Blind estimation of multiple co-channel digital signals<br />
arriving at an antenna array: Part I, algorithms. IEEE Trans. Sign. Proc., pages 1184–1197,<br />
May 1996.<br />
[TWZ01] Y. Tan, J. Wang, and J. Zurada. Nonlinear blind source separation using a radial basis<br />
function n<strong>et</strong>work. IEEE Trans. on Neural N<strong>et</strong>works, 12(1):124–134, 2001.<br />
[TXK94] L. Tong, G. Xu, and T. Kailath. Blind i<strong>de</strong>ntification and equalization based on second-or<strong>de</strong>r<br />
statistics: a time domain approach. IEEE Trans. Inf. Theory, 40(2):340–349, March 1994.<br />
[UBU + 00] Y. Umezawa, P. Bühlmann, K. Umezawa, K. Tohda, and S. Amemiya. Potentiom<strong>et</strong>ric<br />
selectivity coefficients of ion-selective electro<strong>de</strong>s. Pure and Applied Chemistry, 72:1851–2082,<br />
2000.<br />
[VO06] R. Vollgraf and K. Obermayer. Quadratic optimization for simultaneous matrix diagonalization.<br />
IEEE Trans. Sig,. Proc, 54:3270–3278, September 2006.<br />
[XLTK95] G. Xu, H. Liu, L. Tong, and T. Kailath. A least-squares approach to blind channel i<strong>de</strong>ntification.<br />
IEEE Trans. Sig. Proc., 43(12):813–817, Dec. 1995.<br />
[Yer02] A. Yeredor. Non-orthogonal joint diagoinalization in the LS sense with application in blind<br />
source separation. IEEE Trans. Sign. Proc., 50(7):1545–1553, 2002.<br />
12
[YW94] D Yellin and E. Weinstein. Criteria for multichannel signal separation. IEEE Trans. Signal<br />
Processing, pages 2158–2168, August 1994.<br />
[ZC05] V. Zarzoso and P. Comon. Blind and semi-blind equalization based on the constant power<br />
criterion. IEEE Trans. Sig. Proc., 53(11):4363–4375, November 2005.<br />
[ZC07] V. Zarzoso and P. Comon. Comparative speed analysis of FastICA. In Mike E. Davies,<br />
Christopher J. James, Samer A. Abdallah, and Mark D. Plumbley, editors, ICA’07, volume<br />
4666 of Lecture notes in Computer Sciences, pages 293–300, London, UK, September 9-12<br />
2007. Springer.<br />
[ZC08a] V. Zarzoso and P. Comon. Optimal step-size constant modulus algorithm. IEEE Trans.<br />
Com., 56(1):10–13, January 2008.<br />
[ZC08b] V. Zarzoso and P. Comon. Robust in<strong>de</strong>pen<strong>de</strong>nt component analysis for blind source separation<br />
and extraction with application in electrocardiography. In 30th Int. IEEE Engineering<br />
in Medicine and Biology Conf. (EMBC’08), Vancouver, CA, August 20-24 2008.<br />
[ZC10] V. Zarzoso and P. Comon. Robust in<strong>de</strong>pen<strong>de</strong>nt component analysis. IEEE Trans. Neural<br />
N<strong>et</strong>works, 21(2):248–261, February 2010. hal-00457300.<br />
[ZCK06] V. Zarzoso, P. Comon, and M. Kallel. How fast is FastICA? In XIV European Signal<br />
Processing Conference, Eusipco’06, Florence, Italy, Sept. 4-8 2006.<br />
[Ziv95] V. Zivojnovic. Higher-or<strong>de</strong>r statistics and Huber’s robustness. In IEEE-ATHOS Workshop<br />
on Higher-Or<strong>de</strong>r Statistics, pages 236–240, Begur, Spain, 12–14 June 1995.<br />
[ZNM04] A. Ziehe, G. Nolte, and K. R. Müller. A fast algorithm for joint diagonalization with non<br />
orthogonal transformations and its application to blind source separation. Journal of Machine<br />
Learning Research, 5:777–800, December 2004.<br />
13