23.09.2013 Views

Séparation de sources. Principes et algorithmes - Gretsi

Séparation de sources. Principes et algorithmes - Gretsi

Séparation de sources. Principes et algorithmes - Gretsi

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong>. <strong>Principes</strong> <strong>et</strong> <strong>algorithmes</strong><br />

Pierre Comon (1), Christian Jutten (2)<br />

(1) : I3S, CNRS, Univ. of Nice, Sophia-Antipolis<br />

(2) : GIPSA-lab, CNRS, Univ. <strong>de</strong> Grenoble<br />

c○ 2011<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS . 1


Contents of the course<br />

I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

II Tools and Optimization criteria . . . . . . . . . . . . . . . . . 40<br />

III Second Or<strong>de</strong>r M<strong>et</strong>hods . . . . . . . . . . . . . . . . . . . . . . . 122<br />

IV Algorithms with Prewhitening . . . . . . . . . . . . . . . . . 152<br />

V Algorithms without Prewhitening . . . . . . . . . . . . . . 199<br />

VI Convolutive Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . 230<br />

VII NonLinear Mixtures . . . . . . . . . . . . . . . . . . . . . . . . . . . 291<br />

VIII Un<strong>de</strong>rD<strong>et</strong>ermined Mixtures . . . . . . . . . . . . . . . . . . . . 329<br />

IX Nonnegative mixtures . . . . . . . . . . . . . . . . . . . . . . . . . 441<br />

X Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456<br />

B Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .562<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS . 2


Problème ACI Indépendance<br />

I. Le problème <strong>et</strong> les principes <strong>de</strong><br />

résolution<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 3


Problème ACI Indépendance<br />

1 Le problème <strong>de</strong> séparation <strong>de</strong> <strong>sources</strong><br />

2 <strong>Principes</strong> <strong>de</strong> l’ACI<br />

3 Mesure <strong>de</strong> dépendance<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 4


Problème ACI Indépendance<br />

<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> : un problème courant ?<br />

Un exemple<br />

Le signal délivré par un capteur (électro<strong>de</strong>, antenne, microphone,<br />

<strong>et</strong>c.) est un mélange <strong>de</strong> signaux<br />

La question<br />

Est-il possible <strong>de</strong> r<strong>et</strong>rouver les différents signaux (les <strong>sources</strong>) à partir<br />

du mélange observé en sortie du capteur ?<br />

Si oui, à quelle condition ? Et comment ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 5


Problème ACI Indépendance<br />

<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> avec <strong>de</strong>s a priori (1/3)<br />

L’idée <strong>de</strong> Widrow <strong>et</strong> Hoff (1960)<br />

On dispose alors d’une référence du signal <strong>de</strong> bruit :<br />

la référence est le bruit à un gain près, s2(t), ou une version<br />

filtrée du bruit, G(s2) ;<br />

on estime un filtre qui annule la contribution <strong>de</strong> s2 dans<br />

l’observation, c’est-à-dire tel que la différence n’est plus<br />

corrélée avec la référence <strong>de</strong> bruit.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 6


d’où :<br />

Problème ACI Indépendance<br />

<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> avec <strong>de</strong>s a priori (2/3)<br />

E[(as1(t) + (b − k)s2(t))s2(t)] = 0 (1)<br />

E[x1(t)s2(t)] = kE[s 2 2(t)], (2)<br />

k = E[x1(t)s2(t)]/E[s 2 2(t)] (3)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 7


Problème ACI Indépendance<br />

<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> avec <strong>de</strong>s a priori (3/3)<br />

Approche classique<br />

Si le signal utile <strong>et</strong> le bruit sont dans <strong>de</strong>s ban<strong>de</strong>s <strong>de</strong> fréquences<br />

différentes<br />

on peut faire la séparation par filtrage.<br />

En absence d’a priori<br />

Que faire...<br />

si le signal utile <strong>et</strong> le bruit sont dans la même gamme <strong>de</strong><br />

fréquences ?<br />

si on ne dispose pas d’une référence <strong>de</strong> bruit ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 8


Problème ACI Indépendance<br />

<strong>Séparation</strong> <strong>de</strong> <strong>sources</strong> : l’idée fondamentale<br />

Plusieurs observations<br />

La séparation <strong>de</strong>vient possible si l’on a plusieurs observations :<br />

plus <strong>de</strong> capteurs que <strong>de</strong> <strong>sources</strong>,<br />

mélanges, différents, non proportionnels ad = bc<br />

C’est la diversité spatiale<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 9


Problème ACI Indépendance<br />

Origine du problème<br />

Décodage du mouvement chez les vertébrés [HJA85]<br />

Questions ?<br />

fI (t) = a11p(t) + a12v(t)<br />

fII (t) = a21p(t) + a22v(t).<br />

Peut-on r<strong>et</strong>rouver p(t) <strong>et</strong> v(t) à partir <strong>de</strong>s mélanges ?<br />

Si oui, à quelles conditions ? Et comment ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 10


Problème ACI Indépendance<br />

Formalisation mathématique<br />

Les hypothèses sur le mélange (inversible)<br />

Mélange linéaire instantané<br />

Mélange linéaire convolutif<br />

Mélange non linéaire<br />

<strong>Principes</strong> <strong>de</strong> solution<br />

Bloc <strong>de</strong> séparation, B, adapté au mélange A pour l’inverser<br />

Comment estimer B ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 11


Problème ACI Indépendance<br />

Un problème mal posé ?<br />

Impossible sans hypothèses supplémentaires<br />

Hypothèse : on connait A, B adapté à A<br />

Le problème est mal posé, quel que soit le type <strong>de</strong> mélange<br />

Il faut ajouter <strong>de</strong>s hypothèses sur les <strong>sources</strong> s1(t), . . . , sP(t)<br />

Quels a priori ?<br />

Les <strong>sources</strong> sont non corrélées : insuffisant<br />

Les <strong>sources</strong> sont statistiquement indépendantes → ICA<br />

Autres a priori : <strong>sources</strong> discrètes, <strong>sources</strong> à support borné,<br />

<strong>sources</strong> parcimonieuses, <strong>sources</strong> positives, <strong>et</strong>c.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 12


Problème ACI Indépendance<br />

Résultat <strong>de</strong> Darmois (1953) [Dar53]<br />

Analyse factorielle linéaire<br />

Mélange linéaire : x = As<br />

Hypothèse : les composantes du vecteur aléatoire s sont<br />

mutuellement indépendantes<br />

Résultat théorique<br />

<strong>Séparation</strong> impossible si les <strong>sources</strong> sont indépendantes <strong>et</strong><br />

i<strong>de</strong>ntiquement distribuées (iid) ET gaussiennes<br />

Pistes pour la séparation<br />

<strong>Séparation</strong> possible si les <strong>sources</strong> sont iid ET NON gaussiennes<br />

: ICA, <strong>et</strong> statistiques d’ordre supérieur à 2<br />

<strong>Séparation</strong> possible si les <strong>sources</strong> sont gaussiennes (statistiques<br />

d’ordre 2) ET NON iid :<br />

<strong>sources</strong> temporellement corrélées<br />

<strong>sources</strong> non stationnaires<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 13


iid non gaussiennes<br />

Problème ACI Indépendance<br />

Commentaires sur les hypothèses<br />

non gaussiennes : on peut (doit) utiliser <strong>de</strong>s statistiques<br />

d’ordre supérieur à <strong>de</strong>ux<br />

iid : on n’exploite pas les relations temporelles entre<br />

échantillons...<br />

par conséquent, l’ACI peut séparer <strong>de</strong>s <strong>sources</strong> iid ou non<br />

Gaussiennes non iid<br />

non iid : on exploite les relations temporelles entre<br />

échantillons...<br />

gaussiennes : on se restreint aux statistiques jusqu’à l’ordre 2<br />

par conséquent, ces approches à l’ordre 2 sont optimales pour<br />

<strong>de</strong>s <strong>sources</strong> gaussiennes, mais peuvent séparer <strong>de</strong>s <strong>sources</strong> non<br />

gaussiennes<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 14


Problème ACI Indépendance<br />

Analyse en composantes indépendantes (ICA)<br />

Principe le la métho<strong>de</strong><br />

Mélange général : x = A(s), avec P = K <strong>et</strong> A inversible<br />

Hypothèse : les composantes <strong>de</strong> la source s sont mutuellement<br />

indépendantes<br />

Idée : estimer B <strong>de</strong> sorte que y = B(x) ait <strong>de</strong>s composantes<br />

mutuellement indépendantes<br />

Questions<br />

Si y est indépendante, est-ce que B = A −1 ?<br />

Critère pour estimer B = indépendance <strong>de</strong> s. Comment le<br />

mesurer ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 15


Modèle <strong>de</strong> mélange<br />

Problème ACI Indépendance<br />

ICA : mélanges linéaires instantanés<br />

x = As<br />

Résultat théorique (Comon, HOS 1991 <strong>et</strong> SP 1994) [Com94a]<br />

Soit x(t) = As(t), où A est une matrice régulière <strong>et</strong> s(t) est<br />

un vecteur source à composantes statistiquement<br />

indépendantes, dont au plus une est gaussienne, alors<br />

y(t) = Bx(t) est un vecteur à composantes mutuellement<br />

indépendantes si <strong>et</strong> seulement si BA = DP, où D est une<br />

matrice diagonale <strong>et</strong> P est une matrice <strong>de</strong> permutation.<br />

Commentaires<br />

Indépendance = séparation, avec <strong>de</strong>s hypothèses faibles : A<br />

régulière, <strong>sources</strong> non gaussiennes indépendantes<br />

Indétermination d’échelle <strong>et</strong> <strong>de</strong> permutation<br />

Si plusieurs <strong>sources</strong> sont gaussiennes...?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 16


Modèle <strong>de</strong> mélange<br />

Problème ACI Indépendance<br />

ICA : mélanges linéaires convolutifs<br />

x(t) = A(t) ∗ s(t)<br />

ou en temps discr<strong>et</strong> :<br />

xi(n) = aij(k)sj(n − k), ∀i = 1, . . . , K<br />

Premiers résultats théoriques [YW94, TJ95]<br />

Soient x(t) = [A(z)]s(t), où A(z) est une matrice <strong>de</strong> filtres<br />

inversible <strong>et</strong> s(t) est un vecteur source à composantes<br />

statistiquement indépendantes, dont au plus une est<br />

gaussienne, alors y(t) = [B(z)]x(t) est un vecteur à<br />

composantes mutuellement indépendantes si <strong>et</strong> seulement si<br />

B(z)A(z) = D(z)P, où D(z) est une matrice <strong>de</strong> filtres<br />

diagonale <strong>et</strong> P est une matrice <strong>de</strong> permutation.<br />

Commentaires<br />

Sources r<strong>et</strong>rouvées à un filtre <strong>et</strong> une permutation près<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 17


Modèle <strong>de</strong> mélange<br />

Problème ACI Indépendance<br />

ICA : mélanges non linéaires (NL)<br />

x(t) = A(s(t))<br />

Impossible en général [Dar53, HP99]<br />

Dans la cas général, l’indépendance <strong>de</strong> y n’implique pas la<br />

séparation <strong>de</strong> <strong>sources</strong> : y(t) = B(x(t)) peut avoir <strong>de</strong>s<br />

composantes indépendantes avec B ◦ A non diagonale !<br />

Si l’ACI perm<strong>et</strong> la séparation, la solution est indéterminée à<br />

une permutation <strong>et</strong> une fonction non linéaire près : f (si) avec<br />

f (.) fonction NL inconnue.<br />

Possible pour certains mélanges<br />

Mélanges post-non-linéaires [TJ99a, BZJN02, JBZH04, AJ05]<br />

Mélanges NL linéarisables [KLR73, EK02]<br />

Mélanges bilinéaires <strong>et</strong> linéaires-quadratiques (sans preuve<br />

d’i<strong>de</strong>ntifiabilité : [HD03, DH09])<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 18


Problème ACI Indépendance<br />

Aveugle, vous avez dit aveugle ?<br />

La séparation <strong>de</strong> <strong>sources</strong> est un problème...<br />

non aveugle : nature du mélange<br />

non aveugle : hypothèses supplémentaires sur les <strong>sources</strong><br />

indépendance <strong>de</strong>s <strong>sources</strong> : hypothèse très large (même si mise<br />

en oeuvre complexe), d’où le terme (abusif) aveugle associé à<br />

l’ACI<br />

Autres informations a priori<br />

Autres a priori : autres critères <strong>et</strong> autres métho<strong>de</strong>s<br />

Sources avec dépendances temporelles entre échantillons :<br />

métho<strong>de</strong>s à l’ordre 2<br />

Sources discrètes, à support borné : métho<strong>de</strong>s algébriques ou<br />

géométriques<br />

Parcimonie : masquage dans le plan t-f, analyse en<br />

composantes parcimonieuses (SCA)<br />

P.Comon &Positivité C.Jutten, Peyresq : factorisation July 2011 non-négative<br />

BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 19


Problème ACI Indépendance<br />

Mélanges déterminés <strong>et</strong> sur-déterminés<br />

Mélanges déterminés<br />

Autant <strong>de</strong> mélanges que <strong>de</strong> <strong>sources</strong>, K = P<br />

I<strong>de</strong>ntifier A ou son inverse fournit directement les <strong>sources</strong>.<br />

Mélanges sur-déterminés<br />

Plus <strong>de</strong> mélanges que <strong>de</strong> <strong>sources</strong>, K > P<br />

Dans le cas linéaire : solution si la matrice <strong>de</strong> mélange est <strong>de</strong><br />

rang plein<br />

Pré-traitement par PCA pour proj<strong>et</strong>er dans l’espace signal<br />

avant séparation<br />

Dans les <strong>de</strong>ux cas, on peut utiliser<br />

la métho<strong>de</strong> directe : i<strong>de</strong>ntifier A puis l’inverser,<br />

la métho<strong>de</strong> indirecte : i<strong>de</strong>ntifier B, une inverse <strong>de</strong> A.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 20


Problème ACI Indépendance<br />

Mélanges sous-déterminés<br />

Plus <strong>de</strong> <strong>sources</strong> que <strong>de</strong> mélanges, P > K<br />

Si on connait A (pas d’inverse !), on ne déduit pas simplement<br />

s<br />

I<strong>de</strong>ntification <strong>de</strong> A <strong>et</strong> restauration <strong>de</strong>s <strong>sources</strong> sont <strong>de</strong>ux<br />

problèmes distincts <strong>et</strong> difficiles<br />

Sans a priori supplémentaire, pas <strong>de</strong> solution<br />

Dans la cas linéaire, si A est connu, on peut r<strong>et</strong>rouver les<br />

<strong>sources</strong> à un vecteur aléatoire près [TJ99a]<br />

Solution possible si les <strong>sources</strong> sont discrètes ou parcimonieuses<br />

La métho<strong>de</strong> indirecte est donc impossible<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 21


Problème ACI Indépendance<br />

Indéterminations <strong>de</strong>s solutions revisitées<br />

Propriétés <strong>de</strong> l’indépendance<br />

Si <strong>de</strong>ux <strong>sources</strong> si <strong>et</strong> sj sont indépendantes, alors<br />

sj <strong>et</strong> si sont aussi indépendantes,<br />

kjsj <strong>et</strong> kisi, où ki, kj sont <strong>de</strong>ux scalaires quelconques, sont<br />

aussi indépendantes<br />

fj(sj) <strong>et</strong> fi(si), où fi, fj sont <strong>de</strong>ux fonctions non linéaires<br />

quelconques, sont aussi indépendantes<br />

Hj(sj) <strong>et</strong> Hi(si), où Hi, Hj sont <strong>de</strong>ux filtres quelconques, sont<br />

aussi indépendantes<br />

Propriétés <strong>de</strong>s observations : cas linéaire instantané<br />

Les observations xi(t) = <br />

j aijsj(t) ne sont pas discernables<br />

<strong>de</strong> xi(t) = <br />

j<br />

aij<br />

αj αjsj(t).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 22


Problème ACI Indépendance<br />

Conséquences <strong>de</strong>s indéterminations<br />

Directes, sur la matrice <strong>de</strong> mélange<br />

On ne peut pas estimer la puissance <strong>de</strong>s <strong>sources</strong><br />

P paramètres <strong>de</strong> A ne sont pas i<strong>de</strong>ntifiables ; on ne peut<br />

estimer que les vecteurs colonne <strong>de</strong> A, c’est-à-dire les<br />

“directions” <strong>de</strong>s <strong>sources</strong><br />

on peut imposer à A la contrainte : aii = 1<br />

Indirectes, sur la matrice <strong>de</strong> séparation<br />

On peut fixer arbitrairement la puissance <strong>de</strong>s <strong>sources</strong><br />

estimées : normalisation ou pré-blanchiment<br />

P paramètres <strong>de</strong> B ne sont pas i<strong>de</strong>ntifiables : on peut fixer<br />

bii = 1, ou normaliser les lignes <strong>de</strong> B<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 23


Problème ACI Indépendance<br />

Indépendance <strong>de</strong> variables <strong>et</strong> vecteurs aléatoires<br />

Factorisation <strong>de</strong>s lois ou <strong>de</strong>nsités <strong>de</strong> probabilité (ddp)<br />

Deux variables aléatoires<br />

2 variables aléatoires S1 <strong>et</strong> S2 sont indépendantes ssi<br />

pS1S2 (u1, u2) = pS1 (u1)pS2 (u2)<br />

Vecteurs aléatoires<br />

Soit S un vecteur aléatoire à P composantes S1, . . . SP<br />

Les composantes sont mutuellement indépendantes ssi<br />

pS1S2...SP (u1, u2, . . . uP) = pS1 (u1)pS2 (u2) . . . pSP (uP)<br />

Commentaires<br />

Critère délicat à m<strong>et</strong>tre en oeuvre<br />

Egalité entre <strong>de</strong>ux fonctions multivariables<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 24


Problème ACI Indépendance<br />

Première fonction caractéristique<br />

Fonctions caractéristiques<br />

C’est la T. Fourier inverse <strong>de</strong> la ddp <strong>de</strong> S<br />

Conséquence : même information que dans la ddp<br />

Φ(ν) = E[exp(jν T S)] = . . . pS(u) exp(jν T S)du<br />

Secon<strong>de</strong> fonction caractéristique (SFC)<br />

C’est le log <strong>de</strong> la première fonction caractéristique<br />

Conséquence : même information que dans la ddp<br />

Ψ(ν) = log Φ(ν)<br />

Propriétés <strong>de</strong> la SFC<br />

Ψ(ν) = 0 pour ν = 0<br />

Elle est continue <strong>et</strong> dérivable en ν = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 25


Problème ACI Indépendance<br />

Développement <strong>de</strong> la SFC <strong>et</strong> cumulants<br />

Développement <strong>de</strong> la SFC<br />

Ψ(ν) = ∞ (jν)p<br />

p=1<br />

κp p!<br />

Cumulant d’ordre p<br />

κp = (−j) p d p Ψ(ν)<br />

dν p<br />

<br />

<br />

ν=0<br />

Cumulants d’une variable centrée<br />

κ1 = µ1 = 0 (4)<br />

κ2 = µ2<br />

κ3 = µ3<br />

κ4 = µ4 − 3µ 2 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 26<br />

(5)<br />

(6)<br />

(7)


Problème ACI Indépendance<br />

P<strong>et</strong>it exercice<br />

Calcul <strong>de</strong>s cumulants d’ordre 2, 3 <strong>et</strong> 4 d’une variable centrée<br />

Définition<br />

Ψ(ν) = ∞ (jν)p<br />

p=1<br />

κp p!<br />

Développement en ν = 0<br />

Ψ(ν) = ln Φ(ν) = ln Φ(0)+<br />

Détermination <strong>de</strong>s cumulants...<br />

par simple i<strong>de</strong>ntification<br />

d ln Φ(ν) <br />

<br />

dν ν+ ν=0 d 2 ln Φ(ν)<br />

dν 2<br />

<br />

<br />

ν=0<br />

ν2 2 +. . .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 27


Problème ACI Indépendance<br />

Indépendance, fonction caractéristique <strong>et</strong> cumulants<br />

Indépendance <strong>et</strong> fonction caractéristique<br />

pS1S2...SP (u1, u2, . . . uP) = pS1 (u1)pS2 (u2) . . . pSP (uP) (8)<br />

Développement <strong>de</strong> la SFC<br />

Φ(ν) = Φ(ν1)Φ(ν2) . . . Φ(νP) (9)<br />

ln Φ(ν) = ln Φ(ν1) + . . . + ln Φ(νP) (10)<br />

Ψ(ν) = Ψ(ν1) + Ψ(ν2) + . . . + Ψ(νP) (11)<br />

le terme <strong>de</strong> gauche contient <strong>de</strong>s termes croisés<br />

les termes <strong>de</strong> droites ne contiennent que <strong>de</strong>s dérivées par<br />

rapport à une seule variable<br />

Problème<br />

indépendance ssi tous les cumulants croisés sont nuls<br />

Il y a une infinité <strong>de</strong> cumulants<br />

Peut-on s’arrêter à un ordre donné ? Comment le définir ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 28


Définition<br />

Problème ACI Indépendance<br />

Divergence <strong>de</strong> Kullback-Leibler<br />

KL(f g) = . . . f (u) log<br />

f (u)<br />

g(u) du<br />

KL(f g) mesure l’écart entre les <strong>de</strong>ux distributions f <strong>et</strong> g<br />

c’est un scalaire positif qui s’annule ssi f = g (la<br />

démonstration est aisée en utilisant l’inégalité <strong>de</strong> Jensen)<br />

Divergence KL pour mesure l’indépendance<br />

KL(pYΠpYi ) = . . . pY(u) log pY(u)<br />

ΠpY i (ui ) du<br />

KL(pYΠpYi ) est un scalaire positif qui s’annule ssi les<br />

composantes du vecteur aléatoire Y sont mutuellement<br />

indépendantes<br />

On peut i<strong>de</strong>ntifier c<strong>et</strong>te mesure, KL(pYΠpYi ), à l’information<br />

mutuelle (IM) I (Y) définie en théorie <strong>de</strong> l’information<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 29


Définition<br />

Problème ACI Indépendance<br />

Décomposition <strong>de</strong> l’IM à l’ai<strong>de</strong> d’entropie<br />

<br />

I (Y) = . . . pY(u) log pY(u)<br />

du (12)<br />

ΠpYi (ui)<br />

= <br />

H(Yi) − H(Y) (13)<br />

i<br />

avec les définitions suivantes <strong>de</strong>s entropies marginales <strong>et</strong> jointes<br />

<br />

H(Yi) = pYi (ui) log pYi (ui)dui<br />

<br />

(14)<br />

H(Y) = . . . pY(u) log pY(u)du (15)<br />

Problème<br />

L’estimation <strong>de</strong>s entropies, jointe <strong>et</strong> marginales, nécessite<br />

l’estimation <strong>de</strong>s <strong>de</strong>nsités, jointe <strong>et</strong> marginales, resp.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 30


Problème ACI Indépendance<br />

Une astuce dans le cas linéaire<br />

Dans le cas déterminé, avec une matrice <strong>de</strong> mélange A inversible,<br />

on a :<br />

L’astuce<br />

I (Y) = <br />

H(Yi) − H(Y) (16)<br />

i<br />

= <br />

H(Yi) − H(X) − E[log | <strong>de</strong>t B |] (17)<br />

i<br />

Conséquence<br />

<br />

minB i H(Yi)<br />

<br />

− H(Y) ⇔ minB i H(Yi) − log | <strong>de</strong>t B |<br />

L’astuce évite l’estimation <strong>de</strong> l’entropie jointe !<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 31


Problème ACI Indépendance<br />

Minimisation <strong>de</strong> l’IM <strong>et</strong> fonction score<br />

Dans le cas linéaire, on estime une matrice <strong>de</strong> séparation B qui<br />

minimise I (y).<br />

Dérivée <strong>de</strong> l’IM par rapport à B<br />

dI (Y)<br />

dB<br />

= <br />

i<br />

= <br />

i<br />

dH(Yi)<br />

dB<br />

+ d log | <strong>de</strong>t B |<br />

dB<br />

−d log pYi(yi) dyi<br />

E[ ] + B−T<br />

dyi dB<br />

<br />

fonction score <strong>de</strong> yi<br />

Minimisation <strong>de</strong> l’IM <strong>et</strong> annulation <strong>de</strong> HOS<br />

On écrit<br />

dI (Y) dI (Y)<br />

dB = 0 : dB = E[ϕY(Y)XT ] + B−T = 0<br />

soit, en multipliant par BT :<br />

dI (Y)<br />

dB = 0 ⇔ E[ϕY(Y)YT ] + I = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 32


Pourquoi ?<br />

Problème ACI Indépendance<br />

Critères mieux adaptés<br />

Exploiter <strong>de</strong>s a priori (1/4)<br />

Algorithmes plus simples<br />

Relâchement d’hypothèses<br />

Sélection <strong>de</strong>s <strong>sources</strong> séparées<br />

Sources colorées <strong>et</strong> non stationnaires<br />

Exploite les dépendances temporelles (<strong>sources</strong> non iid)<br />

Voir chapitre sur les métho<strong>de</strong>s au second ordre<br />

Sources positives<br />

Exploite la positivité <strong>de</strong>s <strong>sources</strong> <strong>et</strong>/ou <strong>de</strong>s mélanges<br />

Voir chapitre consacré aux mélanges non négatifs<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 33


Sources discrètes<br />

Problème ACI Indépendance<br />

Exploiter <strong>de</strong>s a priori (2/4)<br />

Si D est le nombre d’états,<br />

Les mélanges (bruités) sont <strong>de</strong>s ensembles <strong>de</strong> points (nuages)<br />

Au total, D P points (nuages)<br />

Mélange linéaire <strong>de</strong> <strong>sources</strong> binaires : points sur les somm<strong>et</strong>s<br />

d’un parallépipè<strong>de</strong><br />

s1 = 0 ⇒ x = a2s2<br />

s2 = 0 ⇒ x = a1s1<br />

où ai = colonne i <strong>de</strong> A<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 34


Problème ACI Indépendance<br />

Sources à support borné<br />

Exploiter <strong>de</strong>s a priori (3/4)<br />

Sources dont la ddp est à support borné, avec valeurs non<br />

nulles sur les bornes<br />

Dans le cas <strong>de</strong> mélange linéaire : les observations sont inscrites<br />

dans un parallépipè<strong>de</strong><br />

s1 = 0 ⇒ x = a2s2<br />

s2 = 0 ⇒ x = a1s1<br />

où ai = colonne i <strong>de</strong> A<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 35


Problème ACI Indépendance<br />

Sources parcimonieuses<br />

Exploiter <strong>de</strong>s a priori (4/4)<br />

Sources dont les échantillons sont presque toujours nuls<br />

Parcimonie dans un espace <strong>de</strong> représentation adapté<br />

Cas <strong>de</strong> mélange linéaire : x = As ⇒ T (x) = T (As) = AT (s)<br />

Voir les cours <strong>de</strong> Stark <strong>et</strong> Dau<strong>de</strong>t<br />

s1 = 0 ⇒ x = a2s2<br />

s2 = 0 ⇒ x = a1s1<br />

où ai = colonne i <strong>de</strong> A<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 36


Problème ACI Indépendance<br />

Cas <strong>de</strong> mélanges linéaires<br />

<strong>Séparation</strong> <strong>de</strong> mélanges bruités<br />

Les observations sont : x(t) = As(t) + n(t), où n est un bruit<br />

additif, indépendant <strong>de</strong> s.<br />

La présence <strong>de</strong> bruit entraîne généralement une erreur sur<br />

l’estimation <strong>de</strong> A ou <strong>de</strong> son inverse.<br />

Si on suppose que B est parfaitement estimée, soit B = A −1 ,<br />

alors:<br />

y(t) = Bx(t) = s(t) + Bn(t)<br />

La séparation idéale conduit évi<strong>de</strong>mment au meilleur SIR, mais<br />

pas forcément au meilleur SNR<br />

Pour <strong>de</strong>s détails voir Chapitres 1 (1.4) <strong>et</strong> 4 (4.7) dans [CJ10]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 37


Problème ACI Indépendance<br />

ACI : supervisée ou non ?<br />

ACI : une métho<strong>de</strong> d’apprentissage<br />

Estimation à partir <strong>de</strong> données empiriques<br />

Apprentissage au sens “machine learning”<br />

Supervisé ou non ?<br />

Critère d’indépendance<br />

Mesure d’indépendance repose sur les données<br />

Non supervisé... aveugle par opposition à “supervisé”, qui<br />

<strong>de</strong>man<strong>de</strong> <strong>de</strong>s informations extérieures (superviseur, valeurs<br />

désirées)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS I.Le problème <strong>et</strong> les principes <strong>de</strong> résolution 38


Algebraic tools Statistical tools Criteria<br />

II. Tools and optimization criteria<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 39


Algebraic tools Statistical tools Criteria<br />

4 Algebraic tools<br />

Spatial whitening<br />

Tensors<br />

5 Statistical tools<br />

In<strong>de</strong>pen<strong>de</strong>nce<br />

Cumulants<br />

Mutual Information<br />

6 Optimization Criteria<br />

Cumulant matching<br />

Contrasts<br />

Contents of course II<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 40


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Spatial whitening (1)<br />

Standardization via Cholesky or QR L<strong>et</strong> x be a zero-mean r.v.<br />

with covariance matrix:<br />

Then Cholesky yields:<br />

Γx <strong>de</strong>f<br />

= E{x x H }<br />

∃L / L L H = Γx<br />

Consequence: L −1 x has a unit variance.<br />

Variable ˜x <strong>de</strong>f<br />

= L −1 x is a standardized random variable.<br />

QR factorization of data matrix as X = L ˜X yields same L as<br />

Cholesky factorization of sample covariance, but more<br />

accurate.<br />

Limitation: L may not be invertible if the covariance Γx is not<br />

full rank.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 41


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Spatial whitening (2)<br />

Standardization via PCA<br />

Definition<br />

PCA is based on second or<strong>de</strong>r statistics<br />

Observed random variable x of dimension K. Then ∃(U, z):<br />

x = Uz, U unitary<br />

where Principal Components zi are uncorrelated<br />

ith column ui of U is called ith PC Loading vector<br />

Two possible calculations:<br />

EVD of Covariance Rx: Rx = UΣ 2 U H<br />

Sample estimate by SVD: X = UΣV H<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 42


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Spatial whitening (3)<br />

Summary<br />

Find a linear transform L such that vector ˜x <strong>de</strong>f<br />

= Lx has unit<br />

covariance. Many possibilities, including:<br />

Remarks<br />

PCA yields ˜x = Σ −1 U H x<br />

Cholesky Rx = L L H yields ˜x = L −1 x<br />

Infinitely many possibilities: L is as good as L Q, for any<br />

unitary Q.<br />

If Rx not invertible, then L not invertible (ill-posed). One may<br />

use pseudo-inverse of Σ in PCA to compute L, or regularize<br />

Rx.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 43


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Bibliography<br />

Antenna Array Processing: [BK83] [Sch86]<br />

Implementations: [CJ10], LaPack, LinPack<br />

Adaptive versions: via QR [Com89a], or via SVD [CG90], or<br />

stochastic relative gradient [CL96]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 44


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Arrays and Multi-linearity<br />

A tensor of or<strong>de</strong>r d is a multi-linear map:<br />

S ∗ 1 ⊗ . . . ⊗ S∗ m → Sm+1 ⊗ . . . ⊗ Sd<br />

Once bases of spaces Sℓ are fixed, they can be represented by<br />

d-way arrays of coordinates<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 45


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Arrays and Multi-linearity<br />

A tensor of or<strong>de</strong>r d is a multi-linear map:<br />

S ∗ 1 ⊗ . . . ⊗ S∗ m → Sm+1 ⊗ . . . ⊗ Sd<br />

Once bases of spaces Sℓ are fixed, they can be represented by<br />

d-way arrays of coordinates<br />

➽ bilinear form, or linear operator: represented by a matrix<br />

➽ trilinear form, or bilinear operator: by a 3rd or<strong>de</strong>r tensor.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 45


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Definitions Table T = {Tij..k}<br />

Arrays and Tensors<br />

Or<strong>de</strong>r of T <strong>de</strong>f<br />

= # of its ways = # of its indices<br />

Dimension Kℓ<br />

<strong>de</strong>f<br />

= range of the ℓth in<strong>de</strong>x<br />

T is Cubic when all dimensions Kℓ = K are equal<br />

T is Symm<strong>et</strong>ric when it is square and when its entries do not<br />

change by any permutation of indices<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 46


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Compact notation<br />

Multi-linearity<br />

Linear change in contravariant spaces:<br />

<br />

=<br />

Denoted compactly<br />

T ′<br />

ijk<br />

Example: covariance matrix<br />

npq<br />

AinBjpCkqTnpq<br />

T ′ = (A, B, C) · T (18)<br />

z = Ax ⇒ Rz = (A, A) · Rx = A RxA T<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 47


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Decomposable tensor<br />

A dth or<strong>de</strong>r “<strong>de</strong>composable” tensor is the tensor product of d<br />

vectors:<br />

T = u ⊗ v ⊗ . . . ⊗ w<br />

and has coordinates Tij...k = ui vj . . . wk.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 48


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Decomposable tensor<br />

A dth or<strong>de</strong>r “<strong>de</strong>composable” tensor is the tensor product of d<br />

vectors:<br />

T = u ⊗ v ⊗ . . . ⊗ w<br />

and has coordinates Tij...k = ui vj . . . wk.<br />

may be seen as a discr<strong>et</strong>ization of multivariate functions whose<br />

variables separate:<br />

t(x, y, . . . , z) = u(x) v(y) . . . w(z)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 48


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Decomposable tensor<br />

A dth or<strong>de</strong>r “<strong>de</strong>composable” tensor is the tensor product of d<br />

vectors:<br />

T = u ⊗ v ⊗ . . . ⊗ w<br />

and has coordinates Tij...k = ui vj . . . wk.<br />

may be seen as a discr<strong>et</strong>ization of multivariate functions whose<br />

variables separate:<br />

t(x, y, . . . , z) = u(x) v(y) . . . w(z)<br />

Nothing else but rank-1 tensors, with forthcoming <strong>de</strong>finition<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 48


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Example<br />

Take v =<br />

1<br />

−1<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 49


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Example<br />

<br />

1<br />

Take v =<br />

−1<br />

<br />

⊗ Then v 3 <strong>de</strong>f<br />

1 −1 −1 1<br />

= v ⊗ v ⊗ v =<br />

−1 1 1 −1<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 49


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Example<br />

<br />

1<br />

Take v =<br />

−1<br />

<br />

⊗ Then v 3 <strong>de</strong>f<br />

1 −1 −1 1<br />

= v ⊗ v ⊗ v =<br />

−1 1 1 −1<br />

This is a “<strong>de</strong>composable” symm<strong>et</strong>ric tensor → rank-1<br />

✇<br />

. ✇<br />

✇<br />

. ✇<br />

✇<br />

. ✇<br />

✇<br />

. ✇<br />

blue bull<strong>et</strong>s = 1, red bull<strong>et</strong>s = −1.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 49


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

From SVD to tensor <strong>de</strong>compositions<br />

Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />

to tensors<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

From SVD to tensor <strong>de</strong>compositions<br />

Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />

to tensors<br />

Keep orthogonality: Orthogonal Tucker, HOSVD<br />

T = (U, V, W) · C<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

From SVD to tensor <strong>de</strong>compositions<br />

Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />

to tensors<br />

Keep orthogonality: Orthogonal Tucker, HOSVD<br />

T = (U, V, W) · C<br />

C is R1 × R2 × R3: multilinear rank = (R1, R2, R3)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

From SVD to tensor <strong>de</strong>compositions<br />

Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />

to tensors<br />

Keep orthogonality: Orthogonal Tucker, HOSVD<br />

T = (U, V, W) · C<br />

C is R1 × R2 × R3: multilinear rank = (R1, R2, R3)<br />

Keep diagonality: Canonical Polyadic <strong>de</strong>composition (CP)<br />

T = (A, B, C) · L (19)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

From SVD to tensor <strong>de</strong>compositions<br />

Matrix SVD, M = (U, V) · Σ, may be exten<strong>de</strong>d in at least two ways<br />

to tensors<br />

Keep orthogonality: Orthogonal Tucker, HOSVD<br />

T = (U, V, W) · C<br />

C is R1 × R2 × R3: multilinear rank = (R1, R2, R3)<br />

Keep diagonality: Canonical Polyadic <strong>de</strong>composition (CP)<br />

T = (A, B, C) · L (19)<br />

L is R × R × R diagonal, λi = 0: rank = R.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 50


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Canonical Polyadic (CP) <strong>de</strong>composition<br />

Any I × J × · · · × K tensor T can be <strong>de</strong>composed as<br />

T = <br />

➽ “Polyadic form” [Hitchcock’27]<br />

q<br />

λq u (q) ⊗ v (q) ⊗ . . . ⊗ w (q)<br />

(20)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 51


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Canonical Polyadic (CP) <strong>de</strong>composition<br />

Any I × J × · · · × K tensor T can be <strong>de</strong>composed as<br />

T =<br />

R(T )<br />

<br />

➽ “Polyadic form” [Hitchcock’27]<br />

q<br />

λq u (q) ⊗ v (q) ⊗ . . . ⊗ w (q)<br />

The tensor rank of T is the minimal number R(T ) of<br />

“<strong>de</strong>composable” terms such that equality holds.<br />

(20)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 51


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Canonical Polyadic (CP) <strong>de</strong>composition<br />

Any I × J × · · · × K tensor T can be <strong>de</strong>composed as<br />

T =<br />

R(T )<br />

<br />

➽ “Polyadic form” [Hitchcock’27]<br />

q<br />

λq u (q) ⊗ v (q) ⊗ . . . ⊗ w (q)<br />

The tensor rank of T is the minimal number R(T ) of<br />

“<strong>de</strong>composable” terms such that equality holds.<br />

May impose unit norm vectors u (q) , v (q) , . . . w (q)<br />

(20)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 51


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Frank Lauren Hitchcock<br />

(1875-1957)<br />

[Courtesy of L-H.Lim]<br />

Hitchcock<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 52


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Frank Lauren Hitchcock<br />

(1875-1957)<br />

[Courtesy of L-H.Lim]<br />

Hitchcock<br />

Clau<strong>de</strong> Elwood Shannon<br />

(1916-2001)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 52


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Towards a unique terminology<br />

Minimal Polyadic Form [Hitchcock’27]<br />

Canonical <strong>de</strong>composition [Weinstein’84, Carroll’70,<br />

Chiantini-Ciliberto’06, Comon’00, Khoromskij, Tyrtyshnikov]<br />

Parafac [Harshman’70, Sidiropoulos’00]<br />

Optimal computation [Strassen’83]<br />

Minimum-length additive <strong>de</strong>composition (AD) [Iarrobino’96]<br />

Suggestion:<br />

Canonical Polyadic <strong>de</strong>composition (CP) [Comon’08, Grasedyk,<br />

Espig...]<br />

CP does also already stand for Can<strong>de</strong>comp/Parafac [Bro’97,<br />

Kiers’98, tenBerge’04...]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 53


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Richard A. Harshman<br />

(1970)<br />

Psychom<strong>et</strong>rics<br />

J. Douglas Carroll<br />

(1970)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 54


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Uniqueness: Kruskal 1/2<br />

The Kruskal rank of a matrix A is the maximum number kA, such<br />

that any subs<strong>et</strong> of kA columns are linearly in<strong>de</strong>pen<strong>de</strong>nt.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 55


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Uniqueness: Kruskal 2/2<br />

Sufficient condition for uniqueness of CP<br />

[Kruskal’77, Sidiropoulos-Bro’00, Landsberg’09]:<br />

Essential uniqueness is ensured if tensor rank R is below the<br />

so-called Kruskal’s bound:<br />

2R + 2 ≤ kA + kB + kC<br />

(21)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 56


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Uniqueness: Kruskal 2/2<br />

Sufficient condition for uniqueness of CP<br />

[Kruskal’77, Sidiropoulos-Bro’00, Landsberg’09]:<br />

Essential uniqueness is ensured if tensor rank R is below the<br />

so-called Kruskal’s bound:<br />

2R + 2 ≤ kA + kB + kC<br />

or generically, for a tensor of or<strong>de</strong>r d and dimensions Nℓ:<br />

2R ≤<br />

d<br />

min(Nℓ, R) − d + 1<br />

ℓ=1<br />

➽ Bound much smaller than expected rank:<br />

∃ a much b<strong>et</strong>ter bound, in almost sure sense<br />

(21)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 56


Example<br />

T =<br />

Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Rank-3 example 1/2<br />

2 = +<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 57<br />

+ 2


Example<br />

T =<br />

Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Rank-3 example 1/2<br />

2 = +<br />

= + + 2<br />

blue bull<strong>et</strong>s = 1, red bull<strong>et</strong>s = −1.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 57<br />

+ 2


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Rank-3 example 2/2<br />

Conclusion: the 2 × 2 × 2 tensor<br />

<br />

0<br />

T = 2<br />

1<br />

1<br />

0<br />

1<br />

0<br />

<br />

0<br />

0<br />

admits the CP<br />

T =<br />

1<br />

1<br />

⊗ 3<br />

<br />

−1<br />

+<br />

1<br />

⊗ 3<br />

+ 2<br />

and has rank 3, hence larger than dimension<br />

0<br />

−1<br />

⊗ 3<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 58


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Why need for approximation?<br />

Additive noise in measurements<br />

Noise has a continuous probability distribution<br />

Then tensor rank is generic<br />

Hence there are often infinitely many CP <strong>de</strong>compositions<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 59


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Why need for approximation?<br />

Additive noise in measurements<br />

Noise has a continuous probability distribution<br />

Then tensor rank is generic<br />

Hence there are often infinitely many CP <strong>de</strong>compositions<br />

➽ Approximations aim at g<strong>et</strong>ting rid of noise, and at restoring<br />

uniqueness:<br />

Arg inf ||T −<br />

a(p),b(p),c(p)<br />

r<br />

p=1<br />

a(p) ⊗ b(p) . . . ⊗ c(p)|| 2<br />

But infimum may be reached for tensors of rank > r !<br />

(22)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 59


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Bor<strong>de</strong>r rank<br />

T has bor<strong>de</strong>r rank R iff it is the limit of tensors of rank R, and not<br />

the limit of tensors of lower rank.<br />

[Bini’79, Schönhage’81, Strassen’83, Likteig’85]<br />

rank > R<br />

+<br />

+<br />

+<br />

+<br />

+++<br />

+<br />

x<br />

rank ≤ R<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 60


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Example<br />

L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />

T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Example<br />

L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />

T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />

And <strong>de</strong>fine sequence T ε = 1<br />

ε<br />

(u + ε v) ⊗ 4 − u ⊗ 4 .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Example<br />

L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />

T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />

And <strong>de</strong>fine sequence T ε = 1<br />

ε<br />

Then T ε → T 0 as ε → 0<br />

(u + ε v) ⊗ 4 − u ⊗ 4 .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Example<br />

L<strong>et</strong> u and v be non collinear vectors. Define T0 [CGLM08]:<br />

T 0 = u ⊗ u ⊗ u ⊗ v + u ⊗ u ⊗ v ⊗ u + u ⊗ v ⊗ u ⊗ u + v ⊗ u ⊗ u ⊗ u<br />

And <strong>de</strong>fine sequence T ε = 1<br />

ε<br />

Then T ε → T 0 as ε → 0<br />

(u + ε v) ⊗ 4 − u ⊗ 4 .<br />

➽ Hence rank{T 0 } = 4, but bor<strong>de</strong>r rank is 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 61


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Ill-posedness<br />

Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />

approximating sequence contains several <strong>de</strong>composable tensors<br />

which<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Ill-posedness<br />

Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />

approximating sequence contains several <strong>de</strong>composable tensors<br />

which<br />

tend to infinity and<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Ill-posedness<br />

Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />

approximating sequence contains several <strong>de</strong>composable tensors<br />

which<br />

tend to infinity and<br />

cancel each other, viz, some columns become close to collinear<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Ill-posedness<br />

Tensors for which the bor<strong>de</strong>r rank is < than rank are such that the<br />

approximating sequence contains several <strong>de</strong>composable tensors<br />

which<br />

tend to infinity and<br />

cancel each other, viz, some columns become close to collinear<br />

I<strong>de</strong>as towards a well-posed problem:<br />

Prevent collinearity or bound columns.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 62


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Remedies<br />

1 Impose orthogonality of columns within factor matrices<br />

[Com92a]<br />

2 Impose orthogonality b<strong>et</strong>ween <strong>de</strong>composable tensors [Kol01]<br />

3 Prevent ten<strong>de</strong>ncy to infinity by norm constraint on factor<br />

matrices [Paa00]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 63


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Remedies<br />

1 Impose orthogonality of columns within factor matrices<br />

[Com92a]<br />

2 Impose orthogonality b<strong>et</strong>ween <strong>de</strong>composable tensors [Kol01]<br />

3 Prevent ten<strong>de</strong>ncy to infinity by norm constraint on factor<br />

matrices [Paa00]<br />

4 Nonnegative tensors: impose <strong>de</strong>composable tensors to be<br />

nonnegative [LC09] → “nonnegative rank”<br />

5 Impose minimal angle b<strong>et</strong>ween columns of factor matrices<br />

[CL11]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 63


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Tensors as linear operators<br />

Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />

restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Tensors as linear operators<br />

Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />

restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />

Representation with a marix: Flattening or Unfolding matrix<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Tensors as linear operators<br />

Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />

restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />

Representation with a marix: Flattening or Unfolding matrix<br />

Resorts to special matrix products: Kronecker ⊗, and<br />

Khatri-Rao ⊙ matrix products.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Tensors as linear operators<br />

Warning: Consi<strong>de</strong>ring a tensor as a linear operator is<br />

restrictive (eg. rank is boun<strong>de</strong>d by smallest dimension)<br />

Representation with a marix: Flattening or Unfolding matrix<br />

Resorts to special matrix products: Kronecker ⊗, and<br />

Khatri-Rao ⊙ matrix products.<br />

NB: do not confuse tensor ⊗ and Kronecker ⊗ products...<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 64


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Matrix products (1/2)<br />

Kronecker product b<strong>et</strong>ween two matrices A and B of dimensions<br />

m × n and p × q respectively, is a matrix of size mp × nq <strong>de</strong>fined as:<br />

⎛<br />

⎞<br />

A ⊗ B <strong>de</strong>f<br />

=<br />

⎜<br />

⎝<br />

A11B A12B · · ·<br />

A21B A22B · · ·<br />

.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 65<br />

.<br />

⎟<br />


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Matrix products (1/2)<br />

Kronecker product b<strong>et</strong>ween two matrices A and B of dimensions<br />

m × n and p × q respectively, is a matrix of size mp × nq <strong>de</strong>fined as:<br />

⎛<br />

⎞<br />

A ⊗ B <strong>de</strong>f<br />

=<br />

⎜<br />

⎝<br />

A11B A12B · · ·<br />

A21B A22B · · ·<br />

.<br />

Khatri-Rao product b<strong>et</strong>ween two matrices with the same number<br />

of columns:<br />

.<br />

⎟<br />

⎠<br />

A ⊙ B <strong>de</strong>f<br />

= a1 ⊗ b1 a2 ⊗ b2 · · · .<br />

☞ This is a column-wise Kronecker product.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 65


Algebraic tools Statistical tools Criteria Spatial whitening Tensors<br />

Matrix products (2/2)<br />

Example: L<strong>et</strong><br />

a ∈ S, space of basis {e(i)} of dim. I ,<br />

b ∈ S ′ , space of basis {e ′ (j)} of dim. J,<br />

c ∈ S ′′ , space of basis {e ′′ (k)} of dim. K<br />

Then tensor a ⊗ b ⊗ c has coordinates given by vector a ⊗ b ⊗ c in<br />

basis {e(i) ⊗ e(j) ′ ⊗ e ′′ (k)}<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 66


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Decorrelation vs In<strong>de</strong>pen<strong>de</strong>nce<br />

Example 1: Mixture of 2 iid <strong>sources</strong><br />

Consi<strong>de</strong>r the mixture of two in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong><br />

<br />

x1 1 1 s1<br />

=<br />

·<br />

1 −1<br />

x2<br />

where E{s 2 i } = 1 and E{si} = 0. Then xi are uncorrelated:<br />

s2<br />

E{x1 x2} = E{s 2 1} − E{s 2 2} = 0<br />

But xi are not in<strong>de</strong>pen<strong>de</strong>nt since, for instance:<br />

E{x 2 1 x 2 2 } − E{x 2 1 }E{x 2 2 } = E{s 4 1} + E{s 4 2} − 6 = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 67


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

PCA vs ICA<br />

Example 2: 2 <strong>sources</strong> and 2 sensors<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 68


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Statistical in<strong>de</strong>pen<strong>de</strong>nce<br />

Definition<br />

Components sk of a K-dimensional r.v. s are mutually in<strong>de</strong>pen<strong>de</strong>nt<br />

The joint pdf equals the product of marginal pdf’s:<br />

⇕<br />

ps(u) = <br />

ps (uk) (23)<br />

k<br />

Definition<br />

Components sk of s are pairwise in<strong>de</strong>pen<strong>de</strong>nt ⇔ Any pair of<br />

components (sk, sℓ) are mutually in<strong>de</strong>pen<strong>de</strong>nt.<br />

k<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 69


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Mutual and Pairwise in<strong>de</strong>pen<strong>de</strong>nce<br />

Darmois’s Theorem (1953)<br />

L<strong>et</strong> two random variables be <strong>de</strong>fined as linear combinations of<br />

in<strong>de</strong>pen<strong>de</strong>nt random variables xi:<br />

X1 =<br />

N<br />

ai xi, X2 =<br />

i=1<br />

N<br />

i=1<br />

bi xi<br />

Then, if X1 and X2 are in<strong>de</strong>pen<strong>de</strong>nt, those xj for which ajbj = 0 are<br />

Gaussian.<br />

Proof<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 70


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Mutual and Pairwise in<strong>de</strong>pen<strong>de</strong>nce (cont’d)<br />

Corollary [Com92a]<br />

If z = C s, where si are in<strong>de</strong>pen<strong>de</strong>nt r.v., with at most one of them<br />

being Gaussian, then the following properties are equivalent:<br />

1 Components zi are pairwise in<strong>de</strong>pen<strong>de</strong>nt<br />

2 Components zi are mutually in<strong>de</strong>pen<strong>de</strong>nt<br />

3 C = Λ P, with Λ diagonal and P permutation<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 71


First c.f.<br />

Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Characteristic functions<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{e j tx } = <br />

u ej tu dFx(u)<br />

Real Multivariate: Φx(t) <strong>de</strong>f<br />

= E{e j tT x } = <br />

u ej tT x dFx(u)<br />

Second c.f.<br />

Ψ(t) <strong>de</strong>f<br />

= log Φ(t)<br />

Properties:<br />

Always exists in the neighborhood of 0<br />

Uniquely <strong>de</strong>fined as long as Φ(t) = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 72


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Moments:<br />

Cumulants:<br />

Definition of Multivariate cumulants<br />

<strong>de</strong>f<br />

Cij..k<br />

µ ′ <strong>de</strong>f<br />

ij..k<br />

= E{xixj..xk}<br />

= (−j)<br />

<br />

r times<br />

r<br />

= Cum{xi, xj.., xk}<br />

= (−j)<br />

<br />

r times<br />

r<br />

∂r <br />

Φ(t) <br />

<br />

∂ti∂ti..∂tk<br />

∂r <br />

Ψ(t) <br />

<br />

∂ti∂ti..∂tk<br />

t=0<br />

t=0<br />

NB: Leonov Shiryayev relationship b<strong>et</strong>ween Moments and<br />

Cumulants obtained by expanding both si<strong>de</strong>s in Taylor series:<br />

Log Φx(t) = Ψx(t)<br />

(24)<br />

(25)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 73


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Definition of Multivariate cumulants (cont’d)<br />

First cumulants:<br />

µ ′ i = Ci<br />

µ ′ ij = Cij + CiCj<br />

µ ′ ijk = Cijk + [3] CiCjk + CiCjCk<br />

with [n]: Mccullagh’s brack<strong>et</strong> notation.<br />

Next, for zero-mean variables:<br />

µijkℓ = Cijkℓ + [3] CijCkℓ<br />

µijkℓm = Cijkℓm + [10] CijCkℓm<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 74


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Definition of Complex Cumulants<br />

Definition<br />

L<strong>et</strong> z = x + j y. Then pdf pz = joint pdf px,y<br />

Notation<br />

Characteristic function:<br />

Φz(w) = E{exp[j(x T u + y T v)]} = E{exp[jℜ(z H w)]}<br />

where w <strong>de</strong>f<br />

= u + jv.<br />

Generates Moments & Cumulants, e.g.:<br />

j<br />

Variance: Var{z}ij = C<br />

z i<br />

Higher or<strong>de</strong>rs: Cum{zi, . . . , zj, z∗ k , . . . , z∗ ℓ } = C z<br />

where conjugated r.v. are labeled in superscript.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 75<br />

kℓ<br />

ij


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Properties of Cumulants<br />

Multi-linearity (also enjoyed by moments):<br />

Cum{αX , Y , .., Z} = α Cum{X , Y , .., Z} (26)<br />

Cum{X1 + X2, Y , .., Z} = Cum{X1, Y , .., Z} + Cum{X2, Y , .., Z}<br />

Cancellation: If {Xi} can be partitioned into 2 groups of<br />

in<strong>de</strong>pen<strong>de</strong>nt r.v., then<br />

Cum{X1, X2, .., Xr } = 0 (27)<br />

Additivity: If X and Y are in<strong>de</strong>pen<strong>de</strong>nt, then<br />

Cum{X1 + Y1, X2 + Y2, .., Xr + Yr } = Cum{X1, X2, .., Xr }<br />

+ Cum{Y1, Y2, .., Yr }<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 76


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Mutual Information: <strong>de</strong>finition<br />

According to the <strong>de</strong>finition of page 95, one should measure a<br />

divergence:<br />

<br />

N<br />

δ px,<br />

<br />

i=1<br />

If the Kullback divergence is used:<br />

K(px, py ) <strong>de</strong>f<br />

<br />

= px(u) log px(u)<br />

py (u) du,<br />

then we g<strong>et</strong> the Mutual Information as an in<strong>de</strong>pen<strong>de</strong>nce<br />

measure:<br />

<br />

px(u)<br />

I (px) = px(u) log du. (28)<br />

N<br />

pxi i=1 (ui)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 77<br />

pxi


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

MI always positive<br />

Mutual Information: properties<br />

Cancels if r.v. are mutually in<strong>de</strong>pen<strong>de</strong>nt<br />

MI is invariant by scale change<br />

Proof...<br />

Example 3: Gaussian case<br />

I (gx) = 1<br />

2 log<br />

<br />

Vii<br />

<strong>de</strong>t V<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 78


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Mutual Information: <strong>de</strong>composition<br />

Define the Negentropy as the divergence:<br />

<br />

J(px) = K(px, gx) = px(u) log px(u)<br />

du.<br />

gx(u)<br />

(29)<br />

Negentropy is invariant by invertible transforms<br />

Then MI can be <strong>de</strong>composed into:<br />

I (px) = I (gx) + J(px) − <br />

J(p)<br />

<br />

i J(pi)<br />

I (g) ✟<br />

❅ ✘✘✘✿<br />

✘✘✘<br />

❅ ✘✘✘<br />

❅❘✘ ✘✘✘ I (p)<br />

✟✟✟✟✟✟✯<br />

i<br />

J(pxi ). (30)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 79


px(u)<br />

gx(u)<br />

Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Edgeworth expansion of a pdf (2)<br />

Francis Edgeworth (1845-1926).<br />

= 1 + 1<br />

3! κ3 h3(v) + 1<br />

4! κ4 h4(v) + 10<br />

6! κ2 3 h6(v)<br />

+ 1<br />

5! κ5 h5(v) + 35<br />

7! κ3κ4 h7(v) + 280<br />

9! κ3 3 h9(v) + . . .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 80


Algebraic tools Statistical tools Criteria In<strong>de</strong>pen<strong>de</strong>nce Cum MI<br />

Edgeworth expansion of the MI<br />

For standardized random variables x, the approximation [Com92a]:<br />

I (px) = J(px)− 1<br />

48<br />

<br />

i<br />

2 2 4 2<br />

4 C + Ciiii + 7 Ciii − 6 Ciii Ciiii +o(m<br />

iii<br />

−2 ).<br />

If 3rd or<strong>de</strong>r = 0, then I (px) ≈ J(px) − 1 <br />

12<br />

If 3rd or<strong>de</strong>r ≈ 0, then I (px) ≈ J(px) − 1<br />

48<br />

i C 2<br />

iii<br />

2<br />

<br />

i C iiii<br />

(31)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 81


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Optimization Criteria<br />

Cumulant matching<br />

Contrast criteria<br />

Mutual Information<br />

Likelihood<br />

CoM family (after prewhitening)<br />

STO, JAD (after prewhitening)<br />

Entropy (after NL transform)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 82


Principle<br />

Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

I<strong>de</strong>ntification by Cumulant matching<br />

Estimate the mixture by solving the I/O Multi-linear equations<br />

Apply a separating filter based on the latter estimate<br />

s x<br />

✲ A ✲<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 83


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Noiseless mixture of 2 <strong>sources</strong><br />

Example 4: 2 × 2 by Cumulant matching (cf. <strong>de</strong>mo PCA-ICA)<br />

After standardization, the mixture takes the form<br />

<br />

cos α<br />

x =<br />

sin α e<br />

− sin α ejϕ −jϕ cos α<br />

<br />

s (32)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 84


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Noiseless mixture of 2 <strong>sources</strong><br />

Example 5: 2 × 2 by Cumulant matching (cf. <strong>de</strong>mo PCA-ICA)<br />

After standardization, the mixture takes the form<br />

<br />

cos α<br />

x =<br />

sin α e<br />

− sin α ejϕ −jϕ cos α<br />

<br />

s (32)<br />

Denote γ kℓ<br />

ij = Cum{xi, xj, x ∗ k , x ∗ ℓ } and κi = Cum{si, si, s ∗ i , s∗ i }.<br />

Then by Multi-linearity:<br />

γ 12<br />

12 = cos 2 α sin 2 α (κ1 + κ2)<br />

γ 12<br />

11 = cos 3 α sin α e jϕ κ1 − cos α sin 3 α e jϕ κ2<br />

γ 22<br />

12 = cos α sin 3 α e jϕ κ1 − cos 3 α sin α e jϕ κ2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 84


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Noiseless mixture of 2 <strong>sources</strong><br />

Example 6: 2 × 2 by Cumulant matching (cf. <strong>de</strong>mo PCA-ICA)<br />

After standardization, the mixture takes the form<br />

<br />

cos α<br />

x =<br />

sin α e<br />

− sin α ejϕ −jϕ cos α<br />

<br />

s (32)<br />

Denote γ kℓ<br />

ij = Cum{xi, xj, x ∗ k , x ∗ ℓ } and κi = Cum{si, si, s ∗ i , s∗ i }.<br />

Then by Multi-linearity:<br />

γ 12<br />

12 = cos 2 α sin 2 α (κ1 + κ2)<br />

γ 12<br />

11 = cos 3 α sin α e jϕ κ1 − cos α sin 3 α e jϕ κ2<br />

γ 22<br />

12 = cos α sin 3 α e jϕ κ1 − cos 3 α sin α e jϕ κ2<br />

Compact solution: γ22<br />

12−γ12 11<br />

γ12 12<br />

= −2 cot 2α e jϕ<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 84


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Now the inverse approach<br />

Cumulant matching (direct approach: i<strong>de</strong>ntification)<br />

Contrast Criteria (inverse approach: equalization):<br />

s x<br />

z<br />

✲ A ✲ B<br />

x ✲ ✲<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 85


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Noisy Mixtures of 2 <strong>sources</strong><br />

Example 7: Separation of 2 non Gaussian <strong>sources</strong> by<br />

contrast maximization<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 86


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Source additional hypotheses<br />

H1. Each <strong>sources</strong> sj[k] is an i.i.d. sequence, for any fixed j<br />

H2. Sources sj are mutually statistically in<strong>de</strong>pen<strong>de</strong>nt<br />

H3. At most one source is Gaussian<br />

H4. At most one source has a null marginal cumulant<br />

H5. Sources are Discr<strong>et</strong>e, and belong to some known alphab<strong>et</strong><br />

(but may be stat. <strong>de</strong>pen<strong>de</strong>nt)<br />

H6. Sources sj[k] are sufficiently exciting<br />

H7. Sources are colored, and the s<strong>et</strong> of source spectra forms a<br />

family of linearly in<strong>de</strong>pen<strong>de</strong>nt functions<br />

H8. Sources are non stationary, and have different time<br />

profiles<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 87


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Trivial Filters<br />

They account for Inherent in<strong>de</strong>terminacies, remaining after<br />

assuming Source additional hypotheses<br />

For instance:<br />

For dynamic (convolutive) mixtures, un<strong>de</strong>r H1, H2, H3,<br />

ˇT[z] = P ˇD[z], where P is a permutation, and ˇD[z] a diagonal<br />

filter, with entries of the form ˇDpp[z] = λp z δp , where δp is an<br />

integer.<br />

For static mixtures, un<strong>de</strong>r H2, H3, T = PD, where P<br />

permutation and D diagonal invertible.<br />

In other words, if s satisfies Hi, then so does Ts<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 88


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Contrast criteria: <strong>de</strong>finition<br />

Axiomatic <strong>de</strong>finition<br />

A Contrast optimization criterion Υ should enjoy 3 properties:<br />

Invariance: Υ should not change un<strong>de</strong>r the action of trivial<br />

filters (cf. <strong>de</strong>finition sli<strong>de</strong> 116)<br />

Domination: If <strong>sources</strong> are already separated, any filter should<br />

<strong>de</strong>crease (or leave unchanged) Υ<br />

Discrimination: The maximum achievable value should be<br />

reached only when <strong>sources</strong> are separated (i.e. all absolute<br />

maxima are related to each other by trivial filters)<br />

To summarize:<br />

∀Q, Υ(Q) ≤ Υ(I), with equality iff Q trivial (33)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 89


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Examples of contrasts (1/2)<br />

Minimize Mutual Information I (z)<br />

Maximize Noiseless Likelihood ΥML = −I (z) − <br />

i K(zi, si), if<br />

are known<br />

psi<br />

Maximize Entropy H(z), if H(zi) are kept constant<br />

❅<br />

❅❅<br />

−ΥML<br />

❅<br />

✏ ❅❘<br />

✏✏✏✏✏✏✏✏✏✏✏✏✏✶<br />

✛<br />

✓<br />

✓<br />

✓✓✓✼<br />

H(z)<br />

<br />

i H(zi)<br />

<br />

i Ki<br />

I (z)<br />

✲<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 90


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Examples of contrasts (2/2)<br />

Contrasts based on tensor slices<br />

<br />

<strong>de</strong>f<br />

= =<br />

Υ (r) <strong>de</strong>f<br />

CoM<br />

i<br />

|Ciiii| r , Υ STO<br />

➽ Υ (2)<br />

CoM ≤ Υ STO ≤ Υ JAD shows that Υ CoM<br />

Example for 4 × 4 × 4 tensors<br />

ij<br />

|Ciiij| 2 <strong>de</strong>f<br />

, ΥJAD = <br />

ijk<br />

is more sensitive<br />

Matrix slices diagonalization = Tensor diagonalization<br />

|Ciijk| 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 91


Algebraic tools Statistical tools Criteria Cumulant matching Contrasts<br />

Optimization criteria<br />

Bibliographical notes<br />

Expansion of the MI in terms of cumulants [Com92a] [Com94a]<br />

Contrasts: CoM2 [Com92a] [Com94a], JADE [CS93], STO<br />

[LMV01]<br />

Link b<strong>et</strong>ween Likelihood and Infomax [Car99] [Car97]<br />

Likelihood, Estimating equations: [CJ10, ch.4]<br />

Pairwise sweeping in tensors: [Com89b] [Com92a] [CS93]<br />

[Com94a] [Com94b] [MvL08]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS II.Tools and optimization criteria 92


Introduction Colored Nonstationary GEVD Discussion<br />

III. Second Or<strong>de</strong>r M<strong>et</strong>hods<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 93


Introduction Colored Nonstationary GEVD Discussion<br />

7 Introduction<br />

8 Colored <strong>sources</strong><br />

9 Nonstationary <strong>sources</strong><br />

10 Related approaches<br />

11 Discussion<br />

Plan du cours III<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 94


Introduction Colored Nonstationary GEVD Discussion<br />

Decorrelation ?<br />

In<strong>de</strong>pen<strong>de</strong>nce or <strong>de</strong>correlation?<br />

In<strong>de</strong>pen<strong>de</strong>nce ⇒ <strong>de</strong>correlation, but reverse is wrong<br />

However, <strong>de</strong>correlation is a first step to in<strong>de</strong>pen<strong>de</strong>nce, hence<br />

2-step approaches with whitening or sphering<br />

Algebraic point of view<br />

For 2 linear mixtures of 2 <strong>sources</strong> (assumed zero mean, iid)<br />

B has 4 unknown param<strong>et</strong>ers,<br />

using SOS, one has 3 equations related to E(Y 2 1 ), E(Y 2 2 ) and<br />

E(Y1Y2)<br />

For P linear mixtures of P <strong>sources</strong> (assumed zero mean, iid)<br />

B has P2 unknown param<strong>et</strong>ers,<br />

using SOS, one has P + P(P − 1)/2 = P(P + 1)/2 equations<br />

related to E(Y 2 1 ), . . . E(Y 2 P ) and<br />

E(Y1Y2), E(Y1Y3) . . . , E(YPYP−1)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 95


Introduction Colored Nonstationary GEVD Discussion<br />

Assumptions<br />

Zeroing the <strong>de</strong>correlation<br />

For 2 linear mixtures of two <strong>sources</strong><br />

ajj = 1 and enforcing bjj = 1<br />

zero-mean <strong>sources</strong> with variance σ 2 i<br />

Decorrelation equation<br />

In the plane (b12, b21), solution to E[Y1Y2] = 0 is a s<strong>et</strong> of<br />

hyperboles<br />

<br />

b12 a2 21 +<br />

b21 =<br />

σ2 2<br />

σ2 σ<br />

+a21+a12<br />

1<br />

2 2<br />

σ2 1<br />

σ<br />

b12 a21+a12<br />

2 2<br />

σ2 <br />

+1+a2 σ<br />

21<br />

1<br />

2 2<br />

σ2 1<br />

Each hyperbole <strong>de</strong>pends on the mixture coefficients and on the<br />

variance ratio.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 96


Introduction Colored Nonstationary GEVD Discussion<br />

Decorrelation hyperboles<br />

Zeroing the correlation<br />

Hyperboles are intersected in points related to mixing matrix<br />

coefficients<br />

<br />

1<br />

A =<br />

0.7<br />

<br />

0.4<br />

1<br />

(σ2/σ1) 2 = 1 in blue dots, (σ2/σ1) 2 = 2 in red +.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 97


Introduction Colored Nonstationary GEVD Discussion<br />

Gaussian non iid <strong>sources</strong><br />

Second or<strong>de</strong>r separation<br />

Colored <strong>sources</strong>: AMUSE, Tong <strong>et</strong> al.[TSLH90]; SOBI,<br />

Belouchrani <strong>et</strong> al. [BAM93]<br />

Nonstationary <strong>sources</strong>: Matsuoka <strong>et</strong> al. [MOK95]; Pham and<br />

Cardoso [PC01]<br />

Related approaches<br />

GEVD problem<br />

Common Spatial Pattern<br />

Periodic Component Analysis [TSS + 09, SJS10]<br />

Discussion<br />

Joint diagonalization<br />

Criteria and practical issues<br />

Conclusions<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 98


Introduction Colored Nonstationary GEVD Discussion<br />

For real linear mixtures<br />

A 2-step approach<br />

B can be factorized in 2 matrices:<br />

a whitening (or sphering) matrix W<br />

an orthogonal matrix U<br />

W is computed so that E[ZZ T ] = I, i.e.<br />

E[WAS(WAS) T ] = WAE[SS T ](WA) T = WA(WA) T = I<br />

It means that WA is an orthogonal matrix, consequently U<br />

must be an orthogonal matrix too.<br />

W is a symm<strong>et</strong>ric matrix, and its estimation requires<br />

P(P + 1)/2 param<strong>et</strong>ers<br />

U is associated to P(P − 1)/2 plane (Givens) rotations<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 99


Introduction Colored Nonstationary GEVD Discussion<br />

Decorrelation not enough for iid Gaussian<br />

After whitening<br />

samples z(t) are spatially non correlated: E[ZZ T ] = I<br />

for Gaussian iid <strong>sources</strong>, after whitening, <strong>sources</strong> are<br />

statistically in<strong>de</strong>pen<strong>de</strong>nt: there is no way for estimating U !<br />

Using mutual information<br />

Remember the estimating equations: E[ϕyi (yi)yj] = 0, ∀i = j,<br />

where ϕyi is the score function<br />

for Gaussian (iid or not) <strong>sources</strong> with unit variance, ϕyi = +yi<br />

Consequently, for Gaussian iid variables, MI minimisation is<br />

equivalent to <strong>de</strong>correlation !<br />

Short visual <strong>de</strong>mo<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 100


Introduction Colored Nonstationary GEVD Discussion<br />

I<strong>de</strong>a<br />

Basic i<strong>de</strong>a and early works<br />

Computing a transform B which <strong>de</strong>correlates simultaneously<br />

yi(t) and yj(t − τ), for various values of τ,<br />

Equivalent to cancel simultaneously E[yi(t)yj(t − τ)], ∀i = j,<br />

or to diagonalize simultaneously E[y(t)y T (t − τ)], for at least<br />

two values of τ.<br />

Early works<br />

Féty, in his PhD (1988) [F´88]<br />

AMUSE, proposed by Tong <strong>et</strong> al. in 1990 [TSLH90]<br />

SOBI, proposed by Belouchrani <strong>et</strong> al. in 1993 [BAM93]<br />

Molge<strong>de</strong>y and Schuster in 1994 [MS94]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 101


Introduction Colored Nonstationary GEVD Discussion<br />

Mo<strong>de</strong>l<br />

Notations and assumptions<br />

Linear instantaneous mixture: x(n) = As(n), n = 1, . . . , N<br />

Assumptions<br />

Sources sp(.), p = 1, . . . , P, are zero-mean, real, second or<strong>de</strong>r<br />

stationay processes; they are spatially <strong>de</strong>correlated<br />

A is a real valued, square and regular matrix<br />

Notations<br />

Covariance matrices of <strong>sources</strong> s: Rs (τ) = E[s(t)s T (t − τ)]<br />

Due to source spatial <strong>de</strong>correlation, Rs (τ) are real diagonal<br />

matrices. We assume Rs (0) = I.<br />

Covariance matrices of observations x:<br />

Rx (τ) = E[x(t)x T (t − τ)]<br />

Since Rx (τ) = ARs (τ)A T , they can be simultaneously<br />

diagonalized<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 102


Introduction Colored Nonstationary GEVD Discussion<br />

I<strong>de</strong>ntifiability: first step<br />

First step: diagonalization of Rx (0)<br />

Diagonalization of Rx (0) = Whitening of x<br />

B0Rx (0)B T 0 = I<br />

B0(ARs (0)A T )B T 0 = I<br />

(B0A)Rs (0)(B0A) T = I<br />

Since Rs (0) = I, we <strong>de</strong>duce B0A = UT 0 is orthogonal<br />

B0 provi<strong>de</strong>s a source estimation s0(.) = B0x(.) = UT 0 s(.)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 103


Introduction Colored Nonstationary GEVD Discussion<br />

I<strong>de</strong>ntifiability: second step<br />

Second step: diagonalization of Rs 0 (1)<br />

One computes the orthogonal matrix U1, which diagonalizes<br />

Rs 0 (1). Rs 0 (1) = UT 1 D1U1<br />

= B0Rx (1)B T 0 = U T 0 Rs (1)U0<br />

It is clearly equivalent to diagonalize Rx (1)<br />

B1 = U1B0 then jointly diagonalizes Rx (0) and Rx (1):<br />

B1Rx (0)BT 1 = I and B1Rx (1)BT 1 = D1<br />

The new source estimation s1(.) = B1x(.) = U1UT 0 s(.) is a<br />

source separation solution if Rs (1) has distinct diagonal entries<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 104


Introduction Colored Nonstationary GEVD Discussion<br />

If Rs(1) has equal diagonal entries<br />

Case of 2 mixtures of 2 <strong>sources</strong><br />

Since Rs (τ) = ρ(τ)I, it is invariant by any orthogonal<br />

transform U. In fact:<br />

Hence:<br />

URs (τ)U T = ρ(τ)UU T = ρ(τ)I<br />

x(.) = As(.) = [AU T ][Us(.)]<br />

Rx (.) = ARs (.)A T = [AU T ]Rs (.)[AU T ] T<br />

It is thus impossible to distinguish As(.) and [AU T ][Us(.)]<br />

In other words, A can be estimated, only up to any orthogonal<br />

matrix !<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 105


Introduction Colored Nonstationary GEVD Discussion<br />

General case<br />

Solution<br />

If Rs(1) has equal diagonal entries<br />

If two entries of Rs (1) are equal, say ρi(τ) = ρj(τ), it is<br />

impossible to separate the two <strong>sources</strong> si and sj, due to the<br />

orthogonal in<strong>de</strong>terminacy, i.e. A is only i<strong>de</strong>ntifiable up to a<br />

plane (Givens) rotation in the (si, sj) plane<br />

All the other <strong>sources</strong>, with different entries, can be separated<br />

after the second step<br />

For separating the <strong>sources</strong> with same entries in Rs (1), we<br />

continue with a third step, similar to the second one, i.e. one<br />

computes an orthogonal matrix U2 which diagonalizes Rx (2),<br />

B2 = U2U1B0 then jointly diagonalizes Rx (0), Rx (1) and<br />

Rx (2):<br />

B2Rx (0)BT 2 = I, and B2Rx (1)BT 2 = D1 and<br />

B2Rx (2)BT 2 = D2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 106


Introduction Colored Nonstationary GEVD Discussion<br />

Theorem<br />

I<strong>de</strong>ntifiability theorem<br />

The mixing matrix A is i<strong>de</strong>ntifiable from second or<strong>de</strong>r statistics<br />

(up to scale and permutation in<strong>de</strong>terminacies) iff the<br />

correlation sequences of all <strong>sources</strong> are pairwise linearly<br />

in<strong>de</strong>pen<strong>de</strong>nt, i.e. if (ρi(1), . . . , ρi(K)) = (ρj(1), . . . , ρj(K)),<br />

∀i = j<br />

In frequency domain<br />

Due to uniqueness of Fourier transform, the i<strong>de</strong>ntifiability<br />

condition in the time domain can be transposed in the<br />

frequency domain:<br />

The mixing matrix A is i<strong>de</strong>ntifiable from second or<strong>de</strong>r statistics<br />

(up to scale and permutation in<strong>de</strong>terminacies) iff the <strong>sources</strong><br />

have distinct spectra i.e. pairwise linearly in<strong>de</strong>pen<strong>de</strong>nt spectra.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 107


Introduction Colored Nonstationary GEVD Discussion<br />

Consequences<br />

Since the separation is achieved using second or<strong>de</strong>r statistics<br />

(SOS), Gaussian <strong>sources</strong> can be separated<br />

Since we just use SOS, maximum likelihood approaches can be<br />

<strong>de</strong>velopped assuming Gaussian <strong>de</strong>nsities.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 108


Introduction Colored Nonstationary GEVD Discussion<br />

Notations and assumptions<br />

x(t) = As(t) + v(t)<br />

Noisy mixtures<br />

Sources zero-mean, real second or<strong>de</strong>r stationary processes;<br />

spatially <strong>de</strong>correlated<br />

A is K × P full rank matrix, with rank equal to K ≥ P<br />

v real, zero-mean, second or<strong>de</strong>r stationary process, assumed<br />

uncorrelated of the <strong>sources</strong>, and spatially as well as temporally<br />

white<br />

Equations<br />

Rx (0) = AA T + σ 2 v I and Rx (τ) = ARs (τ)A T , ∀τ ≥ 1<br />

The condition of different ρi(τ) is no longer sufficient for<br />

i<strong>de</strong>ntifying A<br />

In fact, the extra param<strong>et</strong>er σ2 v must be known or i<strong>de</strong>ntified<br />

It can be done iff AAT is i<strong>de</strong>ntifiable<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 109


Introduction Colored Nonstationary GEVD Discussion<br />

Example of non i<strong>de</strong>ntifiable noisy mixtures<br />

Notations and assumptions<br />

x(t) = As(t) + v(t)<br />

In addition to previous assumptions, <strong>sources</strong> are first or<strong>de</strong>r<br />

MA, so that Rs (τ) = 0 for τ ≥ 2.<br />

Consequently, one only has 2 equations:<br />

Rx (0) = AA T + σ 2 v I and Rx (1) = ARs (1)A T<br />

Since Rx (0) and Rx (1) are symm<strong>et</strong>ric P × P, there are<br />

P + P(P − 1)/2 = P(P + 1)/2 equations for each matrix<br />

equation, i.e. a total number of P(P + 1) equations<br />

But the number of unknown is P(P + 1) + 1 : P 2 (matrix A),<br />

P (entries of Rs (1)), 1 for σ 2 v !<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 110


Introduction Colored Nonstationary GEVD Discussion<br />

Non stationary <strong>sources</strong><br />

Covariance matrices on two time windows<br />

Assume x(t) = As(t), with the same assumptions as above<br />

In addition, we assume that <strong>sources</strong> are nonstationary, with<br />

pairwise variance ratio (σi/σj) 2 (t) = 1<br />

On two time windows with different variance ratios, one g<strong>et</strong><br />

two covariance matrices, R1(0) and R2(0), which are jointly<br />

diagonalizable<br />

(σ2/σ1) 2 = 1 in blue dots, (σ2/σ1) 2 = 2 in red +.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 111


Introduction Colored Nonstationary GEVD Discussion<br />

Non stationary <strong>sources</strong> (con’t)<br />

Pham and Cardoso approach [PC01]<br />

Time is divi<strong>de</strong>d into p time blocks, Wi with length Ti,<br />

Covariance matrix on each window is estimated using<br />

ˆRx ,i = 1<br />

<br />

x(t)x T (t), ∀i = 1 . . . , p<br />

Ti<br />

t∈Wi<br />

Assuming Gaussian <strong>sources</strong> (SOS m<strong>et</strong>hod), Pham and Cardoso<br />

[PC01] prove that ML estimate of the separating matrix B, for<br />

unknown source variances, is provi<strong>de</strong>d by joint diagonalization<br />

of the estimated matrices ˆRx ,i.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 112


Introduction Colored Nonstationary GEVD Discussion<br />

General Eigen Value Decomposition (GEVD) problem<br />

GEVD problem<br />

Observation: x(t) = As(t) + n(t) = x s(t) + x n(t), where x s<br />

and x s are <strong>de</strong>sired and un<strong>de</strong>sired signals, resp.<br />

Objective: estimate b such that y = b T x has <strong>de</strong>sired property<br />

Maximizing the power according to 2 conditions can be<br />

associated to the joint diagonalization of two matrices M1 and<br />

M2, called GEVD problem<br />

Using Rayleigh-Ritz theorem, maximizing the above ratio is<br />

equivalent to solve the GEVD of (M1, M2), i.e.<br />

max<br />

B<br />

B T M1B<br />

B T M2B ⇔ B such that M1B = M2BD<br />

where D is the eigenvalue diagonal matrix<br />

The m<strong>et</strong>hod is used for extracting <strong>sources</strong> with given properties<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 113


Introduction Colored Nonstationary GEVD Discussion<br />

General Eigen Value Decomposition (GEVD) problem<br />

(con’t)<br />

SNR maximizer<br />

E[y<br />

max SNR(b) = max<br />

b b<br />

2 s ]<br />

E[y 2 b<br />

= max<br />

n ] b<br />

T Rx s b<br />

b T Rx nb ⇔ b = GEVD(Rx s , Rx n)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 114


Introduction Colored Nonstationary GEVD Discussion<br />

General Eigen Value Decomposition (GEVD) problem<br />

(con’t)<br />

Periodicity maximizer [TSS + 09, SJS10]<br />

min<br />

b<br />

Et[(y(t + τ) − y(t)) 2 ]<br />

E[y(t) 2 ]<br />

⇔ b = GEVD(Rx (τ), Rx (0))<br />

Application for ECG extraction: component are sorted by<br />

similarity to ECG (R peaks)<br />

Mixtures Est. <strong>sources</strong> ICA (JADE) Est. <strong>sources</strong> PiCA<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 115


Introduction Colored Nonstationary GEVD Discussion<br />

General Eigen Value Decomposition (GEVD) problem<br />

(con’t)<br />

Spectral-contrast maximizer<br />

max<br />

b<br />

Ef ∈BP[|Y(f )| 2 ]<br />

Ef [|Y(f )| 2 ]<br />

= max<br />

b<br />

b T Sx ,BPb<br />

b T Sx b ⇔ b = GEVD(Sx ,BP, Sx )<br />

Application for extraction of signal with prior that power in<br />

important in a given frequency band, BP. Similar i<strong>de</strong>a used for<br />

atrial fibrillation extraction [PZL10, LISC11]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 116


Introduction Colored Nonstationary GEVD Discussion<br />

General Eigen Value Decomposition (GEVD) problem<br />

(con’t)<br />

Nonstationary maximizer<br />

max Et∈W1[y 2 (t)]<br />

Et∈W2 [y 2 b<br />

= max<br />

(t)] b<br />

T Rx ,W1(0)b<br />

b T Rx ,W2 (0)b ⇔ b = GEVD(Rx ,W1(0), Rx ,W2(0))<br />

Common spatial pattern<br />

CSP has been introduced by K. Fukunaga and W. L. G.<br />

Koontz (Application of the K-L expansion to feature selection<br />

and or<strong>de</strong>ring, IEEE Trans. on Computers, , no. 4, pp.<br />

311-318, 1970),<br />

CSP aims at maximizing the ratio of variance b<strong>et</strong>ween two<br />

classes, associated to windows Wi, i = 1, 2<br />

CSP is then nothing but a “nonstationarity maximizer”<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 117


Introduction Colored Nonstationary GEVD Discussion<br />

Common Spatial Pattern: Example<br />

Epileptic focus localization, Samadi <strong>et</strong> al. [SASZJ11]<br />

Data : s<strong>et</strong> of intracranial EEG for epileptic patients<br />

Objectives: estimate the spatial filter B which provi<strong>de</strong>s iEEG<br />

<strong>sources</strong> the most strongly related to interictal discharges (IED)<br />

The two classes are then samples in either “IED” (class 1) or<br />

“NON IED” (Class 2) time intervals<br />

CSP provi<strong>de</strong>s <strong>sources</strong>, sorted in <strong>de</strong>creasing or<strong>de</strong>r, according to<br />

the power in class 1 condition<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 118


Introduction Colored Nonstationary GEVD Discussion<br />

Joint diagonalization: exact or approximate ?<br />

Exact joint diagonalization (EJD)<br />

EJD of two sym<strong>et</strong>ric or positive-<strong>de</strong>finite matrices exists<br />

For more than 2 matrices, with exact matrices, EJD still exists<br />

However, in practice, due to matrix estimation errors:<br />

EJD does not exist for more than 2 matrices<br />

EJD of two matrices sensitive to matrix estimation errors<br />

Approximate joint diagonalization (AJD)<br />

AJD of a s<strong>et</strong> of matrices more robust than EJD of 2 matrices<br />

How to choose the s<strong>et</strong> of matrices (<strong>de</strong>lays τ1, . . . τK ) for the<br />

best estimation ?<br />

AJD is a usual problem in data analysis ⇒ huge number of<br />

algorithms, with different structures (with or without<br />

whitening), selection of optimal <strong>de</strong>lays τi, criteria, <strong>et</strong>c.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 119


Introduction Colored Nonstationary GEVD Discussion<br />

Criteria<br />

Criteria for approximate joint diagonalization<br />

Algorithms aim at jointly diagonalizing a s<strong>et</strong> of p matrices Ri,<br />

i.e. estimating U such that<br />

URiU T = Di, ∀i = 1, . . . p<br />

Practical criteria aim at minimizing the square sum of off<br />

diagonal entries of the matrices URiU T , ∀i = 1, . . . p.<br />

Criteria usually write like:<br />

C(R1, . . . , Rp; U) =<br />

p <br />

i=1 k=l<br />

[URiU T ] 2 kl =<br />

p<br />

offdiag(URiU T ) 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 120<br />

i=1


Introduction Colored Nonstationary GEVD Discussion<br />

In frequency domain<br />

Joint diagonalization: practical issues<br />

Since Fourier transform preserves linearity:<br />

F{x(t)} = F{As(t)} = AF{s(t)}<br />

Practically, with short term Fourier transform on a time<br />

window Wi:<br />

FWi {x(t), Ti} = FWi {As(t), Ti} = AFWi {s(t), Ti}<br />

In the frequency domain, it<br />

is then possible to jointly<br />

<br />

diagonalizing Sx (ν) = E[ XWi (ν)XT Wi (ν)<br />

<br />

<br />

]<br />

For both colored and non stationary <strong>sources</strong><br />

One can jointly diagonalize a double s<strong>et</strong> of matrices:<br />

a s<strong>et</strong> {Rx (τ), τ = 0, . . . , K1}, using coloration property,<br />

a s<strong>et</strong> {Rx ,Wi (0), i = 0, . . . , K2}, using nonstationarity.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 121


Introduction Colored Nonstationary GEVD Discussion<br />

Conclusions on second or<strong>de</strong>r m<strong>et</strong>hod<br />

Can be used for Gaussian non iid <strong>sources</strong><br />

Non iid means colored <strong>sources</strong> or nonstationary <strong>sources</strong><br />

Gaussian <strong>sources</strong> can be separated (contrary to HOS-based<br />

m<strong>et</strong>hods), but Gaussianity is not required<br />

Algorithms based either on ML using Gaussian assumptions, or<br />

on joint diagonalization algorithms<br />

Huge number of simple and fast algorithms for Approximate<br />

Joint Diagonalisation<br />

Joint diagonalization can be applied on matrices computed in<br />

the time or in the frequency domains, mixing coloration and<br />

non stationarity properties<br />

Other priors can be used for computing spatial filtering leading<br />

to source separation, and estimated using SOS through GEVD<br />

problem (e.g. Common Spatial Pattern, Periodic Component<br />

Analysis)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS III.Second Or<strong>de</strong>r M<strong>et</strong>hods 122


Intro Dim2 Orthogonal Deflation Discussion<br />

IV. Algorithms with prewhitening<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 123


Intro Dim2 Orthogonal Deflation Discussion<br />

Contents of course IV<br />

12 Introduction<br />

13 Problem in dimension 2<br />

14 Orthogonal transform<br />

Pair sweeping<br />

CoM<br />

JAD<br />

STO<br />

Limits<br />

Other orthogonal approaches<br />

15 Algorithms based on Deflation<br />

Intro<br />

Algos<br />

FastICA<br />

RobustICA<br />

SAUD<br />

16 Discussion, Bibliography<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 124


Intro Dim2 Orthogonal Deflation Discussion<br />

Algorithms<br />

Batch (off-line):<br />

algebraic (eg. in dim 2)<br />

semi-algebraic (eg. pair sweeping)<br />

iterative (eg. <strong>de</strong>scent of or<strong>de</strong>r r)<br />

Adaptive (on-line):<br />

stochastic versions of the above, and in particular<br />

or<strong>de</strong>r 0 (eg. stochastic iteration)<br />

or<strong>de</strong>r 1 (eg. stochastic gradient)<br />

or<strong>de</strong>r 2 (eg. stochastic Newton)<br />

Non stationarities: block- vs sample-wise updates<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 125


Intro Dim2 Orthogonal Deflation Discussion<br />

Spatial whitening<br />

Batch algorithms: already seen in sli<strong>de</strong>s QR-PCA<br />

Adaptive algorithms:<br />

Adaptive QR [Com89a]<br />

Adaptive SVD [CG90]<br />

Gradient [CA02]<br />

Relative gradient [CL96] [CA02]<br />

In this course IV, data are assumed whitened: Rx = I.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 126


Intro Dim2 Orthogonal Deflation Discussion<br />

Direct vs Inverse<br />

Two formulations in terms of cumulants:<br />

1 Direct: look for A so as to fit the input-output multilinearity<br />

cumulant relation:<br />

min<br />

A ||Cy ,ijk..ℓ −<br />

P<br />

p=1<br />

Aip Ajp Akp.. Aℓp Cs,ppp..p|| 2<br />

i.e. <strong>de</strong>compose Cy into a sum of P rank-one terms<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 127


Intro Dim2 Orthogonal Deflation Discussion<br />

Direct vs Inverse<br />

Two formulations in terms of cumulants:<br />

1 Direct: look for A so as to fit the input-output multilinearity<br />

cumulant relation:<br />

min<br />

A ||Cy ,ijk..ℓ −<br />

P<br />

p=1<br />

Aip Ajp Akp.. Aℓp Cs,ppp..p|| 2<br />

i.e. <strong>de</strong>compose Cy into a sum of P rank-one terms<br />

2 Inverse: look for B:<br />

<br />

min<br />

B<br />

mnp..q=ppp..p<br />

| <br />

ijk..ℓ<br />

Cy ,ijk..ℓBim Bjn Bkp.. Bℓq| 2<br />

i.e. try to diagonalize Cy by linear transform, z = By<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 127


Intro Dim2 Orthogonal Deflation Discussion<br />

Orthogonal <strong>de</strong>composition<br />

If Q orthogonal, the two problems are equivalent:<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128


Intro Dim2 Orthogonal Deflation Discussion<br />

Orthogonal <strong>de</strong>composition<br />

If Q orthogonal, the two problems are equivalent:<br />

1 Direct:<br />

min<br />

Q,Λ ||Cijkℓ −<br />

P<br />

p=1<br />

Qip Qjp Qkp Qℓp Λpppp|| 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128


Intro Dim2 Orthogonal Deflation Discussion<br />

Orthogonal <strong>de</strong>composition<br />

If Q orthogonal, the two problems are equivalent:<br />

1 Direct:<br />

min<br />

Q,Λ ||Cijkℓ −<br />

P<br />

p=1<br />

Qip Qjp Qkp Qℓp Λpppp|| 2<br />

2 Inverse: min Q,Λ || <br />

ijkℓ Qip Qjq Qkr Qℓs Cijkℓ − Λpppp δpqrs|| 2 or<br />

e.g. max<br />

Q<br />

<br />

| <br />

p<br />

ijkℓ<br />

Qip Qjp Qkp Qℓp Cijkℓ| 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128


Intro Dim2 Orthogonal Deflation Discussion<br />

Orthogonal <strong>de</strong>composition<br />

If Q orthogonal, the two problems are equivalent:<br />

1 Direct:<br />

min<br />

Q,Λ ||Cijkℓ −<br />

P<br />

p=1<br />

Qip Qjp Qkp Qℓp Λpppp|| 2<br />

2 Inverse: min Q,Λ || <br />

ijkℓ Qip Qjq Qkr Qℓs Cijkℓ − Λpppp δpqrs|| 2 or<br />

e.g. max<br />

Q<br />

<br />

| <br />

p<br />

ijkℓ<br />

Qip Qjp Qkp Qℓp Cijkℓ| 2<br />

Proof.<br />

The Frobenius norm is invariant un<strong>de</strong>r orthogonal change of<br />

coordinates.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 128


Intro Dim2 Orthogonal Deflation Discussion<br />

Divi<strong>de</strong> to conquer<br />

Well posed problem: the s<strong>et</strong> of orthogonal matrices is closed<br />

But difficulty: many unknowns, in real or complex field<br />

1 1st i<strong>de</strong>a: address a sequence of problems of smaller dimension<br />

instead of a single one in larger dimension. (<strong>de</strong>flation, pair<br />

sweeping, fix <strong>de</strong>scent directions...)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 129


Intro Dim2 Orthogonal Deflation Discussion<br />

Divi<strong>de</strong> to conquer<br />

Well posed problem: the s<strong>et</strong> of orthogonal matrices is closed<br />

But difficulty: many unknowns, in real or complex field<br />

1 1st i<strong>de</strong>a: address a sequence of problems of smaller dimension<br />

instead of a single one in larger dimension. (<strong>de</strong>flation, pair<br />

sweeping, fix <strong>de</strong>scent directions...)<br />

2 2nd i<strong>de</strong>a: <strong>de</strong>compose A into two factors, A = L Q, and<br />

compute L so as to exactly standardize the data. Look for the<br />

best Q in a second stage.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 129


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Algorithms: pair Sweeping (1)<br />

Pairwise processing<br />

Split the orthogonal matrix into a product of plane Givens rotations:<br />

G[i, j] <strong>de</strong>f<br />

<br />

1 1 θ<br />

= √<br />

1 + θ2 −θ 1<br />

acting in the subspace <strong>de</strong>fined by (zi, zj).<br />

➽ the dimension has been reduced to 2, and we have a single<br />

unknown, θ, that can be imposed to lie in (−1, 1].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 130


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Algorithms: pair Sweeping (2)<br />

Cyclic sweeping with fixed or<strong>de</strong>ring: Example in dimension P = 3<br />

x ˜x z<br />

L −1<br />

Carl Jacobi, 1804-1851<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 131


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Algorithms: pair Sweeping (3)<br />

Sweeping a 3 × 3 × 3 symm<strong>et</strong>ric tensor<br />

⎛<br />

X x x<br />

⎞ ⎛<br />

X x x<br />

⎞<br />

⎝ x x x ⎠ ⎝ x x x ⎠<br />

⎛<br />

x<br />

x<br />

x<br />

x<br />

x<br />

x<br />

⎞ ⎛<br />

x<br />

x<br />

x<br />

x<br />

x<br />

⎞<br />

x<br />

⎝ x X x ⎠ → ⎝ x . x ⎠ →<br />

⎛<br />

x<br />

x<br />

x<br />

x<br />

x<br />

⎞<br />

x<br />

⎛<br />

x<br />

x<br />

x<br />

x<br />

x<br />

x<br />

⎞<br />

⎝ x x x ⎠ ⎝ x x x ⎠<br />

x x .<br />

x x X<br />

X : maximized<br />

x : minimized<br />

. : unchanged<br />

⎛<br />

⎝<br />

⎛<br />

⎝<br />

⎛<br />

⎝<br />

. x x<br />

x x x<br />

x x x<br />

x x x<br />

x X x<br />

x x x<br />

x x x<br />

x x x<br />

x x X<br />

⎫<br />

⎬<br />

by last Givens rotation<br />

⎭<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 132<br />

⎞<br />

⎠<br />

⎞<br />

⎠<br />

⎞<br />


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Algorithms: pair Sweeping (4)<br />

Criteria Υ CoM , Υ STO and Υ JAD , are rational functions of θ,<br />

and their absolute maxima can be computed algebraically.<br />

To prove this, consi<strong>de</strong>r the elementary 2 × 2 problem<br />

z = G ˜x,<br />

<strong>de</strong>note Cijkℓ the cumulants of z, and<br />

G <strong>de</strong>f<br />

=<br />

cos β sin β<br />

− sin β cos β<br />

<strong>de</strong>f<br />

=<br />

1<br />

√ 1 + θ 2<br />

1 θ<br />

−θ 1<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 133


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Solution for Υ CoM<br />

<strong>de</strong>f<br />

= (C1111) 2 + (C2222) 2<br />

We have ΥCoM<br />

Denote ξ = θ − 1/θ. Then it is a rational function in ξ:<br />

ψ4(ξ) = (ξ 2 + 4) −2<br />

4<br />

i=0<br />

bi ξ i<br />

Its stationary points are roots of a polynomial of <strong>de</strong>gree 4:<br />

ω4(ξ) =<br />

4<br />

i=0<br />

ci ξ i<br />

obtainable algebraically via Ferrari’s technique. Coefficients bi<br />

and ci are given functions of cumulants of ˜x.<br />

θ is obtained from ξ by rooting a 2nd <strong>de</strong>gree trinomial.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 134


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Separation of a noisy mixture by maximization of contrast<br />

CoM2: Influence of or<strong>de</strong>ring<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 135


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Solution for Υ JAD<br />

Goal: maximize squares of diagonal terms of GH <br />

N(r)<br />

<br />

G,<br />

ar br<br />

where matrix slices are <strong>de</strong>noted N(r) =<br />

and are<br />

cumulants of ˜x<br />

cr dr<br />

L<strong>et</strong> v <strong>de</strong>f<br />

= [cos 2β, sin 2β] T . Then this amounts to maximizing<br />

the quadratic form v T Mv where<br />

M <strong>de</strong>f<br />

= <br />

<br />

ar − dr<br />

[ar − dr , br + cr ]<br />

r<br />

br + cr<br />

Thus, 2β is given by the dominant eigenvector of M<br />

and G is obtained by rooting a 2nd <strong>de</strong>gree trinomial.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 136


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Solution for Υ STO<br />

Goal: maximize squares of diagonal terms of 3rd or<strong>de</strong>r tensors<br />

T[ℓ]pqr <strong>de</strong>f<br />

= <br />

ijk Gpi Gqj Grk Cijkℓ<br />

<br />

cos β sin β<br />

If G =<br />

then <strong>de</strong>note v<br />

− sin β cos β<br />

<strong>de</strong>f<br />

= [cos 2β, sin 2β] T .<br />

Angle 2β is given by vector v maximizing a quadratic form<br />

v T Bv, where B is 2 × 2 symm<strong>et</strong>ric and contains sums of<br />

products of cumulants of ˜x<br />

θ is obtained from ξ by rooting a 2nd <strong>de</strong>gree trinomial.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 137


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

First conclusions<br />

Pair sweeping can be executed thanks to the equivalence<br />

b<strong>et</strong>ween pairwise and mutual in<strong>de</strong>pen<strong>de</strong>nce<br />

The cumulant tensor can be diagonalized iteratively via a<br />

Jacobi-like algorithm<br />

For each pair, there is a closed-form solution for the optimal<br />

Givens rotation (absolute maximimum of the contrast<br />

criterion).<br />

Questions:<br />

But what about global convergence?<br />

Pairwise processing holds valid if mo<strong>de</strong>l is exact<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 138


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Stationary points: symm<strong>et</strong>ric matrix case<br />

Given a matrix m with components mij, it is sought for an<br />

orthogonal matrix Q such that Υ2 is maximized:<br />

Υ2(Q) = <br />

M 2 ii ; Mij = <br />

QipQjq mpq.<br />

i<br />

Stationary points of Υ2 satisfy for any pair of indices<br />

(q, r), q = r:<br />

p,q<br />

MqqMqr = Mrr Mqr<br />

Next, d 2 Υ2 < 0 ⇔ M 2 qr < (Mqq − Mrr ) 2 , which proves that<br />

Mqr = 0, ∀q = r yields a maximum<br />

Mqq = Grr , ∀q, r yields a minimum<br />

Other stationary points are saddle points<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 139


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Stationary points: symm<strong>et</strong>ric tensor case<br />

Similarly, one can look at relations characterizing local maxima<br />

of criterion Υ<br />

TqqqqTqqqr − TrrrrTqrrr = 0,<br />

4T 2 qqqr + 4T 2 qrrr − (Tqqqq − 3 2<br />

Tqqrr)<br />

2<br />

−(Trrrr − 3<br />

2 Tqqrr) 2 < 0.<br />

for any pair of indices (p, q), p = q.<br />

As a conclusion, contrary to Υ2 in the matrix case, Υ might<br />

have theor<strong>et</strong>ically spurious local maxima in the tensor case<br />

(or<strong>de</strong>r > 2).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 140


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Open problem<br />

1 At each step, a plane rotation is computed and yields the<br />

global maximum of the objective Υ restricted to one variable<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 141


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Open problem<br />

1 At each step, a plane rotation is computed and yields the<br />

global maximum of the objective Υ restricted to one variable<br />

2 There is no proof that the sequence of successive plane<br />

rotations yields the global maximum, in the general case<br />

(tensors that are not necessarily diagonalizable)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 141


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Open problem<br />

1 At each step, a plane rotation is computed and yields the<br />

global maximum of the objective Υ restricted to one variable<br />

2 There is no proof that the sequence of successive plane<br />

rotations yields the global maximum, in the general case<br />

(tensors that are not necessarily diagonalizable)<br />

3 Y<strong>et</strong>, no counter-example has been found since 1991<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 141


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Other algorithms for orthogonal diagonalization<br />

The 3 previous criteria summarize most ways to address tensor<br />

approximate diagonalization.<br />

But the orthogonal matrix can be treated differently, e.g. via other<br />

param<strong>et</strong>erizations<br />

express an orthogonal matrix as the exponential of a<br />

skew-symm<strong>et</strong>ric matrix: Q = exp S<br />

or use the Cayley param<strong>et</strong>erization: Q = (I − S)(I + S) −1 ,<br />

...<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 142


Intro Dim2 Orthogonal Deflation Discussion Paires CoM JAD STO Limits Other<br />

Joint Approximate Diagonalization S<br />

The i<strong>de</strong>a is to consi<strong>de</strong>r the symm<strong>et</strong>ric tensor of dimension K<br />

and or<strong>de</strong>r d as a collection of K d−2 symm<strong>et</strong>ric K × K matrices<br />

This is the Joint Approximate Diagonalization (JAD) problem<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 143


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

Algorithms based on Deflation<br />

Principle: Joint extraction vs Deflation<br />

Unitary adaptive <strong>de</strong>flation<br />

A so-called “fixed point”algorithm: FastICA<br />

RobustICA<br />

Deflation without spatial prewhitening, algebraic <strong>de</strong>flation<br />

Discussion on MISO criteria<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 144


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

Deflation:<br />

Joint extraction vs Deflation<br />

x<br />

✲<br />

✲<br />

✲<br />

✲<br />

✲<br />

f H<br />

z<br />

✲<br />

✞❄☎<br />

✲<br />

✲<br />

✲<br />

✲<br />

✲<br />

✲<br />

✲<br />

✲<br />

✲<br />

✝ ✆<br />

Advantage: (a) reduced complexity at each stage, (b) simpler<br />

to un<strong>de</strong>rstand<br />

Drawbacks: (i) accumulation of regression errors, limitation of<br />

number of extracted <strong>sources</strong>, (ii) possibly larger final<br />

complexity<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 145


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

Adaptive algorithms<br />

Deflation by Kurtosis Gradient Ascent<br />

Again same i<strong>de</strong>a<br />

After standardization, it is equivalent to maximize 4th or<strong>de</strong>r<br />

moment criterion, M z (4) = E{|z| 4 }, whose gradient is:<br />

Overview<br />

∇M = 4 E{x (f H x)(x H f ) 2 }<br />

Fixed step gradient on anglular param<strong>et</strong>ers: [DL95, CJ10]<br />

Locally optimal step gradient on filter taps: FastICA [CJ10,<br />

ch.6]<br />

Globally optimal step gradient on filter taps: RobustICA<br />

[CJ10, ZC10]<br />

Semi-Algebraic Unitary Deflation (SAUD) [ACX07]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 146


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

Adaptive implementation<br />

Adaptive algorithms<br />

Fully adaptive solutions (update at every sample arrival)<br />

nowadays little useful<br />

Always easy to <strong>de</strong>vise fully adaptive, or block-adaptive<br />

solutions form from block semi-algebraic algorithms (but<br />

reverse is not true!)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 147


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

Unitary adaptive <strong>de</strong>flation (1)<br />

Extraction<br />

To extract the first source, find a unitary matrix U so as to<br />

maximize the kurtosis of the first output<br />

Matrix U can be iteratively <strong>de</strong>termined by a sequence of<br />

Givens rotations<br />

At each step, <strong>de</strong>termine the best angle of the Givens rotation,<br />

e.g. by a gradient ascent [DL95]<br />

NB: only P − 1 Givens rotations are involved<br />

Deflation<br />

After convergence, the first output is extracted, and the P − 1<br />

remaining outputs of U can be processed in the same way<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 148


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

At stage k, Q =<br />

Unitary adaptive <strong>de</strong>flation (2)<br />

q H<br />

k<br />

QH k<br />

<br />

is unitary of size P − k + 1, and only its<br />

first row is used to extract source k, 1 ≤ k ≤ P − 1<br />

q H<br />

1<br />

Q H<br />

1<br />

q H<br />

2<br />

Q H<br />

2<br />

q H<br />

3<br />

Q H<br />

3<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 149


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

A so-called fixed point: FastICA (1)<br />

Any gradient ascent of a function Mρ = E{ρ(f H x)} un<strong>de</strong>r<br />

unit-norm constraint ||f || 2 = 1 admits the Lagrangian<br />

formulation:<br />

E{x ˙ρ(f H x)} = λ f<br />

Convergence: when ∇C and f collinear (and not when<br />

gradient is null, because of constraint ||f || = 1).<br />

One can take ρ(z) = |z| 4 (but this choice makes FastICA<br />

uninteresting)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 150


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

A so-called fixed point: FastICA (2)<br />

D<strong>et</strong>ails of the algorithm proposed in [Hyv99] in the real field; only<br />

difference compared to [Tug97] is fixed step size.<br />

Gradient: ∇M = 4 E{x (f T x) 3 }<br />

Hessian: 12 E{xx T (f T x) 2 }<br />

Heavy approximation of Hyvarinen [Hyv99]:<br />

E{xx T (f T x) 2 } ≈ E{xx T } E{(f T x) 2 }<br />

If x standardized and f unit norm, then Hessian equals<br />

I<strong>de</strong>ntity.<br />

This yields an “approximate Newton iteration”: a mere fixed<br />

step gradient!<br />

f ← f − 1<br />

3 E{x (f T x) 3 } or f ← E{x (f T x) 3 } − 3 f<br />

f ← f /||f ||<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 151


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

FastICA: weaknesses<br />

This is a mere fixed step-size projected gradient algorithm,<br />

inheriting problems such as:<br />

Saddle points (slow/ill convergence)<br />

Flat areas (slow convergence)<br />

Local maxima (ill convergence)<br />

NB: slow convergence may mean high complexity to reach the<br />

solution, or stopping iterations before reaching convergence<br />

(<strong>de</strong>pends on stopping criterion).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 152


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

Polynomial rooting<br />

Theorem (1830). A polynomial of <strong>de</strong>gree higher than 4 cannot in<br />

general be rooted algebraically in terms of a finite number of<br />

additions, subtractions, multiplications, divisions, and radicals (root<br />

extractions).<br />

Niels Abel, 1802-1829 Evariste Galois 1811-1832<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 153


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

How to fix most drawbacks: RobustICA<br />

Principle: Cheap exhaustive Line Search of a criterion J<br />

Look for absolute maximum in the gradient direction (1-dim<br />

search)<br />

Not costly when criteria are polynomials or rational functions<br />

of low <strong>de</strong>gree (same as AMiSRoF [GC98]: polynomial to root,<br />

but here at most of <strong>de</strong>gree 4)<br />

Applies to Kurtosis Maximization (KMA), Constant-Modulus<br />

(CMA), Constant-Power (CPA) Algorithms...<br />

This yields corresponding Optimal-Step (OS) algorithms: OS-KMA,<br />

OS-CMA, OS-CPA... [ZC08] [ZC05]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 154


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

Algorithm<br />

RobustICA<br />

compute coefficients of polynomial ∂<br />

∂µ J (f + µ∇) for fixed f<br />

and ∇<br />

compute all its roots {µi}<br />

select µopt among those roots, which yields the absolute<br />

maximum<br />

s<strong>et</strong> f ← f + µopt∇<br />

See: [ZC10, ZC08]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 155


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

SMSE (dB)<br />

0<br />

−5<br />

−10<br />

RobustICA vs FastICA<br />

Signal Mean Square error<br />

−15 FastICA<br />

OS−CMA<br />

OS−KMA<br />

MMSE<br />

−20 source 1<br />

source 2<br />

source 3<br />

source 4<br />

−25<br />

0 10 20<br />

SNR (dB)<br />

30 40<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 156


Intro Dim2 Orthogonal Deflation Discussion Intro Algos FastICA RobustICA Saud<br />

CoM1 [Com01]<br />

Semi-Algebraic Unitary Deflation<br />

Loop on sweeps<br />

for i = 1 to P − 1<br />

for j = i to P<br />

Algebraic 2 × 2 separ.<br />

end<br />

end<br />

end<br />

Extraction<br />

SAUD [ACX07]<br />

for i = 1 to P − 1<br />

Loop on sweeps<br />

for j = i to P<br />

Algebraic 2 × 2 separ.<br />

end<br />

end<br />

Extraction<br />

end<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 157


Intro Dim2 Orthogonal Deflation Discussion<br />

Equivalence b<strong>et</strong>ween KMA and CMA<br />

Recall the 2 criteria:<br />

ΥKMA = Cum{z, z, z∗ , z ∗ }<br />

[E{ |z| 2 }] 2 , JCMA = E{[ |z| 2 − R ] 2 }<br />

Assume 2nd Or<strong>de</strong>r circular <strong>sources</strong>: E{s 2 } = 0<br />

Then KMA and CMA are equivalent [Reg02] [Com04]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 158


Intro Dim2 Orthogonal Deflation Discussion<br />

Discussion on Deflation (MISO) criteria<br />

L<strong>et</strong> z <strong>de</strong>f<br />

= f H x. Criteria below stationary iff differentials of p and q<br />

are collinear:<br />

Ratio: Max<br />

f<br />

p(f )<br />

q(f )<br />

Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />

and q = E{|z| 2 } 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 159


Intro Dim2 Orthogonal Deflation Discussion<br />

Discussion on Deflation (MISO) criteria<br />

L<strong>et</strong> z <strong>de</strong>f<br />

= f H x. Criteria below stationary iff differentials of p and q<br />

are collinear:<br />

Ratio: Max<br />

f<br />

p(f )<br />

q(f )<br />

Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />

and q = E{|z| 2 } 2<br />

Difference: Min<br />

f p(f ) − α q(f )<br />

Examples: Constant Modulus, with p = E{|z| 4 } and<br />

q = 2 a E{|z| 2 } − a 2 or<br />

Constant Power, with q = 2 a ℜ(E{z 2 }) − a 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 159


Intro Dim2 Orthogonal Deflation Discussion<br />

Discussion on Deflation (MISO) criteria<br />

L<strong>et</strong> z <strong>de</strong>f<br />

= f H x. Criteria below stationary iff differentials of p and q<br />

are collinear:<br />

Ratio: Max<br />

f<br />

p(f )<br />

q(f )<br />

Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />

and q = E{|z| 2 } 2<br />

Difference: Min<br />

f p(f ) − α q(f )<br />

Examples: Constant Modulus, with p = E{|z| 4 } and<br />

q = 2 a E{|z| 2 } − a 2 or<br />

Constant Power, with q = 2 a ℜ(E{z 2 }) − a 2<br />

Constrained: Max p(f )<br />

q(f )=1<br />

Example: Cumulant, with<br />

p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z2 }| 2<br />

Example: Moment, with p = E{|z| 4 }, if standardized and<br />

with either q = ||f || 2 or q = E{|z| 2 } 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 159


Intro Dim2 Orthogonal Deflation Discussion<br />

Bibliographical comments S<br />

Orthogonal diagonalization of symm<strong>et</strong>ric tensors:<br />

Direct diago: CoM without slicing [Com92a] [Com94b]<br />

JAD in 2 mo<strong>de</strong>s: [Lee78] (R), [CS93] (C), with matrix<br />

exponential [TF07]<br />

JAD 2 mo<strong>de</strong>s with positive <strong>de</strong>finite matrices: [FG86]<br />

JAD in 3 mo<strong>de</strong>s: [LMV01]<br />

Orthogonal diagonalization of non symm<strong>et</strong>ric tensors:<br />

ALS type [Kro83] [Kie92]<br />

ALS on pairs: [MvL08] [Sør10]<br />

JAD in 2 mo<strong>de</strong>s (R): [PPPP01]<br />

Matrix exponential [SICD08]<br />

Deflation, FastICA, RobustICA: [CJ10, ch.6] [ZC07] [ZCK06]<br />

[ZC10]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IV.Algorithms with prewhitening 160


Invertible Iterative SemiAlg ConclRemarks<br />

V. Algorithms without prewhitening<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 161


Invertible Iterative SemiAlg ConclRemarks<br />

17 Invertible transform<br />

18 Iterative algorithms<br />

HJ algorithm<br />

InfoMax<br />

Relative gradient<br />

Probabilistic approach<br />

RobustICA<br />

Contents of course V<br />

19 Semi algebraic algorithms<br />

Alternating Least Squares<br />

Diagonally dominant matrices<br />

Joint Schur<br />

ICAR<br />

20 Concluding remarks<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 162


Invertible Iterative SemiAlg ConclRemarks<br />

Approximate diagonalization by invertible transform<br />

One view of the problem: tensor (approximate) diagonalization by<br />

invertible transform<br />

Joint Approximate Diagonalization (JAD) of a collection of:<br />

symm<strong>et</strong>ric matrices<br />

symm<strong>et</strong>ric diagonally dominant matrices<br />

symm<strong>et</strong>ric positive <strong>de</strong>finite matrices<br />

for a collection of matrices, not necessarily symm<strong>et</strong>ric (2<br />

invertible transforms)<br />

...<br />

Direct approaches without slicing the tensor into a collection<br />

of matrices: algorithms <strong>de</strong>vised for un<strong>de</strong>r<strong>de</strong>termined mixtures<br />

apply (eg. ICAR)<br />

NB: Ill-poseness of optimization over s<strong>et</strong> of invertible matrices<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 163


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Hérault-Jutten algorithm (1/2)<br />

Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />

Diag{C} = 0.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Hérault-Jutten algorithm (1/2)<br />

Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />

Diag{C} = 0.<br />

Update rule: Cij[k + 1] = Cij[k] + µ f (zi[k]) g(zj[k])<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Hérault-Jutten algorithm (1/2)<br />

Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />

Diag{C} = 0.<br />

Update rule: Cij[k + 1] = Cij[k] + µ f (zi[k]) g(zj[k])<br />

The iteration C[k + 1] = C[k] + µ Φ(C[k]) actually searches<br />

for zeros of function Φ having negative <strong>de</strong>rivatives if<br />

0 < µ < 1 (resp. positive if −1 < µ < 0).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Hérault-Jutten algorithm (1/2)<br />

Recursive structure: z = x − C z, i.e. F = (I + C) −1 , with<br />

Diag{C} = 0.<br />

Update rule: Cij[k + 1] = Cij[k] + µ f (zi[k]) g(zj[k])<br />

The iteration C[k + 1] = C[k] + µ Φ(C[k]) actually searches<br />

for zeros of function Φ having negative <strong>de</strong>rivatives if<br />

0 < µ < 1 (resp. positive if −1 < µ < 0).<br />

Stochastic iteration Robbins-Monro (1951) [CJH91]:<br />

If Φ(C) is the expectation of some ϕ(C, z), then stochastic<br />

version<br />

C[k + 1] = C[k] + µ ϕ(C[k], z[k])<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 164


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Hérault-Jutten algorithm (2/2)<br />

Choice of functions f (·) and g(·).<br />

At least one should be non linear<br />

Different<br />

Odd<br />

Link with Estimating Equations if f (z) = ψ(z) and g(z) = z<br />

Convergence is slow at the end, and the limiting zero<br />

<strong>de</strong>pends on f and g. Permuting f and g changes the limiting<br />

zero [Sor91].<br />

Direct implementations exist [MM97]<br />

Normalization improves behavior [DAC02]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 165


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

InfoMax<br />

I<strong>de</strong>a: maximize output Mutual Information [BS95] [Car99]<br />

But may tend to infinity!<br />

ΥMI (z) = <br />

H(zi) − H(z)<br />

Impose as constraints H(zi) constant ⇒ well posed<br />

Now just maximize output joint entropy H(y)<br />

i<br />

But how?<br />

Build z = g(Bx), where g(·) is a c.d.f mapping R to [0, 1].<br />

Maximize the output entropy H(g(Bx)) w.r.t. B by a<br />

stochastic version of:<br />

B ← B + µ ∇H(g(Bx)).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 166


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Standard gradient:<br />

Relative gradient<br />

Υ(x + h) − Υ(x) ≈ 〈g(x), h〉<br />

Relative gradient: Useful when x belongs to a continuous<br />

multiplicative group. Take h = ɛx:<br />

Υ(x + ɛx) − Υ(x) ≈ 〈g R (x), ɛ〉<br />

Relationship: because 〈g(x), ɛx〉 = 〈g(x)x T , ɛ〉, we have:<br />

g R (x) = g(x) x T<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 167


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Estimating Equations in BSS<br />

Mutual Information:<br />

ΥMI (z) = <br />

i H(zi) − H(z)<br />

If z = Bx, then H(z) = H(x) + log | <strong>de</strong>t B|<br />

Gradient: ∂H(z)<br />

∂ <br />

∂B i H(zi) = E{ <br />

i ∂<br />

∂B log pi(zi)} = E{ <br />

Hence<br />

∂ΥMI (z)<br />

∂B<br />

where ψz(z) is the score function.<br />

∂B = B−T because H(x) constant, and<br />

= E{ψ z(z)x T } + B −T<br />

Relative gradient: ∇ R Υ = ∇Υ · B T yields<br />

the Estimating Equation:<br />

i<br />

∂ log pi (zi )<br />

∂zi<br />

∂zi<br />

∂B }<br />

∇ R ΥMI (z) = E{ψz(z)z T } + I (34)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 168


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Equivariance<br />

Assume we want to maximize Υ(B)<br />

Υ(B + ɛ B) − Υ(B) ≈ 〈∇ R Υ(B), ɛ B〉 = 〈∇ R Υ(B) B T , ɛ〉<br />

Maximal if ɛ = ∇ R Υ(B) B T<br />

Update rule: B ← (I + µ ɛ)B<br />

Now <strong>de</strong>note G = B A the global filter. Post-multiplying by A<br />

shows that update rule is equivalent to<br />

G ← (I + µ ɛ)G<br />

which does not <strong>de</strong>pend on A ⇒ uniform performance<br />

☞ For more <strong>de</strong>tails, see [CJ10, ch.4].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 169


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Probabilistic approach (1)<br />

L<strong>et</strong> T(q) be a collection of symm<strong>et</strong>ric positive semi<strong>de</strong>finite matrices<br />

Look for B such that M(q) <strong>de</strong>f<br />

= B T(q) BT are as diagonal as<br />

possible. L<strong>et</strong> αk ∈ [0, 1] and <br />

k αk = 1.<br />

1 Criterion to maximize: Υ <strong>de</strong>f<br />

= <br />

q αq<br />

<strong>de</strong>t M(q)<br />

log <strong>de</strong>t Diag{M(q)}<br />

We have Υ ≤ 0 from Hadamard’s inequality.<br />

Avoids singularity<br />

Linked to Maximum Likelihood<br />

2 Use multiplicative update as B (ℓ+1) = U B (ℓ)<br />

3 Criterion after update: Υ = <br />

q αq log<br />

<strong>de</strong>t M(q) <strong>de</strong>t 2 U<br />

<strong>de</strong>t Diag{UM(q)U T }<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 170


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

Probabilistic approach (2)<br />

4 Variation<br />

<br />

of Υ during one update:<br />

<br />

2 log <strong>de</strong>t U − log <strong>de</strong>t Diag{UM(q)UT } + log <strong>de</strong>t DiagM(q)<br />

<br />

q αq<br />

5 Update two rows at a time, i.e. U is equal to I<strong>de</strong>ntity except<br />

for entries (i, i), (i, j), (j, i), (j, j).<br />

By concavity of log, g<strong>et</strong> a lower bound on variation:<br />

<br />

2 log <strong>de</strong>t U − log(UPU T )11 − log(UQU T <br />

)22<br />

q<br />

αq<br />

where P and Q are the 2 × 2 matrices:<br />

P = <br />

q<br />

αq<br />

M(q)ii<br />

M(q)[i, j] and Q = <br />

q<br />

αq<br />

M(q)jj<br />

M(q)[i, j]<br />

6 Maximize this bound instead. This leads to rooting a 2nd<br />

<strong>de</strong>gree trinomial. Sweep all the pairs in turns<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 171


Invertible Iterative SemiAlg ConclRemarks HJ InfoMax Relative gradient Proba. RobustICA<br />

SMSE (dB)<br />

0<br />

−5<br />

−10<br />

−15<br />

−20<br />

−25<br />

10 1<br />

RobustICA again, vs FastICA<br />

Signal Mean Square error [ZC07]<br />

K = 5<br />

10 2<br />

K = 10<br />

10 3<br />

K = 20<br />

10 4<br />

complexity per source per sample (flops)<br />

FastICA<br />

pw+FastICA<br />

RobustICA<br />

pw+RobustICA<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 172<br />

10 5


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Survey of some semi-algebraic algorithms<br />

1 An algorithm of ALS type: ACDC<br />

2 An algorithm with a multiplicative update, valid if A is<br />

diagonally dominant<br />

3 An algorithm based on Joint triangularization<br />

4 An SVD-based algorithm: ICAR (Biome4)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 173


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Alternating Least Squares<br />

1 Two writings of the criterion:<br />

Υ = <br />

q<br />

||T(q) − BΛ(q)B H || 2<br />

Υ = <br />

||t(q) − B λ(q)|| 2<br />

q<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Alternating Least Squares<br />

1 Two writings of the criterion:<br />

Υ = <br />

q<br />

||T(q) − BΛ(q)B H || 2<br />

Υ = <br />

||t(q) − B λ(q)|| 2<br />

q<br />

2 Stationary values for DiagΛ(q): λ(q) = {B H B} −1 B H t(q)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Alternating Least Squares<br />

1 Two writings of the criterion:<br />

Υ = <br />

q<br />

||T(q) − BΛ(q)B H || 2<br />

Υ = <br />

||t(q) − B λ(q)|| 2<br />

q<br />

2 Stationary values for DiagΛ(q): λ(q) = {B H B} −1 B H t(q)<br />

3 Stationary value for each column b[ℓ] of matrix B is the<br />

dominant eigenvector of the Hermitean matrix<br />

P[ℓ] = 1 <br />

λℓ(q){˜T[q; ℓ]<br />

2<br />

H + ˜T[q; ℓ]}<br />

q<br />

where ˜T[q; ℓ] <strong>de</strong>f<br />

= T(q) − <br />

n=ℓ λn(q) b[n]b[n] H .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Alternating Least Squares<br />

1 Two writings of the criterion:<br />

Υ = <br />

q<br />

||T(q) − BΛ(q)B H || 2<br />

Υ = <br />

||t(q) − B λ(q)|| 2<br />

q<br />

2 Stationary values for DiagΛ(q): λ(q) = {B H B} −1 B H t(q)<br />

3 Stationary value for each column b[ℓ] of matrix B is the<br />

dominant eigenvector of the Hermitean matrix<br />

P[ℓ] = 1 <br />

λℓ(q){˜T[q; ℓ]<br />

2<br />

H + ˜T[q; ℓ]}<br />

q<br />

where ˜T[q; ℓ] <strong>de</strong>f<br />

= T(q) − <br />

n=ℓ λn(q) b[n]b[n] H .<br />

4 ALS: calculate Λ(q) and B alternately. Use LS solution when<br />

matrices are singular.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 174


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Diagonally dominant matrices (1)<br />

One wishes to minimize iteratively <br />

q ||T(q) − A Λ(q) AT || 2<br />

Assume A is strictly diagonally dominant: |Aii| > <br />

j=i |Aij|<br />

(cf. Levy-Desplanques theorem)<br />

1 Initialize A = I<br />

2 Update A multiplicatively as A ← (I + W)A, where W is<br />

zero-diagonal<br />

3 Compute the best W assuming that it is small and that T(q)<br />

are almost diagonal (first or<strong>de</strong>r approximation)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 175


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Computational <strong>de</strong>tails:<br />

Diagonally dominant matrices (2)<br />

We have: T (ℓ+1) (q) ← (I + W)T (ℓ) (q)(I + W) T<br />

T (ℓ) (q) <strong>de</strong>f<br />

= (D − E), where D is diagonal, and E zero-diagonal<br />

If W and E are small:<br />

T (ℓ+1) (q) ≈ D(q) + W D(q) + D(q) W T − E(q)<br />

Hence minimize, wrt W:<br />

<br />

|Wij Djj(q) + Wji Dii(q) − Eij(q)| 2<br />

q<br />

i=j<br />

This is of the form minw ||J w − e|| 2 , where J is sparse: J T J<br />

is block diagonal<br />

Thus one g<strong>et</strong>s at each iteration a collection of <strong>de</strong>coupled 2 × 2<br />

linear systems<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 176


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Joint triangularization of matrix slices<br />

1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />

T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />

= diag{Cq,:}.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Joint triangularization of matrix slices<br />

1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />

T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />

= diag{Cq,:}.<br />

2 Compute the Generalized Schur <strong>de</strong>composition<br />

Q T(q) Z = R(q), where R(q) are upper-triangular<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Joint triangularization of matrix slices<br />

1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />

T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />

= diag{Cq,:}.<br />

2 Compute the Generalized Schur <strong>de</strong>composition<br />

Q T(q) Z = R(q), where R(q) are upper-triangular<br />

3 Since QT R(q) ZT = A D(q) BT , matrices R ′ <strong>de</strong>f<br />

= Q A and<br />

R ′′ <strong>de</strong>f<br />

= BT Z are upper triangular, and can be assumed to have<br />

a unit diagonal. Hence R ′ and R ′′ can be computed by<br />

solving from the bottom to the top the triangular system, two<br />

at a time:<br />

entries R ′ ij<br />

and R′′<br />

ij<br />

R(q) = R ′ D(q) R ′′<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Joint triangularization of matrix slices<br />

1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />

T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />

= diag{Cq,:}.<br />

2 Compute the Generalized Schur <strong>de</strong>composition<br />

Q T(q) Z = R(q), where R(q) are upper-triangular<br />

3 Since QT R(q) ZT = A D(q) BT , matrices R ′ <strong>de</strong>f<br />

= Q A and<br />

R ′′ <strong>de</strong>f<br />

= BT Z are upper triangular, and can be assumed to have<br />

a unit diagonal. Hence R ′ and R ′′ can be computed by<br />

solving from the bottom to the top the triangular system, two<br />

at a time:<br />

entries R ′ ij<br />

and R′′<br />

ij<br />

R(q) = R ′ D(q) R ′′<br />

4 Compute A = R ′ Q T and B T = R ′′ Z T<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

Joint triangularization of matrix slices<br />

1 From T, <strong>de</strong>termine a collection of matrices (e.g. matrix slices),<br />

T(q), satisfying T(q) = A D(q) BT , D(q) <strong>de</strong>f<br />

= diag{Cq,:}.<br />

2 Compute the Generalized Schur <strong>de</strong>composition<br />

Q T(q) Z = R(q), where R(q) are upper-triangular<br />

3 Since QT R(q) ZT = A D(q) BT , matrices R ′ <strong>de</strong>f<br />

= Q A and<br />

R ′′ <strong>de</strong>f<br />

= BT Z are upper triangular, and can be assumed to have<br />

a unit diagonal. Hence R ′ and R ′′ can be computed by<br />

solving from the bottom to the top the triangular system, two<br />

at a time:<br />

entries R ′ ij<br />

and R′′<br />

ij<br />

R(q) = R ′ D(q) R ′′<br />

4 Compute A = R ′ Q T and B T = R ′′ Z T<br />

5 Compute matrix C from T, A and B by solving the<br />

over-<strong>de</strong>termined linear system C · {(B ⊙ A) T } = T K×JI<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 177


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

ICAR (1/2)<br />

Source kurtosis signs are known and stored in diagonal matrix ∆.<br />

1 Compute 4th or<strong>de</strong>r cumulants of x and store them in a<br />

N 2 × N 2 “quadricovariance” matrix, Qx.<br />

2 Upon appropriate or<strong>de</strong>ring, we have:<br />

Qx = [A ⊙ A]D∆D[A ⊙ A] H<br />

(35)<br />

where D∆D is the N × N diagonal matrix of source marginal<br />

cumulants, Dii ∈ R + .<br />

3 Compute a square root of Qx = EΛE H of rank N as<br />

Q 1/2<br />

x<br />

= E(Λ∆) 1/2 . Then Q 1/2<br />

x ∆ Q H/2<br />

x = Qx and<br />

∃ V : Q 1/2<br />

x<br />

= [A ⊙ A]DV H , with V H ∆V = ∆<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 178


Invertible Iterative SemiAlg ConclRemarks ALS DiagDom JointSchur ICAR<br />

4 Q 1/2<br />

x<br />

ICAR (2/2)<br />

has N2 rows and N columns. Denote Γn each N × N<br />

block. Then<br />

where Φn = Diag{A(n, :)}<br />

Γn = A ∗ ΦnDV H<br />

5 Compute the products Θm,n = Γ − mΓn. Then we have<br />

Θm,n = V −H Φ −1<br />

m ΦnV H<br />

6 Compute V by jointly diagonalizing Θm,n, 1 ≤ m < n ≤ N.<br />

7 Compute an estimate of A from the N blocks of<br />

Q 1/2<br />

x V −H = A ⊙ A, and eventually an estimate of the <strong>sources</strong><br />

For further <strong>de</strong>tails, see [AFCC05].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 179


Invertible Iterative SemiAlg ConclRemarks<br />

Bibliographical comments S<br />

For collection of symm<strong>et</strong>ric matrices:<br />

maximizes iteratively a lower bound to the <strong>de</strong>crease on a<br />

probabilistic objective [Pha01]<br />

alternately minimize <br />

q ||T(q) − A Λ(q) AT || 2 wrt A and<br />

Λ(q) [Yer02]. See also: [LZ07] [VO06]<br />

minimizes iteratively <br />

q ||T(q) − A Λ(q) AT || 2 un<strong>de</strong>r the<br />

assumption that A is diagonally dominant [ZNM04].<br />

For collection of non symm<strong>et</strong>ric matrices:<br />

factor A into orthogonal and triangular parts, and perform a<br />

joint Schur <strong>de</strong>composition [LMV04].<br />

others: algorithms applicable to un<strong>de</strong>r<strong>de</strong>termined case work<br />

here, e.g. ICAR [AFCC05]. Also: [CL13].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 180


Invertible Iterative SemiAlg ConclRemarks<br />

For over-<strong>de</strong>termined mixtures:<br />

Summary<br />

Solving the invertible problem is sufficient<br />

Orthogonal framework:<br />

Fully use 2nd or<strong>de</strong>r statistics, and then <strong>de</strong>compose<br />

approximately the tensor un<strong>de</strong>r orthogonal constraint.<br />

Easier to handle singularity, but arbitrary.<br />

Several (contrast) criteria and algorithms for orthogonal<br />

<strong>de</strong>composition<br />

Invertible framework:<br />

Decompose the higher or<strong>de</strong>r cumulant tensor directly un<strong>de</strong>r<br />

invertible constraint<br />

Ill-posed<br />

Several algorithms, mainly working with matrix slices<br />

Principles & algorithms <strong>de</strong>dicated to un<strong>de</strong>r-<strong>de</strong>termined mixtures<br />

can also apply (cf. subsequent course).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS V.Algorithms without prewhitening 181


Intro Inversibility Time Frequ<br />

VI. Convolutive Mixtures<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 182


Intro Inversibility Time Frequ<br />

Contents of course VI<br />

21 Introduction<br />

Notations<br />

Mo<strong>de</strong>ls of <strong>sources</strong><br />

22 Inversibility of MIMO convolutive mixtures<br />

23 M<strong>et</strong>hods in the time domain<br />

Contrasts<br />

Blind Equalization<br />

Blind I<strong>de</strong>ntification<br />

Matching algorithms<br />

Subspace algorithms<br />

Algorithms for ARMA channels<br />

24 M<strong>et</strong>hods in the frequency domain<br />

Introduction<br />

SOS<br />

Permutation<br />

Applications<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 183


Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />

Why convolutive mixtures ?<br />

Limitation of linear instantaneous (or memoryless) mixtures<br />

source signal may arrive to the sensors with different <strong>de</strong>lays,<br />

following different paths<br />

source signal can be filtered due to the channel propagation<br />

such situations occurs in:<br />

telecommunications due to multiple paths,<br />

audio and speech processing due to sound propagation in air,<br />

reverberation in room.<br />

For b<strong>et</strong>ter mo<strong>de</strong>lization of mixtures in such situations, one has to<br />

consi<strong>de</strong>r mixtures of filtered <strong>sources</strong>, i. e. convolutive mixtures.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 184


Sources and mixtures<br />

Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />

Notations<br />

Vector of unknown source signals: s(n) = (s1(n), . . . , sP(n)) T<br />

Vector of observable mixtures: x(n) = (x1(n), . . . , xK (n)) T<br />

Delayed samples of <strong>sources</strong> contribute to observations at a<br />

given time n, which is mo<strong>de</strong>lled by a MIMO (multi-input<br />

multi-output) linear time invariant (LTI) filter with impulse<br />

response A(n).<br />

Relation b<strong>et</strong>ween <strong>sources</strong> and mixtures is<br />

x(n) = <br />

A(k)s(n − k)<br />

k∈Z<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 185


Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />

Notations (con’t)<br />

Separation structure and estimated <strong>sources</strong><br />

Estimated <strong>sources</strong>: y(n) = (y1(n), . . . , yP(n)) T<br />

y(n) is obtained through a MIMO LTI filter B(n), objective of<br />

which is to “inverse” the channel mixture A(n):<br />

y(n) = <br />

B(k)x(n − k)<br />

k∈Z<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 186


Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />

z-transform notations<br />

Mixing and separating filters<br />

A[z] = <br />

A(k)z −k and B[z] = <br />

B(k)z −k<br />

Global filter<br />

k∈Z<br />

k∈Z<br />

The global filter <strong>de</strong>noted G(n), relies y(n) and s(n):<br />

G(n) = <br />

G(k)s(n − k), ∀n ∈ Z<br />

k∈Z<br />

The global filter is <strong>de</strong>pending on mixing and separating filters:<br />

G(n) = <br />

B(n − k)A(k), ∀n ∈ Z and<br />

k∈Z<br />

G[z] = B[z]A[z].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 187


Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />

Mo<strong>de</strong>ls of <strong>sources</strong><br />

The mo<strong>de</strong>l of <strong>sources</strong> is important for mo<strong>de</strong>lling practical<br />

situations. It also plays an important role in the in<strong>de</strong>terminacies.<br />

iid <strong>sources</strong><br />

The signal {s(n)}, n ∈ Z, is an in<strong>de</strong>pen<strong>de</strong>nt and i<strong>de</strong>ntically<br />

distributed (iid) sequence if the sample distributions are<br />

i<strong>de</strong>ntical and mutually statistically in<strong>de</strong>pen<strong>de</strong>nt.<br />

Definition of linear <strong>sources</strong><br />

The signal {s(n)}, n ∈ Z, is a linear signal if it can be written<br />

as the output of a filter fed by an iid sequence.<br />

Definition of nonlinear <strong>sources</strong><br />

The signal {s(n)}, n ∈ Z, is a nonlinear signal if it is not<br />

linear, i.e. if it cannot be written as the output of a filter feed<br />

by an iid sequence.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 188


Intro Inversibility Time Frequ Notations Mo<strong>de</strong>ls<br />

Other properties on <strong>sources</strong><br />

Other priors on <strong>sources</strong>, encountered in various situations or<br />

application domains, can be exploited, e.g.:<br />

Nonstationarity, for instance, in music or speech processing<br />

Cyclo-stationarity, for instance in telecommunications<br />

Colored <strong>sources</strong><br />

Etc.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 189


Global filter<br />

Intro Inversibility Time Frequ<br />

Assumptions<br />

Three main assumptions can be done on the filters, leading to<br />

different situations and level of difficulty<br />

H1: The mixing filter is stable<br />

H2: The mixing filter is causal<br />

H3: The mixing filter is finite impulse response (FIR) filter<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 190


Stable mixing filter<br />

Intro Inversibility Time Frequ<br />

General inversibility results<br />

Proposition 1: For P = K), the stable P × P filter A[z] is<br />

invertible by a stable filter iif <strong>de</strong>t(A[z]) = 0, ∀z, |z| = 1.<br />

Stable and causal mixing filter<br />

Proposition 2: For P = K), the stable P × P filter A[z] is<br />

invertible by a stable and causal filter iif <strong>de</strong>t(A[z]) = 0 in a<br />

domain {z ∈ C, |z| > r} ∪ {∞}, where 0 ≤ r < 1.<br />

Stable P = K) mixing filter<br />

We now assume that P = K.<br />

Proposition 3: The stable K × P filter A[z] is invertible by a<br />

stable filter iif rank{A[z]} = P, ∀z, |z| = 1.<br />

Proofs can be found in [CJ10, ch.8]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 191


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Contrast criteria (1) P<br />

In<strong>de</strong>terminacies in convolutive mixtures<br />

White <strong>sources</strong>: the only filters preserving both strong<br />

whiteness (i.i.d) and in<strong>de</strong>pen<strong>de</strong>nce are<br />

T[z] = Λ[z] P<br />

where Λ[z] is diagonal, and λii[z] = αi z βi , β ∈ Z, i.e. pure<br />

<strong>de</strong>lays integer multiples of sampling period.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 192


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Contrast criteria (1) P<br />

In<strong>de</strong>terminacies in convolutive mixtures<br />

White <strong>sources</strong>: the only filters preserving both strong<br />

whiteness (i.i.d) and in<strong>de</strong>pen<strong>de</strong>nce are<br />

T[z] = Λ[z] P<br />

where Λ[z] is diagonal, and λii[z] = αi z βi , β ∈ Z, i.e. pure<br />

<strong>de</strong>lays integer multiples of sampling period.<br />

Colored <strong>sources</strong>: sole mutual in<strong>de</strong>pen<strong>de</strong>nce cannot allow to<br />

recover source color. Trivial filters are of same form, where<br />

Λ[z] is any diagonal matrix.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 192


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Contrast criteria (1) P<br />

In<strong>de</strong>terminacies in convolutive mixtures<br />

White <strong>sources</strong>: the only filters preserving both strong<br />

whiteness (i.i.d) and in<strong>de</strong>pen<strong>de</strong>nce are<br />

T[z] = Λ[z] P<br />

where Λ[z] is diagonal, and λii[z] = αi z βi , β ∈ Z, i.e. pure<br />

<strong>de</strong>lays integer multiples of sampling period.<br />

Colored <strong>sources</strong>: sole mutual in<strong>de</strong>pen<strong>de</strong>nce cannot allow to<br />

recover source color. Trivial filters are of same form, where<br />

Λ[z] is any diagonal matrix.<br />

Linear processes: When each source is the output of a linear<br />

filter fed by a white process, then it is then equivalent to<br />

assume that <strong>sources</strong> are white.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 192


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Contrast criteria (2)<br />

Contrasts based on output cumulants: Proofs <strong>de</strong>rived in the<br />

static case hold true in the convolutive case when <strong>sources</strong> are iid.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 193


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Contrast criteria (2)<br />

Contrasts based on output cumulants: Proofs <strong>de</strong>rived in the<br />

static case hold true in the convolutive case when <strong>sources</strong> are iid.<br />

Examples: The following are contrasts, if (a) at most one source<br />

cumulant is null, (b) p + q > 2 and (c) r ≥ 1:<br />

Υ (r)<br />

p,q = <br />

|Cum{p, q}{zi}| r<br />

i<br />

(36)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 193


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Contrast criteria (2)<br />

Contrasts based on output cumulants: Proofs <strong>de</strong>rived in the<br />

static case hold true in the convolutive case when <strong>sources</strong> are iid.<br />

Examples: The following are contrasts, if (a) at most one source<br />

cumulant is null, (b) p + q > 2 and (c) r ≥ 1:<br />

Υ (r)<br />

p,q = <br />

|Cum{p, q}{zi}| r<br />

i<br />

More generally, if φ is convex strictly increasing on R +<br />

(36)<br />

Υ φ p,q = <br />

φ(|Cum{p, q}{zi}|) (37)<br />

i<br />

For more <strong>de</strong>tails, see [CJ10, ch.3] [Com04] [MP97] [Com96] and<br />

refs therein.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 193


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Para-unitarity<br />

What are the filters preserving second-or<strong>de</strong>r whiteness?<br />

The so-called Para-unitary filters:<br />

U[z] U[1/z ∗ ] H = I<br />

Or in time domain:<br />

<br />

U(k) U(k − n) H = δ(n) I<br />

k<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 194


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Contrast criteria (3)<br />

Also possible to <strong>de</strong>vise new families of contrasts for<br />

para-unitary equalizers after prewhitening [CM97] [Com96].<br />

For instance at or<strong>de</strong>r 4:<br />

Υ(y) = <br />

|Cum{yi[n], yi[n] ∗ , yj[n − p], yk[n − q] ∗ }| 2<br />

(38)<br />

i<br />

j p<br />

k q<br />

In the above, one can conjugate any of the variables yℓ’s<br />

provi<strong>de</strong>d at most one source cumulant is null<br />

Holds true for almost any cumulants of or<strong>de</strong>r ≥ 3<br />

Only two indices need to be i<strong>de</strong>ntical with same <strong>de</strong>lay<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 195


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO mixture with diversity K = 2 (1)<br />

s<br />

Disparity condition:<br />

Bézout:<br />

Thus<br />

FIR filter h =<br />

h<br />

h<br />

x<br />

1 1<br />

h1[z] ∧ h2[z] = 1 ⇒ x1[z] ∧ x2[z] = s[z]<br />

∃v1[z], v2[z] / v1[z] h1[z] + v2[z] h2[z] = 1<br />

h1<br />

h2<br />

2<br />

⇒ v1[z] x1[z] + v2[z] x2[z] = s[z]<br />

<br />

admits the FIR inverse v = <br />

v1, v2 .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 196<br />

x<br />

2


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO mixture with diversity K = 2 (2)<br />

Theorem<br />

If two polynomials p(z) = m i=0 aiz i and q(z) = n prime, then the resultant below is non zero:<br />

<br />

<br />

a0<br />

<br />

<br />

<br />

0<br />

<br />

R(p, q) = 0<br />

<br />

b0<br />

<br />

<br />

0<br />

0<br />

. . .<br />

. ..<br />

0<br />

. . .<br />

. ..<br />

0<br />

am<br />

a0<br />

bn<br />

b0<br />

0<br />

. ..<br />

. . .<br />

0<br />

. ..<br />

. . .<br />

<br />

. . . <br />

<br />

<br />

0 <br />

<br />

am <br />

<strong>de</strong>f A<br />

. . . = <strong>de</strong>t<br />

B<br />

<br />

0 <br />

<br />

bn <br />

i=0 biz i are<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 197


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Use of time diversity<br />

Time diversity<br />

If channel bandwidth exceeds symbol rate 1 (excess bandwidth),<br />

Ts<br />

then a sampling faster than 1 brings extra information on channel.<br />

Ts<br />

[Slo94] [TXK94]<br />

How to build a SIMO channel from a SISO?<br />

sample twice faster: x[k] = x(k Ts/2)<br />

<strong>de</strong>note odd samples x1[k] = x[2k + 1], and even samples<br />

x2[k] = x[2k]<br />

then x1[k]<br />

x2[k]<br />

<br />

=<br />

H1<br />

H2<br />

<br />

s[k] <strong>de</strong>f<br />

= H s[k]<br />

Matrix H is full rank (well conditioned) if sufficient excess<br />

bandwidth<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 198


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MISO Dynamic Extractor: Deflation<br />

Fixed step gradient Deflation [Tug97]<br />

No spurious minima of contrast criterion [CJ10, ch.6] [DL95]<br />

Optimal Line search along a <strong>de</strong>scent direction, OS-KMA<br />

[ZC05] [Com02]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 199


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

After Prewhitening: PAJOD (1)<br />

Technique applied after space-time prewhitening<br />

Then one looks for a para-unitary equalizer, by maximizing the<br />

contrast<br />

J2,r (my) = <br />

||Diag{H H M(mb, mβ) H}|| 2<br />

b<br />

β<br />

Matrix H is now <strong>de</strong>fined differently, and is semi-unitary.<br />

Matrices M(·) contain cumulants of whitened observations<br />

Contrast (38) is maximized again by a sweeping technique<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 200


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

PAJOD (2)<br />

PAJOD: Partial Approximate Joint Diagonalization of matrix slices<br />

One actually attempts to diagonalize only a portion of the tensor<br />

1<br />

. . .<br />

q q<br />

N L<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 201


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO Blind Equalization<br />

linear prediction after BI [Com90]<br />

linear prediction [AMML97] [GL99]<br />

subspace [MDCM95] [GDM97] [CL97] [AMLM97]<br />

i<strong>de</strong>ntifiability issues by subspace techniques [LM00] [Des01]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 202


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Equalization after prior Blind I<strong>de</strong>ntification<br />

Assume channel H[z] has been i<strong>de</strong>ntified, with:<br />

x[z] = H[z] · s[z] + v[z]<br />

An estimate of s[z] is obtained with F[z] ⋆ x[z].<br />

Possible equalizers F[z]:<br />

Zero-Forcing: F[z] = H[z] −1<br />

Matched Filter: F[z] = H[1/z ∗ ] H<br />

(used in MLSE; optimal if channel AWGN; maximizes output<br />

SNR)<br />

Minimum Mean Square Error (MSE):<br />

F[z] = (H[z]H[1/z ∗ ] H + Rv [z]) −1 H[1/z ∗ ] H<br />

➽ One can insert soft or hard <strong>de</strong>cision to stabilize the inverse, or to<br />

reduce noise, e.g. <strong>de</strong>cision Feedback Equalizers (DFE).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 203


Interest<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Overview<br />

MA i<strong>de</strong>ntifiability (second or<strong>de</strong>r vs hOS)<br />

SISO: Cumulant matching<br />

MIMO: Cumulant matching and linear prediction (non monic<br />

MA)<br />

Algebraic approaches, Quotient Ring<br />

SIMO: Subspace approaches<br />

MIMO: Subspace, IIR, ...<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 204


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

If <strong>sources</strong> sp[k] are discr<strong>et</strong>e, it is:<br />

BE vs BI<br />

rather easy to <strong>de</strong>fine a BE optimization criterion in or<strong>de</strong>r to<br />

match an output alphab<strong>et</strong><br />

difficult to exploit a source alphab<strong>et</strong> in BI<br />

Example the property of constant modulus of an alphab<strong>et</strong> is mainly<br />

used in Blind Equalization: CMA (Constant Modulus Algorithm)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 205


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Interest of Blind I<strong>de</strong>ntification<br />

When the mixture does not have a stable inverse<br />

➽ When may want to control stability by soft/hard <strong>de</strong>cision in<br />

a Feedback Equalizer<br />

When <strong>sources</strong> are not of interest (e.g. channel characteristics,<br />

localization only)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 206


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SISO Cumulant matching (1)<br />

Consi<strong>de</strong>r first the SISO case<br />

L<br />

x[n] = h[k] s[n − k] + v[k]<br />

k=0<br />

where v[k] is Gaussian stationary, and s[n] is 4th or<strong>de</strong>r white<br />

stationary.<br />

Then, by the multilinearity property of cumulants (cf. sli<strong>de</strong>):<br />

Cx(i, j) <strong>de</strong>f<br />

= Cum{x[t+i], x[t+j], x[t+L], x[t]} = h[i] h[j] h[L] h[0] cs<br />

with cs <strong>de</strong>f<br />

= Cum{s[n], s[n], s[n], s[n]}.<br />

By substitution of the unknown h[L] h[0] cs, one g<strong>et</strong>s a whole<br />

family of equations [SGS94] [Com92b]:<br />

h[i] h[j] Cx(k, ℓ) = h[k] h[ℓ] Cx(i, j), ∀i, j, k, ℓ (39)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 207


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SISO Cumulant matching (2)<br />

A solution to the subs<strong>et</strong> of (39) for which j = ℓ can be easily<br />

obtained:<br />

h[i] Cx(k, j) − h[k] Cx(i, j) = 0, 0 ≤ i < k ≤ L, 0 ≤ j ≤ L<br />

(40)<br />

This is a linear system of L(L + 1) 2 /2 equations in L + 1<br />

unknowns<br />

⇒ Least Square (LS) solution, up to a scale factor (e.g.<br />

h(0) = 1).<br />

Since 4th or<strong>de</strong>r only, asymptotically (for large samples)<br />

insensitive to Gaussian noise.<br />

Total Least Squares (TLS) solution possible as well<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 208


Int<strong>et</strong>erminacy<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO Cumulant matching (1)<br />

Scale (scalar) factor for SISO, but ΛP factor for MIMO<br />

Reduction to a monic mo<strong>de</strong>l [Com94b]<br />

If H[0] is invertible, one has<br />

y[n] = H[0] s[n], (41)<br />

L<br />

x[n] = B[k] y[n − k] + w[k] (42)<br />

k=0<br />

where B[k] <strong>de</strong>f<br />

= H[k]H[0] −1 .<br />

Because B[0] = I, MA mo<strong>de</strong>l (42) is said to be monic.<br />

In<strong>de</strong>terminacy is only in (41), which is solved by ICA if s[n] is<br />

spatially white<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 209


Kronecker notation<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO Cumulant matching (2)<br />

Store 4th or<strong>de</strong>r cumulant tensors in vector form:<br />

ca,b,c,d<br />

<strong>de</strong>f<br />

= vec{Cum{a, b, c, d}}<br />

Then, we have the property (where ⊡ <strong>de</strong>notes term-wise<br />

Hadamard product):<br />

ca,b,c,d = E{a ⊗ b ⊗ c ⊗ d} − E{a ⊗ b} ⊗ E{c ⊗ d}<br />

−E{a ⊗ E{b ⊗ c} ⊗ d}<br />

−E{a ⊗ 1β ⊗ c ⊗ 1δ} ⊡ E{1α ⊗ b ⊗ 1γ ⊗ d}<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 210


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO Cumulant matching (3)<br />

Assume monic MA mo<strong>de</strong>l (42) where s[n] white in time and L<br />

fixed, and <strong>de</strong>note<br />

cx(i, j) <strong>de</strong>f<br />

= vec{Cum{x[t + i], x[t + j], x[t + L], x[t]}}<br />

Then we can prove [Com92b]:<br />

Cx(i, j) = Cx(0, j) B[i] T , ∀j, 0 ≤ j ≤ L<br />

where Cx(i, j) <strong>de</strong>f<br />

= Unvec P (cx) is P 3 × P<br />

For every fixed i, B[i] is obtained by solving the system of<br />

(L + 1)P 4 equations in P 2 unknowns in LS sense:<br />

[IP ⊗ Cx(0, j)] vec{B[i]} = cx(i, j) (43)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 211


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO Cumulant matching (4)<br />

Summary of the algorithm<br />

Choose a maximum L<br />

Estimate cumulants of observation, cx(i, j) for i, j ∈ {0, . . . , L}<br />

Solve the (L + 1) systems (43) in B[i]<br />

Compute the residue y[t] (Linear Prediction)<br />

Solve the ICA problem y[t] = H[0] s[t]<br />

Weaknesses<br />

H[0] must be invertible<br />

FIR mo<strong>de</strong>l needs to have a stable inverse<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 212


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Algebraic Blind i<strong>de</strong>ntification (1)<br />

Types of discr<strong>et</strong>e source studied<br />

BPSK: b[k] ∈ {−1, 1}, i.i.d.<br />

MSK: m[k + 1] = j m[k] b[k]<br />

QPSK: p[k] ∈ {−1, −j, 1, j}, i.i.d.<br />

π<br />

4 -DQPSK: d[k + 1] = ejπ/4 d[k] p[k]<br />

8-PSK: q[k] ∈ {e jnπ/4 , n ∈ Z}, i.i.d.<br />

<strong>et</strong>c...<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 213


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Input/Output relations:<br />

Algebraic Blind i<strong>de</strong>ntification (2)<br />

For s[k] BPSK: E{x[n] x[n − ℓ]} = s[0] 2 L m=0 h[m] h[m + ℓ]<br />

For s[k] MSK:<br />

E{x[n] x[n − ℓ]} = s[0] 2 L m=0 (−1)m h[m] h[m + ℓ]<br />

For s[k] QPSK:<br />

E{x[n] 2 x[n − ℓ] 2 } = s[0] 2 L m=0 h[m]2 h[m + ℓ] 2<br />

<strong>et</strong>c..<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 214


Principle:<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Algebraic Blind i<strong>de</strong>ntification (3)<br />

Compute all roots of the polynomial system in h[n].<br />

For instance for MSK <strong>sources</strong> and a channel of length 2<br />

[GCMT02]:<br />

h[0] 2 − h[1] 2 + h[2] = β0<br />

h[0] h[1] − h[1] h[2] = β1<br />

h[0] h[2] = β2<br />

Choose among these roots the one that best matches the I/O<br />

correlation:<br />

E{x[n] x[n − ℓ] ∗ } =<br />

L<br />

h[m] h[m + ℓ] ∗<br />

m=0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 215


⎛<br />

⎜<br />

⎝<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO mixture (1)<br />

FIR of length L and dimension K:<br />

L<br />

x(n) = h(i)s(n − i) + b(n)<br />

with: E{b(m) b(n) H } = σ2 and<br />

b I δ(m − n)<br />

E{b(m) s(n) ∗ } = 0<br />

For T successive values:<br />

⎞<br />

x(n)<br />

x(n−1) ⎟<br />

. ⎠<br />

x(n−T)<br />

=<br />

⎛<br />

h(0) h(1) . . . h(L)<br />

⎜ 0 h(0) . . . . . .<br />

⎜<br />

⎝<br />

.<br />

. .. . .. . ..<br />

0<br />

h(L)<br />

. . .<br />

. . .<br />

. ..<br />

0<br />

0<br />

.<br />

0 0 . . . h(0) h(1) . . . h(L)<br />

Or in compact form:<br />

i=1<br />

⎞<br />

⎛<br />

⎟<br />

⎟⎜<br />

⎟⎝<br />

⎠<br />

s(n)<br />

.<br />

s(n−T −L)<br />

X(n : n − T ) = HT S(n : n − T − L) (44)<br />

Here, HT is of size (T + 1)K × (T + L + 1)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 216<br />

⎞<br />

⎟<br />


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO mixture (2)<br />

Condition of “column” matrix<br />

H has strictly more rows than columns iff<br />

(T + 1)K > T + L + 1<br />

⇔ T > L/(K − 1) − 1 ⇐ T ≥ L<br />

It suffices that T exceeds channel memory.<br />

Disparity condition<br />

Columns of H are linearly in<strong>de</strong>pen<strong>de</strong>nt iff<br />

h[z] = 0, ∀z<br />

Noise subspace<br />

Un<strong>de</strong>r these 2 conditions, there exists a “noise subspace”:<br />

∃v / v H HT = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 217


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO mixture (3)<br />

Properties of vectors in the null space<br />

= E{X XH } = HT HH T + σ2 b I,<br />

vectors v (p) of noise space can be computed from Rx:<br />

Since Rx <strong>de</strong>f<br />

Rx v (p) = σ 2 b<br />

v (p)<br />

And since convolution is commutative:<br />

v (p)H HT = h H V (p)<br />

where V (p) block Töplitz, built on v (p) .<br />

Thus h H = [h(0) H , h(1) H , . . . h(L) H ] are obtained by<br />

computing the left singular vector common to V (p) .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 218


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO mixture (4)<br />

Summary of the SIMO Subspace Algorithm<br />

Choose T ≥ L<br />

Compute Rx, correlation matrix of size (T + 1)K<br />

Compute the d = T (K − 1) + K − L − 1 vectors v (p) of the<br />

noise space<br />

Compute vector h minimizing the quadratic form<br />

h H<br />

⎡<br />

d<br />

⎣ V (p) V (p)H<br />

⎤<br />

⎦ h<br />

p=1<br />

Cut h into L + 1 slices h(i) of length K<br />

Un<strong>de</strong>r the assumed hypotheses, the solution is unique up to a scalar<br />

scale factor [MDCM95]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 219


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO mixture (5)<br />

Summary of the SIMO Subspace Algorithm when K = 2<br />

Choose T = L. There is a single vector v in the noise space<br />

Compute Rx, correlation matrix of size (T + 1)K<br />

Compute the vector v of the noise space<br />

Cut v into L + 1 slices v(i) of lengthK = 2<br />

<br />

0 −1<br />

Compute h(i) =<br />

v(i)<br />

1 0<br />

In fact xi = hi ⋆ s ⇒ h2 ⋆ x1 − h1 ⋆ x2 = 0<br />

Approach called SRM (Subchannel Response Matching)<br />

[XLTK95] [GN95]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 220


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Second or<strong>de</strong>r statistics<br />

SISO I<strong>de</strong>ntifiability<br />

αℓ = E{x[n] x[n − ℓ] ∗ } ➽ allow to estimate |h[m]|<br />

βℓ = E{x[n] x[n − ℓ]} ➽ allow to estimate h[m] if E{s 2 } = 0<br />

Fourth or<strong>de</strong>r statistics ➽ many (polynomial) additional<br />

equations<br />

γ0jkℓ = Cum{x[n], x[n − j], x[n − k], x[n − ℓ]}<br />

γ kℓ<br />

0j = Cum{x[n], x[n − j], x[n − k]∗ , x[n − ℓ] ∗ }<br />

If some <strong>sources</strong> are 2nd or<strong>de</strong>r circular, sample Statistics of or<strong>de</strong>r<br />

higher than 2 are mandatory, but otherwise not [GCMT02] !<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 221


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SIMO I<strong>de</strong>ntifiability<br />

With a receive diversity, (<strong>de</strong>terministic) i<strong>de</strong>ntifiability conditions are<br />

weaker [Hua96] [XLTK95]<br />

Definition A length-N input sequence s[n] has P mo<strong>de</strong>s iff<br />

the Hankel matrix below is full row rank:<br />

⎛<br />

s[1]<br />

⎜ s[2]<br />

⎜<br />

⎝ .<br />

s[2]<br />

s[3]<br />

.<br />

. . .<br />

. ..<br />

⎞<br />

s[N − p + 1]]<br />

⎟<br />

s[N − p + 2] ⎟<br />

. ⎠<br />

s[p] s[p + 1] . . . s[N]<br />

Theorem A K × L FIR channel h is i<strong>de</strong>ntifiable if:<br />

Channels hk[z] do not have common zeros<br />

The observation length of each xk[n] must be at least L + 1<br />

The input sequence should have at least L + 1 mo<strong>de</strong>s<br />

(sufficiently exciting)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 222


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Subspace algorithm for MIMO mixtures<br />

Similarly to the SIMO case, we have the compact form:<br />

X(n) = HT S(n) + B(n)<br />

where H is now built on matrices H(k), 1 ≤ k ≤ L, and is of<br />

size (T + 1)K × (T + L + 1)P.<br />

For large enough T , this matrix is "column shaped"<br />

Again Rx = HT H H<br />

T + σ2 b I<br />

But now, vectors of the noise space caracterize H[z] only up to<br />

a constant post-multiplicative matrix ⇒ ICA must be used<br />

afterwards<br />

Foundations of the MIMO subspace algorithm are more<br />

complicated [Loubaton’99]<br />

In the MIMO case, HOS are in general mandatory.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 223


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO Subspace param<strong>et</strong>erization C<br />

The basic i<strong>de</strong>a is to expand the filtering operations, in which matrix<br />

entries are z polynomials, in large matrices with scalar entries.<br />

Source and observation vectors<br />

Vector of <strong>sources</strong>:<br />

S(n) = (s(n), s(n − 1), . . . , s(n − (L + L ′ ) + 1)), where<br />

s(k) = (s1(k), s2(k), . . . , sP(k))<br />

Vector of observations:<br />

X(n) = (x(n), x(n − 1), . . . , x(n − L ′ + 1)), where<br />

x(k) = (x1(k), x2(k), . . . , xK (k))<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 224


Mixing matrix<br />

⎛<br />

⎜<br />

⎝<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Subspace param<strong>et</strong>erization<br />

L and L ′ correspond to channel length and window size,<br />

respectively; they must satisfy KL ′ ≥ P(L + L ′ )<br />

The above condition ensures a <strong>de</strong>termined mo<strong>de</strong>l, i.e. with an<br />

over-<strong>de</strong>termined matrix H, with size KL ′ × P(L + L ′ )<br />

With the above notations, the mo<strong>de</strong>l x(n) = A[z]s(n) can be<br />

written X(n) = HS(n), where H is a huge Sylvester<br />

block-Toeplitz matrix:<br />

x(n)<br />

x(n − 1)<br />

.<br />

x(n − L ′ + 1)<br />

⎞<br />

⎛<br />

⎟<br />

⎠ =<br />

⎜<br />

⎝<br />

A(0) A(1) . . . A(L) 0 . . . 0<br />

0 A(0) A(1) . . . A(L) 0 0<br />

.<br />

.<br />

.<br />

.<br />

.<br />

.<br />

.<br />

.<br />

.<br />

.<br />

.<br />

0 . . . 0 A(0) A(1) . . . A(L)<br />

⎞ ⎛<br />

⎟ ⎜<br />

⎟ ⎜<br />

⎟ ⎜<br />

⎠ ⎝<br />

s(n)<br />

s(n − 1)<br />

.<br />

s(n − (L + L ′ ) + 1)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 225<br />

⎞<br />

⎟<br />


Source separation<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

Subspace param<strong>et</strong>erization<br />

If L and L ′ satisfy the above condition and the<br />

channel condition, it exists a left inverse of H which provi<strong>de</strong>s<br />

source separation.<br />

More <strong>de</strong>tails on algorithms for estimating the left inverse can<br />

be found in [MJL00]<br />

Main drawbacks of the m<strong>et</strong>hod are: huge matrix; observation<br />

number (K) must be strictly larger than the source number<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 226


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

SISO ARMA mixtures P<br />

What are the tools when the channel is IIR?<br />

In general, just consi<strong>de</strong>r it as a FIR (truncation) → already<br />

seen<br />

But also possible to assume presence of a recursive part<br />

Define I/O relation: p i=0 ai x[n − i] = q j=0 bj w[n − i]<br />

where w[·] is i.i.d. and a0 = b0 = 1<br />

Second or<strong>de</strong>r cx(τ) <strong>de</strong>f<br />

= E{x[n] x[n + τ]} can be used to i<strong>de</strong>ntify<br />

ak:<br />

p<br />

ak cx(τ − k) = 0, ∀τ > q<br />

k=0<br />

Then compute the residue and i<strong>de</strong>ntify bℓ with HOS (cf. sli<strong>de</strong>)<br />

Also possible with HOS only for AR part [SGS94]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 227


Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO ARMA mixtures (1)<br />

Results of SISO case can be exten<strong>de</strong>d [Com92b]<br />

Take a K-dimensional ARMA mo<strong>de</strong>l: Define I/O relation:<br />

p<br />

Ai x[n − i] =<br />

i=0<br />

q<br />

Bj w[n − i]<br />

where w[·] is i.i.d. and A0 = I and B0 inveritible<br />

For instance at or<strong>de</strong>r 4, AR i<strong>de</strong>ntification is based on:<br />

p<br />

j=1 Aj ¯cx(t, τ − j) = −¯cx(t, τ), ∀τ > q, ∀t<br />

with ¯cx(i, j) <strong>de</strong>f<br />

= Unvec K (Cum{x[n], x[n], x[n + i], x[n + j]})<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 228<br />

j=0


Limitations<br />

Intro Inversibility Time Frequ Contrasts Equaliz I<strong>de</strong>ntif Match Subspace ARMA<br />

MIMO ARMA mixtures (2)<br />

Sources need to be linear processes<br />

B0 needs to be invertible<br />

AR residuals need to be computed (MA filtering) to compute<br />

Bi<br />

One can compute MA residuals (AR filtering) if input s[n] is<br />

requested → but might be unstable<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 229


Fourier domain ?<br />

Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Representation in frequency domain C<br />

One difficulty in the convolutive mo<strong>de</strong>l: x(t) = A(t) ∗ s(t)<br />

related to ∗<br />

Using Plancherel theorem, ∗ becomes a simple product in the<br />

Fourier domain<br />

Morever:<br />

Fourier transform preserves linearity,<br />

information is the same in time and frequency domain.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 230


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Representation in frequency domain (con’t)<br />

BSS equations in Fourier domain<br />

Taking the Fourier transform, x(t) = A(t) ∗ s(t) becomes:<br />

X(ν) = A(ν)S(ν)<br />

∗ is cancelled; but, one just have one equation: not sufficient<br />

for estimating A(ν) or its inverse !<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 231


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Separation problem in the frequency domain<br />

BSS equations in Fourier domain with STFT<br />

Applying short term Fourier transform (STFT), on sliding<br />

windows:<br />

X(ν, t) = A(ν)S(ν, t)<br />

A(ν) is not modified, if channel impulse response is time<br />

invariant. Of course, it is if channel is changing, e.g. if <strong>sources</strong><br />

are moving.<br />

For each value ν0 of the variable ν, one has an instantaneous<br />

mixture, with complex-valued entries:<br />

X(ν0, t) = A(ν0)S(ν0, t)<br />

In the Fourier domain, the convolutive mixing (real valued)<br />

equation is then transformed in a (infinite) s<strong>et</strong> of<br />

instantaneous (complex-valued) mixing equations<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 232


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Separation problem in the discr<strong>et</strong>e frequency domain<br />

BSS equations in discr<strong>et</strong>e Fourier domain<br />

With discr<strong>et</strong>e data in time, similar equations are obtained by<br />

SF discr<strong>et</strong>e FT<br />

X(νk, t) = A(νk)S(νk, t)<br />

where νk corresponds to the k-th frequency bins in the DFT.<br />

Denoting Nν this number, the convolutive mixing (real valued)<br />

equation is then transformed in a s<strong>et</strong> of Nν instantaneous<br />

(complex-valued) mixing equations<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 233


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Solving the separation problem in frequency domain<br />

Solving in any frequency bin<br />

Based on observations X(νk, t),<br />

X(νk, t) = A(νk)S(νk, t)<br />

can be solved using any m<strong>et</strong>hod for instantaneous linear<br />

mixture.<br />

Source reconstruction<br />

For reconstructing the wi<strong>de</strong>band <strong>sources</strong>, we just have to add<br />

estimated <strong>sources</strong> in each frequency bins...<br />

Just ??? No, due to permutation (and scale) in<strong>de</strong>terminacies !<br />

B(νk) = P(νk)A(νk) −1<br />

Problem if P(νk) = P(νl), k = l<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 234


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

SOS in each frequency bin<br />

SOS in frequency domain<br />

If source are not iid, we can used SOS m<strong>et</strong>hods, exploiting<br />

either coloration or nonstationarity,<br />

In each frequency band νk, the DSP (Fourier transform of the<br />

covariance matrix):<br />

E[X(νk, t)X H (νk, t−τ)] = A(νk)E[S(νk, t)S H (νk, t−τ)]A H (νk), ∀τ<br />

With the notation Sx (νk, τ) = E[X(νk, t)X H (νk, t − τ)], one<br />

obtains similar equations than in the time domain (due to FT<br />

linearity):<br />

Sx (νk, τ) = A(νk)Ss (νk, τ)A H (νk), ∀τ<br />

Joint diagonalization can be used on s<strong>et</strong> of spectral matrices:<br />

Sx (νk, τ), τ = 0, 1, . . . , K for colored <strong>sources</strong>,<br />

Sx ,Wi (νk, 0) = EWi [X(νk, t)X H (νk, t)], computed on time<br />

windows Wi, exploiting source nonstationarity<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 235


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Solving the permutation in<strong>de</strong>terminacies<br />

Many m<strong>et</strong>hods and criteria, exploiting two mains i<strong>de</strong>as:<br />

Continuity of filters<br />

Smooth variation of B(ν):<br />

short impulse response [PS00]<br />

initialisation of B(νk) estimation by ˆB(νk−) [PSB03]<br />

Problem with impulse response with echoes, e.g. if<br />

reverberation<br />

Continuity of <strong>sources</strong><br />

Time continuity of DSP of the estimated <strong>sources</strong>, on<br />

neighboor frequency bins [SP03]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 236


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Results of permutation regularization<br />

Most of the regularization criteria perform well when source<br />

power or channel coefficients are large enough<br />

Basically, it remains<br />

global permutation (not important),<br />

and frequency-bloc permutation, since a regularization failure<br />

in a frequency bins implies usually failure on a block !<br />

Reprinted from PhD <strong>de</strong>fense talk of Bertrand Riv<strong>et</strong><br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 237


Intro Inversibility Time Frequ Introduction SOS Permutation Appli<br />

Applications<br />

Main applications domains of convolutive mixture are:<br />

Telecommunications<br />

Acoustics, audio with speech and music processing<br />

For speech application, refer to the lecture of L. Dau<strong>de</strong>t and<br />

the book [MLS07]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VI.Convolutive Mixtures 238


I<strong>de</strong>ntifiability ICA for NL Discussion<br />

VII. Non Linear Mixtures<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 239


I<strong>de</strong>ntifiability ICA for NL Discussion<br />

25 I<strong>de</strong>ntifiability using ICA<br />

Problem<br />

General results<br />

26 ICA for NL mixtures<br />

General NL<br />

PNL mixtures<br />

Inversion of Wiener system<br />

27 Discussion<br />

Contents of course VII<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 240


Mo<strong>de</strong>l<br />

Question<br />

I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Mo<strong>de</strong>l and question<br />

K noiseless nonlinear mixtures of P in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong><br />

x(t) = A(s(t))<br />

Assuming the mixing mapping A is invertible, is it possible to<br />

estimate an inverse mapping B using in<strong>de</strong>pen<strong>de</strong>nce ?<br />

In other words: output in<strong>de</strong>pen<strong>de</strong>nce ⇔ s source separation ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 241


Un<strong>de</strong>terminacy<br />

I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Darmois’s results<br />

L<strong>et</strong> si and sj be 2 in<strong>de</strong>pen<strong>de</strong>nt random variables, fi(si) and<br />

fj(sj) are in<strong>de</strong>pen<strong>de</strong>nt too<br />

Source separation is achieved if yi = hi(sj), i.e. the global<br />

mapping G = B ◦ A is diagonal, up to a permutation.<br />

Such diagonal (up to a permutation) mappings G will be<br />

<strong>de</strong>fined as trivial mappings<br />

Nonlinear mixtures are non i<strong>de</strong>ntifiable using ICA<br />

It always exists non diagonal nonlinear mixing mappings which<br />

preserve in<strong>de</strong>pen<strong>de</strong>nce<br />

Darmois [Dar53] proposed a general m<strong>et</strong>hod for constructing<br />

such mappings. The i<strong>de</strong>a has then been used by Hyvärinen and<br />

Pajunen [HP99].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 242


Assumptions<br />

I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Darmois’s construction<br />

Consi<strong>de</strong>r the mapping y = G(x) where G is invertible<br />

Then : py (y) = px (x)| <strong>de</strong>t JG(x)| −1<br />

Without loss of generality, one can assume y are uniform in<br />

[0, 1]<br />

Mapping G preserving in<strong>de</strong>pen<strong>de</strong>nce<br />

We are looking for G such that the random vector y is<br />

in<strong>de</strong>pen<strong>de</strong>nt, i.e.<br />

| <strong>de</strong>t JG(x)| −1 = px (x)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 243


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

A possible mapping (1/2)<br />

Chosing the non diagonal mapping G<br />

g1(x) = g1(x1)<br />

g2(x) = g2(x1, x2)<br />

.<br />

gK (x) = gK (x1, . . . , xK )<br />

leads to a diagonal Jacobian<br />

<br />

∂g1<br />

0 . . . 0<br />

∂x1 ∂g2 ∂g2 <br />

0 . . .<br />

∂x1 ∂x2 <br />

<br />

. . . . .<br />

<br />

<br />

. . .<br />

∂gK<br />

∂x1<br />

∂gK<br />

∂x2<br />

∂gK<br />

∂xK<br />

<br />

<br />

<br />

<br />

<br />

<br />

=<br />

<br />

<br />

<br />

∂gi<br />

∂xi i<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 244


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

The last condition implies<br />

A possible mapping (2/2)<br />

∂gi<br />

i ∂xi = px (x) = p(x1)p(x2/x1) . . . p(xK /x1 . . . xK−1)<br />

A possible solution is:<br />

g1(x) = Fx1 (x1)<br />

g2(x) = F x2/x1 (x1, x2)<br />

.<br />

gK (x) = F xK /x1...xK−1 (x1, . . . , xK )<br />

where Fx is the cumulative probability <strong>de</strong>nsity function of x<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 245


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

A simple example [JBZH04]<br />

Consi<strong>de</strong>r 2 in<strong>de</strong>pen<strong>de</strong>nt Gaussian variables x1 and x2 with joint pdf<br />

px (x1, x2) = 1<br />

2π exp − x2 1 +x2 2<br />

2<br />

Consi<strong>de</strong>r the following mapping and its Jacobian<br />

x1 = r cos θ<br />

x2 = r sin θ<br />

J =<br />

<br />

cos θ −r sin θ<br />

sin θ r cos θ<br />

The joint pdf of r and θ is<br />

<br />

prθ(r, θ) =<br />

r<br />

2π exp(−r 2 ) if (r, θ) ∈ R + 0<br />

× [0, 2π]<br />

otherwise<br />

Althought r and θ are <strong>de</strong>pending both of x1 and x2, they are<br />

statistically in<strong>de</strong>pen<strong>de</strong>nt!<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 246


General results<br />

I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Conclusions<br />

Statistical in<strong>de</strong>pen<strong>de</strong>nce is not sufficient for insuring<br />

i<strong>de</strong>ntifiability of NL mixtures<br />

For any <strong>sources</strong>, it exists invertible mappings with non<br />

diagonal Jacobian (i.e. mixing mappings) which preserve<br />

statistical in<strong>de</strong>pen<strong>de</strong>nce<br />

if the mapping can be i<strong>de</strong>ntified, source can be recovered up<br />

to a NL mapping (and permutation)<br />

For overcoming the problem,<br />

one can consi<strong>de</strong>r approaches which reduce the s<strong>et</strong> of nontrivial<br />

mappings which preserve in<strong>de</strong>pen<strong>de</strong>nce<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 247


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Regularization for avoiding nontrivial mappings preserving<br />

in<strong>de</strong>pen<strong>de</strong>nce<br />

Smooth mappings<br />

Restrict B to be a smooth mapping [MA99, TWZ01]<br />

Structurally constrained mappings<br />

Restrict B by structural constraints: post-nonlinear mixtures<br />

[TJ99b], mixtures satisfying addition theorem [KLR73, EK02],<br />

bilinear and linear-quadratic mixtures [HD03, DH09]<br />

Priors on <strong>sources</strong><br />

Consi<strong>de</strong>r priors on <strong>sources</strong> : boun<strong>de</strong>d <strong>sources</strong> [BZJN02],<br />

Markov [LJH04] or colored <strong>sources</strong> [HJ03]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 248


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Almeida’s claim<br />

Smooth mappings<br />

For smooth mappings (e.g. mappings provi<strong>de</strong>d by multi-layer<br />

perceptrons (MLP)), ICA can lead to separation<br />

Counterexample [JBZH04]<br />

Smoothness of the mappings is not a sufficient prior<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 249


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Structural constraints: general results<br />

Trivial mappings: <strong>de</strong>finition<br />

Definition: H is a trivial mapping if it transforms any random<br />

vector with in<strong>de</strong>pen<strong>de</strong>nt components in another random vector<br />

with in<strong>de</strong>pen<strong>de</strong>nt components.<br />

The s<strong>et</strong> of the trivial mappings will be <strong>de</strong>noted Z<br />

Trivial mappings: properties<br />

A trivial mapping is then a mapping preserving in<strong>de</strong>pen<strong>de</strong>nce<br />

for any random vector<br />

It can be shown that a trivial mapping satisfies<br />

Hi(s1, . . . , snbs) = hi(s σ(i)), ∀i = 1, . . . , K<br />

The Jacobian matrix of a trivial mapping is a diagonal matrix<br />

up to a permutation<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 250


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Structural constraints<br />

There is an infinity of nontrivial mappings preserving in<strong>de</strong>pen<strong>de</strong>nce<br />

Constrained mo<strong>de</strong>l of mixtures<br />

If the mapping G = B ◦ A is constrained to belong to a mo<strong>de</strong>l<br />

C, the un<strong>de</strong>terminacies can be reduced, and hopefully cancelled<br />

Consi<strong>de</strong>r Ω = {Fs1 , . . . , FsP }, the s<strong>et</strong> of signal distributions<br />

such that ∃G ∈ C − Z which preserves in<strong>de</strong>pen<strong>de</strong>nce for any<br />

ω ∈ Ω<br />

Ω then contains all the (particular) source distributions which<br />

cannot be separated by mapping belonging to C.<br />

Separation is then possible<br />

(1) for source distributions which do not belong to Ω, (2) with<br />

P.Comon &in<strong>de</strong>terminacies C.Jutten, Peyresq July 2011 in G ∈ ZBSS ∩ C VII.Non Linear Mixtures 251


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Structural constraints<br />

Case of linear memoryless mixtures<br />

C is the s<strong>et</strong> of square matrices<br />

Z ∩ C is the s<strong>et</strong> of square matrices which are the product of a<br />

diagonal matrix and a permutation matrix<br />

Ω is the s<strong>et</strong> of distributions which contain at least 2 Gaussian<br />

<strong>sources</strong> (consequence of the Darmois-Skitovich theorem)<br />

Conclusions<br />

For linear memoryless mixtures, source separation is possible<br />

using statistical in<strong>de</strong>pen<strong>de</strong>nce (1) for <strong>sources</strong> which are not in<br />

Ω (i.e. at most one Gaussian) and (2) with scale and<br />

permutation un<strong>de</strong>terminacies.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 252


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Structural constraints: PNL mixtures<br />

Post-nonlinear (PNL) mixtures<br />

PNL are perticular nonlinear mixtures, which structural<br />

constraints : linear part, following by NL componentwise<br />

mappings.<br />

PNL are realistic enough : linear channel, nonlinear sensors<br />

PNL i<strong>de</strong>ntifiability [TJ99a, AJ05] with suited B<br />

if (1) at most one source is Gaussian, (2) the mixing matrix<br />

has at least two nonzero entries per row and per column, and<br />

(3) the NL mappings fi are invertible and satisfy f ′<br />

i (0) = 0,<br />

then y is in<strong>de</strong>pen<strong>de</strong>nt iff gi ◦ fi is linear and BA = DP<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 253


Comments<br />

I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Structural constraints: PNL mixtures<br />

PNL has the same in<strong>de</strong>terminacies that linear mixtures !<br />

It does not work if A does not mix the <strong>sources</strong> !<br />

PNL is a special case of mixtures satisfying addition theorem<br />

(see below)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 254


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Extension of the D-S theorem<br />

Constrained NL mappings<br />

D-S theorem has been exten<strong>de</strong>d to NL mappings A,<br />

continuous at least separately on each variable, satisfying an<br />

addition theorem [KLR73]<br />

f (x + y) = A[f (x), f (y)] = f (x) ◦ f (y)<br />

Example:<br />

f (.) = tan(.)<br />

A(u, v) = (u + v)/(1 + uv)<br />

tan(x+y) =<br />

tan(x) + tan(y)<br />

1 + tan(x) tan(y)<br />

If (E, ◦) is an Abelian group, one can <strong>de</strong>fine a mutiplication,<br />

<strong>de</strong>noted ∗, and satisfying<br />

f (cx) = c ∗ f (x) or cf −1 (u) = f −1 (c ∗ u)<br />

c1f −1 (u1) + . . . + ckf −1 (uk) = f −1 (c1 ∗ u1 ◦ . . . ◦ ck ∗ uk)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 255


Theorem<br />

I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Constrained NL mappings<br />

L<strong>et</strong> X1, . . . , Xn be in<strong>de</strong>pen<strong>de</strong>nt variables such that<br />

E1 = a1 ∗ X1 ◦ . . . ◦ an ∗ Xn<br />

E2 = b1 ∗ X1 ◦ . . . ◦ bn ∗ Xn<br />

are in<strong>de</strong>pen<strong>de</strong>nt and where ∗ and ◦ satisfy the above<br />

conditions, then f −1 (Xi) is normally distributed if aibi = 0.<br />

The proof is based on applying the D-S theorem to f −1 (E1)<br />

and f −1 (E2):<br />

f −1 (E1) = f −1 (a1 ∗ X1 ◦ . . . ◦ an ∗ Xn) = a1f −1 (X1) + . . . + anf −1 (Xn)<br />

f −1 (E2) = f −1 (b1 ∗ X1 ◦ . . . ◦ bn ∗ Xn) = b1f −1 (X1) + . . . + bnf −1 (Xn)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 256


Comments<br />

I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Constrained NL mappings<br />

These NL mappings are basically linearisable mappings<br />

PNL are particular NL mappings satisfying the addition<br />

theorem<br />

A(u1, u2) =<br />

<br />

f1(a11f −1<br />

1 (u1) + a12f −1<br />

2 (u2))<br />

f2(a21f −1<br />

1 (u1) + a22f −1<br />

2 (u2))<br />

In fact, <strong>de</strong>noting s1 = f −1<br />

1 (u1) and s2 = f −1<br />

2 (u2), one g<strong>et</strong>s<br />

PNL mixtures:<br />

<br />

f1(a11s1<br />

A(f (s1), f (s2)) =<br />

f2(a21s1<br />

+<br />

+<br />

a12s2)<br />

a22s2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 257


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Constraints on <strong>sources</strong>: boun<strong>de</strong>d <strong>sources</strong><br />

PNL with boun<strong>de</strong>d <strong>sources</strong> [BZJN02]<br />

Theorem: A component-wise mapping preserves boundary<br />

linearity iff it is a linear mapping<br />

New proof for i<strong>de</strong>ntifiability and separability of PNL mixtures,<br />

restricted to boun<strong>de</strong>d <strong>sources</strong><br />

Algorithm with in<strong>de</strong>pen<strong>de</strong>nt criteria and estimations for linear<br />

and nonlinear parts of PNL separating structure<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 258


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Constraints on <strong>sources</strong>: temporally correlated <strong>sources</strong><br />

Temporally correlated <strong>sources</strong><br />

Mo<strong>de</strong>lled by AR or Markov mo<strong>de</strong>ls [HJ03]<br />

The s<strong>et</strong> of NL mapping preserving in<strong>de</strong>pen<strong>de</strong>nce is reduced if<br />

<strong>sources</strong> are non iid, due to stronger in<strong>de</strong>pen<strong>de</strong>nce criterion<br />

(in<strong>de</strong>pen<strong>de</strong>nce of random processes)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 259


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Constrained on <strong>sources</strong>: temporally correlated <strong>sources</strong><br />

Darmois’s construction<br />

For 2 mixtures of 2 <strong>sources</strong><br />

y1(t) = g1(x1(t), x2(t)) = FX1 (x1(t))<br />

y2(t) = g2(x1(t), x2(t)) = F X2/X1 (x1(t), x2(t))<br />

If y1(t) and y2(t) are in<strong>de</strong>pen<strong>de</strong>nt,<br />

pY1Y2 (y1(t), y2(t)) = pY1 (y1(t))pY2 (y2(t))<br />

Since <strong>sources</strong> are temporally correlated, one can prove that<br />

generally:<br />

pY1Y2 (y1(t + 1), y2(t)) = pY1 (y1(t + 1))pY2 (y2(t))<br />

i.e. p(FX1 (x1(t + 1), F X2/X1 (x1(t)x2(t)) =<br />

p(FX1 (x1(t + 1))p(F X2/X1 (x1(t), x2(t))<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 260


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Constrained on <strong>sources</strong>: temporally correlated <strong>sources</strong><br />

After some computations, one can show the relation becomes:<br />

pA/BC (a, b, c)pB(b)db = pA(a)<br />

In fact, left-si<strong>de</strong> is a function of variables a and c, while right<br />

si<strong>de</strong> only <strong>de</strong>pends on b<br />

Consequently, the above mapping is generally not a mixing<br />

mappings preserving in<strong>de</strong>pen<strong>de</strong>nce for colored <strong>sources</strong>.<br />

The s<strong>et</strong> of mixing mappings preserving in<strong>de</strong>pen<strong>de</strong>nce is then<br />

reduced for colored <strong>sources</strong>.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 261


I<strong>de</strong>ntifiability ICA for NL Discussion Problem General results<br />

Conclusions<br />

For NL mixtures, generally in<strong>de</strong>pen<strong>de</strong>nce does not insure<br />

i<strong>de</strong>ntifiability and separability<br />

Regularization is required for reducing solutions to trivial<br />

mappings:<br />

Generally, smoothness is not a sufficient constraint<br />

Structural constraints leads to particular NL i<strong>de</strong>ntifiable and<br />

separable mixtures<br />

Priors on the <strong>sources</strong> can reduce the s<strong>et</strong> of solutions<br />

Extention to NL ICA<br />

Are NL mixtures useful ? Theor<strong>et</strong>ical curiosity or practical<br />

applications ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 262


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Mutual information in NL mixtures<br />

Mutual information and its <strong>de</strong>rivative<br />

I (Y) = <br />

i H(Yi)−H(Y) = <br />

i H(Yi)−H(X)+E[log |<strong>de</strong>t JB|]<br />

Estimating equation:<br />

param<strong>et</strong>er vector<br />

∂I (Y)<br />

∂Θ<br />

It then leads to the following equations:<br />

∂I (Y)<br />

∂Θ<br />

= 0, where Θ represents the<br />

∂H(Yi ) ∂E[log|<strong>de</strong>t JB|]<br />

= i ∂Θ + ∂Θ<br />

= <br />

i E[d log pY (yi )<br />

i ∂yi ∂E[log|<strong>de</strong>t JB|]<br />

dyi ∂Θ ] + ∂Θ<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 263<br />

= 0


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Mutual information<br />

Mutual information for PNL mixtures<br />

I (Y) = <br />

H(Yi) − H(Y)<br />

i<br />

= <br />

H(Yi) − H(X) + E[log |<strong>de</strong>t JG|] + E[log |<strong>de</strong>t JB|]<br />

i<br />

= <br />

<br />

<br />

<br />

<br />

∂gi(xi, θi)<br />

<br />

<br />

H(Yi) − H(X) + E[log <br />

]<br />

+ log |<strong>de</strong>t B|]<br />

∂xi <br />

i<br />

i<br />

(45)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 264


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Mutual information for NL mixtures<br />

Derivatives of the mutual information<br />

With respect to B<br />

∂I (Y)<br />

∂B<br />

∂H(Yi ) ∂<br />

= i ∂B −<br />

= <br />

i Ed log pY (yi )<br />

i ∂yi<br />

dyi ∂B<br />

∂B E <br />

<br />

log <br />

i ∂xi<br />

<br />

− BT = 0<br />

With respect to param<strong>et</strong>er of gi’s<br />

∂θ E<br />

k <br />

<br />

log <br />

∂gi (xi ,θi )<br />

<br />

<br />

−<br />

∂I (Y)<br />

∂θ =<br />

k ∂H(Yi )<br />

i ∂θ −<br />

k<br />

∂<br />

∂gi (xi ,θi )<br />

i ∂xi<br />

= <br />

i Ed log pY (yi ) <br />

i ∂yi<br />

∂ log|g<br />

dyi ∂θ − E<br />

k<br />

′<br />

i (xi ,θi )|<br />

∂θk ∂ log|<strong>de</strong>t B|<br />

∂B<br />

<br />

<br />

−<br />

<br />

= 0<br />

∂ log|<strong>de</strong>t B|<br />

∂θk P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 265<br />

= 0<br />

= 0


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Separation of PNL mixtures<br />

The algorithm consists in 3 steps:<br />

estimation of marginal score functions of estimated <strong>sources</strong>,<br />

estimation of the nonlinear functions gi,<br />

estimation of the separationg matrix B,<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 266


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Sources and mixtures<br />

Separation of PNL mixtures<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 267


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Estimated <strong>sources</strong><br />

Separation of PNL mixtures<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 268


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Wiener systems<br />

PNL mixtures and NL <strong>de</strong>convolution<br />

Hammerstein systems<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 269


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Blind inversion of Wiener system<br />

Wiener system is a usual NL mo<strong>de</strong>l in biology, in satellite<br />

communications, <strong>et</strong>c.<br />

Classical approaches<br />

Classical i<strong>de</strong>ntification m<strong>et</strong>hods for nonlinear systems are<br />

based on higher-or<strong>de</strong>r cross-correlations<br />

Usually, input signal is assumed to be iid Gaussian<br />

If the distortion input is available, the compensation of the<br />

nonlinearities is almost straighforward, after i<strong>de</strong>ntification of<br />

the NL<br />

However, in a real world situation, we donŠt know either the<br />

nonlinear system input or the input distribution<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 270


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Param<strong>et</strong>erization<br />

Blind inversion of Wiener system<br />

source: S(t) = [. . . s(t − k + 1)s(t − k)s(t − k − 1) . . .] T<br />

observation: X(t) = [. . . x(t − k + 1)x(t − k)x(t − k − 1) . . .] T<br />

channel:<br />

H =<br />

⎛<br />

⎜<br />

⎝<br />

. . . . . . . . . . . . . . .<br />

. . . h(k − 1) h(k) h(k + 1) . . .<br />

. . . h(k − 2) h(k − 1) h(k) . . .<br />

. . . . . . . . . . . . . . .<br />

Due to iid assumption, S has in<strong>de</strong>pen<strong>de</strong>nt components, Wiener<br />

systems is nothing but an infinite-dimension PNL mixture,<br />

X(t) = f [HS(t)], with a particular (Toeplitz) mixing matrix H.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 271<br />

⎞<br />

⎟<br />


I<strong>de</strong>ntifiability<br />

I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Blind inversion of Wiener system<br />

If the Wiener system satisfies:<br />

Subsystems h and f are unknown and invertible ; h can be a<br />

nonminimum phase filter<br />

The input s(t) is an unknown (a priori) non Gaussian iid<br />

process<br />

it is a PNL mixtures, with a particular Toeplitz mixture matrix<br />

H and with the same NL function f on each channel.<br />

PNL i<strong>de</strong>ntifiability implies Wiener systems inversibility based<br />

on (temporal) sample statistical in<strong>de</strong>pen<strong>de</strong>nce.<br />

PNL separability is only proved for finite dimensions. It is<br />

conjectured for infinite dimensions.<br />

Practically, the filter h(k) and its inverse w(k) are truncated.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 272


I<strong>de</strong>ntifiability ICA for NL Discussion General NL PNL Wiener<br />

Criterion for inversion<br />

Blind inversion of Wiener system<br />

Output of the inversion (Hammerstein) structure:<br />

y(t) = w ∗ z(t) with z(t) = g(x(t))<br />

Y(t) spatially in<strong>de</strong>pen<strong>de</strong>nt ⇔ sequence {y(t)} iid<br />

The mutual information for infinite dimensional stationary<br />

random vectors is <strong>de</strong>fined from entropy rates [CT91]:<br />

H(Y) = lim<br />

T →∞<br />

I (Y) = lim<br />

T →∞<br />

1<br />

H(y(−T ), . . . , y(T ))<br />

2T + 1<br />

1<br />

+T<br />

<br />

H(y(t)) − H(y(−T ), . . . , y(T ))<br />

2T + 1<br />

t=−T<br />

I (Y) is always positive and vanishes if and only if {y(t){ is an iid<br />

sequence<br />

For more <strong>de</strong>tails, see [TSJ01]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 273


I<strong>de</strong>ntifiability ICA for NL Discussion<br />

Algorithms for PNL<br />

Other approaches for NL mixtures<br />

Relatively many algorithms have been <strong>de</strong>signed for PNL<br />

mixtures. See a few references in [CJ10, ch.14].<br />

M<strong>et</strong>hods for smooth mappings<br />

MISEP proposed by Marques and Almeida [MA99]: the<br />

nonlinearity is mo<strong>de</strong>lled by multi-layer perceptrons (MLP)<br />

trained for insuring output in<strong>de</strong>pen<strong>de</strong>nce.<br />

Tan <strong>et</strong> al. [TWZ01] used radial basis functions.<br />

Bayesian approaches<br />

Nonlinear Factor Analysis (NFA) proposed by Helsinki team<br />

(Valpola, Karhunen, Honkela, <strong>et</strong>c.). See <strong>de</strong>tails and references<br />

in [CJ10, ch.14]<br />

Bayesian approach also <strong>de</strong>veloped by Duarte <strong>et</strong> al. [DJM10]<br />

for chemical NL mixtures, with positivity and other constraints<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 274


Conclusions<br />

I<strong>de</strong>ntifiability ICA for NL Discussion<br />

NL mixtures: conclusions and perspectives<br />

Generally, in<strong>de</strong>pen<strong>de</strong>nce is not sufficient for insuring<br />

i<strong>de</strong>ntifiability and separability<br />

In<strong>de</strong>pen<strong>de</strong>nce can insure i<strong>de</strong>ntifibility and separability in<br />

structured NL mixtures. PNL mixtures are particular separable<br />

NL mixtures.<br />

Deconvolution and Wiener system inversion are related to<br />

BSS: time in<strong>de</strong>pen<strong>de</strong>nce (iid) is then related to spatial<br />

in<strong>de</strong>pen<strong>de</strong>nce.<br />

Mutual information or Information rate can then be used for<br />

measuring spatial in<strong>de</strong>pen<strong>de</strong>nce or time in<strong>de</strong>pen<strong>de</strong>nce (iid)<br />

In<strong>de</strong>pen<strong>de</strong>nce is powerful enough for blindly compensate<br />

strong NL distortions (1) in PNL multichannel systems:<br />

satellite antenna, sensors array, <strong>et</strong>c., or (2) in Wiener systems<br />

(NL dynamic SISO)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 275


I<strong>de</strong>ntifiability ICA for NL Discussion<br />

NL mixtures: conclusions and perspectives<br />

Other contributions<br />

Extension to MIMO dynamic NL systems: Convolutive PNL<br />

mixtures [BZJN01]<br />

Bilinear and multi-linear mo<strong>de</strong>ls [HD03]<br />

Separation of NL mixtures with positivity constraints [DJM10]<br />

Practical applications for chemical sensor arrays<br />

[DJM10, DJTB + 10] or scanned mage processing<br />

[Alm05, MBBZJ08] (see in 339 of this course)<br />

Perspectives and open issues<br />

Noisy NL mixtures (problem of noise amplification)<br />

I<strong>de</strong>ntifiability for bilinear and multi-linear mo<strong>de</strong>ls<br />

I<strong>de</strong>ntifiability based on other priors, criteria and mixing<br />

structures<br />

NL dynamic problem, e.g. encoutered in gas sensor arrays<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VII.Non Linear Mixtures 276


Problem Algorithms Biblio SCA<br />

VIII. Un<strong>de</strong>rD<strong>et</strong>ermined mixtures<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 277


Problem Algorithms Biblio SCA<br />

Contents of course VIII<br />

28 Problem formulation<br />

nth Characteristic Functions<br />

I<strong>de</strong>ntifiability<br />

Cumulants<br />

29 Algorithms<br />

Symm<strong>et</strong>ric tensors of dim. 2<br />

BIOME<br />

Foobi<br />

Algorithms based on CF<br />

D<strong>et</strong>erministic approaches<br />

30 Bibliography<br />

31 Sparse Component Analysis<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 278


Again<br />

First c.f.<br />

Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

Again<br />

First c.f.<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{ej tx } = <br />

u ej tu dFx(u)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

Again<br />

First c.f.<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{ej tx } = <br />

u ej tu dFx(u)<br />

Real Multivariate: Φx (t) <strong>de</strong>f<br />

= E{ej t Tx <br />

} = u ej t Tu dFx (u)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

Again<br />

First c.f.<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{ej tx } = <br />

u ej tu dFx(u)<br />

Real Multivariate: Φx (t) <strong>de</strong>f<br />

= E{ej t Tx <br />

} = u ej t Tu dFx (u)<br />

Second c.f.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

Again<br />

First c.f.<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{ej tx } = <br />

u ej tu dFx(u)<br />

Real Multivariate: Φx (t) <strong>de</strong>f<br />

= E{ej t Tx <br />

} = u ej t Tu dFx (u)<br />

Second c.f.<br />

Ψ(t) <strong>de</strong>f<br />

= log Φ(t)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

Again<br />

First c.f.<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{ej tx } = <br />

u ej tu dFx(u)<br />

Real Multivariate: Φx (t) <strong>de</strong>f<br />

= E{ej t Tx <br />

} = u ej t Tu dFx (u)<br />

Second c.f.<br />

Ψ(t) <strong>de</strong>f<br />

= log Φ(t)<br />

Properties:<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

Again<br />

First c.f.<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{ej tx } = <br />

u ej tu dFx(u)<br />

Real Multivariate: Φx (t) <strong>de</strong>f<br />

= E{ej t Tx <br />

} = u ej t Tu dFx (u)<br />

Second c.f.<br />

Ψ(t) <strong>de</strong>f<br />

= log Φ(t)<br />

Properties:<br />

Always exists in the neighborhood of 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions<br />

Again<br />

First c.f.<br />

Real Scalar: Φx(t) <strong>de</strong>f<br />

= E{ej tx } = <br />

u ej tu dFx(u)<br />

Real Multivariate: Φx (t) <strong>de</strong>f<br />

= E{ej t Tx <br />

} = u ej t Tu dFx (u)<br />

Second c.f.<br />

Ψ(t) <strong>de</strong>f<br />

= log Φ(t)<br />

Properties:<br />

Always exists in the neighborhood of 0<br />

Uniquely <strong>de</strong>fined as long as Φ(t) = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 279


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions (cont’d)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions (cont’d)<br />

Properties of the 2nd Characteristic function (cont’d):<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions (cont’d)<br />

Properties of the 2nd Characteristic function (cont’d):<br />

if a c.f. Ψx(t) is a polynomial, then its <strong>de</strong>gree is at most 2 and<br />

x is Gaussian (Marcinkiewicz’1938) [Luka70]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions (cont’d)<br />

Properties of the 2nd Characteristic function (cont’d):<br />

if a c.f. Ψx(t) is a polynomial, then its <strong>de</strong>gree is at most 2 and<br />

x is Gaussian (Marcinkiewicz’1938) [Luka70]<br />

if (x, y) statistically in<strong>de</strong>pen<strong>de</strong>nt, then<br />

Ψx,y (u, v) = Ψx(u) + Ψy (v) (46)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280


Proof.<br />

Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Characteristic functions (cont’d)<br />

Properties of the 2nd Characteristic function (cont’d):<br />

if a c.f. Ψx(t) is a polynomial, then its <strong>de</strong>gree is at most 2 and<br />

x is Gaussian (Marcinkiewicz’1938) [Luka70]<br />

if (x, y) statistically in<strong>de</strong>pen<strong>de</strong>nt, then<br />

Ψx,y (u, v) = Ψx(u) + Ψy (v) (46)<br />

Ψx,y (u, v) = log[E{exp(ux + vy)}]<br />

= log[E{exp(ux)} E{exp(vy)}].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 280


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Problem posed in terms of Characteristic Functions<br />

Proof.<br />

If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />

Ψx(u) = <br />

<br />

<br />

(47)<br />

p<br />

Ψsp<br />

q<br />

uq Aqp<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Problem posed in terms of Characteristic Functions<br />

Proof.<br />

If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />

Ψx(u) = <br />

<br />

<br />

(47)<br />

p<br />

Ψsp<br />

Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />

q<br />

uq Aqp<br />

Φx(u) <strong>de</strong>f<br />

= E{exp(u T A s)} = E{exp( <br />

p,q<br />

uq Aqp sp)}<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Problem posed in terms of Characteristic Functions<br />

Proof.<br />

If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />

Ψx(u) = <br />

<br />

<br />

(47)<br />

p<br />

Ψsp<br />

Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />

q<br />

uq Aqp<br />

Φx(u) <strong>de</strong>f<br />

= E{exp(u T A s)} = E{exp( <br />

p,q<br />

uq Aqp sp)}<br />

Since sp in<strong>de</strong>pen<strong>de</strong>nt, Φx(u) = <br />

p E{exp(<br />

q uq Aqp sp)}<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Problem posed in terms of Characteristic Functions<br />

Proof.<br />

If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />

Ψx(u) = <br />

<br />

<br />

(47)<br />

p<br />

Ψsp<br />

Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />

q<br />

uq Aqp<br />

Φx(u) <strong>de</strong>f<br />

= E{exp(u T A s)} = E{exp( <br />

p,q<br />

uq Aqp sp)}<br />

Since sp in<strong>de</strong>pen<strong>de</strong>nt, Φx(u) = <br />

p E{exp(<br />

q uq Aqp sp)}<br />

Taking the log conclu<strong>de</strong>s.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Problem posed in terms of Characteristic Functions<br />

Proof.<br />

If sp in<strong>de</strong>pen<strong>de</strong>nt and x = A s, we have the core equation:<br />

Ψx(u) = <br />

<br />

<br />

(47)<br />

p<br />

Ψsp<br />

Plug x = A s, in <strong>de</strong>finition of Φx and g<strong>et</strong><br />

q<br />

uq Aqp<br />

Φx(u) <strong>de</strong>f<br />

= E{exp(u T A s)} = E{exp( <br />

p,q<br />

uq Aqp sp)}<br />

Since sp in<strong>de</strong>pen<strong>de</strong>nt, Φx(u) = <br />

p E{exp(<br />

q uq Aqp sp)}<br />

Taking the log conclu<strong>de</strong>s.<br />

Problem: Decompose a mutlivariate function into a sum of<br />

univariate ones<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 281


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Darmois-Skitovich theorem (1953)<br />

Theorem<br />

L<strong>et</strong> si be statistically in<strong>de</strong>pen<strong>de</strong>nt random variables, and two linear<br />

statistics:<br />

y1 = <br />

ai si and y2 = <br />

i<br />

i<br />

bi si<br />

If y1 and y2 are statistically in<strong>de</strong>pen<strong>de</strong>nt, then random variables sk<br />

for which ak bk = 0 are Gaussian.<br />

NB: holds in both R or C<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 282


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

L<strong>et</strong> characteristic functions<br />

Sk<strong>et</strong>ch of proof<br />

Ψ1,2(u, v) = log E{exp(j y1 u + j y2 v)}<br />

Ψk(w) = log E{exp(j yk w)}<br />

ϕp(w) = log E{exp(j sp w)}<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 283


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

L<strong>et</strong> characteristic functions<br />

Sk<strong>et</strong>ch of proof<br />

Ψ1,2(u, v) = log E{exp(j y1 u + j y2 v)}<br />

Ψk(w) = log E{exp(j yk w)}<br />

ϕp(w) = log E{exp(j sp w)}<br />

1 In<strong>de</strong>pen<strong>de</strong>nce b<strong>et</strong>ween sp’s implies:<br />

Ψ1,2(u, v) = P<br />

k=1 ϕk(u ak + v bk)<br />

Ψ1(u) = P<br />

k=1 ϕk(u ak)<br />

Ψ2(v) = P<br />

k=1 ϕk(v bk)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 283


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

L<strong>et</strong> characteristic functions<br />

Sk<strong>et</strong>ch of proof<br />

Ψ1,2(u, v) = log E{exp(j y1 u + j y2 v)}<br />

Ψk(w) = log E{exp(j yk w)}<br />

ϕp(w) = log E{exp(j sp w)}<br />

1 In<strong>de</strong>pen<strong>de</strong>nce b<strong>et</strong>ween sp’s implies:<br />

Ψ1,2(u, v) = P<br />

k=1 ϕk(u ak + v bk)<br />

Ψ1(u) = P<br />

k=1 ϕk(u ak)<br />

Ψ2(v) = P<br />

k=1 ϕk(v bk)<br />

2 In<strong>de</strong>pen<strong>de</strong>nce b<strong>et</strong>ween y1 and y2 implies<br />

Ψ1,2(u, v) = Ψ1(u) + Ψ2(v)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 283


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Does not restrict generality to assume that [ak, bk] not<br />

collinear. To simplify, assume also ϕp differentiable.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Does not restrict generality to assume that [ak, bk] not<br />

collinear. To simplify, assume also ϕp differentiable.<br />

3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />

Trivial for terms for which akbk = 0.<br />

From now on, restrict the sum to terms akbk = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Does not restrict generality to assume that [ak, bk] not<br />

collinear. To simplify, assume also ϕp differentiable.<br />

3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />

Trivial for terms for which akbk = 0.<br />

From now on, restrict the sum to terms akbk = 0<br />

4 Write this at u + α/aP and v − α/bP:<br />

P<br />

k=1<br />

ϕk<br />

<br />

u ak + v bk + α( ak<br />

aP<br />

− bk<br />

<br />

) = f (u) + g(v)<br />

bP<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Does not restrict generality to assume that [ak, bk] not<br />

collinear. To simplify, assume also ϕp differentiable.<br />

3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />

Trivial for terms for which akbk = 0.<br />

From now on, restrict the sum to terms akbk = 0<br />

4 Write this at u + α/aP and v − α/bP:<br />

P<br />

k=1<br />

ϕk<br />

<br />

u ak + v bk + α( ak<br />

aP<br />

− bk<br />

<br />

) = f (u) + g(v)<br />

bP<br />

5 Subtract to cancel Pth term, divi<strong>de</strong> by α, and l<strong>et</strong> α → 0:<br />

P−1 <br />

k=1<br />

( ak<br />

aP<br />

− bk<br />

) ϕ<br />

bP<br />

(1)<br />

k (u ak + v bk) = f (1) (u) + g (1) (v)<br />

for some univariate functions f (1) (u) and g (1) (u).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Does not restrict generality to assume that [ak, bk] not<br />

collinear. To simplify, assume also ϕp differentiable.<br />

3 Hence P k=1 ϕp(u ak + v bk) = P k=1 ϕk(u ak) + ϕk(v bk)<br />

Trivial for terms for which akbk = 0.<br />

From now on, restrict the sum to terms akbk = 0<br />

4 Write this at u + α/aP and v − α/bP:<br />

P<br />

k=1<br />

ϕk<br />

<br />

u ak + v bk + α( ak<br />

aP<br />

− bk<br />

<br />

) = f (u) + g(v)<br />

bP<br />

5 Subtract to cancel Pth term, divi<strong>de</strong> by α, and l<strong>et</strong> α → 0:<br />

P−1 <br />

k=1<br />

( ak<br />

aP<br />

− bk<br />

) ϕ<br />

bP<br />

(1)<br />

k (u ak + v bk) = f (1) (u) + g (1) (v)<br />

for some univariate functions f (1) (u) and g (1) (u).<br />

Conclusion: We have one term less<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 284


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />

P<br />

(<br />

j=2<br />

a1<br />

aj<br />

− b1<br />

bj<br />

) ϕ (P−1)<br />

1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />

P<br />

(<br />

j=2<br />

a1<br />

aj<br />

− b1<br />

bj<br />

) ϕ (P−1)<br />

1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />

7 Hence ϕ (P−1)<br />

1 (u a1 + v b1) is linear, as a sum of two univariate<br />

functions (ϕ (P)<br />

1 is a constant because a1b1 = 0).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />

P<br />

(<br />

j=2<br />

a1<br />

aj<br />

− b1<br />

bj<br />

) ϕ (P−1)<br />

1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />

7 Hence ϕ (P−1)<br />

1 (u a1 + v b1) is linear, as a sum of two univariate<br />

functions (ϕ (P)<br />

1 is a constant because a1b1 = 0).<br />

8 Eventually ϕ1 is a polynomial.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />

P<br />

(<br />

j=2<br />

a1<br />

aj<br />

− b1<br />

bj<br />

) ϕ (P−1)<br />

1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />

7 Hence ϕ (P−1)<br />

1 (u a1 + v b1) is linear, as a sum of two univariate<br />

functions (ϕ (P)<br />

1 is a constant because a1b1 = 0).<br />

8 Eventually ϕ1 is a polynomial.<br />

9 Lastly invoke Marcinkiewicz theorem to conclu<strong>de</strong> that s1 is<br />

Gaussian.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />

P<br />

(<br />

j=2<br />

a1<br />

aj<br />

− b1<br />

bj<br />

) ϕ (P−1)<br />

1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />

7 Hence ϕ (P−1)<br />

1 (u a1 + v b1) is linear, as a sum of two univariate<br />

functions (ϕ (P)<br />

1 is a constant because a1b1 = 0).<br />

8 Eventually ϕ1 is a polynomial.<br />

9 Lastly invoke Marcinkiewicz theorem to conclu<strong>de</strong> that s1 is<br />

Gaussian.<br />

10 Same is true for any ϕp such that apbp = 0: sp is Gaussian.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

6 Repeat the procedure (P − 1) times and g<strong>et</strong>:<br />

P<br />

(<br />

j=2<br />

a1<br />

aj<br />

− b1<br />

bj<br />

) ϕ (P−1)<br />

1 (u a1 + v b1) = f (P−1) (u) + g (P−1) (v)<br />

7 Hence ϕ (P−1)<br />

1 (u a1 + v b1) is linear, as a sum of two univariate<br />

functions (ϕ (P)<br />

1 is a constant because a1b1 = 0).<br />

8 Eventually ϕ1 is a polynomial.<br />

9 Lastly invoke Marcinkiewicz theorem to conclu<strong>de</strong> that s1 is<br />

Gaussian.<br />

10 Same is true for any ϕp such that apbp = 0: sp is Gaussian.<br />

NB: also holds if ϕp not differentiable<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 285


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

L<strong>et</strong> y admit two representations<br />

Equivalent representations<br />

y = A s and y = B z<br />

where s (resp. z) have in<strong>de</strong>pen<strong>de</strong>nt components, and A (resp. B)<br />

have pairwise noncollinear columns.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 286


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

L<strong>et</strong> y admit two representations<br />

Equivalent representations<br />

y = A s and y = B z<br />

where s (resp. z) have in<strong>de</strong>pen<strong>de</strong>nt components, and A (resp. B)<br />

have pairwise noncollinear columns.<br />

These representations are equivalent if every column of A is<br />

proportional to some column of B, and vice versa.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 286


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

L<strong>et</strong> y admit two representations<br />

Equivalent representations<br />

y = A s and y = B z<br />

where s (resp. z) have in<strong>de</strong>pen<strong>de</strong>nt components, and A (resp. B)<br />

have pairwise noncollinear columns.<br />

These representations are equivalent if every column of A is<br />

proportional to some column of B, and vice versa.<br />

If all representations of y are equivalent, they are said to be<br />

essentially unique (permutation & scale ambiguities only).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 286


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

I<strong>de</strong>ntifiability & uniqueness theorems S<br />

L<strong>et</strong> y be a random vector of the form y = A s, where sp are<br />

in<strong>de</strong>pen<strong>de</strong>nt, and A has non pairwise collinear columns.<br />

I<strong>de</strong>ntifiability theorem y can be represented as<br />

y = A1 s1 + A2 s2, where s1 is non Gaussian, s2 is Gaussian<br />

in<strong>de</strong>pen<strong>de</strong>nt of s1, and A1 is essentially unique.<br />

Uniqueness theorem If in addition the columns of A1 are<br />

linearly in<strong>de</strong>pen<strong>de</strong>nt, then the distribution of s1 is unique up<br />

to scale and location in<strong>de</strong>terminacies.<br />

Remark 1: if s2 is 1-dimensional, then A2 is also essentially unique<br />

Remark 2: the proofs are not constructive [KLR73]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 287


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Example of uniqueness S<br />

L<strong>et</strong> si be in<strong>de</strong>pen<strong>de</strong>nt with no Gaussian component, and bi be<br />

in<strong>de</strong>pen<strong>de</strong>nt Gaussian. Then the linear mo<strong>de</strong>l below is i<strong>de</strong>ntifiable,<br />

but A2 is not essentially unique whereas A1 is:<br />

<br />

<br />

<br />

s1 + s2 + 2 b1<br />

b1<br />

b1 + b2<br />

with<br />

s1 + 2 b2<br />

A1 =<br />

1 1<br />

1 0<br />

= A1 s +A2<br />

<br />

2 0<br />

, A2 =<br />

0 2<br />

b2<br />

<br />

= A1 s +A3<br />

and A3 =<br />

Hence the distribution of s is essentially unique.<br />

But (A1, A2) not equivalent to (A1, A3).<br />

b1 − b2<br />

1 1<br />

1 −1<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 288


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Example of non uniqueness S<br />

L<strong>et</strong> si be in<strong>de</strong>pen<strong>de</strong>nt with no Gaussian component, and bi be<br />

in<strong>de</strong>pen<strong>de</strong>nt Gaussian. Then the linear mo<strong>de</strong>l below is i<strong>de</strong>ntifiable,<br />

but the distribution of s is not unique because a 2 × 4 matrix<br />

cannot be full column rank:<br />

⎛<br />

⎞ ⎛<br />

s1 + s3 + s4 + 2 b1<br />

with<br />

s2 + s3 − s4 + 2 b2<br />

<br />

⎜<br />

= A ⎜<br />

⎝<br />

A =<br />

s1<br />

s2<br />

s3 + b1 + b2<br />

s4 + b1 − b2<br />

1 0 1 1<br />

0 1 1 −1<br />

⎟ ⎜<br />

⎟<br />

⎠ = A ⎜<br />

⎝<br />

s1 + 2 b1<br />

s2 + 2 b2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 289<br />

<br />

s3<br />

s4<br />

⎞<br />

⎟<br />


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

CP of Cumulant tensor<br />

If we take the dth <strong>de</strong>rivative of the core equation (47) at u = 0, we<br />

g<strong>et</strong> a relation b<strong>et</strong>ween cumulants of or<strong>de</strong>r d.<br />

For instance at or<strong>de</strong>r d = 4 in the real case:<br />

Cx,ijkℓ = <br />

p<br />

AipAjpAkpAℓp κp<br />

➽ This can be noticed to be of the same form as the CP (20) in<br />

sli<strong>de</strong>s 55-60.<br />

(48)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 290


Problem Algorithms Biblio SCA nCF I<strong>de</strong>ntif CumTens<br />

Link with polynomials<br />

Symm<strong>et</strong>ric tensors of or<strong>de</strong>r d and dim N are bijectively<br />

mapped to homogeneous polynomials of <strong>de</strong>gree d in N<br />

variables. For instance if d = 3:<br />

p(x) = <br />

ijk<br />

Tijk xi xj xk<br />

Third or<strong>de</strong>r tensors of dimension I × J × K are bijectively<br />

related to trilinear forms in I + J + K variables:<br />

p(x, y, z) = <br />

i,j,k<br />

Tijk xi yj zk<br />

➽ Hence, <strong>de</strong>composing a polynomial is equivalent to <strong>de</strong>composing<br />

a tensor.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 291


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Tenseurs symétriques <strong>de</strong> dim.2<br />

Binary forms of <strong>de</strong>gree d<br />

James Joseph Sylvester (1814–1897)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 292


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem<br />

Sylvester’s theorem in R (1886)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem<br />

Sylvester’s theorem in R (1886)<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written in R[x1, x2] as a sum of dth powers of r distinct linear<br />

forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem<br />

Sylvester’s theorem in R (1886)<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written in R[x1, x2] as a sum of dth powers of r distinct linear<br />

forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤ ⎡ ⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎦ ⎣<br />

g0<br />

g1<br />

.<br />

gr<br />

⎥ = 0. (49)<br />

⎦<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem<br />

Sylvester’s theorem in R (1886)<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written in R[x1, x2] as a sum of dth powers of r distinct linear<br />

forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤ ⎡ ⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎦ ⎣<br />

2 q(x1, x2) <strong>de</strong>f<br />

= r ℓ=0 gℓ x ℓ r−ℓ<br />

1 x2 has r distinct real roots<br />

g0<br />

g1<br />

.<br />

gr<br />

⎥ = 0. (49)<br />

⎦<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem<br />

Sylvester’s theorem in R (1886)<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written in R[x1, x2] as a sum of dth powers of r distinct linear<br />

forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤ ⎡ ⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎦ ⎣<br />

2 q(x1, x2) <strong>de</strong>f<br />

= r ℓ=0 gℓ x ℓ r−ℓ<br />

1 x2 has r distinct real roots<br />

g0<br />

g1<br />

.<br />

gr<br />

⎥ = 0. (49)<br />

⎦<br />

Then q(x1, x2) <strong>de</strong>f<br />

= r<br />

j=1 (βj x1 − αj x2) yields the r forms<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem<br />

Sylvester’s theorem in R (1886)<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written in R[x1, x2] as a sum of dth powers of r distinct linear<br />

forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤ ⎡ ⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎦ ⎣<br />

2 q(x1, x2) <strong>de</strong>f<br />

= r ℓ=0 gℓ x ℓ r−ℓ<br />

1 x2 has r distinct real roots<br />

g0<br />

g1<br />

.<br />

gr<br />

⎥ = 0. (49)<br />

⎦<br />

Then q(x1, x2) <strong>de</strong>f<br />

= r<br />

j=1 (βj x1 − αj x2) yields the r forms<br />

Valid even in non generic cases.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 293


Lemma<br />

Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Proof of Sylvester’s theorem (1)<br />

For homogeneous polynomials of <strong>de</strong>gree d param<strong>et</strong>erized as<br />

p(x) <strong>de</strong>f<br />

= <br />

|i|=d c(i) γ(i; p) x i , <strong>de</strong>fine the apolar scalar product:<br />

〈p, q〉 = <br />

c(i) γ(i; p) γ(i; q)<br />

|i|=d<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 294


Lemma<br />

Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Proof of Sylvester’s theorem (1)<br />

For homogeneous polynomials of <strong>de</strong>gree d param<strong>et</strong>erized as<br />

p(x) <strong>de</strong>f<br />

= <br />

|i|=d c(i) γ(i; p) x i , <strong>de</strong>fine the apolar scalar product:<br />

〈p, q〉 = <br />

c(i) γ(i; p) γ(i; q)<br />

|i|=d<br />

Then L(x) <strong>de</strong>f<br />

= a T x ⇒ 〈p, L d 〉 = <br />

|i|=d c(i) γ(i; p) ai = p(a)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 294


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Proof of Sylvester’s theorem (2)<br />

1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />

L<strong>et</strong> q(x) <strong>de</strong>f<br />

= r<br />

j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Proof of Sylvester’s theorem (2)<br />

1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />

L<strong>et</strong> q(x) <strong>de</strong>f<br />

= r<br />

j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />

2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />

〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Proof of Sylvester’s theorem (2)<br />

1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />

L<strong>et</strong> q(x) <strong>de</strong>f<br />

= r<br />

j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />

2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />

〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />

d−r−µ<br />

x2 ,<br />

1 ≤ µ ≤ d − r, and <strong>de</strong>note gℓ coefficients of q:<br />

3 Take for instance polynomials mµ(x) = x µ<br />

1<br />

〈mµq, t〉 = 0 ⇒<br />

r<br />

gℓ γℓ+µ = 0<br />

ℓ=0<br />

This is exactly (49) expressed in canonical basis<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Proof of Sylvester’s theorem (2)<br />

1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />

L<strong>et</strong> q(x) <strong>de</strong>f<br />

= r<br />

j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />

2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />

〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />

d−r−µ<br />

x2 ,<br />

1 ≤ µ ≤ d − r, and <strong>de</strong>note gℓ coefficients of q:<br />

3 Take for instance polynomials mµ(x) = x µ<br />

1<br />

〈mµq, t〉 = 0 ⇒<br />

r<br />

gℓ γℓ+µ = 0<br />

ℓ=0<br />

This is exactly (49) expressed in canonical basis<br />

4 Roots of q(x1, x2) are distinct real since forms Lj are.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Proof of Sylvester’s theorem (2)<br />

1 Assume the r distinct linear forms Lj = αjx1 + βjx2 are given.<br />

L<strong>et</strong> q(x) <strong>de</strong>f<br />

= r<br />

j=1 (βjx1 − αjx2). Then q(αj, βj) = 0, ∀j.<br />

2 Hence from lemma, ∀m(x) of <strong>de</strong>gree d − r,<br />

〈mq, L d j 〉 = mq(aj) = 0, and 〈mq, t〉 = 0.<br />

d−r−µ<br />

x2 ,<br />

1 ≤ µ ≤ d − r, and <strong>de</strong>note gℓ coefficients of q:<br />

3 Take for instance polynomials mµ(x) = x µ<br />

1<br />

〈mµq, t〉 = 0 ⇒<br />

r<br />

gℓ γℓ+µ = 0<br />

ℓ=0<br />

This is exactly (49) expressed in canonical basis<br />

4 Roots of q(x1, x2) are distinct real since forms Lj are.<br />

5 Reasoning goes also backwards<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 295


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Algorithm for dth or<strong>de</strong>r symm<strong>et</strong>ric tensors of dimension 2<br />

Start with r = 1 (d × 2 matrix) and increase r until it looses its<br />

column rank and until roots are distinct<br />

1 2<br />

2 3<br />

3 4<br />

4 5<br />

5 6<br />

6 7<br />

7 8<br />

−→<br />

1 2 3<br />

2 3 4<br />

3 4 5<br />

4 5 6<br />

5 6 7<br />

6 7 8<br />

−→<br />

1 2 3 4<br />

2 3 4 5<br />

3 4 5 6<br />

4 5 6 7<br />

5 6 7 8<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 296


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Decomposition of maximal rank: x 1 x d−1<br />

2<br />

1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />

[0, 0, . . . 0, 1, 0] g = 0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Decomposition of maximal rank: x 1 x d−1<br />

2<br />

1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />

[0, 0, . . . 0, 1, 0] g = 0<br />

2 Find (αi, βi) such that q(x1, x1) = d<br />

j=1 (βjx1 − αjx2)<br />

<strong>de</strong>f d = ℓ=0 gℓ x ℓ r−ℓ<br />

1 x2 has d distinct roots<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Decomposition of maximal rank: x 1 x d−1<br />

2<br />

1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />

[0, 0, . . . 0, 1, 0] g = 0<br />

2 Find (αi, βi) such that q(x1, x1) = d<br />

j=1 (βjx1 − αjx2)<br />

<strong>de</strong>f<br />

= d<br />

ℓ=0 gℓ x ℓ 1<br />

x r−ℓ<br />

2<br />

has d distinct roots<br />

3 Take αj = 1. Then gd−1 = 0 just means βj = 0<br />

Choose arbitrarily such distinct βj’s<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Decomposition of maximal rank: x 1 x d−1<br />

2<br />

1 Maximal rank r = d when (49) reduces to a 1-row matrix:<br />

[0, 0, . . . 0, 1, 0] g = 0<br />

2 Find (αi, βi) such that q(x1, x1) = d<br />

j=1 (βjx1 − αjx2)<br />

<strong>de</strong>f<br />

= d<br />

ℓ=0 gℓ x ℓ 1<br />

x r−ℓ<br />

2<br />

has d distinct roots<br />

3 Take αj = 1. Then gd−1 = 0 just means βj = 0<br />

Choose arbitrarily such distinct βj’s<br />

4 Compute λj’s by solving the Van <strong>de</strong>r Mon<strong>de</strong> linear system:<br />

⎡<br />

⎢<br />

⎣<br />

1 . . . 1<br />

β1 . . . βd<br />

β 2 1 . . . β 2 d<br />

: : :<br />

β d 1 . . . β d d<br />

⎤<br />

⎡<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎥ λ = ⎢<br />

⎦ ⎣<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 297<br />

0<br />

:<br />

0<br />

1<br />

0<br />

⎤<br />

⎥<br />


Algorithm<br />

Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Summary<br />

Sylvester’s theorem allows to compute the <strong>de</strong>composition,<br />

even in non generic cases: [BCMT10] [CGLM08] [CM96].<br />

Rank of binary quantics<br />

A binary quantic of odd <strong>de</strong>gree 2n + 1 has generic rank n + 1<br />

A binary quantic of even <strong>de</strong>gree 2n generic rank n + 1<br />

A binary quantic of <strong>de</strong>gree d may reach maximal rank d.<br />

Orbit of maximal rank is x1 x d−1<br />

2 .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 298


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem in C S<br />

Similarly, Sylvester’s theorem in C<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem in C S<br />

Similarly, Sylvester’s theorem in C<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written as a sum of dth powers of r distinct linear forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem in C S<br />

Similarly, Sylvester’s theorem in C<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written as a sum of dth powers of r distinct linear forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ g = 0.<br />

⎦<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem in C S<br />

Similarly, Sylvester’s theorem in C<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written as a sum of dth powers of r distinct linear forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ g = 0.<br />

⎦<br />

2 q(x1, x2) <strong>de</strong>f<br />

= r ℓ=0 gℓ x ℓ r−ℓ<br />

1 x2 has r distinct roots<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem in C S<br />

Similarly, Sylvester’s theorem in C<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written as a sum of dth powers of r distinct linear forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ g = 0.<br />

⎦<br />

2 q(x1, x2) <strong>de</strong>f<br />

= r ℓ=0 gℓ x ℓ r−ℓ<br />

1 x2 has r distinct roots<br />

Then q(x1, x2) <strong>de</strong>f<br />

= r<br />

j=1 (β∗ j x1 − α ∗ j x2) yields the r forms<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Sylvester’s theorem in C S<br />

Similarly, Sylvester’s theorem in C<br />

A binary quantic t(x1, x2) = d i=0 c(i) γi x i d−i<br />

1 x2 can be<br />

written as a sum of dth powers of r distinct linear forms:<br />

t(x1, x2) = r j=1 λj (αj x1 + βj x2) d if and only if:<br />

1 there exists a vector g of dimension r + 1 such that<br />

⎡<br />

⎤<br />

⎢<br />

⎣<br />

γ0 γ1 · · · γr<br />

γ1 γ2 · · · γr+1<br />

.<br />

γd−r · · · γd<br />

.<br />

⎥ g = 0.<br />

⎦<br />

2 q(x1, x2) <strong>de</strong>f<br />

= r ℓ=0 gℓ x ℓ r−ℓ<br />

1 x2 has r distinct roots<br />

Then q(x1, x2) <strong>de</strong>f<br />

= r<br />

j=1 (β∗ j x1 − α ∗ j x2) yields the r forms<br />

Valid even in non generic cases.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 299


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

BIOME algorithms<br />

These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />

2r > 4<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

BIOME algorithms<br />

These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />

2r > 4<br />

Related to symm<strong>et</strong>ric flattening introduced in previous lectures<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

BIOME algorithms<br />

These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />

2r > 4<br />

Related to symm<strong>et</strong>ric flattening introduced in previous lectures<br />

We take the case 2r = 6 for the presentation, and <strong>de</strong>note<br />

C ℓmn<br />

ijk<br />

<strong>de</strong>f<br />

= Cum{xi, xj, xk, x ∗ l , x ∗ m, x ∗ n} (50)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

BIOME algorithms<br />

These algorithms work with a cumulant tensor of even or<strong>de</strong>r<br />

2r > 4<br />

Related to symm<strong>et</strong>ric flattening introduced in previous lectures<br />

We take the case 2r = 6 for the presentation, and <strong>de</strong>note<br />

C ℓmn<br />

ijk<br />

In that case, we have<br />

C ℓmn<br />

x, ijk =<br />

<strong>de</strong>f<br />

= Cum{xi, xj, xk, x ∗ l , x ∗ m, x ∗ n} (50)<br />

P<br />

p=1<br />

Aip Ajp Akp A ∗ ℓp A∗ mp A ∗ np ∆p<br />

where ∆p <strong>de</strong>f<br />

= Cum{sp, sp, sp, s ∗ p, s ∗ p, s ∗ p} <strong>de</strong>note the diagonal<br />

entries of a P × P diagonal matrix, ∆ (6)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 300


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Writing in matrix form<br />

Tensor Cx is of dimensions K × K × K × K × K × K and<br />

enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Writing in matrix form<br />

Tensor Cx is of dimensions K × K × K × K × K × K and<br />

enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />

Tensor Cx can be stored in a K 3 × K 3 Hermitian matrix, C (6)<br />

x ,<br />

called the hexacovariance. With an appropriate storage of the<br />

tensor entries, we have<br />

C (6)<br />

x = A ⊙3 ∆ (6) A ⊙3H<br />

(51)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Writing in matrix form<br />

Tensor Cx is of dimensions K × K × K × K × K × K and<br />

enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />

Tensor Cx can be stored in a K 3 × K 3 Hermitian matrix, C (6)<br />

x ,<br />

called the hexacovariance. With an appropriate storage of the<br />

tensor entries, we have<br />

C (6)<br />

x = A ⊙3 ∆ (6) A ⊙3H<br />

Because C (6)<br />

x is Hermitian, ∃V unitary, such that<br />

(51)<br />

(C (6)<br />

x ) 1/2 = A ⊙3 (∆ (6) ) 1/2 V (52)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Writing in matrix form<br />

Tensor Cx is of dimensions K × K × K × K × K × K and<br />

enjoys symm<strong>et</strong>ries and Hermitian symm<strong>et</strong>ries.<br />

Tensor Cx can be stored in a K 3 × K 3 Hermitian matrix, C (6)<br />

x ,<br />

called the hexacovariance. With an appropriate storage of the<br />

tensor entries, we have<br />

C (6)<br />

x = A ⊙3 ∆ (6) A ⊙3H<br />

Because C (6)<br />

x is Hermitian, ∃V unitary, such that<br />

(51)<br />

(C (6)<br />

x ) 1/2 = A ⊙3 (∆ (6) ) 1/2 V (52)<br />

I<strong>de</strong>a: Use redundancy existing b<strong>et</strong>ween blocks of (C (6)<br />

x ) 1/2 .<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 301


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Using the invariance to estimate V<br />

1 Cut the K 3 × P matrix (C (6)<br />

x ) 1/2 into K blocks of size K 2 × P.<br />

Each of these blocks, Γ[n], satisfies:<br />

Γ[n] = (A ⊙ A H )D[n] (∆ (6) ) 1/2 V<br />

where D[n] is the P × P diagonal matrix containing the nth<br />

row of A, 1 ≤ n ≤ K.<br />

Hence matrices Γ[n] share the same common right singular<br />

space<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 302


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Using the invariance to estimate V<br />

1 Cut the K 3 × P matrix (C (6)<br />

x ) 1/2 into K blocks of size K 2 × P.<br />

Each of these blocks, Γ[n], satisfies:<br />

Γ[n] = (A ⊙ A H )D[n] (∆ (6) ) 1/2 V<br />

where D[n] is the P × P diagonal matrix containing the nth<br />

row of A, 1 ≤ n ≤ K.<br />

Hence matrices Γ[n] share the same common right singular<br />

space<br />

2 Compute the joint EVD of the K(K − 1) matrices<br />

as: Θ[m, n] = V Λ[m, n] V H .<br />

Θ[m, n] <strong>de</strong>f<br />

= Γ[m]†Γ[n]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 302


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Estimation of A<br />

Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />

unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />

scale factor:<br />

A ⊙3 = (C (6)<br />

x ) 1/2 V (53)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Estimation of A<br />

Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />

unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />

scale factor:<br />

A ⊙3 = (C (6)<br />

x ) 1/2 V (53)<br />

One possibility to g<strong>et</strong> A from A ⊙3 is as follows:<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Estimation of A<br />

Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />

unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />

scale factor:<br />

A ⊙3 = (C (6)<br />

x ) 1/2 V (53)<br />

One possibility to g<strong>et</strong> A from A ⊙3 is as follows:<br />

3 Build K 2 matrices Ξ[m] of size K × P form rows of A ⊙3<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Estimation of A<br />

Matrices Λ[m, n] cannot be used directly because (∆ (6) ) 1/2 is<br />

unknown. But we use V to obtain the estimate of A ⊙3 up to a<br />

scale factor:<br />

A ⊙3 = (C (6)<br />

x ) 1/2 V (53)<br />

One possibility to g<strong>et</strong> A from A ⊙3 is as follows:<br />

3 Build K 2 matrices Ξ[m] of size K × P form rows of A ⊙3<br />

4 From Ξ[m] find A and diagonal matrices D[m], in the LS<br />

sense:<br />

Ξ[m] D[m] ≈ A, 1 ≤ m ≤ K 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 303


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Estimation of A (<strong>de</strong>tails) S<br />

Stationary values of criterion M yield the solution below<br />

m=1 ΞmDm − A 2<br />

F<br />

, M <strong>de</strong>f<br />

= K 2 ,<br />

Obtain vectors d p <strong>de</strong>f<br />

= [D1 (p, p) , D2 (p, p) , · · · DM (p, p)] T ,<br />

by solving the linear systems:<br />

Fp d p = 0<br />

where matrices Fp are <strong>de</strong>fined as<br />

<br />

H<br />

(M − 1) Ξm1 Fp (m1, m2) =<br />

Ξm1<br />

<br />

(p, p) if m1 = m2<br />

− Ξ H<br />

m1Ξm2 <br />

(p, p) otherwise<br />

Deduce the estimate A = 1 M M m=1 ΞmDm<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 304


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Conditions of i<strong>de</strong>ntifiability of BIOME(2r) S<br />

Source cumulants of or<strong>de</strong>r 2r > 4 are nonzero and have the<br />

same sign<br />

Columns vectors of mixing matrix A are not collinear<br />

Matrix A ⊙(r−1) is full column rank.<br />

This last condition implies that tensor rank must be at most<br />

K r−1 (e.g. P ≤ K 2 for or<strong>de</strong>r 2r = 6).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 305


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

FOOBI algorithms<br />

Again same problem: Given a K 2 × P matrix A ⊙2 , find a real<br />

orthogonal matrix Q such that the P columns of A ⊙2 Q are of the<br />

form a[p] ⊗ a[p] ∗<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 306


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

FOOBI algorithms<br />

Again same problem: Given a K 2 × P matrix A ⊙2 , find a real<br />

orthogonal matrix Q such that the P columns of A ⊙2 Q are of the<br />

form a[p] ⊗ a[p] ∗<br />

FOOBI: use the K 4 <strong>de</strong>terminantal equations characterizing<br />

rank-1 matrices a[p] a[p] H of the form:<br />

φ(X, Y)ijkℓ = xijyℓk − xikyℓj + yijxℓk − yikxℓj<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 306


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

FOOBI algorithms<br />

Again same problem: Given a K 2 × P matrix A ⊙2 , find a real<br />

orthogonal matrix Q such that the P columns of A ⊙2 Q are of the<br />

form a[p] ⊗ a[p] ∗<br />

FOOBI: use the K 4 <strong>de</strong>terminantal equations characterizing<br />

rank-1 matrices a[p] a[p] H of the form:<br />

φ(X, Y)ijkℓ = xijyℓk − xikyℓj + yijxℓk − yikxℓj<br />

FOOBI2: use the K 2 equations of the form:<br />

Φ(X, Y) = X Y + Y X − traceXY − traceYX<br />

where matrices X and Y are K × K Hermitian.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 306


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

FOOBI S<br />

1 Normalize the columns of the K 2 × P matrix A⊙2 such that<br />

matrices A[r] <strong>de</strong>f <br />

= UnvecK a⊙2 [r] are Hermitian<br />

2 Compute the K 2 (K − 1) 2 × P(P − 1)/2 matrix P <strong>de</strong>fined by<br />

φ(A[r], A[s]), 1 ≤ r ≤ s ≤ P.<br />

3 Compute the P weakest right singular vectors of P, Unvec<br />

them and store them in P matrices W[r]<br />

4 Jointly diagonalize W[r] by a real orthogonal matrix Q<br />

= (C (4)<br />

x ) 1/2∆Q and <strong>de</strong>duce â[r] as the<br />

dominant left singular vectors of Unvec (f [r]).<br />

5 Then compute F <strong>de</strong>f<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 307


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

FOOBI2 S<br />

1 Normalize the columns of the K 2 × P matrix A⊙2 such that<br />

matrices A[r] <strong>de</strong>f <br />

= UnvecK a⊙2 [r] are Hermitian<br />

2 Compute the K(K + 1)/2 Hermitian matrix B[r, s] of size<br />

P × P <strong>de</strong>fined by:<br />

Φ(A[r], A[s])|ij <strong>de</strong>f<br />

= B[i, j]|rs<br />

3 Jointly cancel diagonal entries of matrices B[i, j] by a real<br />

congruent orthogonal transform Q<br />

= (C (4)<br />

x ) 1/2∆Q and <strong>de</strong>duce â[r] as the<br />

dominant left singular vectors of Unvec (f [r]).<br />

4 Then compute F <strong>de</strong>f<br />

NB: B<strong>et</strong>ter bound than FOOBI and BIOME(4), but iterative<br />

algorithm sensitive to initialization<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 308


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Algorithms based on characteristic functions<br />

Fit with a mo<strong>de</strong>l of exact rank<br />

1 Back to the core equation (47):<br />

Ψx(u) = <br />

p<br />

Ψsp<br />

<br />

q<br />

uq Aqp<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 309


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Algorithms based on characteristic functions<br />

Fit with a mo<strong>de</strong>l of exact rank<br />

1 Back to the core equation (47):<br />

Ψx(u) = <br />

p<br />

Ψsp<br />

<br />

q<br />

uq Aqp<br />

2 Goal: Find a matrix A such that the K−variate function<br />

Ψx(u) <strong>de</strong>composes into a sum of P univariate functions<br />

ψp <strong>de</strong>f<br />

= Ψsp.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 309


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Algorithms based on characteristic functions<br />

Fit with a mo<strong>de</strong>l of exact rank<br />

1 Back to the core equation (47):<br />

Ψx(u) = <br />

p<br />

Ψsp<br />

<br />

q<br />

uq Aqp<br />

2 Goal: Find a matrix A such that the K−variate function<br />

Ψx(u) <strong>de</strong>composes into a sum of P univariate functions<br />

ψp <strong>de</strong>f<br />

= Ψsp.<br />

3 I<strong>de</strong>a: Fit both si<strong>de</strong>s on a grid of values u[ℓ] ∈ G<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 309


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Equations <strong>de</strong>rived from the CAF<br />

Assumption: functions ψp, 1 ≤ p ≤ P admit finite <strong>de</strong>rivatives<br />

up to or<strong>de</strong>r r in a neighborhood of the origin, containing G.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 310


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Equations <strong>de</strong>rived from the CAF<br />

Assumption: functions ψp, 1 ≤ p ≤ P admit finite <strong>de</strong>rivatives<br />

up to or<strong>de</strong>r r in a neighborhood of the origin, containing G.<br />

Then, Taking r = 3 as a working example:<br />

∂ 3 Ψx<br />

(u) =<br />

∂ui∂uj∂uk<br />

P<br />

p=1<br />

Aip Ajp Akp ψ (3)<br />

p (<br />

K<br />

q=1<br />

uq Aqp)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 310


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Equations <strong>de</strong>rived from the CAF<br />

Assumption: functions ψp, 1 ≤ p ≤ P admit finite <strong>de</strong>rivatives<br />

up to or<strong>de</strong>r r in a neighborhood of the origin, containing G.<br />

Then, Taking r = 3 as a working example:<br />

∂ 3 Ψx<br />

(u) =<br />

∂ui∂uj∂uk<br />

P<br />

p=1<br />

Aip Ajp Akp ψ (3)<br />

p (<br />

K<br />

q=1<br />

uq Aqp)<br />

If L > 1 point in grid G, then yields another mo<strong>de</strong> in tensor<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 310


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Putting the problem in tensor form<br />

A <strong>de</strong>composition into a sum of rank-1 terms:<br />

Tijkℓ = <br />

or equivalently<br />

T = <br />

p<br />

p<br />

Aip Ajp Akp Bℓp<br />

a(p) ⊗ a(p) ⊗ a(p) ⊗ b(p)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 311


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Putting the problem in tensor form<br />

A <strong>de</strong>composition into a sum of rank-1 terms:<br />

Tijkℓ = <br />

or equivalently<br />

T = <br />

p<br />

p<br />

Aip Ajp Akp Bℓp<br />

a(p) ⊗ a(p) ⊗ a(p) ⊗ b(p)<br />

Tensor T is K × K × K × L,<br />

symm<strong>et</strong>ric in all mo<strong>de</strong>s except the last.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 311


Example [CR06]<br />

Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Joint use of different <strong>de</strong>rivative or<strong>de</strong>rs<br />

Derivatives of or<strong>de</strong>r 3:<br />

T (3)<br />

ijkℓ<br />

= <br />

p<br />

Aip Ajp Akp Bℓp<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 312


Example [CR06]<br />

Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Joint use of different <strong>de</strong>rivative or<strong>de</strong>rs<br />

Derivatives of or<strong>de</strong>r 3:<br />

T (3)<br />

ijkℓ<br />

Derivatives of or<strong>de</strong>r 4:<br />

T (4)<br />

ijkmℓ<br />

= <br />

p<br />

= <br />

p<br />

Aip Ajp Akp Bℓp<br />

Aip Ajp Akp Amp Cℓp<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 312


Example [CR06]<br />

Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Joint use of different <strong>de</strong>rivative or<strong>de</strong>rs<br />

Derivatives of or<strong>de</strong>r 3:<br />

T (3)<br />

ijkℓ<br />

Derivatives of or<strong>de</strong>r 4:<br />

T (4)<br />

ijkmℓ<br />

= <br />

p<br />

= <br />

Derivatives of or<strong>de</strong>rs 3 and 4:<br />

Tijkℓ[m] = <br />

p<br />

p<br />

Aip Ajp Akp Bℓp<br />

Aip Ajp Akp Amp Cℓp<br />

Aip Ajp Akp Dℓp[m]<br />

with Dℓp[m] = Amp Cℓp and Dℓp[0] = Bℓp.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 312


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Iterative algorithms<br />

Goal: Minimize ||T − r<br />

p=1 up ⊗ v p ⊗ . . . ⊗ w p|| 2<br />

Gradient with fixed or variable (ad-hoc) stepsize<br />

Alternating Least Squares (ALS)<br />

Levenberg-Marquardt<br />

Newton<br />

Conjugate gradient<br />

...<br />

Remarks<br />

Hessian is generally huge, but sparse<br />

Problem of local minima: ELS variants for all of the above<br />

Warning: special constraints nee<strong>de</strong>d to ensure existence of best<br />

low rank approximate<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 313


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Tensor package<br />

Matlab co<strong>de</strong> are freely downloadable at:<br />

http://www.i3s.unice.fr/∼pcomon/TensorPackage.html<br />

Co<strong>de</strong>s inclu<strong>de</strong>:<br />

Alternating Least Squares (ALS) at any or<strong>de</strong>r<br />

Gradient<br />

Quasi-Newton (BFGS update)<br />

Levenberg-Marquardt<br />

Conjugate gradient<br />

Enhanced Line Search (ELS) for any of the above<br />

Special co<strong>de</strong>s for NonNegative tensors<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 314


Problem Algorithms Biblio SCA Dim.2 BIOME Foobi nCF D<strong>et</strong>erm<br />

Non symm<strong>et</strong>ric tensors<br />

Tensors with partial symm<strong>et</strong>ries, or no symm<strong>et</strong>ry at all, often need<br />

to be approximated by <strong>de</strong>compositions of lower rank:<br />

1 Characteristic functions: one mo<strong>de</strong> corresponds to u<br />

2 INDSCAL: tensors with positive <strong>de</strong>finite matrix slices<br />

3 D<strong>et</strong>erministic approaches: no symm<strong>et</strong>ry at all (cf. course on<br />

Applications)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 315


Problem Algorithms Biblio SCA<br />

Bibliography<br />

Existence: [LC09] [CGLM08] [Ste08] [CL11] [LC10]<br />

Uniqueness: [SB00] [BS02]<br />

Computation: [AFCC04] [Ste06] [RCH08] [CL13] [Paa99]<br />

[Paa97]<br />

General: [SBG04] [CLDA09]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 316


Problem Algorithms Biblio SCA<br />

Sparse Component Analysis<br />

Review paper (see [GL06] and Chapter 10 in [CJ10]),<br />

Main i<strong>de</strong>as<br />

Initial un<strong>de</strong>r<strong>de</strong>termined source separation problem:<br />

x(t) = As(t), nbr <strong>sources</strong> P ≫ K nbr d’observations<br />

Transform with a sparsifying mapping T , which preserves<br />

linearity (e.g. wavel<strong>et</strong> transform, ST Fourier transform, <strong>et</strong>c.):<br />

˜x(t) = T (x(t)) = T (As(t)) = AT (s(t)) = A˜s(t)<br />

Solve the source separation problem in the sparse space: ˆ˜s(t)<br />

Come back to the initial space, with inverse of T :<br />

ˆs(t) = T −1 (ˆ˜s(t))<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 317


Problem Algorithms Biblio SCA<br />

Examples of <strong>sources</strong> and mixtures<br />

Sources may be sparse in time or in any domain, preserving the<br />

linear nature of mixtures<br />

2 mixtures of 3 sparse <strong>sources</strong><br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 318


Problem Algorithms Biblio SCA<br />

Solving the un<strong>de</strong>r-<strong>de</strong>termined source separation problem<br />

Since A is non invertible, indirect approach is not possible. One has<br />

to consi<strong>de</strong>r (1) i<strong>de</strong>ntification of A and then (2) restauration of s,<br />

i.e.:<br />

A 3-step approach<br />

Step 0 (representation space): if necessary apply sparsifying<br />

mapping T<br />

Step 1 (i<strong>de</strong>ntification): Estimate A, e.g. by clustering.<br />

Step 2 (source restoration): At each instant n0, find the<br />

sparsest solution of<br />

As(n0) = x(n0), n0 = 0, . . . , N<br />

Main question: HOW to find the sparsest solution of an<br />

Un<strong>de</strong>r<strong>de</strong>termined System of Linear Equations (USLE)?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 319


Problem Algorithms Biblio SCA<br />

Uniqueness of the sparsest solution<br />

x = As, K equations (observations), P unknowns (<strong>sources</strong>):<br />

P ≫ K<br />

Question: Is the sparse solution unique?<br />

Theorem [Don04, GN03]: If there is a solution ˆs with less than<br />

K/2 non-zero components, then it is unique with probability 1<br />

(that is, for almost all matrices A).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 320


Problem Algorithms Biblio SCA<br />

How to find the sparsest solution ?<br />

x = As, K equations (observations), P unknowns (<strong>sources</strong>):<br />

P ≫ K<br />

Goal: Finding the sparsest solution (at least P − K <strong>sources</strong> are<br />

zero)<br />

Direct m<strong>et</strong>hod (Minimum L0 norm m<strong>et</strong>hod):<br />

min s 0 s.t. x = As<br />

1 S<strong>et</strong> P − K (arbitrary) <strong>sources</strong> equal to zero<br />

2 Solve the remaining system of K equations and K unknowns<br />

3 Repeat steps 1 and 2 for all possible choices, and take the<br />

sparsest answer.<br />

Not tractable<br />

Example: P = 50, K = 30, then C K P<br />

≈ 5 × 1013<br />

On a basic PC, 30 × 30 system of equations : 2 × 1O −4 s<br />

Total time: (5 × 10 13 ) × (2 × 10 −4 s) = 300 years!<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 321


Problem Algorithms Biblio SCA<br />

A few faster algorithms<br />

M<strong>et</strong>hod of Frames (MoF) [DDD04]: linear m<strong>et</strong>hod, fast but<br />

not accurate<br />

Matching Pursuit [MZ93]: greedy m<strong>et</strong>hod, fast but not<br />

accurate<br />

Basis Pursuit (minimal L1 norm, Linear Programming)<br />

[CDS99]: accurate and tractable m<strong>et</strong>hod, but still time<br />

consuming<br />

Smoothed L0 [MBZJ09]: fast and accurate, based on GNC<br />

principle<br />

Theorem [DE03]: un<strong>de</strong>r some (sparsity) conditions: argmin L1 =<br />

argmin L0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 322


Problem Algorithms Biblio SCA<br />

Applications of SCA<br />

Among many others, applications in<br />

Audio and music (see the lecture of L. Dau<strong>de</strong>t)<br />

Astrophysics (see the lecture of Starck)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS VIII.Un<strong>de</strong>rD<strong>et</strong>ermined mixtures 323


Introduction NMF NTF<br />

IX. NonNegative Mixtures<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 324


32 Introduction<br />

Titre sous-section<br />

Introduction NMF NTF<br />

Contents of course IX<br />

33 Non-negative Matrix Factorization<br />

Condition for NN ICA<br />

Algorithm principles<br />

Discussion<br />

34 NonNegative Tensor Factorization<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 325


Introduction NMF NTF Titre sous-section<br />

Why non negative mixture ?<br />

In many real-world applications, <strong>sources</strong> are non-negative<br />

(NN): images, chemical concentrations, note or speech<br />

amplitu<strong>de</strong> in audio, <strong>et</strong>c.<br />

In addition, mixtures can som<strong>et</strong>imes be non-negitive too:<br />

superposition of images, mixtures of chemical molecules or<br />

ions, <strong>et</strong>c.<br />

NN for linear source separation problem<br />

Mixing equation x(t) = As(t); t = 1, . . . , N can be written as:<br />

X = AS<br />

where X is K × N, and S is P × N.<br />

Solving source separation consists in factorizing the observable<br />

mixtures X in two matrices, with non-negativity constraints,<br />

on S, on A or on both S and A.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 326


Introduction NMF NTF Titre sous-section<br />

Matrix and tensor factorization<br />

From matrices to tensors<br />

More complex data (3D, 3D+t, <strong>et</strong>c.), e.g. sequence of images<br />

(vi<strong>de</strong>o, TEP or MRI, hyperspectral images) or multi-modal<br />

data, can be represented by N size arrays, N > 2<br />

Non-negative matrix factorization (NMF) can then be<br />

exten<strong>de</strong>d to Non-negative Tensor Factorization (NTF)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 327


In NMF<br />

Introduction NMF NTF NN ICA Algorithms Discussion<br />

Ealier works<br />

Paatero and Tapper [PT94] consi<strong>de</strong>red positivity constraints<br />

on A and s in environmental mo<strong>de</strong>lization,<br />

Lee and Seung [LS99] used NMF for feature extraction in face<br />

images<br />

In Nonnegative ICA<br />

Plumbley <strong>et</strong> al addressed the problem for audio and especially<br />

music separation [PAB + 02]<br />

Plumbey proposed theor<strong>et</strong>ical conditions for nonnegative<br />

in<strong>de</strong>pen<strong>de</strong>nt component analysis in 2002 [Plu02]<br />

First algorithms have been proposed by Plumbley<br />

[Plu03, Plu05] and Plumbley and Oja [PO04],<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 328


Introduction NMF NTF NN ICA Algorithms Discussion<br />

Conditions for NN ICA [Plu02]<br />

Assumptions and pre-processing<br />

x(t) = As(t), and <strong>sources</strong> s(t) are NN and well-groun<strong>de</strong>d<br />

Definition: A source s is well-groun<strong>de</strong>d if it has non-zero pdf<br />

in the positive neighborhood of s = 0.<br />

One then apply a whitening W, computed on zero-mean data<br />

(x(t) − E[x]) , but applyied on x: z = Wx<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 329


Theorem [Plu02]<br />

Introduction NMF NTF NN ICA Algorithms Discussion<br />

Conditions for NN ICA [Plu02]<br />

L<strong>et</strong> z be a random vector of real-valued, non-negative and<br />

well-groun<strong>de</strong>d in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong>, each with unit variance<br />

and l<strong>et</strong> y = Uz be an orthogonal rotation of z, then U is a<br />

permutation matrix iff y is nonnegative with probability 1.<br />

Practically, matrix U can be estimated by minimizing a<br />

criterion:<br />

J(U) = 1<br />

2 E<br />

<br />

z T + <br />

− U y 2 <br />

where y + = (y + 1<br />

, . . . , y +<br />

P<br />

), with y +<br />

i = max(yi, 0).<br />

In [Plu03], Plumbley propose a few algorithms, based<br />

on NN PCA,<br />

Givens rotations,<br />

line search,<br />

geo<strong>de</strong>syc search.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 330


Introduction NMF NTF NN ICA Algorithms Discussion<br />

Cost function and its differential<br />

Simple Gradient algorithm<br />

It is the square error, written like the Frobenius norm:<br />

J(A, S) = 1<br />

2<br />

<br />

i,j<br />

(xij − (AS)ij) 2 = 1<br />

2<br />

Its gradient with respect to S entries writes:<br />

Update equations<br />

∂J(A, S)<br />

= − <br />

∂sij<br />

i,j<br />

[A T X − A T AS]i,j<br />

X − AS2<br />

F<br />

There are similar due to sym<strong>et</strong>ry b<strong>et</strong>ween A and S in the cost<br />

function:<br />

sij = sij + µij([A T X]ij − [A T AS]ij) <br />

+<br />

akl = akl + µkl([XST ]kl − [ASST ]kl)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 331


Introduction NMF NTF NN ICA Algorithms Discussion<br />

Gradient with multiplicative updates<br />

For enhancing gradient algorithm, Lee and Seung [LS99] proposed<br />

multiplicative update:<br />

Choice of the step size<br />

One chose µij so that first and third right si<strong>de</strong> terms of the<br />

update equation cancel:<br />

sij = sij + µij([AT X]ij − [AT AS]ij) <br />

+ , i.e. sij = µij[AT AS]ij,<br />

hence:<br />

sij<br />

µij =<br />

[AT AS]ij<br />

New update equations<br />

The new (multiplicative) update equations are then::<br />

sij = sij<br />

[A T X]ij<br />

[A T AS]ij<br />

and aij = aij<br />

[AST ]ij<br />

[ASST ]ij<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 332


Newton-like m<strong>et</strong>hod<br />

Introduction NMF NTF NN ICA Algorithms Discussion<br />

Alternating Least Square<br />

The gradient of J(A, S) with respect to S writes:<br />

∂J(A, S)<br />

∂S<br />

= −(A T X − A T AS)<br />

It must be equal to 0 at the minimum, i.e.<br />

Newton-like update equations<br />

(A T A)S = A T X<br />

The Newton-like update equations are then:<br />

S = [(A T A) −1 A T X]+ and A = [XS T (SS T ) −1 ]+<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 333


Well-groun<strong>de</strong>d condition<br />

Introduction NMF NTF NN ICA Algorithms Discussion<br />

Comments<br />

This condition is very important, since one need samples on the<br />

edges of the scatter plot<br />

With uniform distribution: satisfied. But with a triangle pdf...<br />

Similar problem noted by Moussaoui (ICASSP 06) in chemical<br />

spectroscopy<br />

Moussaoui mentionned problem of tendancy in chemical spectra<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 334


Noisy mixtures<br />

Introduction NMF NTF NN ICA Algorithms Discussion<br />

Noisy mixtures of NN <strong>sources</strong><br />

Finally, NMF are not robust to additive noise<br />

Moreover, is additive noise realistic ?<br />

Relevant noise mo<strong>de</strong>lling in relation with the physics of the<br />

data<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 335


NMF pros and cons<br />

Introduction NMF NTF NN ICA Algorithms Discussion<br />

Beyond NMF<br />

Simple formulation and algorithms<br />

No proof of unique factorization<br />

Require strict enough conditions<br />

Poor robustness to noisy mixtures<br />

NN Bayesian approach<br />

NN priors is simple to consi<strong>de</strong>r<br />

Although Bayesian approaches are cost consuming, they are<br />

efficient for small samples<br />

See the lecture of A. Mohammad-Djafari in this course<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 336


Introduction NMF NTF<br />

NonNegative Tensor Factorization (1/2)<br />

In a CP <strong>de</strong>composition<br />

T = (A, B, .., C) · L (54)<br />

the solution is not unique: constraints nee<strong>de</strong>d (to avoid e.g.<br />

A → ∞)<br />

If T is nonnegative, one can impose to {A, B, .., C, L} to be<br />

all nonnegative → concept of nonnegative rank [LC09]<br />

Impose unit-norm factors in (54): A = B = .. = C = 1<br />

Choice of norm?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 337


Introduction NMF NTF<br />

NonNegative Tensor Factorization (2/2)<br />

Define T E = <br />

ij..k |Tij..k|, b<strong>et</strong>ter suited in R + than<br />

Frobenius<br />

Then constraints<br />

fixes the scale in<strong>de</strong>terminacy<br />

ap1 = bp1 = .. = cp1 = 1 (55)<br />

With that constraint, u ⊗ v ⊗ .. ⊗ w = 1, and T E = λ1.<br />

Function Υ(A, B, .., C, λ) = T − R i=1 ai ⊗ bi ⊗ .. ⊗ ciE is<br />

unboun<strong>de</strong>d but coercive continuous. Hence its infimum is<br />

reached, even if not <strong>de</strong>fined on a compact.<br />

Conclusion: The best low-rank nonnegative tensor approximation<br />

problem is well posed un<strong>de</strong>r constraint (55).<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS IX.NonNegative Mixtures 338


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

X. Applications<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 339


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

35 Introduction<br />

Contents of course X<br />

36 D<strong>et</strong>erministic approaches<br />

Fluorescence<br />

Antenna Array Processing<br />

Spread spectrum communications<br />

37 Hyperspectral image of Mars<br />

38 Back to Cumulant-based approaches<br />

Discr<strong>et</strong>e <strong>sources</strong><br />

Algorithms<br />

39 Smart chemical sensor arrays<br />

Problem and mo<strong>de</strong>l<br />

M<strong>et</strong>hod based on silent <strong>sources</strong><br />

Bayesian approach<br />

40 Removing show-through in Scanned images<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 340


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

A large range of applications<br />

Source separation is a common problem<br />

For multi-dimensional recordings...<br />

Biomedical signals processing: non invasive m<strong>et</strong>hods,<br />

mocalisation and artefects cancellation in EEG, ECG, MEG,<br />

fMRI,<br />

Communications and antenna processing, sonar, radar, mobile<br />

phones,<br />

Audio and music processing,<br />

Image processing in astrophysics,<br />

Monitoring,<br />

Pre-processing for classification,<br />

Scanned image processing,<br />

Smart sensor <strong>de</strong>sign.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 341


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

How to use ICA ?<br />

ICA: a solution for source separation<br />

ICA is now a usual m<strong>et</strong>hod for solving blind source separation<br />

problems<br />

ICA is an estimation m<strong>et</strong>hod, which requires:<br />

a separating mo<strong>de</strong>l, suitable to the mixing mo<strong>de</strong>l,<br />

an in<strong>de</strong>pen<strong>de</strong>nce criterion,<br />

an optimisation algorithm.<br />

When running ICA algorithm, one obtains a solution, which is<br />

the best one un<strong>de</strong>r the above constraints.<br />

Are the In<strong>de</strong>pen<strong>de</strong>nt Components (IC) relevant ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 342


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

ICA: relevance of in<strong>de</strong>pen<strong>de</strong>nt component<br />

Relevance requires:<br />

a good mo<strong>de</strong>l,<br />

the <strong>sources</strong> satisfy the in<strong>de</strong>pen<strong>de</strong>nce assumption,<br />

the optimisation algorithm does not stop in a bad local<br />

minima.<br />

With actual problems, basically, the relevance is not sure since:<br />

the mo<strong>de</strong>l is often an approximation of the physical system,<br />

source in<strong>de</strong>pen<strong>de</strong>nce assumption is not sure,<br />

we don’t know either the <strong>sources</strong> or their number.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 343


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Use all the possible priors<br />

Before applying ICA<br />

Positivity, sparsity, temporal <strong>de</strong>pendance, periodicity or<br />

cyclostationnarity, discr<strong>et</strong>e-valued data, <strong>et</strong>c.<br />

Priors can lead to b<strong>et</strong>ter criteria than in<strong>de</strong>pen<strong>de</strong>nce.<br />

Careful mo<strong>de</strong>lization of observations<br />

How the physical system provi<strong>de</strong>s the observations?<br />

Mo<strong>de</strong>lization leads to a mixing mo<strong>de</strong>l, and hence to a suited<br />

mo<strong>de</strong>l of separating structure.<br />

Mo<strong>de</strong>lization is important for showing what should be the<br />

estimated <strong>sources</strong>.<br />

Physics of the system is important for component<br />

interpr<strong>et</strong>ation.<br />

Choose the best representation domain for applying BSS<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 344


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Dam monitoring<br />

The dam wall moves according to the water level, the<br />

temperature, <strong>et</strong>c.<br />

Simple pendular are hung on the wall, at different locations:<br />

the pendular <strong>de</strong>viations measure the wall motions, with<br />

different sensitivities for the water level, the temperature, <strong>et</strong>c.<br />

What is the relation b<strong>et</strong>ween <strong>de</strong>viations and water level,<br />

temperature, <strong>et</strong>c. ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 345


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Multi-access control<br />

Y. Deville, L. Andry, Application of blind source separation<br />

techniques for multi-tag contactless i<strong>de</strong>ntification systems (NOLTA<br />

95; IEICE Trans. On Fundamentals of Electronics, 1996; French<br />

patent Sept. 1995 subsequently exten<strong>de</strong>d)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 346


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

ICA: preprocessing for classification<br />

Bayesian classification framework<br />

Relate each multi-dimensionnal observation x to a class Ci<br />

Bayesian classification requires conditional <strong>de</strong>nsities p(x/Ci)<br />

Multivariate <strong>de</strong>nsity estimation (small sample, complexity)<br />

A simple i<strong>de</strong>a [Com95, AAG03]<br />

Transform observations x in ˜x by ICA, for all observations<br />

associated to Ci<br />

In the transform space, components are close to in<strong>de</strong>pendance,<br />

so that one can write: p(˜x/Ci) ≈ <br />

k p(˜xk/Ci)<br />

Then, using the inverse transform, one g<strong>et</strong>s p(x/Ci)<br />

The m<strong>et</strong>hod replaces 1 multivariate (size N) <strong>de</strong>nsity<br />

estimation by N scalar <strong>de</strong>nsity estimations<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 347


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Fluorescence Spectroscopy 1/4<br />

An optical excitation produces several effects<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 348


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Fluorescence Spectroscopy 1/4<br />

An optical excitation produces several effects<br />

Rayleigh diffusion<br />

Raman diffusion<br />

Fluorescence<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 348


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Fluorescence Spectroscopy 1/4<br />

An optical excitation produces several effects<br />

Rayleigh diffusion<br />

Raman diffusion<br />

Fluorescence<br />

At low concentrations, Beer-Lambert law can be linearized<br />

[LMRB09]<br />

<br />

I (λf , λe, k) = Io<br />

ℓ<br />

γ ℓ (λf ) ɛ ℓ (λe) c k,ℓ<br />

➽ Hence 3rd array <strong>de</strong>composition with real nonnegative<br />

factors [SBG04].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 348


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Fluorescence Spectroscopy 2/4<br />

Mixture of 4 solutes (one concentration shown)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 349


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Fluorescence Spectroscopy 3/4<br />

Obtained results with R = 4<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 350


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

With projected ALS [Cichocki’09] and R = 5<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 351


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

With nonnegative BFGS [Royer <strong>et</strong> al’10] and R = 5<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 352


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Link with Bayesian approaches<br />

Similar Non Negative tensor <strong>de</strong>composition in Bayesian approaches:<br />

p(x1, x2, . . . xp) ≈<br />

R<br />

p(zi) p(x1|zi) . . . p(xp|zi)<br />

i=1<br />

See [LC09], and course on Bayesian techniques.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 353


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Antenna Array Processing<br />

Receiver<br />

Transmitter<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Antenna Array Processing<br />

Receiver<br />

Transmitter<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Antenna Array Processing<br />

Receiver<br />

Transmitter<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Antenna Array Processing<br />

Receiver<br />

Transmitter<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Antenna Array Processing<br />

Receiver<br />

Transmitter<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Antenna Array Processing<br />

Receiver<br />

Transmitter<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 354


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Narrow band mo<strong>de</strong>l in the far field<br />

Mo<strong>de</strong>ling the signals received on an array of antennas<br />

generally leads to a matrix <strong>de</strong>composition:<br />

Tij = <br />

q<br />

aiq sjq<br />

i: space q: path, source A: antenna gains<br />

j: time S: transmitted signals<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 355


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Narrow band mo<strong>de</strong>l in the far field<br />

Mo<strong>de</strong>ling the signals received on an array of antennas<br />

generally leads to a matrix <strong>de</strong>composition:<br />

Tijp = <br />

q<br />

aiq sjq hpq<br />

i: space q: path, source A: antenna gains<br />

j: time S: transmitted signals<br />

But in the presence of additional diversity, a tensor can be<br />

constructed, thanks to new in<strong>de</strong>x p<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 355


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

space<br />

time<br />

Possible diversities in Signal Processing<br />

space translation (array geom<strong>et</strong>ry)<br />

time translation (chip)<br />

frequency/wavenumber (nonstationarity)<br />

excess bandwidth (oversampling)<br />

cyclostationarity<br />

polarization<br />

finite alphab<strong>et</strong><br />

...<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 356


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (1/2)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 357


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (2/2)<br />

Aiq: gain b<strong>et</strong>ween sensor i and source q<br />

Hpq: transfer b<strong>et</strong>ween reference and subarray p<br />

Sjq: sample j of source q<br />

βq: path loss, d q: DOA, bi: sensor location<br />

Tensor mo<strong>de</strong>l (NB far field) [SBG00]<br />

Reference subarray: Aiq = βq exp(j ω<br />

C bT i d q)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 358


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (2/2)<br />

Aiq: gain b<strong>et</strong>ween sensor i and source q<br />

Hpq: transfer b<strong>et</strong>ween reference and subarray p<br />

Sjq: sample j of source q<br />

βq: path loss, d q: DOA, bi: sensor location<br />

Tensor mo<strong>de</strong>l (NB far field) [SBG00]<br />

Reference subarray: Aiq = βq exp(j ω<br />

C bTi d q)<br />

Space translation (from reference subarray):<br />

βq exp( ω<br />

C [bi + ∆p] T d q) <strong>de</strong>f<br />

= Aiq Hpq<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 358


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Space translation diversity (2/2)<br />

Aiq: gain b<strong>et</strong>ween sensor i and source q<br />

Hpq: transfer b<strong>et</strong>ween reference and subarray p<br />

Sjq: sample j of source q<br />

βq: path loss, d q: DOA, bi: sensor location<br />

Tensor mo<strong>de</strong>l (NB far field) [SBG00]<br />

Reference subarray: Aiq = βq exp(j ω<br />

C bTi d q)<br />

Space translation (from reference subarray):<br />

βq exp( ω<br />

C [bi + ∆p] T d q) <strong>de</strong>f<br />

= Aiq Hpq<br />

Trilinear mo<strong>de</strong>l:<br />

Tijp = <br />

p: in<strong>de</strong>x of subarray<br />

q<br />

Aiq Sjq Hpq<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 358


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Time translation diversity<br />

DS-CDMA<br />

Aiq: gain b<strong>et</strong>ween sensor i and user q<br />

Cq(n): spreading sequence of user q (chip number: n)<br />

Hq(n): channel of user q<br />

Bqp = Hq ⋆ Cq(p), after removal of ISI (guards)<br />

Sqj: symbol j of user q<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 359


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Time translation diversity<br />

DS-CDMA<br />

Aiq: gain b<strong>et</strong>ween sensor i and user q<br />

Cq(n): spreading sequence of user q (chip number: n)<br />

Hq(n): channel of user q<br />

Bqp = Hq ⋆ Cq(p), after removal of ISI (guards)<br />

Sqj: symbol j of user q<br />

Tensor mo<strong>de</strong>l<br />

p: in<strong>de</strong>x of chip with no ISI<br />

Tijp = <br />

q<br />

Aiq Sqj Bqp<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 359


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Fluo AAP CDMA<br />

Bibliographical comments<br />

NonNegative tensors: Projected ALS [CZPA09], Exact<br />

param<strong>et</strong>erization [RTMC11], Tensor package: url.<br />

Antenna Array Processing: [SBG00], [CL11] [LC10]<br />

CDMA: Sidiropoulos [SGB00]<br />

Discr<strong>et</strong>e <strong>sources</strong> and TCOM: [TVP96] [Com04] [ZC05] [ZC08]<br />

Audio (L.Dau<strong>de</strong>t)<br />

Image (D.Fadili)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 360


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Content<br />

Hyperspectral image of Mars<br />

Mo<strong>de</strong>ling the observations<br />

Aplying ICA<br />

IC interpr<strong>et</strong>ation<br />

Discussion<br />

with H. Hauksdottir (MsSc thesis, Univ. of Iceland), J. Chanussot<br />

(GIPSA-lab, Grenoble), S. Moussaoui (IRCCyN, Nantes), F.<br />

Schmidt and S. Douté (Plan<strong>et</strong>ology Lab., Grenoble), D. Brie<br />

5CRAN, Nancy), J. Benediksson (Univ. of Iceland)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 361


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Observations<br />

Data<br />

Hyperspectral image of Mars: Data<br />

OMEGA instrument: a spectrom<strong>et</strong>er on board of Mars Express<br />

256 frequency bands in visible and IR, from 0.926 mm to<br />

5.108 mm with a spectral resolution of 0.014 mm,<br />

variable spatial resolution (elliptic orbit) from 300m to 4km.<br />

only 174 frequency bands (the other are too noisy),<br />

portion of image: about 256 x 256 pixels<br />

one hyperspectral image is a cube of 10,000,000 data<br />

South Polar Cap of Mars mainly contains 3 endmembers: CO2<br />

ice, H2O ice and Dust.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 362


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Physical properties of observations<br />

Reflected light on the spectrom<strong>et</strong>er<br />

reflected light of sun inci<strong>de</strong>nt light according to ground<br />

reflectance,<br />

atmospheric attenuation and direct sensor illumination,<br />

geom<strong>et</strong>rical mixture occurs when a few different chemicals<br />

reflect directly the sun light according to their local abundance,<br />

intimate mixture corresponds to a (NL) mixture at very small<br />

scale,<br />

solar position with respect to the Polar cap leads to luminance<br />

gradient<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 363


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Mo<strong>de</strong>l of observations<br />

Luminance in location (x, y) t wavelength λ<br />

Assuming geom<strong>et</strong>rical mixture, the luminance writes as<br />

<br />

L(x, y, λ) = φ(λ) La(λ) +<br />

P<br />

p=1<br />

With simple algebra, it leads to a linear mo<strong>de</strong>l<br />

L(x, y, λ) =<br />

P<br />

p=1<br />

<br />

αp(x, y)Lp(λ) cos(θ(x, y)<br />

α ′ p(x, y)L ′ p(λ) + E(x, y, λ) + noise<br />

Estimated spectra different of original endmember spectra,<br />

since modified by gradient (cos θ(x, y)) and athmospheric<br />

effect (φ(λ)<br />

What are the in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong>: abundances, spectra ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 364


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Spectral or spatial ICA ?<br />

ICA approximations: spatial or spectral <strong>de</strong>composition ?<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 365


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Chosing the number of ICs<br />

Solutions<br />

Spatial ICA<br />

At least equal to the endmember number,<br />

sufficient for insuring an accurate approximation.<br />

3 endmembers: H2O, CO2 ices and Dust.<br />

Using PCA, with Nc = 7 components, one recovers 98% of<br />

the initial image power.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 366


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Spatial ICA results<br />

After PCA, JADE provi<strong>de</strong>s 7 ICs (images)<br />

How to interpr<strong>et</strong> the ICs ?<br />

For endmembers IC, one has ground truth with reference<br />

classification and reference spectra.<br />

For others ???<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 367


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

IC interpr<strong>et</strong>ation: using reference spectra<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 368


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

IC interpr<strong>et</strong>ation: using reference classification<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 369


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Interpr<strong>et</strong>ation of other ICs<br />

Labelling through expert interaction<br />

Results with preprocessed<br />

data, with removal of<br />

luminance and atmospheric<br />

effects and <strong>de</strong>fectuous<br />

sensors<br />

On these data, the power<br />

associated to IC1, IC3, IC4<br />

and IC7 are very small<br />

We <strong>de</strong>duce that these ICs are<br />

related to effects removed by<br />

the preprocessing<br />

Spectra of these ICs are<br />

more or less flat<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 370


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Interpr<strong>et</strong>ation of the last ICs<br />

Labelling through expert interaction<br />

The spectrum looks like a<br />

nonlinear mixture of CO2 ice<br />

and Dust spectra.<br />

Physicists suspect it could<br />

be associated to intinate<br />

mixtures.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 371


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Comments<br />

Hyperspectral images of Mars: Discussion<br />

5 ICs seems related to artefacts, and 2 like endmembers<br />

Artefacts cannot been explained with endmembers, and appear<br />

like ICs. The initial mo<strong>de</strong>l should be improved.<br />

2 endmembers are strongly correlated !<br />

Spectra are som<strong>et</strong>imes negative !<br />

Conclusions<br />

ICA is not suited to this problem !<br />

Classification cannot be done with these ICs.<br />

One can <strong>de</strong>sign a m<strong>et</strong>hod based on true assumptions, like<br />

positivity of spectra, e.g. based on Bayesian approaches<br />

[MHS + 08].<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 372


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Finite alphab<strong>et</strong>s<br />

Back to contrast criteria: APF<br />

Approximation of the MAP estimate<br />

Semi-Algebraic Blind Extraction algorithm: AMiSRoF<br />

Blind Extraction by ILSP<br />

Convolutive mo<strong>de</strong>l<br />

Presence of Carrier Offs<strong>et</strong> (in Digital Communications)<br />

Constant Modulus<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 373


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Contrast for discr<strong>et</strong>e inputs (1)<br />

Hypothesis H5 The <strong>sources</strong> take their value in a finite<br />

aphab<strong>et</strong> A <strong>de</strong>fined by the roots in C of some polynomial<br />

q(z) = 0<br />

Theorem [Com04]<br />

Un<strong>de</strong>r H5, the following is a contrast over the s<strong>et</strong> H of<br />

invertible P × P FIR filters.<br />

Υ(G; z) <strong>de</strong>f<br />

= − <br />

APF: Algebraic Polynomial Fitting<br />

n<br />

i<br />

|q(zi[n])| 2<br />

(56)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 374


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Contrast for discr<strong>et</strong>e inputs (2)<br />

For given alphab<strong>et</strong> A, <strong>de</strong>note G the s<strong>et</strong> of numbers c such<br />

that c A ⊆ A.<br />

Lemma 1 Trivial filters satisfying H5 are of the form:<br />

P D[z]<br />

with D[z] diagonal and Dpp[z] = cp z n , for some n ∈ Z and<br />

some cp ∈ G.<br />

Because A is finite, any c ∈ G must be of unit modulus, and<br />

we must have c A = A, ∀c ∈ G.<br />

Also any c ∈ G has an inverse c −1 in G.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 375


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Advantages<br />

Contrast for discr<strong>et</strong>e inputs : summary<br />

The previous contrast allows to separate correlated <strong>sources</strong><br />

But it needs all <strong>sources</strong> to have the same (known) alphab<strong>et</strong><br />

If <strong>sources</strong> have different alphab<strong>et</strong>s, one can extract <strong>sources</strong> in<br />

parallel with different criteria: Parallel Extraction [RZC05]<br />

By <strong>de</strong>flation with different criteria, one can extract more<br />

<strong>sources</strong> than sensors: Parallel Deflation [RZC05]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 376


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Kurtosis:<br />

Equivalence b<strong>et</strong>ween KMA and CMA<br />

Constant modulus:<br />

ΥKMA = Cum{z, z, z∗ , z ∗ }<br />

[E{ |z| 2 }] 2<br />

JCMA = E{[ |z| 2 − R ] 2 }<br />

Assume 2nd Or<strong>de</strong>r circular <strong>sources</strong>: E{s 2 } = 0<br />

Then KMA and CMA are equivalent [Reg02] [Com04]<br />

Proof.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 377


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Discussion on Deflation (MISO) criteria<br />

L<strong>et</strong> z <strong>de</strong>f<br />

= f H x. Criteria below stationary iff differentials _ p and _ q are<br />

collinear:<br />

Ratio: Max<br />

f<br />

p(f )<br />

q(f )<br />

Example: Kurtosis, with p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z 2 }| 2<br />

and q = E{|z| 2 } 2<br />

Difference: Min<br />

f p(f ) − α q(f )<br />

Example: Constant Modulus, with p = E{|z| 4 } and<br />

q = 2 a E{|z| 2 } − a 2 or Constant Power, with<br />

q = 2 a ℜ(E{z 2 }) − a 2<br />

Constrained: Max p(f )<br />

q(f )=1<br />

Example: Cumulant, with<br />

p = E{|z| 4 } − 2E{|z| 2 } 2 − |E{z2 }| 2<br />

Example: Moment, with p = E{|z| 4 }, if standardized and<br />

with either q = ||f || 2 or q = E{|z| 2 } 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 378


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Parallel Deflation<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 379


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Parallel Extraction<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 380


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Parallel extraction<br />

Parallel extraction of 3 <strong>sources</strong> (QPSK, QAM16, PSK6), from a<br />

3-sensor length-3 random Gaussian channel [RZC05]<br />

Symbol Error Rate (%)<br />

10 0<br />

10 −1<br />

10 −2<br />

10 −3<br />

10 −4<br />

Lc=2, Lh=13<br />

QAM16<br />

QPSK<br />

PSK−6<br />

10<br />

0 5 10 15 20 25 30<br />

−5<br />

Signal to Noise ratio (dB)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 381


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

APF extraction<br />

Parallel Deflation from a mixture of 4 <strong>sources</strong> (2 QPAK and 2<br />

QAM16) received on 3 sensors. Extraction of a QPSK source in<br />

figure, compared to MMSE [RZC05]<br />

10 1<br />

10 0<br />

MSE<br />

10 −1<br />

Un<strong>de</strong>r<strong>de</strong>termined mixture (4 inputs − 3 ouputs), 600 samples<br />

APF extraction filter<br />

MMSE filter<br />

10<br />

0 2 4 6 8 10 12 14 16 18 20<br />

−2<br />

SNR(dB)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 382


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Optimal solution<br />

MAP estimate<br />

( ˆH, ˆs) MAP = Arg Max<br />

H s ps|x,H (x, s, H)<br />

If sp ∈ A, and if noise is Gaussian, then<br />

( ˆH, ˆs) MAP = Arg Min<br />

H, s∈AP ||x − H s|| 2<br />

Less costly to search (inverse filter when it exists)<br />

or by <strong>de</strong>flation:<br />

(ˆF, ˆs) MAP = Arg Min<br />

F , s∈AP ||Fx − s|| 2<br />

( ˆ f , ˆs) MAP = Arg Min<br />

f , s∈AP ||f H x − s|| 2<br />

(57)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 383


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Approximation of the MAP estimate<br />

For alphab<strong>et</strong> of constant modulus, MAP criterion (57) is<br />

asymptotically equivalent (for large samples of size T ) to [GC98]:<br />

ΥT (f ) = 1<br />

T<br />

T<br />

t=1<br />

cardA <br />

j=1<br />

|f H x[t] − aj[t]| 2<br />

where aj[t] ∈ A<br />

We have transformed an exhaustive serach into a polynomial<br />

alphab<strong>et</strong> fit<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 384


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Algorithm AMiSRoF<br />

Absolute Minimimum Search by Root Finding [GC98]<br />

Initialize f = f o<br />

For k = 1 to kmax, and while |µk| >threshold, do<br />

Compute gradient g k and Hessian Hk at f k−1<br />

Compute a search direction v k, e.g. v k = Hk −1g k<br />

Normalize v k to ||v k|| = 1<br />

Compute the absolute minumum µk of the rational function in<br />

µ:<br />

Φ(µ) <strong>de</strong>f<br />

= ΥT (f k−1 + µ v k)<br />

S<strong>et</strong> f k = f k−1 + µk v k<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 385


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Discr<strong>et</strong>e Algorithms<br />

Algorithm ILSP<br />

Iterative Least-Squares with Projection [TVP96]<br />

Assumes that components si[n] ∈ A, known alphab<strong>et</strong><br />

Assumes columns of H belong to a known array manifold<br />

Initialize H, and start the loop<br />

Compute LS estimate of matrix S in equation X = H S<br />

Project S onto A<br />

Compute LS estimate of H in equation X = H S<br />

Project H onto the array manifold<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 386


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Content<br />

Smart chemical sensore arrays<br />

Nicolski-Eisenman mo<strong>de</strong>l<br />

M<strong>et</strong>hod based on source silences (Eusipco 2008) [DJ08]<br />

M<strong>et</strong>hod based on positivity (ICA 2009) [DJM09]<br />

These works were done by L. Duarte during his PhD thesis in<br />

GIPSA-lab (2006-2009)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 387


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Ca 2+ sensor (mV)<br />

[Ca 2+ ] (M)<br />

Ion-selective electro<strong>de</strong>s: the interference problem<br />

- 0 . 0 2<br />

- 0 . 0 4<br />

- 0 . 0 6<br />

Aim: to estimate the concentration of several ions in a<br />

solution.<br />

Electro<strong>de</strong> response<br />

- . 0 0 8<br />

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />

0 . 0 8<br />

0 . 0 6<br />

0 . 0 4<br />

0 . 0 2<br />

Mixture 1<br />

Ca 2+ concentration (M)<br />

0<br />

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />

Ca 2+<br />

Ca 2+<br />

Na+<br />

Ca 2+<br />

Na +<br />

Na +<br />

Na + sensor (mV)<br />

[Na + ] (M)<br />

- 0 . 0 4<br />

- 0 . 0 6<br />

- 0 . 0 8<br />

Electro<strong>de</strong> response<br />

- . 0 1<br />

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />

0 . 1<br />

0 . 0 5<br />

Mixture 2<br />

Na + concentration (M)<br />

0<br />

0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0 8 0 0 9 0 0 1 0 0 0<br />

Source 1 Source 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 388


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Electro<strong>de</strong> <strong>de</strong>scription - The Nicolsky-Eisenman mo<strong>de</strong>l<br />

<br />

xi(t) = ci + di log si(t) + <br />

j,j=i<br />

zi <br />

z<br />

aijsj(t) j , (58)<br />

si(t) ⇒ targ<strong>et</strong> ion concentration; sj(t) ⇒ interfering ions<br />

concentrations<br />

ci, di, aij ⇒ mixing mo<strong>de</strong>l param<strong>et</strong>ers;<br />

zi and zj ⇒ valences of the ions i and j<br />

When zi = zj ⇒ Post-nonlinear (PNL) mixing mo<strong>de</strong>l.<br />

We are interested in the case in which zi = zj.<br />

We consi<strong>de</strong>r a scenario with two ions and two electro<strong>de</strong>s.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 389


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

The Nicolsky-Eisenman mo<strong>de</strong>l (cont.)<br />

In previous works, we assumed that the component-wise<br />

functions were known in advance.<br />

Resulting Mo<strong>de</strong>l<br />

In this work, we consi<strong>de</strong>r the compl<strong>et</strong>e mo<strong>de</strong>l<br />

<br />

x1(t) = d1 log s1(t) + a12s2(t) k<br />

<br />

<br />

x2(t) = d2 log s2(t) + a21s1(t) 1<br />

, (59)<br />

k<br />

with k = z1/z2.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 390


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

The Nicolsky-Eisenman mo<strong>de</strong>l (cont.)<br />

In previous works, we assumed that the component-wise<br />

functions were known in advance.<br />

Resulting Mo<strong>de</strong>l<br />

In this work, we consi<strong>de</strong>r the compl<strong>et</strong>e mo<strong>de</strong>l<br />

<br />

x1(t) = d1 log s1(t) + a12s2(t) k<br />

<br />

<br />

x2(t) = d2 log s2(t) + a21s1(t) 1<br />

, (59)<br />

k<br />

with k = z1/z2.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 390


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

s 1<br />

s 2<br />

The Nicolsky-Eisenman mo<strong>de</strong>l (cont.)<br />

In previous works, we assumed that the component-wise<br />

functions were known in advance.<br />

Nonlinear<br />

Mixing<br />

p 1<br />

p 2<br />

d 1 log(p 1 )<br />

d 2 log(p 2 )<br />

x 1<br />

x 2<br />

exp(x 1 /u 1 )<br />

exp(x 2 /u 2 )<br />

e 1<br />

e 2<br />

Recurrent<br />

N<strong>et</strong>work<br />

Mixing Process Separating System<br />

Resulting Mo<strong>de</strong>l<br />

In this work, we consi<strong>de</strong>r the compl<strong>et</strong>e mo<strong>de</strong>l<br />

<br />

x1(t) = d1 log s1(t) + a12s2(t) k<br />

<br />

<br />

x2(t) = d2 log s2(t) + a21s1(t) 1<br />

, (59)<br />

k<br />

with k = z1/z2.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 390<br />

y 1<br />

y 2


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Assumptions<br />

1 The <strong>sources</strong> are statistically in<strong>de</strong>pen<strong>de</strong>nt;<br />

2 The <strong>sources</strong> are positive and boun<strong>de</strong>d, i.e.,<br />

> 0;<br />

si(t) ∈ [S min<br />

i<br />

, S max<br />

i<br />

], where S max<br />

i<br />

> S min<br />

i<br />

3 The mixing system is invertible in the region given by<br />

[S min ] × [S min ];<br />

1 , S max<br />

1<br />

2 , S max<br />

2<br />

4 k (the ratio b<strong>et</strong>ween the valences) is known and takes only<br />

positive integer values;<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 391


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

s 1<br />

s 2<br />

s 1<br />

1 . 4<br />

1 . 2<br />

Basic i<strong>de</strong>a<br />

Additional assumption: there is, at least, a period of time<br />

where one, and only one, source has zero-variance.<br />

1<br />

0 . 8<br />

0 . 6<br />

0 . 4<br />

0 . 2<br />

Nonlinear<br />

Mixing<br />

p 1<br />

p 2<br />

. . . . . . . . . .<br />

0<br />

0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1 1 1<br />

s 2<br />

p 1<br />

d 1 log(p 1 )<br />

d 2 log(p 2 )<br />

2 . 5<br />

2<br />

1 . 5<br />

1<br />

0 . 5<br />

x 1<br />

x 2<br />

exp(x 1 /u 1 )<br />

exp(x 2 /u 2 )<br />

. . . . . . .<br />

0<br />

0 2 0 4 0 6 0 8 1 1 2 1 4 1 6<br />

p 2<br />

e 1<br />

e 2<br />

x 1<br />

0 . 0 1<br />

0 . 0 0 5<br />

0<br />

- 0 . 0 0 5<br />

- 0 . 0 1<br />

- 0 . 0 1 5<br />

- 0 . 0 2<br />

- 0 . 0 2 5<br />

Recurrent<br />

N<strong>et</strong>work<br />

y 1<br />

y 2<br />

- .<br />

- . - . - . - . . .<br />

0 0 3<br />

0 0 4 0 0 3 0 0 2 0 0 1 0 0 0 1 0 0 2<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 392<br />

x 2


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

s 1<br />

s 2<br />

Nonlinear<br />

Mixing<br />

Basic i<strong>de</strong>a in equations<br />

p 1<br />

p 2<br />

d 1 log(p 1 )<br />

d 2 log(p 2 )<br />

x 1<br />

x 2<br />

exp(x 1 /u 1 )<br />

exp(x 2 /u 2 )<br />

e 1<br />

e 2<br />

Recurrent<br />

N<strong>et</strong>work<br />

Mixing Process Separating System<br />

During the silent periods (s1(t) = S1):<br />

p1(t) = S1 + a12s2(t) k<br />

p2(t) = s2(t) + a21S 1<br />

k<br />

1<br />

y 1<br />

y 2<br />

. (60)<br />

In the (p1, p2) plane, we have a polynomial of or<strong>de</strong>r k:<br />

p1(t) = S1 + a12(p2(t) − a21S 1<br />

k<br />

1 )k . (61)<br />

p1(t) =<br />

k<br />

ϕip2(t) i , (62)<br />

i=0<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 393


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Recovering of polynomials<br />

Mapping b<strong>et</strong>ween the (p1, p2) and (e1, e2) planes<br />

e1(t)<br />

e2(t)<br />

=<br />

=<br />

<br />

d1 log(p1(t))<br />

exp u1<br />

<br />

d2 log(p2(t))<br />

exp<br />

u2<br />

= p1(t) d 1<br />

u1<br />

= p2(t) d 2<br />

u2<br />

. (63)<br />

Using this equation and the polynomial relation in the (p1, p2)<br />

plane<br />

<br />

k<br />

e1(t) =<br />

ϕie2(t) u <br />

2 i d2 d1 u1<br />

. (64)<br />

i=0<br />

For what values of u1 and u2 the function (64) is a polynomial<br />

of or<strong>de</strong>r k?<br />

I<strong>de</strong>al solution u1 = d1 and u2 = d2;<br />

When all the coefficients except ϕk in (64) are null, then<br />

(u1 = Dd1, u2 = Dd2) also culminates in a polynomial of or<strong>de</strong>r<br />

k.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 394


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

How to <strong>de</strong>tect the silent periods?<br />

During the silent periods,<br />

x1 = g1(s2)<br />

x2 = g2(s2)<br />

. (65)<br />

Maximum (nonlinear) correlation b<strong>et</strong>ween x1 and x2<br />

Normalized mutual information<br />

ς(x1, x2) = 1 − exp(−2I (x1, x2)) (66)<br />

ς(x1, x2) = 0 when x1 and x2 are statistically in<strong>de</strong>pen<strong>de</strong>nt;<br />

ς(x1, x2) → 1 when there is a <strong>de</strong>terministic relation b<strong>et</strong>ween x1<br />

and x2;<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 395


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Result for silent periods <strong>de</strong>tection<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 396


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Final Algorithm<br />

1 D<strong>et</strong>ection of silent periods<br />

Estimate the mutual information b<strong>et</strong>ween the mixtures x1 and<br />

x2 for a moving-time window.<br />

Select the time window in which the mutual information is<br />

maximum.<br />

2 Estimation of the component-wise functions by the polynomial<br />

recovering i<strong>de</strong>a.<br />

For the selected time window, minimize the expression<br />

min<br />

u1,u2<br />

<br />

<br />

e1(t) −<br />

t<br />

k<br />

i=0<br />

αi (e2(t)) i<br />

2<br />

, (67)<br />

3 Training the recurrent n<strong>et</strong>work<br />

Apply on ei the algorithm for the simplified version of the NE<br />

mo<strong>de</strong>l.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 397


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results<br />

Estimation of the concentration of the ions Ca 2+ and Na +<br />

using two electro<strong>de</strong>s;<br />

Mixing param<strong>et</strong>ers a12 = 0.79 and a21 = 0.40 (extracted from<br />

a IUPAC report) and d1 = 0.0129 and d2 = 0.0258;<br />

Sources artificially generated;<br />

Number of samples: 1000;<br />

Window length for silent period <strong>de</strong>tection: 151;<br />

Performance in<strong>de</strong>x:<br />

<br />

E{s<br />

SIRi = 10 log<br />

2 i }<br />

E{(si − yi) 2 <br />

. (68)<br />

}<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 398


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results (cont.)<br />

SIR1 = 40.52dB, SIR2 = 38.27dB and SIR = 39.39dB<br />

<strong>sources</strong> mixtures r<strong>et</strong>rieved <strong>sources</strong><br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 399


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Conclusions<br />

A compl<strong>et</strong>e nonlinear BSS m<strong>et</strong>hod was <strong>de</strong>veloped for a<br />

chemical sensing application.<br />

The <strong>de</strong>veloped m<strong>et</strong>hod can avoid time-<strong>de</strong>manding calibration<br />

stages.<br />

There are still several challenging points to be investigated:<br />

The number of samples in a real application may be small.<br />

The in<strong>de</strong>pen<strong>de</strong>nce assumption may be rather strong, e.g. if a<br />

regulatory process b<strong>et</strong>ween the ions exists.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 400


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Motivations for using the Bayesian approach<br />

Prior information is available<br />

⎛<br />

xit = ei + di log ⎝sit 10 + <br />

j,j=i<br />

zi z<br />

aijsjt<br />

j<br />

⎞<br />

⎠ + nit, (69)<br />

ei takes value in the interval [0.05, 0.35]; [Gru07]<br />

Theor<strong>et</strong>ical value for the Nerstian slope ⇒ di = RT ln(10)/ziF<br />

(0.059V for room temperature); [FF03]<br />

Always non-negative. Very often in the interval [0, 1] ;<br />

[UBU + 00]<br />

The <strong>sources</strong> are positive.<br />

Takes noise into account;<br />

In contrast to ICA, the statistical in<strong>de</strong>pen<strong>de</strong>nce is rather a<br />

working assumption in the Bayesian approach [FG06];<br />

May work even if the number of samples is small.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 401


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Bayesian source separation m<strong>et</strong>hod: problem and notations<br />

Problem: given X, estimate the unknown param<strong>et</strong>ers<br />

θ = [S, A, d, e, σ, φ];<br />

S ⇒ <strong>sources</strong>;<br />

φ ⇒ <strong>sources</strong> hyperparam<strong>et</strong>ers;<br />

A ⇒ selectivity coefficients;<br />

d ⇒ Nerstian slopes;<br />

e ⇒ offs<strong>et</strong> param<strong>et</strong>ers;<br />

σ ⇒ noise variances.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 402


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Bayesian source separation m<strong>et</strong>hod: an overview<br />

Problem: given X, estimate the unknown param<strong>et</strong>ers<br />

θ = [S, A, d, e, σ, φ];<br />

In the Bayesian approach, estimation of θ is based on the<br />

posterior information<br />

p(θ|X) ∝ p(X|θ)p(θ) (70)<br />

The likelihood function is given by:<br />

⎛ ⎛<br />

nd nc<br />

ns<br />

<br />

<br />

p(X|θ) = ⎝ei + di log ⎝<br />

t=1i=1<br />

Nxit<br />

j=1<br />

aijs zi /zj<br />

jt<br />

⎞<br />

⎠ , σ 2 i<br />

since we assume an i.i.d. Gaussian noise which is spatial<br />

in<strong>de</strong>pen<strong>de</strong>nt.<br />

⎞<br />

⎠ ,<br />

(71)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 403


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Prior <strong>de</strong>finitions<br />

Log-normal prior distribution for the <strong>sources</strong> (non-negative<br />

distribution)<br />

1<br />

p(sjt) = <br />

sjt 2πσ2 <br />

exp −<br />

sj<br />

(log(sjt) − µsj )2<br />

2σ2 <br />

[0,+∞[(sjt),<br />

sj<br />

(72)<br />

Motivations<br />

The estimation of φj = [µsj σ2 sj ] is not difficult, since we can<br />

<strong>de</strong>fine a conjugate pair.<br />

Ionic activities are expected to have a small variation in the<br />

logarithmic scale.<br />

The <strong>sources</strong> are assumed i.i.d. and statistically mutually<br />

in<strong>de</strong>pen<strong>de</strong>nt:<br />

ns nd p(S) = p(sjt), (73)<br />

j=1 t=1<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 404


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Prior <strong>de</strong>finitions (cont.)<br />

Sources param<strong>et</strong>ers φj = [µsj σ2 sj ]<br />

p(µsj ) = N (˜µsj , ˜σ2 sj ), p(1/σ2 sj ) = G(ασs j , βσs j ) (74)<br />

Selectivity coefficients aij: very often within [0, 1]<br />

p(aij) = U(0, 1) (75)<br />

Nernstian slopes di: i<strong>de</strong>ally 0.059V at room temperature<br />

p(di) = N (µdi = 0.059/zi, σ 2 di ) (76)<br />

Offs<strong>et</strong> param<strong>et</strong>ers ei lie in the interval [0.050, 0.350]V<br />

Noise variances σi:<br />

p(ei) = N (µei = 0.20, σ2 ei ) (77)<br />

p(1/σ 2 i ) = G(ασi , βσi ) (78)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 405


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

The posterior distribution<br />

The posterior distribution is given by<br />

ns <br />

p(θ|X) ∝ p(X|θ) ·<br />

n d<br />

j=1 t=1<br />

p(sjt|µsj , σ2 ns <br />

sj ) · p(µsj )<br />

j=1<br />

ns<br />

nc ns<br />

nc nc nc<br />

<br />

· p(σsj ) · p(aij) · p(ei) · p(di) · p(σi)<br />

j=1<br />

i=1 j=1<br />

i=1<br />

i=1<br />

Bayesian MMSE estimator ⇒ θMMSE = θp(θ|X)dθ<br />

(Difficult to calculate!)<br />

i=1<br />

Given θ 1 , θ 2 , . . . , θ M (samples drawn from p(θ|X)), the<br />

Bayesian MMSE estimator can be approximated by:<br />

θMMSE = 1<br />

M<br />

θ<br />

M<br />

i<br />

i=1<br />

(79)<br />

(80)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 406


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results on synth<strong>et</strong>ic data<br />

Synth<strong>et</strong>ic <strong>sources</strong> (drawn from log-normal distributions);<br />

Noise level ⇒ SNR1 = SNR2 = 18 dB;<br />

Number of samples: 500;<br />

Performance in<strong>de</strong>x:<br />

<br />

E{si(t)<br />

SIRi = 10 log<br />

2 }<br />

E{(si(t) − si(t)) 2 <br />

. (81)<br />

}<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 407


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results on synth<strong>et</strong>ic data: mixtures<br />

Two ISEs (nc = 2) for <strong>de</strong>tecting the activities of NH + 4 and K + (ns = 2) ⇒<br />

Post-nonlinear mo<strong>de</strong>l;<br />

Mixing coefficients a12 = 0.3, a21 = 0.4, d1 = 0.056, d2 = 0.056, e1 = 0.090,<br />

e2 = 0.105;<br />

SIR1 = 19.7 dB, SIR2 = 18.9 dB, SIR = 19.3 dB<br />

(a) Mixtures. (b) Original <strong>sources</strong>.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 408


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results on synth<strong>et</strong>ic data: estimated <strong>sources</strong><br />

Two ISEs (nc = 2) for <strong>de</strong>tecting the activities of NH + 4 and K + (ns = 2) ⇒<br />

Post-nonlinear mo<strong>de</strong>l;<br />

Mixing coefficients a12 = 0.3, a21 = 0.4, d1 = 0.056, d2 = 0.056, e1 = 0.090,<br />

e2 = 0.105;<br />

SIR1 = 19.7 dB, SIR2 = 18.9 dB, SIR = 19.3 dB<br />

(a) Mixtures. (b) R<strong>et</strong>rieved <strong>sources</strong>.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 409


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results on real data<br />

ISE array (NH + 4 − ISE and K + − ISE)<br />

Sources<br />

ISE<br />

Array<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 410


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results on real data (cont.)<br />

nd = 169, SIR1 = 25.1 dB, SIR2 = 23.7 dB, SIR = 24.4 dB<br />

(a) ISE array response. (b) Actual <strong>sources</strong>.<br />

Since the <strong>sources</strong> are clearly <strong>de</strong>pen<strong>de</strong>nt here, an ICA-based<br />

m<strong>et</strong>hod failed in this case.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 411


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Results on real data (cont.)<br />

nd = 169, SIR1 = 25.1 dB, SIR2 = 23.7 dB, SIR = 24.4 dB<br />

(a) ISE array response. (b) R<strong>et</strong>rieved signals.<br />

Since the <strong>sources</strong> are clearly <strong>de</strong>pen<strong>de</strong>nt here, an ICA-based<br />

m<strong>et</strong>hod failed in this case.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 412


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through Problem Silence Bayesian<br />

Conclusions<br />

A Bayesian non-linear source separation was proposed for<br />

processing the outputs of an ISE array;<br />

Good results were obtained even in a complicated situation<br />

(<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong> and reduced number of samples);<br />

Future works inclu<strong>de</strong><br />

The <strong>de</strong>velopment of a mo<strong>de</strong>l that also takes into account the<br />

time-structure of the <strong>sources</strong>;<br />

The case where neither the number of ions in the solution nor<br />

their valences are available;<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 413


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Show-through effect<br />

Cooperation with F. Merrikh-Bayat and M. Babaie-Za<strong>de</strong>h, Sharif<br />

Univ. of Technology [MBBZJ08, MBBZJ11]<br />

sous-titre<br />

Show-through, due to paper transparency and thickness,<br />

Pigment oil pen<strong>et</strong>ration,<br />

Vehicle oil component, due to loss of opacity,<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 414


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

sous-titre<br />

State-of-the-art<br />

Often applied for texts and handwritting documents: 1-si<strong>de</strong><br />

m<strong>et</strong>hods or 2-si<strong>de</strong> m<strong>et</strong>hods,<br />

ICA assuming<br />

Linear mo<strong>de</strong>l of mixtures (Tonazzini <strong>et</strong> al., 2007 ; Ophir,<br />

Malah, 2007)<br />

Nonlinear mo<strong>de</strong>l of mixtures (Almeida, 2005 ; Sharma, 2001)<br />

In this work, we consi<strong>de</strong>r:<br />

mo<strong>de</strong>lisation of the nonlinear mixture,<br />

blurring effect.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 415


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Nonlinearity of show-through: experimental study<br />

Evi<strong>de</strong>nce<br />

Sum of luminance is NL.<br />

Whiter the pixel, more important is show-through. More black<br />

than black is impossible !<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 416


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Nonlinearity of show-through: mathematical mo<strong>de</strong>l<br />

Basic equation<br />

Show-through has a gain which <strong>de</strong>pends of the grayscale of<br />

the front image<br />

It leads to the mo<strong>de</strong>l of mixtures:<br />

f s<br />

r (x, y) = a1f i<br />

r (x, y) + b1f i<br />

v (x, y)g1[f i<br />

r (x, y)]<br />

f s<br />

v (x, y) = a2f i<br />

v (x, y) + b2f i<br />

r (x, y)g2[f i<br />

v (x, y)]<br />

where i = initial, s = scanned, r = recto, v = verso, ai and bi<br />

<strong>de</strong>note unknown mixing param<strong>et</strong>ers, and gi(.) <strong>de</strong>note nonlinear<br />

gains<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 417


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Nonlinearity of show-through: mathematical mo<strong>de</strong>l<br />

Shape of the gain<br />

The gain function can be estimated by computing<br />

g1[f i<br />

r (x, y)] = [f s<br />

v (x, y) − a1f i<br />

r (x, y)]/b1f i<br />

v (x, y)<br />

g2[f i<br />

v (x, y)] = [f s<br />

v (x, y) − a2f i<br />

v (x, y)]/b2f i<br />

r (x, y)<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 418


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Nonlinearity of show-through: mathematical mo<strong>de</strong>l<br />

Approximation of the gain function<br />

The gain function can be estimated by an exponential:<br />

g1[f i<br />

r (x, y)] = γ1 exp[β1f i<br />

r (x, y)] ≈ γ1(1 + β1f i<br />

r (x, y))<br />

g2[f i<br />

v (x, y)] = γ2 exp[β2f i<br />

v (x, y)] ≈ γ2(1 + β2f i<br />

v (x, y))<br />

It leads to the approximated mixing mo<strong>de</strong>l:<br />

<br />

f s<br />

r (x, y) = a1f i<br />

r (x, y) + b ′ 1f i<br />

v (x, y)[1 + β1f i<br />

f s<br />

v (x, y) = a2f i<br />

v (x, y) + b ′ 2f i<br />

r (x, y)[1 + β2f i<br />

r (x, y)]<br />

v (x, y)]<br />

And finally:<br />

f s<br />

r (x, y) = a1f i<br />

r (x, y) − l1f i<br />

v (x, y) − q1f i<br />

v (x, y)f i<br />

r (x, y)]<br />

f s<br />

v (x, y) = a2f i<br />

v (x, y) + l2f i<br />

r (x, y) − q2f i<br />

r (x, y)f i<br />

v (x, y)]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 419


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Recursive structure<br />

Separation structure<br />

Studied by Deville and Hosseini [HD03, DH09]<br />

f s<br />

r (x, y) = a1f i<br />

r (x, y) − l1f i<br />

v (x, y) − q1f i<br />

v (x, y)f i<br />

r (x, y)]<br />

f s<br />

v (x, y) = a2f i<br />

v (x, y) + l2f i<br />

r (x, y) − q2f i<br />

r (x, y)f i<br />

v (x, y)]<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 420


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Cancellation of show-through: preliminary results<br />

Preliminary results with NL mo<strong>de</strong>l<br />

Bilinear mo<strong>de</strong>l neither always invertible, nor always stable.<br />

Param<strong>et</strong>ers estimated by ML.<br />

Comments<br />

The other si<strong>de</strong> image never perfectly removed, especially when<br />

no superimposition.<br />

It means difference b<strong>et</strong>ween verso image and recto image is<br />

not a simple gain<br />

Diffusion in the paper ⇒ blurring effect, mo<strong>de</strong>lled by 2D filter.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 421


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Improved mo<strong>de</strong>l and recursive structure<br />

Mo<strong>de</strong>l with filtering<br />

The mixture is not the nonlinear superimposition of the recto<br />

image with the verso image, but with a filtered version of the<br />

verso image, hence the final separation structure<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 422


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Cancellation of show-through<br />

Final results with NL mo<strong>de</strong>ling and filtering<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 423


Intro D<strong>et</strong>erm Mars Cum Chemical Show-through<br />

Summary<br />

Scanned images: conclusions and perspectives<br />

Show-through is a NL phenomenon which can be mo<strong>de</strong>led by<br />

bilinear mixtures ;<br />

In addition, it is necessary to consi<strong>de</strong>r the blurring effect,<br />

which can be viewed as a 2-D filtering effect.<br />

Experimental results show the mixture mo<strong>de</strong>l has to take into<br />

account both NL and convolutive effects.<br />

Perspectives<br />

For avoiding registration due to the 2-si<strong>de</strong> scanning, one could<br />

explore if two scans of the same si<strong>de</strong> (un<strong>de</strong>r different<br />

conditions) provi<strong>de</strong> sufficient diversity;<br />

Other priors, like positivity of images, and of the coefficients<br />

could be exploited, e.g. by NMF or Bayesian approaches.<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS X.Applications 424


Bibliography<br />

XI. Bibliography<br />

I II III IV V VI VII VIII IX X<br />

P.Comon & C.Jutten, Peyresq July 2011 BSS XI.Bibliography 425


References<br />

[AAG03] U. Amato, A Antoniadis, and G. Gregoire. In<strong>de</strong>pen<strong>de</strong>nt component discriminant analysis.<br />

Int. J. Mathematics, 3:735–753, 203.<br />

[ACCF04] L. Albera, P. Comon, P. Chevalier, and A. Ferreol. Blind i<strong>de</strong>ntification of un<strong>de</strong>r<strong>de</strong>termined<br />

mixtures based on the hexacovariance. In ICASSP’04, volume II, pages 29–32, Montreal,<br />

May 17–21 2004.<br />

[ACX07] L. Albera, P. Comon, and H. Xu. SAUD, un algorithme d’ica par déflation semi-algébrique.<br />

In GRETSI, Troyes, September 2007.<br />

[AFCC04] L. Albera, A. Ferreol, P. Comon, and P. Chevalier. Blind i<strong>de</strong>ntification of overcompl<strong>et</strong>e<br />

mixtures of <strong>sources</strong> (BIOME). Lin. Algebra Appl., 391:1–30, November 2004.<br />

[AFCC05] L. Albera, A. Ferreol, P. Chevalier, and P. Comon. ICAR, a tool for blind source separation<br />

using fourth or<strong>de</strong>r statistics only. IEEE Trans. Sig. Proc., 53(10):3633–3643, October 2005.<br />

[AJ05] S. Achard and C. Jutten. I<strong>de</strong>ntifiability of post nonlinear mixtures. IEEE Signal Processing<br />

L<strong>et</strong>ters, 12(5):423–426, May 2005.<br />

[Alm05] L. Almeida. Separating a real-life nonlinear image mixture. Journal of Machine Learning<br />

Research, 6:1199–1229, July 2005.<br />

[Alm06] L. Almeida. Nonlinear Source Separation. Synthesis Lectures on Signal Processing, vol. 2.<br />

Morgan & Claypool Publishers, 2006.<br />

[AMLM97] K. Abed-Meraim, P. Loubaton, and E. Moulines. A subspace algorithm for certain blind<br />

i<strong>de</strong>ntification problems. IEEE Trans. Inf. Theory, pages 499–511, March 1997.<br />

[AMML97] K. Abed-Meraim, E. Moulines, and P. Loubaton. Prediction error m<strong>et</strong>hod for second-or<strong>de</strong>r<br />

blind i<strong>de</strong>ntification. IEEE Trans. Sig. Proc., 45(3):694–705, March 1997.<br />

[APJ01] S. Achard, D.T. Pham, and C. Jutten. Blind source separation in post nonlinear mixtures.<br />

In Proc. of the 3rd Workshop on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Signal Separation<br />

(ICA2001), pages 295–300, San Diego (California, USA), 2001.<br />

[APJ03] S. Achard, D.T. Pham, and C. Jutten. Quadratic <strong>de</strong>pen<strong>de</strong>nce measure for nonlinear blind<br />

source separation. In S.-I. Amari, A. Cichocki, S. Makino, and N. Murata, editors, Proc. of<br />

4th Int. Symp. on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Blind Source Separation (ICA2003),<br />

pages 263–268, Nara, Japan, April 2003.<br />

[APJ04] S. Achard, D.T. Pham, and C. Jutten. Criteria based on mutual information minimization<br />

for blind source separation in post-nonlinear mixtures. Signal Processing, 85(5):965–974,<br />

2004.<br />

[BAM93] A. Belouchrani and K. Abed-Meraim. <strong>Séparation</strong> aveugle au second ordre <strong>de</strong> <strong>sources</strong><br />

corrélées. In GRETSI, pages 309–312, Juan-Les-Pins, France, Septembre 1993.<br />

[BCA + 10] H. Becker, P. Comon, L. Albera, M. Haardt, and I. Merl<strong>et</strong>. Multiway analysis for source<br />

localization and extraction. In EUSIPCO’2010, pages 1349–1353, Aalborg, DK, August 23-27<br />

2010. hal-00501845. Best stu<strong>de</strong>nt paper award.<br />

1


[BCMT10] J. Brachat, P. Comon, B. Mourrain, and E. Tsigaridas. Symm<strong>et</strong>ric tensor <strong>de</strong>composition.<br />

Linear Algebra Appl., 433(11/12):1851–1872, December 2010. hal:inria-00355713,<br />

arXiv:0901.3706.<br />

[BGR80] A. Benveniste, M. Goursat, and G. Rug<strong>et</strong>. Robust i<strong>de</strong>ntification of a non-minimum phase<br />

system. IEEE Trans. Auto. Control, 25(3):385–399, June 1980.<br />

[BK83] G. Bienvenu and L. Kopp. Optimality of high-resolution array processing using the eigensystem<br />

approach. IEEE Trans. ASSP, 31(5):1235–1248, October 1983.<br />

[BS95] A. J. Bell and T. J. Sejnowski. An information-maximization approach to blind separation<br />

and blind <strong>de</strong>convolution. Neural Computation, 7(6):1129–1159, November 1995.<br />

[BS02] J. M. F. Ten Berge and N. Sidiropoulos. On uniqueness in Can<strong>de</strong>comp/Parafac. Psychom<strong>et</strong>rika,<br />

67(3):399–409, September 2002.<br />

[BZJN01] M. Babaie-Za<strong>de</strong>h, C. Jutten, and K. Nayebi. Separating convolutive post non-linear mixtures.<br />

In Proc. of the 3rd Workshop on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Signal Separation<br />

(ICA2001), pages 138–143, San Diego (California, USA), 2001.<br />

[BZJN02] M. Babaie-Za<strong>de</strong>h, C. Jutten, and K. Nayebi. A geom<strong>et</strong>ric approach for separating post<br />

nonlinear mixtures. In Proc. of the XI European Signal Processing Conf. (EUSIPCO 2002),<br />

volume II, pages 11–14, Toulouse, France, September 2002.<br />

[BZJN04] M. Babaie-Za<strong>de</strong>h, C. Jutten, and K. Nayebi. Differential of mutual information. IEEE Signal<br />

Processing L<strong>et</strong>ters, 11(1):48–51, January 2004.<br />

[CA02] A. Cichocki and S-I. Amari. Adaptive Blind Signal and Image Processing. Wiley, New York,<br />

2002.<br />

[Car97] J. F. Cardoso. Infomax and Maximum Likelihood for source separation. IEEE Sig. Proc.<br />

L<strong>et</strong>ters, 4(4):112–114, April 1997.<br />

[Car99] J. F. Cardoso. High-or<strong>de</strong>r contrasts for in<strong>de</strong>pen<strong>de</strong>nt component analysis. Neural Computation,<br />

11(1):157–192, January 1999.<br />

[CBDC09] P. Comon, J. M. F. Ten Berge, L. DeLathauwer, and J. Castaing. Generic and typical ranks<br />

of multi-way arrays. Linear Algebra Appl., 430(11–12):2997–3007, June 2009. hal-00410058.<br />

[CDS99] S. S. Chen, D. L. Donoho, and M. A. Saun<strong>de</strong>rs. Atomic <strong>de</strong>composition by basis pursuit.<br />

SIAM Journal on Scientific Computing, 20(1):33–61, 1999.<br />

[CG90] P. Comon and G. H. Golub. Tracking of a few extreme singular values and vectors in<br />

signal processing. Proceedings of the IEEE, 78(8):1327–1343, August 1990. (published from<br />

Stanford report 78NA-89-01, feb 1989).<br />

[CGLM08] P. Comon, G. Golub, L-H. Lim, and B. Mourrain. Symm<strong>et</strong>ric tensors and symm<strong>et</strong>ric tensor<br />

rank. SIAM Journal on Matrix Analysis Appl., 30(3):1254–1279, September 2008. hal-<br />

00327599.<br />

2


[CJ10] P. Comon and C. Jutten, editors. Handbook of Blind Source Separation, In<strong>de</strong>pen<strong>de</strong>nt Component<br />

Analysis and Applications. Aca<strong>de</strong>mic Press, Oxford UK, Burlington USA, 2010. ISBN:<br />

978-0-12-374726-6, 19 chapters, 830 pages. hal-00460653.<br />

[CJH91] P. Comon, C. Jutten, and J. Herault. Separation of <strong>sources</strong>, part II: Problems statement.<br />

Signal Processing, Elsevier, 24(1):11–20, July 1991.<br />

[CL96] J. F. Cardoso and B. Laheld. Equivariant adaptive source separation. IEEE Trans. on Sig.<br />

Proc., 44(12):3017–3030, December 1996.<br />

[CL97] A. Chevreuil and P. Loubaton. Modulation and second or<strong>de</strong>r cyclostationarity: Structured<br />

subspace m<strong>et</strong>hod and new i<strong>de</strong>ntifiability conditions. IEEE Sig. Proc. L<strong>et</strong>ters, 4(7):204–206,<br />

July 1997.<br />

[CL11] P. Comon and L.-H. Lim. Sparse representations and low-rank tensor approximation. Research<br />

Report ISRN I3S//RR-2011-01-FR, I3S, Sophia-Antipolis, France, February 2011.<br />

[CL13] P. Comon and L-H. Lim. Tensor Computations. Cambridge Univ. Press, 2013. in preparation.<br />

[CLDA09] P. Comon, X. Luciani, and A. L. F. De Almeida. Tensor <strong>de</strong>compositions, alternating least<br />

squares and other tales. Jour. Chemom<strong>et</strong>rics, 23:393–405, August 2009. hal-00410057.<br />

[CM96] P. Comon and B. Mourrain. Decomposition of quantics in sums of powers of linear forms. Signal<br />

Processing, Elsevier, 53(2):93–107, September 1996. special issue on High-Or<strong>de</strong>r Statistics.<br />

[CM97] P. Comon and E. Moreau. Improved contrast <strong>de</strong>dicated to blind separation in communications.<br />

In ICASSP, pages 3453–3456, Munich, April 20-24 1997.<br />

[Com89a] P. Comon. Separation <strong>de</strong> melanges <strong>de</strong> signaux. In XIIème Colloque Gr<strong>et</strong>si, pages 137–140,<br />

Juan les Pins, 12 -16 juin 1989.<br />

[Com89b] P. Comon. Separation of stochastic processes. In Proc. Workshop on Higher-Or<strong>de</strong>r Spectral<br />

Analysis, pages 174–179, Vail, Colorado, June 28-30 1989. IEEE-ONR-NSF.<br />

[Com90] P. Comon. Analyse en Composantes Indépendantes <strong>et</strong> i<strong>de</strong>ntification aveugle. Traitement du<br />

Signal, 7(3):435–450, December 1990. Numero special non lineaire <strong>et</strong> non gaussien.<br />

[Com92a] P. Comon. In<strong>de</strong>pen<strong>de</strong>nt Component Analysis. In J-L. Lacoume, editor, Higher Or<strong>de</strong>r Statistics,<br />

pages 29–38. Elsevier, Amsterdam, London, 1992.<br />

[Com92b] P. Comon. MA i<strong>de</strong>ntification using fourth or<strong>de</strong>r cumulants. Signal Processing, Elsevier,<br />

26(3):381–388, 1992.<br />

[Com94a] P. Comon. In<strong>de</strong>pen<strong>de</strong>nt Component Analysis, a new concept ? Signal Processing, Elsevier,<br />

36(3):287–314, April 1994. Special issue on Higher-Or<strong>de</strong>r Statistics. hal-00417283.<br />

[Com94b] P. Comon. Tensor diagonalization, a useful tool in signal processing. In M. Blanke and<br />

T. So<strong>de</strong>rstrom, editors, IFAC-SYSID, 10th IFAC Symposium on System I<strong>de</strong>ntification, volume<br />

1, pages 77–82, Copenhagen, Denmark, July 4-6 1994. invited session.<br />

3


[Com95] P. Comon. Supervised classification, a probabilistic approach. In Verleysen, editor, ESANN-<br />

European Symposium on Artificial Neural N<strong>et</strong>works, pages 111–128, Brussels, Apr 19-21<br />

1995. D facto Publ. invited paper.<br />

[Com96] P. Comon. Contrasts for multichannel blind <strong>de</strong>convolution. IEEE Signal Processing L<strong>et</strong>ters,<br />

3(7):209–211, July 1996.<br />

[Com97] P. Comon. “Cumulant tensors”. IEEE SP Workshop on HOS, Banff, Canada, July 21-23<br />

1997. invited keynote address, www.i3s.unice.fr/~comon/plenaryBanff97.html.<br />

[Com01] P. Comon. From source separation to blind equalization, contrast-based approaches. In Int.<br />

Conf. on Image and Signal Processing (ICISP’01), Agadir, Morocco, May 3-5, 2001. invited<br />

plenary.<br />

[Com02a] P. Comon. Blind equalization with discr<strong>et</strong>e inputs in the presence of carrier residual. In<br />

Second IEEE Int. Symp. Sig. Proc. Inf. Theory, Marrakech, Morocco, December 2002. invited<br />

session.<br />

[Com02b] P. Comon. Tensor <strong>de</strong>compositions, state of the art and applications. In J. G. McWhirter and<br />

I. K. Proudler, editors, Mathematics in Signal Processing V, pages 1–24. Clarendon Press,<br />

Oxford, UK, 2002. arXiv:0905.0454v1.<br />

[Com04] P. Comon. Contrasts, in<strong>de</strong>pen<strong>de</strong>nt component analysis, and blind <strong>de</strong>convolution. Int. Journal<br />

Adapt. Control Sig. Proc., 18(3):225–243, April 2004. special issue on Signal Separation:<br />

http://www3.interscience.wiley.com/cgi-bin/jhome/4508. Preprint: I3S Research Report<br />

RR-2003-06.<br />

[Com09] P. Comon. Tensors, usefulness and unexpected properties. In 15th IEEE Workshop on<br />

Statistical Signal Processing (SSP’09), pages 781–788, Cardiff, UK, Aug. 31 - Sep. 3 2009.<br />

keynote. hal-00417258.<br />

[Com10a] P. Comon. Tensor <strong>de</strong>compositions viewed from different fields. Bari, Italy, September 13-17<br />

2010. Workshop on Tensor Decompositions and Applications (TDA 2010), invited.<br />

[Com10b] P. Comon. Tensors in Signal Processing. Aalborg, DK, August 23-27 2010. EUSIPCO’2010,<br />

invited plenary.<br />

[Com10c] P. Comon. When Tensor Decomposition me<strong>et</strong>s Compressed Sensing. Saint Malo, France,<br />

September 27-30 2010. 9th Int. Conf. Latent Variable Analysis, invited keynote.<br />

[CR06] P. Comon and M. Rajih. Blind i<strong>de</strong>ntification of un<strong>de</strong>r-<strong>de</strong>termined mixtures based on the<br />

characteristic function. Signal Processing, 86(9):2271–2281, September 2006. hal-00263668.<br />

[CS93] J. F. Cardoso and A. Souloumiac. Blind beamforming for non-Gaussian signals. IEE Proceedings<br />

- Part F, 140(6):362–370, December 1993. Special issue on Applications of High-Or<strong>de</strong>r<br />

Statistics.<br />

[CT91] T. Cover and J. Thomas. Elements of Information Theory. Wiley Series in Telecommunications,<br />

1991.<br />

4


[CZA08] A. Cichocki, R. Zdunek, and S-I. Amari. Nonnegative matrix and tensor factorization. IEEE<br />

Sig. Proc. Mag., pages 142–145, 2008.<br />

[CZPA09] A. Cichocki, R. Zdunek, A. H. Phan, and S-I. Amari. Nonnegative Matrix and Tensor<br />

Factorization. Wiley, 2009.<br />

[DAC02] Y. Deville, O. Albu, and N. Charkani. New self normalized blind source separation n<strong>et</strong>works<br />

for instantaneous and convolutive mixtures. Int. J. Adapt. contr. Sig. Proc., 16:753–761,<br />

2002.<br />

[Dar53] G. Darmois. Analyse générale <strong>de</strong>s liaisons stochastiques. Rev. Inst. Intern. Stat., 21:2–8,<br />

1953.<br />

[DDD04] I. Daubechies, M. Defrise, and C. DeMol. An iterative thresholding algorithm for linear<br />

inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics,<br />

57(11):1413–1457, 2004.<br />

[DE03] D. L. Donoho and M. Elad. Optimally sparse representation in general (nonorthogonal)<br />

dictionaries via ℓ 1 minimization. Proc. Nat. Aca. Sci., 100(5):2197–2202, March 2003.<br />

[Des01] F. Desbouvries. Problemes Structures <strong>et</strong> Algorithmes du Second Ordre en Traitement du<br />

Signal. HdR Marne la vallée, 22 Jun 2001.<br />

[DH09] Y. Deville and S. Hosseini. Recurrent n<strong>et</strong>works for separating extractable-targ<strong>et</strong> nonlinear<br />

mixtures, part I: Non-blind configurations. Signal Processing, 89(4):378–393, 2009.<br />

[DJ08] L. Duarte and C. Jutten. A nonlinear source separation approach for the nicolsky-eisenman<br />

mo<strong>de</strong>l. In CD-Rom Proceedings of EUSIPCO 2008, Lausanne, Switzerland, sep 2008.<br />

[DJM09] L. T. Duarte, C. Jutten, and S. Moussaoui. Bayesian source separation of linear-quadratic<br />

and linear mixtures through a MCMC m<strong>et</strong>hod. In Machine Learning for Signal Processing<br />

(MLSP 2009), page 6 pages, Grenoble France, 09 2009.<br />

[DJM10] Leonardo Duarte, Christian Jutten, and Saïd Moussaoui. Bayesian Source Separation of<br />

Linear and Linear-quadratic Mixtures Using Truncated Priors. Journal of Signal Processing<br />

Systems, 2010.<br />

[DJTB + 10] L Duarte, C Jutten, P Temple-Boyer, A Benyahia, and J Launay. A datas<strong>et</strong> for the <strong>de</strong>sign<br />

of smart ion-selective electro<strong>de</strong> arrays for quantitative analysis. IEEE Sensors Journal,<br />

10(12):1891–1892, 2010.<br />

[DL95] N. Delfosse and P. Loubaton. Adaptive blind separation of in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong>: a <strong>de</strong>flation<br />

approach. Signal Processing, 45:59–83, 1995.<br />

[Don04] D. L. Donoho. For most large un<strong>de</strong>r<strong>de</strong>termined systems of linear equations the minimal<br />

l 1 -norm solution is also the sparsest solution. Technical report, 2004.<br />

[EK02] J. Eriksson and V. Koivunen. Blind i<strong>de</strong>ntifiability of class of nonlinear instantaneous ICA<br />

mo<strong>de</strong>ls. In Proc. of the XI European Signal Proc. Conf. (EUSIPCO2002), volume 2, pages<br />

7–10, Toulouse, France, September 2002.<br />

5


[F´88] L. Féty. Métho<strong>de</strong>s <strong>de</strong> traitement d’antenne adaptées aux radiocommunications. PhD thesis,<br />

ENST Paris, 1988.<br />

[FF03] P. Fabry and J. Foul<strong>et</strong>ier, editors. Microcapteurs chimiques <strong>et</strong> biologiques. Lavoisier, 2003.<br />

In French.<br />

[FG86] B. N. Flury and W. Gautsch. An algorithm for simultaneous orthogonal transformation of<br />

several positive <strong>de</strong>finite symm<strong>et</strong>ric matrices to nearly diagonal form. SIAM J. Sci. Stat.<br />

Comput, 7(1):169–184, 1986.<br />

[FG06] C. Févotte and S. J. Godsill. A Bayesian approach for blind separation of sparse <strong>sources</strong>.<br />

IEEE Trans. on Audio, Speech and Language Processing, 14:2174–2188, 2006.<br />

[GC98] O. Grellier and P. Comon. Blind separation of discr<strong>et</strong>e <strong>sources</strong>. IEEE Signal Processing<br />

L<strong>et</strong>ters, 5(8):212–214, August 1998.<br />

[GCMT02] O. Grellier, P. Comon, B. Mourrain, and P. Trebuch<strong>et</strong>. Analytical blind channel i<strong>de</strong>ntification.<br />

IEEE Trans. Signal Processing, 50(9):2196–2207, September 2002.<br />

[GDM97] D. Gesbert, P. Duhamel, and S. Mayrargue. On-line blind multichannel equalization based<br />

on mutually referenced filters. IEEE Trans. Sign. Proc., 45(9):2307–2317, September 1997.<br />

[GL99] A. Gorokhov and P. Loubaton. Blind i<strong>de</strong>ntification of MIMO-FIR systems: a generalized<br />

linear prediction approach. Signal Processing, Elsevier, 73:105–124, Feb. 1999.<br />

[GL06] R. Gribonval and S. Lesage. A survey of sparse components analysis for blind source separation:<br />

principles, perspectives and new challenges. In Proc. ESANN 2006, Bruges, Belgium,<br />

April 2006.<br />

[GN95] M. I. Gürelli and C. L. Nikias. EVAM: An eigenvector -based algorithm for multichannel<br />

blind <strong>de</strong>convolution of input colored signals. IEEE Trans. Sig. Proc., 43(1):134–149, January<br />

1995.<br />

[GN03] R. Gribonval and M. Nielsen. Sparse <strong>de</strong>compositions in unions of bases. IEEE Trans. Inform.<br />

Theory, 49(12):3320–3325, Dec. 2003.<br />

[Gru07] P. Gruendler. Chemical sensors: an introduction for scientists and engineers. Springer, 2007.<br />

[HD03] S. Hosseini and Y. Deville. Blind separation of linear-quadratic mixtures of real <strong>sources</strong> using<br />

a recurrent structure. In Proc. of the 7th Int. Work-Conference on Artificial and Natural<br />

Neural N<strong>et</strong>works (IWANN2003), Lecture Notes in Computer Science, vol. 2686, pages 241–<br />

248, Menorca, Spain, June 2003. Springer-Verlag.<br />

[HJ03] S. Hosseini and C. Jutten. On the separability of nonlinear mixtures of temporally correlated<br />

<strong>sources</strong>. IEEE Signal Processing L<strong>et</strong>ters, 10(2):43–46, February 2003.<br />

[HJA85] J. Hérault, C. Jutten, and B. Ans. Détection <strong>de</strong> gran<strong>de</strong>urs primitives dans un message<br />

composite par une architecture <strong>de</strong> calcul neuromimétique en apprentissage non supervisé. In<br />

Actes du Xeme colloque GRETSI, pages 1017–1022, Nice, France, 20-24 Mai 1985.<br />

6


[HP99] A. Hyvärinen and P. Pajunen. Nonlinear in<strong>de</strong>pen<strong>de</strong>nt component analysis: Existence and<br />

uniqueness results. Neural N<strong>et</strong>works, 12(3):429–439, 1999.<br />

[Hua96] Y. Hua. Fast maximum likelihood for blind i<strong>de</strong>ntification of multiple fir channels. IEEE<br />

Trans. Sig. Proc., 44(3):661–672, Mar. 1996.<br />

[Hyv99] A. Hyvärinen. Fast and robust fixed-point algorithms for in<strong>de</strong>pen<strong>de</strong>nt component analysis.<br />

IEEE Trans. Neural N<strong>et</strong>works, 10(3):626–634, 1999.<br />

[JBZH04] C. Jutten, M. Babaie-Za<strong>de</strong>h, and S. Hosseini. Three easy ways for separating nonlinear<br />

mixtures ? Signal Processing, 84(2):217–229, February 2004.<br />

[JT00] C. Jutten and A. Taleb. Source separation: from dusk till dawn. In Proc. 2nd Int. Workshop<br />

on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Blind Source Separation (ICA2000), pages 15–26,<br />

Helsinki, Finland, 2000.<br />

[KASC08] A. Kachenoura, L. Albera, L. Senhadji, and P. Comon. ICA: a potential tool for BCI systems.<br />

Sig. Proc. Magazine, pages 57–68, January 2008. Special Issue on Brain-Computer Interfaces.<br />

[Kie92] H. A. L. Kiers. Tuckals core rotations and constrained Tuckals mo<strong>de</strong>lling. Statistica Applicata,<br />

4(4):659–667, 1992.<br />

[KLR73] A. M. Kagan, Y. V. Linnik, and C. R. Rao. Characterization Problems in Mathematical<br />

Statistics. Probability and Mathematical Statistics. Wiley, New York, 1973.<br />

[Kol01] T. G. Kolda. Orthogonal tensor <strong>de</strong>composition. SIAM Jour. Matrix Ana. Appl., 23(1):243–<br />

255, 2001.<br />

[Kro83] P. Kroonenberg. Three mo<strong>de</strong> Principal Component Analysis. SWO Press, Lei<strong>de</strong>n, 1983.<br />

[KS77] M. Kendall and A. Stuart. The Advanced Theory of Statistics, Distribution Theory, volume 1.<br />

C. Griffin, 1977.<br />

[LAC97] J. L. Lacoume, P. O. Amblard, and P. Comon. Statistiques d’ordre superieur pour le traitement<br />

du signal. Collection Sciences <strong>de</strong> l’Ingenieur. Masson, 1997. ISBN: 2-225-83118-1.<br />

[LC09] L-H. Lim and P. Comon. Nonnegative approximations of nonnegative tensors. Jour. Chemom<strong>et</strong>rics,<br />

23:432–441, August 2009. hal-00410056.<br />

[LC10] L-H. Lim and P. Comon. Multiarray signal processing: Tensor <strong>de</strong>composition me<strong>et</strong>s compressed<br />

sensing. Compte-Rendus Mécanique <strong>de</strong> l’Aca<strong>de</strong>mie <strong>de</strong>s Sciences, 338(6):311–320,<br />

June 2010. hal-00512271.<br />

[LdAC11] X. Luciani, A. L. F. <strong>de</strong> Almeida, and P. Comon. Blind i<strong>de</strong>ntification of un<strong>de</strong>r<strong>de</strong>termined<br />

mixtures based on the characteristic function: The complex case. IEEE Trans. Signal Proc.,<br />

59(2):540–553, February 2011.<br />

[Lee78] J. De Leeuw. A new computational m<strong>et</strong>hod to fit the weighted eucli<strong>de</strong>an distance mo<strong>de</strong>l.<br />

Psychom<strong>et</strong>rika, 43(4):479–490, December 1978.<br />

7


[LISC11] R. Llinares, J. Igual, A. Salazar, and A. Camacho. Semi-blind source extraction of atrial<br />

activity by combining statistical and spectral features. Digital Signal Processing, 21(2):391–<br />

403, 2011.<br />

[LJH04] A. Larue, C. Jutten, and S. Hosseini. Markovian source separation in non-linear mixtures. In<br />

C. Punton<strong>et</strong> and A. Pri<strong>et</strong>o, editors, Proc. of the 5th Int. Conf. on In<strong>de</strong>pen<strong>de</strong>nt Component<br />

Analysis and Blind Signal Separation (ICA2004), Lecture Notes in Computer Science, vol.<br />

3195, pages 702–709, Granada, Spain, September 2004. Springer-Verlag.<br />

[LLCL01] J.S. Lee, D.D. Lee, S. Choi, and D.S. Lee. Application of nonnegative matrix factorization<br />

to dynamic positron emission tomography. In Proc. of the 3rd Workshop on In<strong>de</strong>pen<strong>de</strong>nt<br />

Component Analysis and Signal Separation (ICA2001), pages 629–632, San Diego (California,<br />

USA), 2001.<br />

[LM00] P. Loubaton and E. Moulines. On blind multiuser forward link channel estimation by subspace<br />

m<strong>et</strong>hod: I<strong>de</strong>ntifiability results. IEEE Trans. Sig. Proc., 48(8):2366–2376, August 2000.<br />

[LMRB09] X. Luciani, S. Mounier, R. Redon, and A. Bois. A simple correction m<strong>et</strong>hod of inner filter<br />

effects affecting FEEM and its application to the Parafac <strong>de</strong>composition. Chemom<strong>et</strong>rics and<br />

Intel. Lab. Syst., 96(2):227–238, 2009.<br />

[LMV01] L. De Lathauwer, B. De Moor, and J. Van<strong>de</strong>walle. In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and<br />

(simultaneous) third-or<strong>de</strong>r tensor diagonalization. IEEE Trans. Sig. Proc., pages 2262–2271,<br />

October 2001.<br />

[LMV04] L. De Lathauwer, B. De Moor, and J. Van<strong>de</strong>walle. Computation of the canonical <strong>de</strong>composition<br />

by means of a simultaneous generalized Schur <strong>de</strong>composition. SIAM Journal on Matrix<br />

Analysis and Applications, 26(2):295–327, 2004.<br />

[LS99] D.D. Lee and H.S. Seung. Learning the parts of objects by nonnegative matrix factorization.<br />

Nature, 401:788–791, 1999.<br />

[LZ07] X. L. Li and X. D. Zhang. Nonorthogonal joint diagonalization free of <strong>de</strong>generate solution.<br />

IEEE Trans. Signal Process., 55(5):1803–1814, 2007.<br />

[MA99] G. Marques and L. Almeida. Separation of nonlinear mixtures using pattern repulsion. In<br />

Proc. Int. Workshop on In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Signal Separation (ICA’99),<br />

pages 277–282, Aussois, France, 1999.<br />

[MBBZJ08] F. Merrikh-Bayat, M. Babaie-Za<strong>de</strong>h, and C. Jutten. A nonlinear blind source separation<br />

solution for removing the show-through effect in the scanned documents. In Proc. of the<br />

16th European Signal Processing Conf. (EUSIPCO2008), Lausanne, Switzerland, August<br />

2008.<br />

[MBBZJ11] F. Merrikh-Bayat, M. Babaie-Za<strong>de</strong>h, and C. Juttenl. Linear-quadratic blind source separating<br />

structure for removing show-through in scanned documents. International Journal of<br />

Document Analysis and Recognition, page 15 pages, 2011. published online at Oct. 28, 2010;<br />

to appear.<br />

8


[MBZJ09] G. H. Mohimani, M. Babaie-Za<strong>de</strong>h, and C. Jutten. A Fast Approach for Overcompl<strong>et</strong>e Sparse<br />

Decomposition Based on Smoothed L-0 Norm. IEEE Transactions on Signal Processing,<br />

57(1):289–301, 2009.<br />

[Mcc87] P. Mccullagh. Tensor M<strong>et</strong>hods in Statistics. Monographs on Statistics and Applied Probability.<br />

Chapman and Hall, 1987.<br />

[MDCM95] E. Moulines, P. Duhamel, J. F. Cardoso, and S. MAYRAGUE. Subspace m<strong>et</strong>hods for the<br />

blind i<strong>de</strong>ntification of multichannel FIR filters. IEEE Trans. Sig. Proc., 43(2):516–525, February<br />

1995.<br />

[MHS + 08] S. Moussaoui, H. Hauksdóttir, F. Schmidt, C. Jutten, J. Chanussot, S. Dout, and J. A.<br />

Benediksson. On the <strong>de</strong>composition of mars hyperspectral data by ica and bayesian positive<br />

source separation. Neurocomputing, 71(10-12):2194–2208, Jun 2008.<br />

[MJL00] A. Mansour, C. Jutten, and Ph. Loubaton. An adaptive subspace algorithm for blind separation<br />

of in<strong>de</strong>pen<strong>de</strong>nt <strong>sources</strong> in convolutive mixture. IEEE Trans. on Signal Processing,<br />

48(2):583–587, February 2000.<br />

[MLS07] S. Makino, T.-W. Lee, and H. Sawada, editors. Blind Speech Separation. Springer, 2007.<br />

[MM97] O. Macchi and E. Moreau. Self-adaptive source separation. part I: Convergence analysis of a<br />

direct linear n<strong>et</strong>work controlled by the Herault-Jutten algorithm. IEEE Trans. Sign. Proc.,<br />

45(4):918–926, April 1997.<br />

[MOK95] K. Matsuoka, M. Ohya, and M. Kawamoto. A neural n<strong>et</strong> for blind separation of nonstationary<br />

signals. Neural N<strong>et</strong>works, 8(3):411–419, 1995.<br />

[MP97] E. Moreau and J. C. Pesqu<strong>et</strong>. Generalized contrasts for multichannel blind <strong>de</strong>convolution of<br />

linear systems. IEEE Signal Processing L<strong>et</strong>ters, 4(6):182–183, June 1997.<br />

[MS94] L. Molge<strong>de</strong>y and H. Schuster. Separation of a mixture of in<strong>de</strong>pen<strong>de</strong>nt signals using time<br />

<strong>de</strong>layed correlation. Physical Review L<strong>et</strong>ters, 72:3634–3636, 1994.<br />

[MvL08] C. D. M. Martin and C. van Loan. A Jacobi-type m<strong>et</strong>hod for computing orthogonal tensor<br />

<strong>de</strong>compositions. SIAM J. Matrix Anal. Appl., 30(3):1219–1232, 2008.<br />

[MZ93] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE Trans.<br />

on Signal Proc., 41(12):3397–3415, 1993.<br />

[Paa97] P. Paatero. A weighted non-negative least squares algorithm for three-way Parafac factor<br />

analysis. Chemom<strong>et</strong>rics Intell. Lab. Syst., 38:223–242, 1997.<br />

[Paa99] P. Paatero. The multilinear engine: A table-driven, least squares program for solving multilinear<br />

problems, including the n-way parallel factor analysis mo<strong>de</strong>l. Journal of Computational<br />

and Graphical Statistics, 8(4):854–888, December 1999.<br />

[Paa00] P. Paatero. Construction and analysis of <strong>de</strong>generate Parafac mo<strong>de</strong>ls. Jour. Chemom<strong>et</strong>rics,<br />

14:285–299, 2000.<br />

9


[PAB + 02] M. Plumbley, S.A. Abdallah, J.P. Bello, M.E. Davies, G. Monti, and M.B. Sandler. Automatic<br />

music transcription and audio source separation. Cybern. Syst., 33:603–627, 2002.<br />

[PC01] D.T. Pham and J.-F. Cardoso. Blind separation of instantaneous mixtures of nonstationary<br />

<strong>sources</strong>. IEEE Trans. on Signal Processing, 49(9):1837–1848, 2001.<br />

[Pha01] D. T. Pham. Joint approximate diagonalization of positive <strong>de</strong>finite hermitian matrices. SIAM<br />

Journal on matrix Analysis, 22(4):1136–1152, 2001.<br />

[Plu02] M. Plumbley. Condition for nonnegative in<strong>de</strong>pen<strong>de</strong>nt component analysis. IEEE Signal<br />

Processing L<strong>et</strong>ters, 9:177–180, 2002.<br />

[Plu03] M. Plumbley. Algorithms for nonnegative in<strong>de</strong>pen<strong>de</strong>nt component analysis. IEEE Trans. on<br />

Neural N<strong>et</strong>works, 14(3):534–543, 2003.<br />

[Plu05] M. Plumbley. Geom<strong>et</strong>rical m<strong>et</strong>hods for non-negative ICA. Neurocomputing, 67:161–197,<br />

2005.<br />

[PO04] M. Plumbley and E. Oja. Blind separation of positive <strong>sources</strong> by globally convergent gradient<br />

search. Neural Computation, 19(9):1811–1825, 2004.<br />

[PPPP01] B. Pesqu<strong>et</strong>-Popescu, J-C. Pesqu<strong>et</strong>, and A. P. P<strong>et</strong>ropulu. Joint singular value <strong>de</strong>composition<br />

- a new tool for separable representation of images. In ICISP, 2001.<br />

[PS00] L. Parra and C. Spence. Convolutive blind separation of nonstationary <strong>sources</strong>. IEEE Trans.<br />

on Speech and Audio Processing, 8(3):320–327, 2000.<br />

[PSB03] D. T. Pham, C. Servière, and H. Boumaraf. Blind separation of convolutive audio mixtures<br />

using nonstationarity. In In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and Blind Signal Separation (ICA<br />

2003), Nara, Japan, 04 2003.<br />

[PT94] P. Paatero and U. Tapper. Positive matrix factorization: a nonnegative factor mo<strong>de</strong>l with<br />

optimal utilization of error estimates of data values. Environm<strong>et</strong>rics, 5:111–126, 1994.<br />

[PZL10] R. Phlypo, V. Zarzoso, and I. Lemahieu. Source extraction by maximising the variance in<br />

the conditional distribution tails. IEEE Trans on Signal Processing, 58(1):305–316, 2010.<br />

[RCH08] M. Rajih, P. Comon, and R. Harshman. Enhanced line search : A novel m<strong>et</strong>hod to accelerate<br />

Parafac. SIAM Journal on Matrix Analysis Appl., 30(3):1148–1171, September 2008.<br />

[Reg02] P. A. Regalia. Blind <strong>de</strong>convolution and source separation. In J. G. McWhirter and I. K.<br />

Proudler, editors, Mathematics in Signal Processing V, pages 25–35. Clarendon Press, Oxford,<br />

UK, 2002.<br />

[RK03] P. A. Regalia and E. Kofidis. Monotonic convergence of fixed-point algorithms for ICA.<br />

IEEE Transactions on Neural N<strong>et</strong>works, 14(4):943–949, July 2003.<br />

[RTMC11] J-P. Royer, N. Thirion-Moreau, and P. Comon. Computing the polyadic <strong>de</strong>composition of<br />

nonnegative third or<strong>de</strong>r tensors. Signal Processing, 2011.<br />

[RZC05] L. Rota, V. Zarzoso, and P. Comon. Parallel <strong>de</strong>flation with alphab<strong>et</strong>-based criteria for blind<br />

source extraction. In IEEE SSP’05, pages 751–756, Bor<strong>de</strong>aux, France, July, 17-20 2005.<br />

10


[SASZJ11] S. Samadi, L. Amini, H. Soltanian-Za<strong>de</strong>h, and C. Jutten. I<strong>de</strong>ntification of brain regions<br />

involved in epilepsy using common spatial pattern. In C. Richard A. Ferrari, P. Djuric,<br />

editor, IEEE Workshop on Statistical Signal Processing (SSP2011), pages 829–832, Nice,<br />

France, 2011. IEEE.<br />

[SB00] N. D. Sidiropoulos and R. Bro. On the uniqueness of multilinear <strong>de</strong>composition of N-way<br />

arrays. Jour. Chemo., 14:229–239, 2000.<br />

[SBG00] N. D. Sidiropoulos, R. Bro, and G. B. Giannakis. Parallel factor analysis in sensor array<br />

processing. IEEE Trans. Sig. Proc., 48(8):2377–2388, August 2000.<br />

[SBG04] A. Smil<strong>de</strong>, R. Bro, and P. Geladi. Multi-Way Analysis. Wiley, 2004.<br />

[SC10] A. Stegeman and P. Comon. Subtracting a best rank-1 approximation does not necessarily<br />

<strong>de</strong>crease tensor rank. Linear Algebra Appl., 433(7):1276–1300, December 2010. hal-00512275.<br />

[Sch86] R. O. Schmidt. Multiple emitter location and signal param<strong>et</strong>er estimation. IEEE Trans.<br />

Antenna Propagation, 34(3):276–280, March 1986.<br />

[SGB00] N. D. Sidiropoulos, G. B. Giannakis, and R. Bro. Blind PARAFAC receivers for DS-CDMA<br />

systems. IEEE Trans. on Sig. Proc., 48(3):810–823, March 2000.<br />

[SGS94] A. Swami, G. Giannakis, and S. Shamsun<strong>de</strong>r. Multichannel ARMA processes. IEEE Trans.<br />

Sign. Proc., 42(4):898–913, April 1994.<br />

[SICD08] M. Sorensen, S. Icart, P. Comon, and L. Deneire. Gradient based approximate joint diagonalization<br />

by orthogonal transforms. In 16th European Signal Processing Conference<br />

(Eusipco’08), Lausanne, August 25-29 2008.<br />

[SJS10] R. Sameni, C. Jutten, and M. Shamsollahi. A <strong>de</strong>flation procedure for subspace <strong>de</strong>composition.<br />

IEEE Transactions on Signal Processing, 58(4):2363–2374, 2010.<br />

[Slo94] T. M. Slock. Blind fractionally-spaced equalization, perfect-reconstruction filter banks and<br />

multichannel linear prediction. In Proc. ICASSP 94 Conf., A<strong>de</strong>lai<strong>de</strong>, Australia, April 1994.<br />

[Sor91] E. Sorouchyari. Separation of <strong>sources</strong>, part III: Stability analysis. Signal Processing, Elsevier,<br />

24(1):21–29, July 1991.<br />

[Sør10] M. Sørensen. Tensor Tools with Application in Signal Processing. PhD thesis, Universit <strong>de</strong><br />

Nice Sophia Antipolis, 2010.<br />

[SP03] C. Servière and D. T. Pham. A novel m<strong>et</strong>hod for permutation correction in frequency-domain<br />

in blind separation of speech mixtures. In Proc. of In<strong>de</strong>pen<strong>de</strong>nt Component Analysis and<br />

Blind Signal Separation (ICA 2004), pages 807–815, Granada, Spain, 2003.<br />

[Ste06] A. Stegeman. Degeneracy in Can<strong>de</strong>comp/Parafac explained for p × p × 2 arrays of rank p + 1<br />

or higher. Psychom<strong>et</strong>rika, 71(3):483–501, 2006.<br />

[Ste08] A. Stegeman. Low-rank approximation of generic pxqx2 arrays and diverging components in<br />

the Can<strong>de</strong>comp/Parafac mo<strong>de</strong>l. SIAM Journal on Matrix Analysis Appl., 30(3):988–1007,<br />

2008.<br />

11


[TF07] T. Tanaka and S. Fiori. Least squares approximate joint diagonalization on the orthogonal<br />

group. In ICASSP, volume II, pages 649–652, Honolulu, April 2007.<br />

[TJ95] H.L. Nguyen Thi and C. Jutten. Blind <strong>sources</strong> separation for convolutive mixtures. Signal<br />

Processing, 45:209–229, 1995.<br />

[TJ99a] A. Taleb and C. Jutten. On un<strong>de</strong>r<strong>de</strong>termined source separation. In Proceedings of the<br />

Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference -<br />

Volume 03, pages 1445–1448, Washington, DC, USA, 1999. IEEE Computer Soci<strong>et</strong>y.<br />

[TJ99b] A. Taleb and C. Jutten. Source separation in post-nonlinear mixtures. IEEE Trans. Sign.<br />

Proc., 47:2807–2820, October 1999.<br />

[TSJ01] A. Taleb, J. Sole, and C. Jutten. Quasi-nonparam<strong>et</strong>ric blind inversion of Wiener systems.<br />

IEEE Trans. on Signal Processing, 49(5):917–924, 2001.<br />

[TSLH90] L. Tong, V. Soon, R. Liu, and Y. Huang. Amuse: a new blind i<strong>de</strong>ntification algorithm. In<br />

Proc. ISCAS, New Orleans, USA, 1990.<br />

[TSS + 09] Thato Tsalaile, Reza Sameni, Saied Sanei, Christian Jutten, and Jonathon Chambers. Sequential<br />

Blind Source Extraction For Quasi-Periodic Signals With Time-Varying Period.<br />

IEEE Transactions on Biomedical Engineering, 56(3):646–655, 2009.<br />

[Tug97] J. K. Tugnait. I<strong>de</strong>ntification and <strong>de</strong>convolution of multichannel non-Gaussian processes using<br />

higher or<strong>de</strong>r statistics and inverse filter criteria. IEEE Trans. Sig. Proc., 45:658–672, March<br />

1997.<br />

[TVP96] S. Talwar, M. Viberg, and A. Paulraj. Blind estimation of multiple co-channel digital signals<br />

arriving at an antenna array: Part I, algorithms. IEEE Trans. Sign. Proc., pages 1184–1197,<br />

May 1996.<br />

[TWZ01] Y. Tan, J. Wang, and J. Zurada. Nonlinear blind source separation using a radial basis<br />

function n<strong>et</strong>work. IEEE Trans. on Neural N<strong>et</strong>works, 12(1):124–134, 2001.<br />

[TXK94] L. Tong, G. Xu, and T. Kailath. Blind i<strong>de</strong>ntification and equalization based on second-or<strong>de</strong>r<br />

statistics: a time domain approach. IEEE Trans. Inf. Theory, 40(2):340–349, March 1994.<br />

[UBU + 00] Y. Umezawa, P. Bühlmann, K. Umezawa, K. Tohda, and S. Amemiya. Potentiom<strong>et</strong>ric<br />

selectivity coefficients of ion-selective electro<strong>de</strong>s. Pure and Applied Chemistry, 72:1851–2082,<br />

2000.<br />

[VO06] R. Vollgraf and K. Obermayer. Quadratic optimization for simultaneous matrix diagonalization.<br />

IEEE Trans. Sig,. Proc, 54:3270–3278, September 2006.<br />

[XLTK95] G. Xu, H. Liu, L. Tong, and T. Kailath. A least-squares approach to blind channel i<strong>de</strong>ntification.<br />

IEEE Trans. Sig. Proc., 43(12):813–817, Dec. 1995.<br />

[Yer02] A. Yeredor. Non-orthogonal joint diagoinalization in the LS sense with application in blind<br />

source separation. IEEE Trans. Sign. Proc., 50(7):1545–1553, 2002.<br />

12


[YW94] D Yellin and E. Weinstein. Criteria for multichannel signal separation. IEEE Trans. Signal<br />

Processing, pages 2158–2168, August 1994.<br />

[ZC05] V. Zarzoso and P. Comon. Blind and semi-blind equalization based on the constant power<br />

criterion. IEEE Trans. Sig. Proc., 53(11):4363–4375, November 2005.<br />

[ZC07] V. Zarzoso and P. Comon. Comparative speed analysis of FastICA. In Mike E. Davies,<br />

Christopher J. James, Samer A. Abdallah, and Mark D. Plumbley, editors, ICA’07, volume<br />

4666 of Lecture notes in Computer Sciences, pages 293–300, London, UK, September 9-12<br />

2007. Springer.<br />

[ZC08a] V. Zarzoso and P. Comon. Optimal step-size constant modulus algorithm. IEEE Trans.<br />

Com., 56(1):10–13, January 2008.<br />

[ZC08b] V. Zarzoso and P. Comon. Robust in<strong>de</strong>pen<strong>de</strong>nt component analysis for blind source separation<br />

and extraction with application in electrocardiography. In 30th Int. IEEE Engineering<br />

in Medicine and Biology Conf. (EMBC’08), Vancouver, CA, August 20-24 2008.<br />

[ZC10] V. Zarzoso and P. Comon. Robust in<strong>de</strong>pen<strong>de</strong>nt component analysis. IEEE Trans. Neural<br />

N<strong>et</strong>works, 21(2):248–261, February 2010. hal-00457300.<br />

[ZCK06] V. Zarzoso, P. Comon, and M. Kallel. How fast is FastICA? In XIV European Signal<br />

Processing Conference, Eusipco’06, Florence, Italy, Sept. 4-8 2006.<br />

[Ziv95] V. Zivojnovic. Higher-or<strong>de</strong>r statistics and Huber’s robustness. In IEEE-ATHOS Workshop<br />

on Higher-Or<strong>de</strong>r Statistics, pages 236–240, Begur, Spain, 12–14 June 1995.<br />

[ZNM04] A. Ziehe, G. Nolte, and K. R. Müller. A fast algorithm for joint diagonalization with non<br />

orthogonal transformations and its application to blind source separation. Journal of Machine<br />

Learning Research, 5:777–800, December 2004.<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!