Mathematics in Independent Component Analysis

More documents

Recommendations

Info

242 Chapter 17. IEEE SPL 13(2):96-99, 2006 IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. XX, XXXX 3 4000 3000 2000 1000 2° 74° 104° 135° 0 0 20 40 60 80 100 120 140 160 180 Fig. 2. Approximated probability density function of a mixture of four speech signals using the mixing angles αi = 2 ◦ ,74 ◦ ,104 ◦ ,135 ◦ . Plotted is the approximated density using a histogram with 180 bins; the thick line indicates the density after smoothing with a 5 degree radius polynomial kernel. In the proof we show that the median-condition is equivariant meaning that it does no depend on A. Hence any algorithm based on such a condition is equivariant, as confirmed by figure 1. Set ξ(ω) := (cosω, sin ω) ⊤ ; then θ(ξ(ω)) = ω. Combining both lemmata, we have therefore shown: Theorem 1.3: The set Φ of fixed points of geometric matrix-recovery contains an element (ˆω1, . . . , ˆωn) such that the matrix (ξ(ˆω1)...ξ(ˆωn)) solves the overcomplete BMMR. The stable fixed points in the above set Φ can be found by the geometric matrix-recovery algorithm. Furthermore, experiments confirm that in the special case of unimodal, symmetric and non-Gaussian signals, the set Φ consists of only two elements: a stable and an unstable fixed point, where the stable fixed point will be found by the algorithm. Then, depending on the kurtosis of the sources, either the stable (supergaussian case) or the instable (subgaussian case) fixed point represents the image of the unit vectors. We have partially shown this in the complete case, see [7], theorem 5: Theorem 1.4: If n = 2 and ps1 = ps2, then Φ contains only two elements given that py|[0, π) has exactly 4 local extrema. More elaborate studies of Φ are necessary to show full convergence, however the mathematics can be expected to be difficult. This can already be seen from the complicated and in higher dimensions yet unknown proofs of convergence of the related self-organizing-map algorithm [9]. B. Turning the online algorithm into a batch algorithm The above theory can be used to derive a batch-type learning algorithm, by testing all different receptive fields for the overcomplete GCC after histogrambased estimation of y. For simplicity let us assume that the cumulative distribution Py of y is invertible. For ϕ = (ϕ1, . . . , ϕn−1), define a function µ(ϕ) := ((γ1(ϕ) + γ2(ϕ))/2 − ϕ1, . . . , (γn(ϕ) + γ1(ϕ))/2 − ϕn), where γi(ϕ) := P −1 y ((Py(ϕi) + Py(ϕi+1)/2) is the median of y|[ϕi, ϕi+1] in [ϕi, ϕi+1] for i ∈ [1 : n] and ϕn := (ϕn−1 + ϕ1 + π)/2, ϕn+1 := ϕ1. Lemma 1.5: If µ(ϕ) = 0 then the γi(ϕ) satisfy the GCC. Proof: By definition, the receptive field of γi(ϕ) is given by [(γi−1(ϕ) + γi(ϕ))/2, (γi(ϕ) + γi+1(ϕ))/2]. Since µ(ϕ) = 0 implies γi(ϕ) + γi+1(ϕ))/2 = ϕi, the receptive field of γi(ϕ) is precisely [ϕi, ϕi+1], and by construction γi(ϕ) is the median of y restricted to the above interval. Algorithm (overcomplete FastGeo): Find the zeros of µ(ϕ). Algorithmically, we may simply estimate y using a histogram and search for the zeros exhaustively or by discrete gradient descent. Note that for m = 2 this is precisely the complete FastGeo algorithm [7]. Again µ always has at least two zeros representing the stable and the unstable fixed point of the neural algorithm, so for supergaussian sources we extract the stable fixed point by maximizing �n i=1 py(γi(ϕi)). Similar to the complete case, histogram-based density approximation results in a quite ‘ragged’ distribution. Hence, zeros of µ are split up into multiple close-by zeros. This can be improved by smoothing the distribution using a kernel with sufficiently small radius. Too large kernel radii result in lower accuracy because the calculation of the median is only roughly independent of the kernel radius. In figure 2 we use a polynomial kernel of radius 5 degree (zero everywhere else); one can see that indeed this smoothes the distribution nicely. C. Clustering in higher mixture dimensions We now extend any BSS algorithm working in lower mixing dimension m ′ < m to dimension m using the simple idea of projecting the mixtures x onto different subspaces and then estimating A from the recovered projected matrices. We eliminate scaling indeterminacies by normalization and permutation indeterminacies by comparing the source correlation matrices. 1) Equivalence after projections: Let m ′ ∈ N with 1 < m ′ < m, and let M denote the set of all subsets of [1 : m] of size m ′ . For an element τ ∈ M let τ = {τ1, . . . , τm ′} such that 1 ≤ τ1 < . . . < τm ′ ≤ m and let πτ denote the ordered projection from R m onto those coordinates. Consider the projected mixing matrix Aτ := πτA. We will study how scaling-equivalence behaves under projection, where two matrices A,B are said to be scaling equivalent, A ∼s B if there exists an invertible diagonal matrix L with A = BL. Lemma 1.6: Let τ 1 , . . . , τ k ∈ M such that � i τi = [1 : m] and j ∈ � i τi . If A τ i ∼s B τ i for all i and ajl �= 0 for all l, then A ∼s B. Proof: Fix a column l ∈ [1 : n], and let a := al, b := bl. By assumption, for each i ∈ [1 : k] there exist λi �= 0 such that b τ i = λia τ i. Index j occurs in all projections, so bj = λiaj for all i. Hence all λi =: λ coincide and b = λa. This lemma gives the general idea how to combine matrices; now we will construct specific projections. Assume that the first row of A does not contain any zeros. This is a very mild assumption because A was assumed to be full-rank, and the set of A’s with first row without zeros is dense in the set of full-rank matrices. As usual let ⌈λ⌉ denote the smallest integer larger or equal to λ ∈ R. Then let k := ⌈(m−1)/(m ′ −1)⌉, and define τ i := {1, 2+(m ′ −1)(i−1), . . .,2+(m ′ −1)i−1} for i < k and τ k := {1, m−m ′ +2, . . ., m}. Then � i τi = [1 : m] and 1 ∈ � i τi . Given (m ′ ×n)-matrices B 1 , . . .,B k := (b k i,j ), define A B 1 ,...,B k to be composed of the columns (j ∈ [1 : n]) aj := (1, b 1 2,j /b1 1,j , . . . , b1 m ′ ,j /b1 1,j , b 2 2,j /b21,j , . . . , bk−1 m ′ ,j /bk−1 1,j , b k 3+k(m ′ −1)−m,j /bk1,j , . . . , bkm ′ ,j /bk1,j )⊤ .
Chapter 17. IEEE SPL 13(2):96-99, 2006 243 IEEE SIGNAL PROCESSING LETTERS, VOL. X, NO. XX, XXXX 4 Lemma 1.7: Let B 1 , . . .,B k be (m ′ ×n)-matrices such that Aτ i ∼s Bi for i ∈ [1 : k]. Then AB1 ,...,Bk ∼s A. Proof: By assumption there exist λi l ∈ R \ {0} such that bi j,l = λil (Aτ i)j,l for all i ∈ [1 : k], j ∈ [1 : m ′ ] and l ∈ [1 : n], hence bi j,l /bi 1,l = (Aτ i)j,l/(Aτ i)1,l. One can check that due to the choice of the τi ’s we then have (AB1 ,...,Bk)j,l = aj,l/a1,l for all j ∈ [1 : m] and therefore AB1 ,...,Bk ∼ A. 2) Reduction algorithm: The dimension reduction algorithm now is very simple. Pick k and τ1 , . . . , τk as in the previous section. Perform overcomplete BMMR with the projected mixtures πτ i(x) for i ∈ [1 : k] and get estimated mixing matrices Bi . If this recovery has been carried out without any error, then every Bi is equivalent to Aτ i. Due to permutations, they might however not be scaling-equivalent. Therefore do the following iteratively for each i ∈ [1 : k]: Apply the overcomplete source-recovery, see next section, to πτ i(x) using Bi and get recovered sources si . For all j < i, consider the absolute crosscorrelation matrices (| Cor(si r, sj s)|)r,s. The row positions of the maxima of this matrix are pairwise different because the original sources were chosen to be independent. Thereby we get a permutation matrix Pi indicating how to permute Bi , Ci := BiPi , so that the new source correlation matrices are diagonal. Finally, we have constructed matrices Ci such that there exists a permutation P independent of i with Ci ∼s Aτ iP for all i ∈ [1 : k]. Now we can apply lemma 1.7 and get a matrix AC1 ,...,Ck with AC1 ,...,Ck ∼s AP and therefore AC1 ,...,Ck ∼ A as desired. II. BLIND SOURCE RECOVERY Using the results from the BMMR step, we can assume that an estimate of A has been found. In order to solve the overcomplete BSS problem, we are therefore left with the task of reconstructing the sources using the mixtures x and the estimated matrix (BSR). Since A has full rank, the equation x(t) = As(t) yields the (n − m)-dimensional affine vector space A−1 {x(t)} as solution space for s(t). Hence, if n > m the source-recovery problem is ill-posed without further assumptions. Using a maximum likelihood approach [4], [5] an appropriate assumption can be derived: Given a prior probability p0 s on the sources, it can be seen quickly [4], [10] that the most likely source sample is recovered by s = argmaxx=Asp0 s. Depending on the assumptions on the prior p 0 s of s, we get different optimization criteria. In the experiments we will assume a simple prior p 0 s ∝ exp(−|s|p) with any p-norm |.|p. Then s = argmin x=As |s|p, which can be solved linearly in the Gaussian case p = 2 and by linear programming or a shortest-path decomposition in the sparse, Laplacian case p = 1, see [5], [10]. III. EXPERIMENTAL RESULTS In order to compare the mixture matrix A with the recovered matrix B from the BMMR step, we calculate the generalized crosstalking error E(A,B) of A and B defined by E(A,B) := minM∈Π �A −BM�, where the minimum is taken over the group Π of all invertible matrices having only one non-zero entry per column and �.� denotes some matrix norm. It vanishes if and only if A and B are equivalent [10]. TABLE I PERFORMANCE OF BMMR-ALGORITHMS (n = 3, m = 2, 100 RUNS) algorithm mean E(A, Â) deviation σ FastGeo (kernel r = 5, approx. 0.1) 0.60 0.60 FastGeo (kernel r = 0, approx. 0.5) 0.40 0.46 FastGeo (kernel r = 5, approx. 0.5) 0.29 0.42 Soft-LOST (p = 0.01) 0.68 0.57 The overcomplete FastGeo algorithm is applied to 4 speech signals s, mixed by a (2 × 4)-mixing matrix A with coefficients uniformly drawn from [−1, 1], see figure 2 for their mixture density. The algorithm estimates the matrix well with E(A, Â) = 0.68, and BSR by 1-norm minimization yields recovered sources with a mean SNR of only 2.6dB when compared with the original sources; as noted before [5], [10], without sparsification for instance by FFT, source-recovery is difficult. To analyze the overcomplete FastGeo algorithm more generally, we perform 100 Monte-Carlo runs using highkurtotic gamma-distributed three-dimensional sources with 104 samples, mixed by a (2 × 3)-mixing matrix with weights uniformly chosen from [−1, 1]. In table I, the mean of the performance index depending on various parameters is presented. Noting that the mean error when using random (2×3)matrices with coefficients uniformly taken from [−1, 1] is E = 1.9 ± 0.73, we observe good performance, especially for a larger kernel radius and higher approximation parameter (E = 0.29), also compared with Soft-LOST’s E = 0.68 [6]. As an example in higher mixture dimension three speech signals are mixed by a column-normalized (3 × 3) mixing matrix A. For n = m = 3, m ′ = 2, the projection framework simplifies to k = 2 with projections π {1,2} and π {1,3}. Overcomplete geometric ICA is performed with 5·104 sweeps. The recoveries of the projected matrices π {1,2}A and π {1,3}A are quite good with E(π {1,2}A,B1 ) = 0.084 and E(π {1,3}A,B2 ) = 0.10. Taking out the permutations as described before, we get a recovered mixing matrix AB1 ,B2 with low generalized crosstalking error of E(A,AB1 ,B2) = 0.15 (compared with a mean random error of E = 3.2 ± 0.7). REFERENCES [1] A. Hyvärinen, J. Karhunen, and E. Oja, “Independent component analysis,” John Wiley & Sons, 2001. [2] A. Cichocki and S. Amari, Adaptive blind signal and image processing. John Wiley & Sons, 2002. [3] J. Eriksson and V. Koivunen, “Identifiability, separability and uniqueness of linear ICA models,” IEEE Signal Processing Letters, vol. 11, no. 7, pp. 601–604, 2004. [4] T. Lee, M. Lewicki, M. Girolami, and T. Sejnowski, “Blind source separation of more sources than mixtures using overcomplete representations,” IEEE Signal Processing Letters, vol. 6, no. 4, pp. 87–90, 1999. [5] P. Bofill and M. Zibulevsky, “Underdetermined blind source separation using sparse representations,” Signal Processing, vol. 81, pp. 2353–2362, 2001. [6] P. O’Grady and B. Pearlmutter, “Soft-LOST: EM on a mixture of oriented lines,” in Proc. ICA 2004, ser. Lecture Notes in Computer Science, vol. 3195, Granada, Spain, 2004, pp. 430–436. [7] F. Theis, A. Jung, C. Puntonet, and E. Lang, “Linear geometric ICA: Fundamentals and algorithms,” Neural Computation, vol. 15, pp. 419– 439, 2003. [8] C. Puntonet and A. Prieto, “An adaptive geometrical procedure for blind separation of sources,” Neural Processing Letters, vol. 2, 1995. [9] M. Benaim, J.-C. Fort, and G. Pagés, “Convergence of the onedimensional Kohonen algorithm,” Adv. Appl. Prob., vol. 30, pp. 850– 869, 1998. [10] F. Theis, E. Lang, and C. Puntonet, “A geometric algorithm for overcomplete linear ICA,” Neurocomputing, vol. 56, pp. 381–398, 2004.
Page 1 and 2:
Statistical machine learning of bio
Page 3:
to Jakob
Page 6 and 7:
vi Preface
Page 8 and 9:
viii CONTENTS 3 Signal Processing 8
Page 11 and 12:
Chapter 1 Statistical machine learn
Page 13 and 14:
1.1. Introduction 5 auditory cortex
Page 15 and 16:
1.2. Uniqueness issues in independe
Page 17 and 18:
1.2. Uniqueness issues in independe
Page 19 and 20:
S U M M A R Y T his �le contains
Page 21 and 22:
1.3. Dependent component analysis 1
Page 23 and 24:
Table 1.1: BSS algorithms based on
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
1.4. Sparseness 25 1.4 Sparseness O
Page 35 and 36:
1.4. Sparseness 27 Theorem 1.4.3 (S
Page 37 and 38:
1.4. Sparseness 29 R 3 R 3 A BSRA R
Page 39 and 40:
1.4. Sparseness 31 Sparse projectio
Page 41 and 42:
1.5. Machine learning for data prep
Page 43 and 44:
Page 45 and 46:
Page 47 and 48:
Page 49 and 50:
Page 51 and 52:
1.6. Applications to biomedical dat
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
1.7. Outlook 51 noise data signal i
Page 61:
Part II Papers 53
Page 64 and 65:
56 Chapter 2. Neural Computation 16
Page 66 and 67:
Page 68 and 69:
Page 70 and 71:
Page 72 and 73:
Page 74 and 75:
Page 76 and 77:
Page 78 and 79:
Page 80 and 81:
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
Page 88 and 89:
Page 90 and 91:
82 Chapter 3. Signal Processing 84(
Page 92 and 93:
Page 94 and 95:
Page 96 and 97:
Page 98 and 99:
90 Chapter 4. Neurocomputing 64:223
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
Page 106 and 107:
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
104 Chapter 5. IEICE TF E87-A(9):23
Page 114 and 115:
Page 116 and 117:
Page 118 and 119:
Page 120 and 121:
Page 122 and 123:
114 Chapter 6. LNCS 3195:726-733, 2
Page 124 and 125:
116 Chapter 6. LNCS 3195:726-733, 2
Page 126 and 127:
118 Chapter 6. LNCS 3195:726-733, 2
Page 128 and 129:
120 Chapter 6. LNCS 3195:726-733, 2
Page 130 and 131:
122 Chapter 6. LNCS 3195:726-733, 2
Page 132 and 133:
124 Chapter 7. Proc. ISCAS 2005, pa
Page 134 and 135:
Page 136 and 137:
Page 138 and 139:
130 Chapter 8. Proc. NIPS 2006 Towa
Page 140 and 141:
132 Chapter 8. Proc. NIPS 2006 1.3
Page 142 and 143:
134 Chapter 8. Proc. NIPS 2006 stru
Page 144 and 145:
136 Chapter 8. Proc. NIPS 2006 5.5
Page 146 and 147:
138 Chapter 8. Proc. NIPS 2006
Page 148 and 149:
140 Chapter 9. Neurocomputing (in p
Page 150 and 151:
Page 152 and 153:
Page 154 and 155:
Page 156 and 157:
Page 158 and 159:
Page 160 and 161:
Page 162 and 163:
Page 164 and 165:
156 Chapter 10. IEEE TNN 16(4):992-
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
162 Chapter 11. EURASIP JASP, 2007
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
176 Chapter 12. LNCS 3195:718-725,
Page 186 and 187:
178 Chapter 12. LNCS 3195:718-725,
Page 188 and 189:
180 Chapter 12. LNCS 3195:718-725,
Page 190 and 191:
182 Chapter 12. LNCS 3195:718-725,
Page 192 and 193:
184 Chapter 12. LNCS 3195:718-725,
Page 194 and 195:
186 Chapter 13. Proc. EUSIPCO 2005
Page 196 and 197:
Page 198 and 199:
Page 200 and 201: 192 Chapter 14. Proc. ICASSP 2006 S
Page 202 and 203: 194 Chapter 14. Proc. ICASSP 2006 A
Page 204 and 205: 196 Chapter 14. Proc. ICASSP 2006
Page 206 and 207: 198 Chapter 15. Neurocomputing, 69:
Page 238 and 239: 230 Chapter 16. Proc. ICA 2006, pag
Page 248 and 249: 240 Chapter 17. IEEE SPL 13(2):96-9
Page 252 and 253: 244 Chapter 17. IEEE SPL 13(2):96-9
Page 254 and 255: 246 Chapter 18. Proc. EUSIPCO 2006
Page 260 and 261: 252 Chapter 19. LNCS 3195:977-984,
Page 270 and 271: 262 Chapter 20. Signal Processing 8
Page 300 and 301:
292 Chapter 21. Proc. BIOMED 2005,
Page 302 and 303:
Page 304 and 305:
Page 306 and 307:
298 BIBLIOGRAPHY Barber, C., Dobkin
Page 308 and 309:
300 BIBLIOGRAPHY Georgiev, P. (2001
Page 310 and 311:
302 BIBLIOGRAPHY Keck, I., Theis, F
Page 312 and 313:
304 BIBLIOGRAPHY Schießl, I., Sch
Page 314 and 315:
306 BIBLIOGRAPHY Theis, F. and Inou
show all

Mathematics in Independent Component Analysis

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?