Multivariate Gaussianization for Data Processing

Intro Iterative Gaussianization Experiments ConclusionsMultivariate Gaussianization for Data ProcessingGustavo Camps-Valls(joint work with V. Laparra and J. Malo)Image Processing Laboratory (IPL)Universitat de València. Spain.gustavo.camps@uv.eshttp://www.valencia.edu/gcamps

Intro Iterative Gaussianization Experiments ConclusionsOutline1 IntroductionWhy Gaussianization?How Gaussianization?2 Iterative Gaussianization FrameworkNotation and preliminariesThe ideaTheoretical ConvergenceInvertibilityDifferentiability3 Experimental resultsImage synthesisClassificationMulti-information estimation4 Conclusions

Intro Iterative Gaussianization Experiments ConclusionsGaussianization“Transform multidimensional data into multivariate Gaussian data”Why?Achieve statistical independence of data components is useful to ...... process dimensions independently... alleviate the curse of dimensionality... tackle the PDF estimation problem directly... apply/design methods that assume Gaussianity safely... get insight in the data characteristics

Intro Iterative Gaussianization Experiments ConclusionsGaussianization, more formallyGiven a random variable x ∈ R d , a Gaussianization transform is an invertibleand differentiable transform G(x) such thatG(x) ∼ N (0, I)What do we need?1 Can we ‘Gaussianize’ each dimension independently?2 If not, can we look for a (hopefully linear) transform to do it?

Intro Iterative Gaussianization Experiments ConclusionsMarginal (univariate) GaussianizationMarginal Gaussianization is trivial [Friedman87]Gaussianization in each dimension, Ψ i (k), can be decomposed into twoconsecutive equalization transforms:1 Marginal uniformization, U(k) i , based on the cdf of the marginal PDF,2 Gaussianization of a uniform variable, G(u), based on the inverse of the cdfof a univariate Gaussian: Ψ i (k) = G ⊙ Ui (k)whereu = U i (k)(x (k)i) =G −1 (x i ) =∫ x(k)i−∞∫ xi−∞p i (x ′(k)i) dx ′(k)ig(x i ′ ) dx i′and g(x i ) is just a univariate Gaussian.0.011150.5p i(x i)0.0080.0060.004u = U i (x i)0.80.60.4p(u)0.80.60.4G(u) = Ψ i (x i)0p i(Ψ i (x i))0.40.30.20.0020.20.20.100 100 200 300x i00 100 200 300x i00 0.5 1u−50 0.5 1u0−5 0 5G(u) = Ψ i (x i)

Intro Iterative Gaussianization Experiments ConclusionsThe method is limited and may lead to a non-multivariate Gaussian distributionDesiredMarginalOriginal Gaussianization Gaussianization

Intro Iterative Gaussianization Experiments ConclusionsMotivationIdea: Rotate and Gaussianize marginally!√ An orthogonal transform R does not affect the measure of Gaussianity√ Univariate Gaussianization is unique up to a sign flip

Intro Iterative Gaussianization Experiments ConclusionsMotivationRBIG: Rotation-based Iterative GaussianizationImpact of different rotations: ICA, PCA, random, etc.Study of the convergenceStudy the essential propertiesInvertibilityDifferentiabilityConvergence(Multivariate) GaussianityApply it!SynthesisClassificationDependence estimation

Intro Iterative Gaussianization Experiments ConclusionsPreliminariesDefinition 1: PDF estimation under arbitrary transform [Stark86]Let x ∈ R d be a r.v. with PDF, p x(x). Given some bijective, differentiabletransform of x into y using G : R d → R d , y = G(x), the PDFs are related:p x(x) = p y(G(x))∣ dG(x)dx ∣ = py(G(x)) |∇xG(x)|where |∇ xG| is the determinant of the transform’s Jacobian matrix.

Intro Iterative Gaussianization Experiments ConclusionsPreliminariesDefinition 1: PDF estimation under arbitrary transform [Stark86]Let x ∈ R d be a r.v. with PDF, p x(x). Given some bijective, differentiabletransform of x into y using G : R d → R d , y = G(x), the PDFs are related:p x(x) = p y(G(x))∣ dG(x)dx ∣ = py(G(x)) |∇xG(x)|where |∇ xG| is the determinant of the transform’s Jacobian matrix.Remark 1The p x(x) can be obtained if the Jacobian is known, since(1p y(y) = p y(G(x)) =( √ 2π|Σ|) exp − 1 )d 2 (G(x) − µ y )⊤ Σ −1 (G(x) − µ y )

Intro Iterative Gaussianization Experiments ConclusionsPreliminariesDefinition 1: PDF estimation under arbitrary transform [Stark86]Let x ∈ R d be a r.v. with PDF, p x(x). Given some bijective, differentiabletransform of x into y using G : R d → R d , y = G(x), the PDFs are related:p x(x) = p y(G(x))∣ dG(x)dx ∣ = py(G(x)) |∇xG(x)|where |∇ xG| is the determinant of the transform’s Jacobian matrix.Remark 1The p x(x) can be obtained if the Jacobian is known, since(1p y(y) = p y(G(x)) =( √ 2π|Σ|) exp − 1 )d 2 (G(x) − µ y )⊤ Σ −1 (G(x) − µ y )Remark 2We need differentiable (Gaussianization) transforms for PDF estimation.

Intro Iterative Gaussianization Experiments ConclusionsPreliminariesDefinition 2: Iterative Gaussianization TransformGiven a d-dimensional random variable x (0) = [x 1, . . . , x d ] ⊤ with PDF p(x (0) ),in each iteration k, a two-step processing is performed:G : x (k+1) = R (k) Ψ (k) (x (k) )whereΨ (k) is the marginal Gaussianization of each dimension of x (k) for thecorresponding iteration,R (k) is a rotation matrix for the marginally Gaussianized variable Ψ (k) (x (k) ).

Intro Iterative Gaussianization Experiments ConclusionsPropertiesProperty 1: The iterative Gaussianization transform is invertibleGiven the Gaussianization transform:G : x (k+1) = R (k) Ψ (k) (x (k) )by simple manipulation, the inversion transform is:G −1 : x (k) = Ψ −1(k) (R⊤ (k)x (k+1) ).

Intro Iterative Gaussianization Experiments ConclusionsPropertiesProperty 1: The iterative Gaussianization transform is invertibleGiven the Gaussianization transform:G : x (k+1) = R (k) Ψ (k) (x (k) )by simple manipulation, the inversion transform is:G −1 : x (k) = Ψ −1(k) (R⊤ (k)x (k+1) ).Remark 1: valid for any rotation transformInvertibility is possible for any rotation matrix R (k) .

Intro Iterative Gaussianization Experiments ConclusionsPropertiesProperty 1: The iterative Gaussianization transform is invertibleGiven the Gaussianization transform:G : x (k+1) = R (k) Ψ (k) (x (k) )by simple manipulation, the inversion transform is:G −1 : x (k) = Ψ −1(k) (R⊤ (k)x (k+1) ).Remark 1: valid for any rotation transformInvertibility is possible for any rotation matrix R (k) .Remark 2: valid for PDF connected supportInvertibility of Ψ (k) is trivially ensured when the PDF support is connected, i.e.no disjoint subspaces in the PDF support.

Intro Iterative Gaussianization Experiments ConclusionsPropertiesProperty 2: The iterative Gaussianization transform is differentiableThe Jacobian of the series of K iterations is the product of the Jacobians:∇ xG = ∏ Kk=1 R (k)∇ x (k)Ψ (k)Marg. Gauss. Ψ (k) is a dimension-wise transform with diagonal Jacobian:⎛⎞∂Ψ 1 (k)· · · 0∂x (k)1∇ x (k)Ψ (k) =⎜.⎝. ... ..⎟⎠0 · · ·Each element in ∇ x (k)Ψ (k) is:∂Ψ d (k)∂x (k)d∂Ψ i (k)∂x (k)i= ∂G∂u∂u∂x (k)i( ∂G−1=∂x i) −1p i (x (k)i) = g(Ψ i (k)(x (k)i)) −1 p i (x (k)i)

Intro Iterative Gaussianization Experiments ConclusionsPropertiesProperty 2: The iterative Gaussianization transform is differentiableThe Jacobian of the series of K iterations is the product of the Jacobians:∇ xG = ∏ Kk=1 R (k)∇ x (k)Ψ (k)Marg. Gauss. Ψ (k) is a dimension-wise transform with diagonal Jacobian:⎛⎞∂Ψ 1 (k)· · · 0∂x (k)1∇ x (k)Ψ (k) =⎜.⎝. ... ..⎟⎠0 · · ·Each element in ∇ x (k)Ψ (k) is:∂Ψ d (k)∂x (k)d∂Ψ i (k)∂x (k)i= ∂G∂u∂u∂x (k)i( ∂G−1=∂x i) −1p i (x (k)i) = g(Ψ i (k)(x (k)i)) −1 p i (x (k)i)Remark 1: rotation-independentThe differentiable nature of G is independent of the selected rotations R (k) .

Intro Iterative Gaussianization Experiments ConclusionsPropertiesQ: Does the method converge? At what rate?How to measure our distance to a unit multivariate Gaussian?

Intro Iterative Gaussianization Experiments ConclusionsPropertiesQ: Does the method converge? At what rate?How to measure our distance to a unit multivariate Gaussian?A: Information theory basicsNegentropy: distance to a zero mean unit multivariate GaussianMulti-information: Compute the information reduction after each step

Intro Iterative Gaussianization Experiments ConclusionsPropertiesDefinition 3: NegentropyNegentropy measures Gaussianity with the KLD:)J(x) = D KL(p(x)|N (µ x , σ 2 xI)Remark 1Negentropy is always non-negative and zero iff x has a Gaussian distribution

Intro Iterative Gaussianization Experiments ConclusionsPropertiesDefinition 3: NegentropyNegentropy measures Gaussianity with the KLD:)J(x) = D KL(p(x)|N (µ x , σ 2 xI)Remark 1Negentropy is always non-negative and zero iff x has a Gaussian distributionDefinition 4: Negentropy to unit GaussianKLD to a multivariate zero mean unit variance Gaussian distribution:J(x) = D KL (p(x)|N (0, I))

Intro Iterative Gaussianization Experiments ConclusionsPropertiesDefinition 3: NegentropyNegentropy measures Gaussianity with the KLD:)J(x) = D KL(p(x)|N (µ x , σ 2 xI)Remark 1Negentropy is always non-negative and zero iff x has a Gaussian distributionDefinition 4: Negentropy to unit GaussianKLD to a multivariate zero mean unit variance Gaussian distribution:J(x) = D KL (p(x)|N (0, I))Definition 5: Marginal negentropy to unit GaussianMarginal negentropy is defined as the sum of the KLDs of each marginal,p i (x i ), to the univariate zero mean unit variance Gaussian, N (0, 1):J m(x) =d∑D KL (p i (x i )|N (0, 1))i=1

Intro Iterative Gaussianization Experiments ConclusionsPropertiesDefinition 6: Multi-informationMulti-information is the KLD between the joint PDF of a multidimensionalrandom variable and the product of its marginal PDFs [Studeny98]:I (x) = D KL (p(x)| ∏ i p i(x i ))Remark 1To assess the statistical relations among the dimensions of a random variableRemark 2Multi-information generalizes mutual info. to multidimensional vectorsZero iff the different components of x are independent

Intro Iterative Gaussianization Experiments ConclusionsPropertiesTheorem 1: negentropy reduces independently of the rotation∆J = J(x) − J(RΨ(x)) ≥ 0, ∀ RProof.Divergence to a factorized PDF written in terms of its marginal PDFs [Cardoso03]:D KL (p(x) | ∏ i q i(x i )) = D KL (p(x) | ∏ i p i(x i )) + D KL ( ∏ i p i(x i ) | ∏ i q i(x i ))= I (x) + D KL ( ∏ i p i(x i ) | ∏ i q i(x i ))If q i (x i ) are univariate Gaussian PDFs, ∏ i q i(x i ) = N (0, I), and then:J(x) = I (x) + J m(x)The negentropy reduction in our transform is:∆J = J(x) − J(RΨ(x)) = J(x) − J(Ψ(x))= I (x) + J m(x) − I (Ψ(x)) − J m(Ψ(x)) = J m(x) ≥ 0, ∀ Rsince: (1) N (0, I) is rotation invariant; (2) the I is invariant under dim-wisetransforms [Studeny98]; and (3) the J m of a marginally Gaussianized r.v. is 0.

Intro Iterative Gaussianization Experiments ConclusionsPropertiesTheorem 2: redundacy reduces independently of the rotationGiven a marginally Gaussianized variable, Ψ(x), any rotation reduces theredundancy:∆I = I (Ψ(x)) − I (RΨ(x)) ≥ 0, ∀ RProof.RememberJ(x) = I (x) + J m(x) → I (x) = J(x) − J m(x)Apply it on I (Ψ(x)) and I (RΨ(x)):∆I = J(Ψ(x)) − J m(Ψ(x)) − J(RΨ(x)) + J m(RΨ(x))= J m(RΨ(x)) ≥ 0, ∀ Rsince (1) negentropy is rotation invariant, and (2) the marginal negentropy of amarginally Gaussianized r.v. is 0.

Intro Iterative Gaussianization Experiments ConclusionsPropertiesCorollary 1: convergence ensuredThe combination of marginal Gaussianization and rotation gives rise toredundancy reduction since marginal Gaussianization does not changeredundancy, I (Ψ(x)) = I (x).

Intro Iterative Gaussianization Experiments ConclusionsPropertiesCorollary 1: convergence ensuredThe combination of marginal Gaussianization and rotation gives rise toredundancy reduction since marginal Gaussianization does not changeredundancy, I (Ψ(x)) = I (x).Corollary 2: mutual and negentropy are relatedNegentropy reduction at some iteration k is related to the amount ofredundancy reduction obtained in k − 1:∆J (k) = J m(x (k) ) = ∆I (k−1)

Intro Iterative Gaussianization Experiments ConclusionsOn the suitable rotationWhich is the suitable rotation?Closed-form Theoretical Convergence Comput.Rotation convergence√rate costICA√×√Max ∆J O(2md(d + 1)n)PCA√√∆J = 2nd order O(d 2 (d + 1)n)RND∆J ≥ 0 O(d 3 )n samples of dimension d, FastICA running m iterationsICA guarantees the theoretical convergence of the Gaussianization processsince it seeks for the maximally non-Gaussian marginal PDFsPCA leads to non-optimal convergence rate because it does reduceredundancy to a certain extent (it removes correlation), but it does notmaximize the marginal non-Gaussianity J m(x)PCA is closed-form and is faster than ICAPCA require more iterations but Gaussianizes fasterUsing RND transforms guarantees the theoretical convergence, butconverges slowly

Intro Iterative Gaussianization Experiments ConclusionsOn the suitable rotationConvergence analysis1.5∆ I (bpp)10.5mance of G-PCA in a toy example. Original and transformed data (top), and cumulati05 10 15 20 25 30 35 40iterationCA (solid) Similar and GICA convergence (dashed). rates Optimal when using iterations PCA are (solid) highlighted. or ICA (dashed) Inset scatter plots shoat different iterations.Using PCA requires more iterations to converge, but it is much faster!3. RELATION OF G-PCA TO OTHER METHODSwe point out some particularly interesting relations of the proposed G-PCA to th

Intro load, as PCA Iterative is much Gaussianization cheaper than ICA. The computational Experiments cost of FastICA is ConclusionsO(2k(d + 1)n),dimensionality, n is the number of samples, and k is the number of iterations until convergenceOn the suitable eventually rotation very high. On the other hand, PCA is basically a singular value decomposition thaO(dn 2 ) if the naïve Jacobi’s method is implemented. Note, however, that typically k ≫ n/2? Tis more relevant in higher dimensional problems. MENUDO CHARCO! To assess this, we Gaussiof different sizes from the standard grayscale image ‘Barbara’. Results for both CPU time and thare presented in Table 1. For similar ∆I reductions, more than one order of magnitude in comis gained by G-PCA, e.g. when working with 64 dimensions, G-PCA takes about 4 minutes whiComputational around 4 hours. cost analysisG-ICAG-PCAdim ∆I [bpp] Time [s] ∆I [bpp] Time [s]2 × 2 1.54 865 1.51 143 × 3 2.08 1236 2.05 344 × 4 2.38 2197 2.29 635 × 5 2.50 3727 2.44 996 × 6 2.60 6106 2.56 1417 × 7 2.68 9329 2.63 1708 × 8 2.69 15085 2.69 233Table 1. Cumulative ∆I and CPU time for G-ICA and G-PCA.Gaussianized patches of different sizes for the image ‘Barbara’More than one order of magnitude gained for similar ∆I reductions8×8 patches: 4 minutes vs 4 hours!

Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesDensity estimation in high-dimensional problemsUnivariate density estimation is solved:Kernel estimatorsRadial basis function estimatesGaussian mixture modelsWavelet density estimates...High-dimensional density estimation is difficult!Parametric models: e.g. histogram-based→ Many samples needed, curse of dimensionalityNon-parametric models: e.g. Gaussian, GSM, etc.→ knowledge about the PDF

Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesDensity estimation with RBIG1: Input: Given data x (0) = [x 1, . . . , x d ] ⊤ ∈ R d2: Learn the sequence of Gaussianization transforms, G, such that y = G(x)3: Compute its Jacobian, J G4: The p y(y) is a multivariate Gaussian:(1p y(y) = p y(G(x)) =( √ 2π|Σ|) exp − 1 )d 2 (G(x) − µ y) ⊤ Σ −1 (G(x) − µ y )5: Compute the probability in the input space with:p x(x) = p y(y) · |∇ xG|

Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesDensity estimation with RBIG1: Input: Given data x (0) = [x 1, . . . , x d ] ⊤ ∈ R d2: Learn the sequence of Gaussianization transforms, G, such that y = G(x)3: Compute its Jacobian, J G4: The p y(y) is a multivariate Gaussian:(1p y(y) = p y(G(x)) =( √ 2π|Σ|) exp − 1 )d 2 (G(x) − µ y) ⊤ Σ −1 (G(x) − µ y )5: Compute the probability in the input space with:p x(x) = p y(y) · |∇ xG|AdvantagesRobustness to high dimensional problemsNo data distribution assumptions, no parametric model eitherLow computational cost

Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesToy exampleTheoretical PDF Scatter plot Histogram G-PCA PDFe 3. Example of 10PDF 2 samples estimation. usedFrom left to right: theoretical PDF, scatter plot of the data used in the estimaram estimation Much usingsmoother a number of estimation bins to obtain the same resolution as in the G-PCA estimation.5. EXPERIMENTAL RESULTSroposed method is illustrated in two experiments and will be compared to standard SVDD because ofive similarity (cf. Section 3.1). The first 2D experiment on synthetic data illustrates the capabilities ood in non-linearly separable and badly sampled one-class problems. The second experiment deals withspectral and multisource data and illustrates the advantages of G-PCA in real and challenging scenarExperiment 1: 2D non-linearly separable problems.is 2D experiment the problem is detecting outliers from the target class represented by dots in Fig. 4.

Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesToy exampleTheoretical PDF Scatter plot Histogram G-PCA PDFe 3. Example of 10PDF 2 samples estimation. usedFrom left to right: theoretical PDF, scatter plot of the data used in the estimaram estimation Much usingsmoother a number of estimation bins to obtain the same resolution as in the G-PCA estimation.Problems and limitations 5. EXPERIMENTAL RESULTSroposed method We need is illustrated to ensure in two connected experiments supports and will forbe the compared PDF to standard SVDD because ofive similarity (cf. Section 3.1). The first 2D experiment on synthetic data illustrates the capabilities ood in non-linearly For clustered separabledata, and badly estimate sampled individual one-class transforms problems. for The each second cluster experiment deals withspectral andThe multisource Jacobian data estimation and illustrates is highly the advantages point-dependent of G-PCA in real and challenging scenarExperiment 1: 2D non-linearly separable problems.is 2D experiment the problem is detecting outliers from the target class represented by dots in Fig. 4.

Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesG-PCAToy example 2: convergence to a multivariate Gaussian. Original Data

Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesG-PCAk=1











Intro Iterative Gaussianization Experiments ConclusionsExperiment 1: Density estimation toy examplesG-PCAk=24. Gaussianized!

Intro Iterative Gaussianization Experiments ConclusionsExperiment 2: Data synthesis‘Data synthesis allows one togenerate artificial data with similarstatistical properties to real data’Image synthesisTexture synthesisSpeech synthesisetc.2/ Síntesis de texturas2/ Síntesis de texturas7474

Intro Iterative Gaussianization Experiments ConclusionsExperiment 2: Data synthesisData synthesis with RBIG1: Input: Given data x (0) = [x 1, . . . , x d ] ⊤ ∈ R d2: Learn the sequence of Gaussianization transforms, G, such that y = G(x)3: Compute its Jacobian, J G4: Sample randomly in the Gaussianized domain5: Transform back to the original domain

Intro Iterative Gaussianization Experiments ConclusionsExperiment 2: Data synthesisData synthesis with RBIG1: Input: Given data x (0) = [x 1, . . . , x d ] ⊤ ∈ R d2: Learn the sequence of Gaussianization transforms, G, such that y = G(x)3: Compute its Jacobian, J G4: Sample randomly in the Gaussianized domain5: Transform back to the original domainAdvantagesRobustness to high dimensional problemsNo data distribution assumptions, no parametric model eitherLow computational cost

Intro Iterative Gaussianization Experiments ConclusionsExperiment 2: Data synthesisOriginal data Gaussianized data Synthesized data

Intro Iterative Gaussianization Experiments ConclusionsExperiment 2: Data synthesis

Intro Iterative Gaussianization Experiments ConclusionsExperiment 3: One-class ClassificationOne-class classificationOne-class classification tries to distinguish one class of objects from all otherpossible objects, by learning from a training set containing only the objects ofthat class.ApplicationsChange detectionAnomaly detectionTarget detectionMethodsDensity estimation methods:More complex in high dim. spacesMore accurate with high number of samplesBoundary methods, like the support vector domain description (SVDD)Robust to high dim. spacesMore accurate with low number of training samples

Intro Iterative Gaussianization Experiments ConclusionsExperiment 3: One-class ClassificationOne-class classification TOY EXAMPLE Toy example GPCA vs SVDDGaussianization andG-PCASVDD perfectly reject outliers SVDDBoth model the class of interest• PROBLEM WITH NO TARGET SAMPLES• PROBLEM WITH THE REPRESENTATIVITY OF NO TARGET SAMPLES

Intro Iterative Gaussianization Experiments ConclusionsExperiment 3: One-class ClassificationOne-class classification TOY EXAMPLE Toy example GPCA vs SVDDSVDD has problemsG-PCAwith badly-sampled space and/or SVDDrepresentativeness of the outlier samples• PROBLEM WITH NO TARGET SAMPLES• PROBLEM WITH THE REPRESENTATIVITY OF NO TARGET SAMPLES

Intro Iterative Gaussianization Experiments ConclusionsExperiment 3: One-class ClassificationOne-class remote sensing image classificationObjetive: Detection of classes ‘urban’ vs. ‘non-urban’.‘Urban Expansion Monitoring (UrbEx) ESA-ESRIN DUP’ Project.2 sensors (ERS2 SAR y Landsat TM) in 2 dates (1995 and 1999) overRome.Features and pre-processingImages were co-registered with ISTAT data (at subpixel level,

Intro Iterative Gaussianization Experiments ConclusionsExperiment 3: One-class ClassificationBuilding the classifiersRBF kernel for the SVDD: K(x i , x j ) = exp ( −‖x i − x j ‖ 2 /2σ 2) , σ ∈ R + .Kernel width varied in σ ∈ [10 −2 , . . . , 10 2 ]The fraction rejection parameter was varied in ν ∈ [10 −2 , 0.5]Experimental setupTrain with a set of free parameters to maximize kappa statisticTraining sets of different size for the target class, [100, 2500]Test set was constituted by 10 5 pixelsExperiment repeated 10 runs in 3 images

crosses represent outliers of different nature. The figures show the classification boundaries found by SVDD (left) andIntro G-PCA (right) whenIterative trained using Gaussianization a restricted set of outliers (crosses). Experiments ConclusionsExperiment method for3: small One-class size training Classification sets. This is because more target samples are needed by the G-PCA for anaccurate PDF estimation. However, for moderate and large training sets the proposed method substantiallyoutperforms SVDD. Note that training size requirements of G-PCA are not too demanding: 750 samples on a10-dimensional problem are enough for G-PCA to outperform SVDD when very little is known of the non-targetclass. Classification accuracyNaples, 1995 Naples, 1999 Rome 19990.50.50.5κ statistic0.40.30.2κ statistic0.40.30.2κ statistic0.40.30.20.10.10.100 500 1000 1500 2000 2500Training samples00 500 1000 1500 2000 2500Training samplesThe estimated κ statistic jointly measures precision and recall00 500 1000 1500 2000 2500Training samplesFigure 5. Classification performance (κ statistics) as a function of the number of training samples for the three consideredimages by the SVDD (dashed) and the G-PCA (solid)..Results for test, 10 5 pixelsPoor results, very challenging problem:Figure 6 shows Training the classification with few samples maps using and a restricted from antraining independent strategy. area In this case, the experiment wascarried out over High a small variance region (200 of the × 200) spectral of thesignaturesNaples 1995 image. We used 2000 samples of the target classand only 10 samples of the non-target class. Here the classification performance (κ statistic) is better thanSVDD outperforms RBIG for small size training setsthe results reported in Fig. 5 because small regions have more homogeneous features, and then the varianceof spectral RBIG signatures outperforms is smaller. SVDD As a consequence, for moderate the training and large data describes trainingmore sets accurately the particularbehavior of the smaller spatial region thus achieving a better performance in the test set.Note that, although the SVDD classification map is more homogeneous, G-PCA better rejects the ‘nonurban’areas (in black). This may be because SVDD training with few non-target data gives rise to a too broad

Intro Iterative Gaussianization Experiments ConclusionsExperiment 3: One-class ClassificationClassification accuracy (II)Ground truth SVDD κ = 0.62 G-PCA κ = 0.65A small region (200 × 200) of the Naples 1995 image.Figure 6. Classification performance over a small region of the Naples image (1995). White points represent urban areaswhile black 2000 points represent samples non-urban of theareas.target class and only 10 samples of the non-targetclass for tuning parameters.Much better results (lower 6. spectral CONCLUSIONS variance)We proposed SVDD a fast alternative classification to iterative map Gaussianization is more homogeneous methods that but makes fails it suitable in outlier in high-dimensionalproblems such identificationas those in remote sensing applications. The proposed G-PCA consists of iteratively applyingmarginal Gaussianization and PCA to any original dataset. The result is a multivariate Gaussian. TheoreticalRBIG better rejects the ‘non-urban’ areas (in black)convergence of the proposed method was proved.The methodNoisyexhibits resultsfast canandbe stablesolvedconvergenceby includingrates throughspatialainformationsuitable early-stopping criterion. Thecomputational cost is dramatically reduced compared to ICA-based Gaussianization methods. The proposed

Intro Iterative Gaussianization Experiments ConclusionsExperiment 4: Natural and remote sensing image Gaussianization

Intro Iterative Gaussianization Experiments ConclusionsExperiment 5: Multi-information estimationMeasuring dependence‘Two random variables are independent if the conditional probabilitydistribution of either given the observed value of the other is the same as if theother’s value had not been observed’ApplicationsFeature selectionData analysisData codingMany MethodsCorrelation only measures linear dependencesNon-linear extensions availableKernel methods can estimate higher-order dependencesMutual information measures dependence between two r.v.’sMulti-information generalizes mutual information

Intro Iterative Gaussianization Experiments ConclusionsExperiment 5: Multi-information estimationMeasuring independence with RBIG1: Input: Given data x (0) = [x 1, . . . , x d ] ⊤ ∈ R d2: Learn the sequence of Gaussianization transforms, G, such that y = G(x)3: Compute the cumulative reduction in mutual information

Intro Iterative Gaussianization Experiments ConclusionsExperiment 5: Multi-information estimationMeasuring independence with RBIG1: Input: Given data x (0) = [x 1, . . . , x d ] ⊤ ∈ R d2: Learn the sequence of Gaussianization transforms, G, such that y = G(x)3: Compute the cumulative reduction in mutual informationAdvantagesRobustness to high dimensional problems, no need of histograms!No data distribution assumptions, no parametric model eitherLow computational cost

Intro Iterative Gaussianization Experiments ConclusionsExperiment 5: Multi-information estimationMutual information estimation in nonlinear manifoldsɛ r = 100 × (I − Î )/I [%] (10 realizations)Dim3 6.97 7.87 6.304 1.50 0.31 0.485 3.53 2.97 3.288 3.13 3.69 3.7210 1.48 1.45 1.32

Intro Iterative Gaussianization Experiments ConclusionsExperiment 5: Multi-information estimationMutual information estimation in nonlinear manifoldsɛ r = 100 × (I − Î )/I [%] (10 realizations)Dim3 6.97 7.87 6.304 1.50 0.31 0.485 3.53 2.97 3.288 3.13 3.69 3.7210 1.48 1.45 1.32Problems and limitationsEntropy estimators are not perfectly definedMore iterations, more error: overfitting?More error with complex manifolds

Intro Iterative Gaussianization Experiments ConclusionsConclusions & Further workConclusions1 A framework for multidimensional Gaussianization presented:General: any orthogonal transform can be plugged inConvergence studiedInvertible and differentiable2 Useful for large scale real problems:ClassificationTarget detectionDensity estimationSynthesisSaliencyMulti-information estimation

Intro Iterative Gaussianization Experiments ConclusionsConclusions & Further workConclusions1 A framework for multidimensional Gaussianization presented:General: any orthogonal transform can be plugged inConvergence studiedInvertible and differentiable2 Useful for large scale real problems:ClassificationTarget detectionDensity estimationSynthesisSaliencyMulti-information estimationFurther workBlind source separation in multiple post-nonlinear stages?Regularization and overfitting studies in DNN?

Intro Iterative Gaussianization Experiments Conclusionshttp://isp.uv.es/soft.htmReferencesReferencesV. Laparra, G. Camps-Valls and J. Malo, “Iterative Gaussianization: fromICA to Random Rotations”. IEEE Trans. Neural Networks (2011)V. Laparra, G. Camps-Valls and J. Malo, “PCA Gaussianization for ImageProcessing”. ICIP (2009)V. Laparra, J. Muñoz-Marí, G. Camps-Valls and J. Malo, “PCAGaussianization for One-Class Remote Sensing Image Classification”,SPIE: Europe Remote Sensing (2009)D. W. Scott, Multivariate Density Estimation: Theory, Practice, andVisualization, Wiley & Sons, 1992.J.H. Friedman and J.W. Tukey, “A projection pursuit algorithm forexploratory data analysis,” IEEE Trans. Comp., vol. C-23, no. 9, pp.881–890, 1974.P. J. Huber, “Projection pursuit,” The Annals of Statistics, vol. 13, no. 2,pp. 435–475, 1985.G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality ofdata with neural networks,” Science, 313(5786), 504–507, 2006.

Multivariate Gaussianization for Data Processing

Create successful ePaper yourself

Delete template?

Save as template?