05.07.2014 Views

Elements of validity in Multiple Factor Analysis - Marine Cadoret - Free

Elements of validity in Multiple Factor Analysis - Marine Cadoret - Free

Elements of validity in Multiple Factor Analysis - Marine Cadoret - Free

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> <strong>Multiple</strong> <strong>Factor</strong> <strong>Analysis</strong><br />

Mar<strong>in</strong>e <strong>Cadoret</strong>, Sébastien Lê, Jérôme Pagès<br />

Applied Mathematics Department, Agrocampus Rennes, France<br />

Caserta, june 11th 2008<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 1 / 20


Context<br />

Problem<br />

Selection <strong>of</strong> the number <strong>of</strong> dimensions <strong>in</strong> Pr<strong>in</strong>cipal Component <strong>Analysis</strong><br />

(PCA) :<br />

Bar plot <strong>of</strong> the eigenvalues<br />

Visual test : Cattell criterion<br />

Stability <strong>in</strong> spite <strong>of</strong> perturbations <strong>in</strong> the dataset<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 2 / 20


Methods<br />

PCA<br />

Dray, 2007 : rst dimension<br />

Eigenvector <strong>of</strong> XX 0<br />

Eigenvector <strong>of</strong> X 0 X<br />

u 0 1<br />

X<br />

√<br />

λ1 v 1<br />

ˆX1<br />

Is the data reconstituted from the rst dimension ( ˆX1 ) closer to the<br />

one <strong>of</strong> orig<strong>in</strong>al data (X ) than a random table?<br />

Measure <strong>of</strong> similarity : RV coecient (Escouer, 1973)<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 3 / 20


Methods<br />

PCA<br />

Dray, 2007 : rst dimension<br />

Eigenvector <strong>of</strong> XX 0<br />

Eigenvector <strong>of</strong> X 0 X<br />

u 0 1<br />

X<br />

√<br />

λ1 v 1<br />

ˆX1<br />

Is the observed RV coecient large?<br />

H 0 : Absence <strong>of</strong> structure among variables<br />

Procedure based on permutation tests<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 4 / 20


Methods<br />

PCA<br />

First dimension : permutation tests<br />

Calculate the p-value associated to the observed RV :<br />

1 Repeat a large number <strong>of</strong> times :<br />

1 Independent row permutations with<strong>in</strong> each column <strong>of</strong> X → X p<br />

2 PCA on X p<br />

3 Reconstitution <strong>of</strong> X p from the rst dimension <strong>of</strong> the PCA on X p → ˆX<br />

p<br />

1<br />

4 Calculate RV (X p , ˆX<br />

p<br />

1 )<br />

2 Distribution <strong>of</strong> RV coecient under H 0<br />

3 Identify the observed value <strong>in</strong> this distribution to get the p-value<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 5 / 20


Methods<br />

PCA<br />

Evaluation <strong>of</strong> Dray's procedure<br />

Behavior <strong>of</strong> the procedure under the alternative hypothesis (Dray)<br />

Behavior <strong>of</strong> the procedure under the null hypothesis<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 6 / 20


Methods<br />

PCA<br />

Behavior <strong>of</strong> the procedure under H 0 : rst dimension<br />

simulation algorithm<br />

Simulation <strong>of</strong> a dataset X under H 0<br />

Compute the RV between X and ˆX 1<br />

Row permutations <strong>of</strong> X → X p<br />

×10000<br />

×1000<br />

PCA on X p<br />

Reconstitution <strong>of</strong> the first dimension <strong>of</strong> X p → ˆX p 1<br />

Compute the RV between X p and ˆX p 1<br />

Distribution <strong>of</strong> RV<br />

0 1<br />

Compute the p-value associated to the observed RV<br />

Distribution <strong>of</strong> p-value under H 0<br />

0<br />

0 1<br />

1<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 7 / 20


Methods<br />

PCA<br />

Behavior <strong>of</strong> the procedure under H 0 : rst dimension<br />

First dimension<br />

% <strong>of</strong> datasets with significant 1st dimension<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

level<br />

⇒ For a signicant level <strong>of</strong> α%, we observe α% <strong>of</strong> data tables hav<strong>in</strong>g a<br />

signicant rst dimension<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 7 / 20


Methods<br />

PCA<br />

Dray, 2007 : second dimension<br />

We are <strong>in</strong> the space orthogonal to the rst dimension<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 8 / 20


Methods<br />

PCA<br />

Dray, 2007 : second dimension<br />

We use the same methodology that for the rst dimension : we calculate<br />

the RV coecient between X − ˆX1 and ˆX2 .<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 9 / 20


Methods<br />

PCA<br />

Behavior <strong>of</strong> the procedure under H 0 : second dimension<br />

Same simulation procedure<br />

Second dimension<br />

% <strong>of</strong> datasets with significant 2nde dimension<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●<br />

0.0 0.2 0.4 0.6 0.8 1.0<br />

level<br />

Significant level <strong>of</strong> 20% for the first dimension<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 10 / 20


Methods<br />

PCA<br />

Particular case<br />

⇒ Stability ≠ Signicant structure<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 11 / 20


Methods<br />

MFA<br />

<strong>Multiple</strong> <strong>Factor</strong> <strong>Analysis</strong><br />

<strong>Multiple</strong> <strong>Factor</strong> <strong>Analysis</strong> deals with data tables <strong>in</strong> which a set <strong>of</strong> <strong>in</strong>dividuals<br />

(I ) is described by several groups <strong>of</strong> variables (J)<br />

MFA highlights a structure common to all the groups, to some groups or<br />

specic to a group.<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 12 / 20


Methods<br />

MFA<br />

2 ma<strong>in</strong> questions<br />

Does the dimension s correspond to a structure common to several<br />

groups?<br />

In this case, which groups contribute to this common structure?<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 13 / 20


Methods<br />

MFA<br />

Existence <strong>of</strong> a common structure <strong>in</strong> MFA<br />

H 0 : Absence <strong>of</strong> common structure (no l<strong>in</strong>ks between groups)<br />

Row permutations with<strong>in</strong> each group<br />

First dimension : Calculate the RV coecient between X and ˆX1<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 14 / 20


Methods<br />

MFA<br />

Contribution <strong>of</strong> groups to the common structure<br />

H 0 : No contribution <strong>of</strong> the group j to the common structure<br />

First dimension : Calculate the RV coecient between X j and [ ˆXj ] 1<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 15 / 20


Application<br />

Application<br />

Classical example <strong>of</strong> MFA (INRA Angers, Agrocampus Rennes, Spad,<br />

FactoM<strong>in</strong>eR)<br />

21 w<strong>in</strong>es described by 27 variables gathered <strong>in</strong>to 4 groups :<br />

Olfaction before shak<strong>in</strong>g : 5 variables<br />

Vision : 3 variables<br />

Olfaction after shak<strong>in</strong>g : 10 variables<br />

Gustation : 9 variables<br />

Expected results :<br />

Dim.1 Dim.2 Dim.3 Dim.4<br />

Olfaction before shak<strong>in</strong>g × × ×<br />

Vision ×<br />

Olfaction after shak<strong>in</strong>g × × ×<br />

Gustation × ×<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 16 / 20


Application<br />

Application : Number <strong>of</strong> dimensions<br />

λ P-value<br />

Dim.1 3.46 < 0.001<br />

Dim.2 1.37 < 0.001<br />

Dim.3 0.62 0.004<br />

Dim.4 0.37 0.15<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 17 / 20


Application<br />

Application : Contribution <strong>of</strong> the groups<br />

Contribution<br />

Dim.1 Dim.2 Dim.3 Dim.4<br />

Olfaction before shak<strong>in</strong>g 0.78 0.62 0.37 0.17<br />

Vision 0.85 0.04 0.01 0.05<br />

Olfaction after shak<strong>in</strong>g 0.92 0.47 0.18 0.10<br />

Gustation 0.90 0.24 0.05 0.05<br />

Sum 3.46 1.37 0.62 0.37<br />

P-value<br />

Dim.1 Dim.2 Dim.3 Dim.4<br />

Olfaction before shak<strong>in</strong>g 0.02 0.174 0.038 0.127<br />

Vision 0.007 0.104 0.387 0.149<br />

Olfaction after shak<strong>in</strong>g < 0.001 0.004 < 0.001 0.638<br />

Gustation < 0.001 0.002 0.278 0.39<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 18 / 20


Conclusion, perspective<br />

Conclusion, perspective<br />

Dray's procedure extended to MFA<br />

Ambiguity between stability and signicant structure<br />

Implementation <strong>of</strong> systematic simulations <strong>in</strong> MFA<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 19 / 20


http://factom<strong>in</strong>er.free.fr<br />

R package dedicated to exploratory analysis<br />

written by Applied Mathematics Department <strong>of</strong> Agrocampus<br />

SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 20 / 20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!