Elements of validity in Multiple Factor Analysis - Marine Cadoret - Free
Elements of validity in Multiple Factor Analysis - Marine Cadoret - Free
Elements of validity in Multiple Factor Analysis - Marine Cadoret - Free
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> <strong>Multiple</strong> <strong>Factor</strong> <strong>Analysis</strong><br />
Mar<strong>in</strong>e <strong>Cadoret</strong>, Sébastien Lê, Jérôme Pagès<br />
Applied Mathematics Department, Agrocampus Rennes, France<br />
Caserta, june 11th 2008<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 1 / 20
Context<br />
Problem<br />
Selection <strong>of</strong> the number <strong>of</strong> dimensions <strong>in</strong> Pr<strong>in</strong>cipal Component <strong>Analysis</strong><br />
(PCA) :<br />
Bar plot <strong>of</strong> the eigenvalues<br />
Visual test : Cattell criterion<br />
Stability <strong>in</strong> spite <strong>of</strong> perturbations <strong>in</strong> the dataset<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 2 / 20
Methods<br />
PCA<br />
Dray, 2007 : rst dimension<br />
Eigenvector <strong>of</strong> XX 0<br />
Eigenvector <strong>of</strong> X 0 X<br />
u 0 1<br />
X<br />
√<br />
λ1 v 1<br />
ˆX1<br />
Is the data reconstituted from the rst dimension ( ˆX1 ) closer to the<br />
one <strong>of</strong> orig<strong>in</strong>al data (X ) than a random table?<br />
Measure <strong>of</strong> similarity : RV coecient (Escouer, 1973)<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 3 / 20
Methods<br />
PCA<br />
Dray, 2007 : rst dimension<br />
Eigenvector <strong>of</strong> XX 0<br />
Eigenvector <strong>of</strong> X 0 X<br />
u 0 1<br />
X<br />
√<br />
λ1 v 1<br />
ˆX1<br />
Is the observed RV coecient large?<br />
H 0 : Absence <strong>of</strong> structure among variables<br />
Procedure based on permutation tests<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 4 / 20
Methods<br />
PCA<br />
First dimension : permutation tests<br />
Calculate the p-value associated to the observed RV :<br />
1 Repeat a large number <strong>of</strong> times :<br />
1 Independent row permutations with<strong>in</strong> each column <strong>of</strong> X → X p<br />
2 PCA on X p<br />
3 Reconstitution <strong>of</strong> X p from the rst dimension <strong>of</strong> the PCA on X p → ˆX<br />
p<br />
1<br />
4 Calculate RV (X p , ˆX<br />
p<br />
1 )<br />
2 Distribution <strong>of</strong> RV coecient under H 0<br />
3 Identify the observed value <strong>in</strong> this distribution to get the p-value<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 5 / 20
Methods<br />
PCA<br />
Evaluation <strong>of</strong> Dray's procedure<br />
Behavior <strong>of</strong> the procedure under the alternative hypothesis (Dray)<br />
Behavior <strong>of</strong> the procedure under the null hypothesis<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 6 / 20
Methods<br />
PCA<br />
Behavior <strong>of</strong> the procedure under H 0 : rst dimension<br />
simulation algorithm<br />
Simulation <strong>of</strong> a dataset X under H 0<br />
Compute the RV between X and ˆX 1<br />
Row permutations <strong>of</strong> X → X p<br />
×10000<br />
×1000<br />
PCA on X p<br />
Reconstitution <strong>of</strong> the first dimension <strong>of</strong> X p → ˆX p 1<br />
Compute the RV between X p and ˆX p 1<br />
Distribution <strong>of</strong> RV<br />
0 1<br />
Compute the p-value associated to the observed RV<br />
Distribution <strong>of</strong> p-value under H 0<br />
0<br />
0 1<br />
1<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 7 / 20
Methods<br />
PCA<br />
Behavior <strong>of</strong> the procedure under H 0 : rst dimension<br />
First dimension<br />
% <strong>of</strong> datasets with significant 1st dimension<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
level<br />
⇒ For a signicant level <strong>of</strong> α%, we observe α% <strong>of</strong> data tables hav<strong>in</strong>g a<br />
signicant rst dimension<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 7 / 20
Methods<br />
PCA<br />
Dray, 2007 : second dimension<br />
We are <strong>in</strong> the space orthogonal to the rst dimension<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 8 / 20
Methods<br />
PCA<br />
Dray, 2007 : second dimension<br />
We use the same methodology that for the rst dimension : we calculate<br />
the RV coecient between X − ˆX1 and ˆX2 .<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 9 / 20
Methods<br />
PCA<br />
Behavior <strong>of</strong> the procedure under H 0 : second dimension<br />
Same simulation procedure<br />
Second dimension<br />
% <strong>of</strong> datasets with significant 2nde dimension<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
level<br />
Significant level <strong>of</strong> 20% for the first dimension<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 10 / 20
Methods<br />
PCA<br />
Particular case<br />
⇒ Stability ≠ Signicant structure<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 11 / 20
Methods<br />
MFA<br />
<strong>Multiple</strong> <strong>Factor</strong> <strong>Analysis</strong><br />
<strong>Multiple</strong> <strong>Factor</strong> <strong>Analysis</strong> deals with data tables <strong>in</strong> which a set <strong>of</strong> <strong>in</strong>dividuals<br />
(I ) is described by several groups <strong>of</strong> variables (J)<br />
MFA highlights a structure common to all the groups, to some groups or<br />
specic to a group.<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 12 / 20
Methods<br />
MFA<br />
2 ma<strong>in</strong> questions<br />
Does the dimension s correspond to a structure common to several<br />
groups?<br />
In this case, which groups contribute to this common structure?<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 13 / 20
Methods<br />
MFA<br />
Existence <strong>of</strong> a common structure <strong>in</strong> MFA<br />
H 0 : Absence <strong>of</strong> common structure (no l<strong>in</strong>ks between groups)<br />
Row permutations with<strong>in</strong> each group<br />
First dimension : Calculate the RV coecient between X and ˆX1<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 14 / 20
Methods<br />
MFA<br />
Contribution <strong>of</strong> groups to the common structure<br />
H 0 : No contribution <strong>of</strong> the group j to the common structure<br />
First dimension : Calculate the RV coecient between X j and [ ˆXj ] 1<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 15 / 20
Application<br />
Application<br />
Classical example <strong>of</strong> MFA (INRA Angers, Agrocampus Rennes, Spad,<br />
FactoM<strong>in</strong>eR)<br />
21 w<strong>in</strong>es described by 27 variables gathered <strong>in</strong>to 4 groups :<br />
Olfaction before shak<strong>in</strong>g : 5 variables<br />
Vision : 3 variables<br />
Olfaction after shak<strong>in</strong>g : 10 variables<br />
Gustation : 9 variables<br />
Expected results :<br />
Dim.1 Dim.2 Dim.3 Dim.4<br />
Olfaction before shak<strong>in</strong>g × × ×<br />
Vision ×<br />
Olfaction after shak<strong>in</strong>g × × ×<br />
Gustation × ×<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 16 / 20
Application<br />
Application : Number <strong>of</strong> dimensions<br />
λ P-value<br />
Dim.1 3.46 < 0.001<br />
Dim.2 1.37 < 0.001<br />
Dim.3 0.62 0.004<br />
Dim.4 0.37 0.15<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 17 / 20
Application<br />
Application : Contribution <strong>of</strong> the groups<br />
Contribution<br />
Dim.1 Dim.2 Dim.3 Dim.4<br />
Olfaction before shak<strong>in</strong>g 0.78 0.62 0.37 0.17<br />
Vision 0.85 0.04 0.01 0.05<br />
Olfaction after shak<strong>in</strong>g 0.92 0.47 0.18 0.10<br />
Gustation 0.90 0.24 0.05 0.05<br />
Sum 3.46 1.37 0.62 0.37<br />
P-value<br />
Dim.1 Dim.2 Dim.3 Dim.4<br />
Olfaction before shak<strong>in</strong>g 0.02 0.174 0.038 0.127<br />
Vision 0.007 0.104 0.387 0.149<br />
Olfaction after shak<strong>in</strong>g < 0.001 0.004 < 0.001 0.638<br />
Gustation < 0.001 0.002 0.278 0.39<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 18 / 20
Conclusion, perspective<br />
Conclusion, perspective<br />
Dray's procedure extended to MFA<br />
Ambiguity between stability and signicant structure<br />
Implementation <strong>of</strong> systematic simulations <strong>in</strong> MFA<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 19 / 20
http://factom<strong>in</strong>er.free.fr<br />
R package dedicated to exploratory analysis<br />
written by Applied Mathematics Department <strong>of</strong> Agrocampus<br />
SFC-CLADAG (Caserta) <strong>Elements</strong> <strong>of</strong> <strong>validity</strong> <strong>in</strong> MFA 20 / 20