14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

522 Fitting Partial Least Squares Models Chapter 21<br />

Statistical Details<br />

T 2 Plot<br />

The T 2 value for the i th observation is computed as follows:<br />

p n <br />

T2 <br />

i<br />

( n – 1) t2 ij<br />

⁄ t2<br />

<br />

= kj <br />

j = 1<br />

k = 1 <br />

where t ij = X score for the i th row <strong>and</strong> j th extracted factor, p = number of extracted factors, <strong>and</strong> n = number<br />

of observations used to train the model. If validation is not used, n = total number of observations.<br />

The control limit for the T 2 Plot is computed as follows:<br />

((n-1) 2 /n)*BetaQuantile(0.95, p/2, (n-p-1)/2)<br />

where p = number of extracted factors, <strong>and</strong> n = number of observations used to train the model. If<br />

validation is not used, n = total number of observations.<br />

van der Voet T 2<br />

The van der Voet T 2 test helps determine whether a model with a specified number of extracted factors<br />

differs significantly from a proposed optimum model. The test is a r<strong>and</strong>omization test based on the null<br />

hypothesis that the squared residuals for both models have the same distribution. Intuitively, one can think<br />

of the null hypothesis as stating that both models have the same predictive ability.<br />

The test statistic is<br />

C i<br />

=<br />

<br />

( R2 ijk ,<br />

– R2<br />

opt,<br />

jk<br />

)<br />

jk<br />

where R i, jk<br />

is the jth predicted residual for response k for the model with i extracted factors, <strong>and</strong> R opt,<br />

jk<br />

is<br />

the corresponding quantity for the model based on the proposed optimum number of factors, opt. The<br />

significance level is obtained by comparing C i with the distribution of values that results from r<strong>and</strong>omly<br />

exchanging R2 i, jk<br />

<strong>and</strong> R2<br />

opt,<br />

jk<br />

. A Monte Carlo sample of such values is simulated <strong>and</strong> the significance level<br />

is approximated as the proportion of simulated critical values that are greater than C i .<br />

Confidence Ellipse for Scatter Scores Plots<br />

The Scatter Scores Plots option produces scatterplots with a confidence ellipse. The coordinates of the top,<br />

bottom, left, <strong>and</strong> right extremes of the ellipse are as follows:<br />

For a scatterplot of score i versus score j:<br />

• the top <strong>and</strong> bottom extremes are +/-sqrt(var(score i)*z)<br />

• the left <strong>and</strong> right extremes are +/-sqrt(var(score j)*z)<br />

where z = ((n-1)*(n-1)/n)*BetaQuantile(0.95, 1, (n-3)/2).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!