14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 17 Correlations <strong>and</strong> <strong>Multivariate</strong> Techniques 457<br />

Computations <strong>and</strong> Statistical Details<br />

Inverse Correlation Matrix<br />

The inverse correlation matrix provides useful multivariate information. The diagonal elements of the<br />

inverse correlation matrix, sometimes called the variance inflation factors (VIF), are a function of how<br />

closely the variable is a linear function of the other variables. Specifically, if the correlation matrix is denoted<br />

R <strong>and</strong> the inverse correlation matrix is denoted R -1 , the diagonal element is denoted r ii <strong>and</strong> is computed as<br />

where R i<br />

2 is the coefficient of variation from the model regressing the i th explanatory variable on the other<br />

explanatory variables. Thus, a large r ii indicates that the ith variable is highly correlated with any number of<br />

the other variables.<br />

Distance Measures<br />

The Outlier Distance plot shows the Mahalanobis distance of each point from the multivariate mean<br />

(centroid). The Mahalanobis distance takes into account the correlation structure of the data as well as the<br />

individual scales. For each value, the distance is denoted d i <strong>and</strong> is computed as<br />

where:<br />

r ii 1<br />

= VIF i<br />

= ---------------<br />

2<br />

1 – R i<br />

d i<br />

= ( Y i – Y)'S – 1 ( Y i – Y)<br />

Y i is the data for the i th row<br />

Y is the row of means<br />

S is the estimated covariance matrix for the data<br />

The reference line (Mason <strong>and</strong> Young, 2002) drawn on the Mahalanobis Distance plot is computed as<br />

AxF × nvars where A is related to the number of observations <strong>and</strong> number of variables, nvars is the number<br />

of variables, <strong>and</strong> the computation for F in formula editor notation is:<br />

F Quantile(0.95, nvars, n–nvars–1, centered at 0).<br />

If a variable is an exact linear combination of other variables, then the correlation matrix is singular <strong>and</strong> the<br />

row <strong>and</strong> the column for that variable are zeroed out. The generalized inverse that results is still valid for<br />

forming the distances.<br />

The T 2 distance is just the square of the Mahalanobis distance, so T 2 i = d 2 i . The upper control limit on the<br />

T 2 is<br />

( n – 1) 2<br />

UCL = ------------------ β<br />

n<br />

a;<br />

p 2 -- ;-------------------<br />

n – p – 1<br />

2<br />

where<br />

n = number of observations

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!