14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 21 Fitting Partial Least Squares Models 521<br />

Statistical Details<br />

Partial Least Squares<br />

NIPALS<br />

SIMPLS<br />

Partial least squares fits linear models based on linear combinations, called factors, of the explanatory<br />

variables (Xs). These factors are obtained in a way that attempts to maximize the covariance between the Xs<br />

<strong>and</strong> the response or responses (Ys). In this way, PLS exploits the correlations between the Xs <strong>and</strong> the Ys to<br />

reveal underlying latent structures. The factors address the combined goals of explaining response variation<br />

<strong>and</strong> predictor variation. Partial least squares is particularly useful when you have more X variables than<br />

observations or when the X variables are highly correlated.<br />

The NIPALS method works by extracting one factor at a time. Let X = X 0 be the centered <strong>and</strong> scaled matrix<br />

of predictors <strong>and</strong> Y = Y 0 the centered <strong>and</strong> scaled matrix of response values. The PLS method starts with a<br />

linear combination t =X 0 w of the predictors, where t is called a score vector <strong>and</strong> w is its associated weight<br />

vector. The PLS method predicts both X 0 <strong>and</strong> Y 0 by regression on t:<br />

ˆ<br />

X0<br />

= tp´, where p´ =(t´t) -1 t´X 0<br />

ˆ<br />

Y0<br />

= tc´, where c´ =(t´t) -1 t´Y 0<br />

The vectors p <strong>and</strong> c are called the X- <strong>and</strong> Y-loadings, respectively.<br />

The specific linear combination t =X 0 w is the one that has maximum covariance t´u with some response<br />

linear combination u = Y 0 q. Another characterization is that the X- <strong>and</strong> Y-weights, w <strong>and</strong> q, are<br />

proportional to the first left <strong>and</strong> right singular vectors of the covariance matrix X 0´Y 0 . Or, equivalently, the<br />

first eigenvectors of X 0´Y 0 Y 0´X 0 <strong>and</strong> Y 0´X 0 X 0´Y 0 respectively.<br />

This accounts for how the first PLS factor is extracted. The second factor is extracted in the same way by<br />

replacing X 0 <strong>and</strong> Y 0 with the X- <strong>and</strong> Y-residuals from the first factor:<br />

ˆ<br />

X 1 = X 0 – X0<br />

ˆ<br />

Y 1 = Y 0 – Y0<br />

These residuals are also called the deflated X <strong>and</strong> Y blocks. The process of extracting a score vector <strong>and</strong><br />

deflating the data matrices is repeated for as many extracted factors as are desired.<br />

The SIMPLS algorithm was developed to optimize a statistical criterion: it finds score vectors that maximize<br />

the covariance between linear combinations of Xs <strong>and</strong> Ys, subject to the requirement that the X-scores are<br />

orthogonal. Unlike NIPALS, where the matrices X 0 <strong>and</strong> Y 0 are deflated, SIMPLS deflates the cross-product<br />

matrix, X 0´Y 0 .<br />

In the case of a single Y variable, these two algorithms are equivalent. However, for multivariate Y, the<br />

models differ. SIMPLS was suggested by De Jong (1993).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!