14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Appendix A Statistical Details 673<br />

The Usual Assumptions<br />

Assumed Model<br />

Most statistics are based on the assumption that the model is correct. To the extent that your model may not<br />

be correct, you must attenuate your credibility in the statistical reports that result from the model.<br />

Relative Significance<br />

Many statistical tests do not evaluate the model in an absolute sense. Significant test statistics might only be<br />

saying that the model fits better than some reduced model, such as the mean. The model can appear to fit<br />

the data but might not describe the underlying physical model well at all.<br />

Multiple Inferences<br />

Often the value of the statistical results is not that you believe in them directly, but rather that they provide<br />

a key to some discovery. To confirm the discovery, you may need to conduct further studies. Otherwise, you<br />

might just be sifting through the data.<br />

For instance, if you conduct enough analyses you can find 5% significant effects in five percent of your<br />

studies by chance alone, even if the factors have no predictive value. Similarly, to the extent that you use<br />

your data to shape your model (instead of testing the correct model for the data), you are corrupting the<br />

significance levels in your report. The r<strong>and</strong>om error then influences your model selection <strong>and</strong> leads you to<br />

believe that your model is better than it really is.<br />

Validity Assessment<br />

Some of the various techniques <strong>and</strong> patterns to look for in assessing the validity of the model are as follows:<br />

• Model validity can be checked against a saturated version of the factors with Lack of Fit tests. The Fit<br />

Model platform presents these tests automatically if you have replicated x data in a nonsaturated model.<br />

• You can check the distribution assumptions for a continuous response by looking at plots of residuals<br />

<strong>and</strong> studentized residuals from the Fit Model platform. Or, use the Save comm<strong>and</strong>s in the platform<br />

popup menu to save the residuals in data table columns. Then use the Analyze > Distribution on these<br />

columns to look at a histogram with its normal curve <strong>and</strong> the normal quantile plot. The residuals are not<br />

quite independent, but you can informally identify severely non-normal distributions.<br />

• The best all-around diagnostic tool for continuous responses is the leverage plot because it shows the<br />

influence of each point on each hypothesis test. If you suspect that there is a mistaken value in your data,<br />

this plot helps determine if a statistical test is heavily influenced by a single point.<br />

• It is a good idea to scan your data for outlying values <strong>and</strong> examine them to see if they are valid<br />

observations. You can spot univariate outliers in the Distribution platform reports <strong>and</strong> plots. Bivariate<br />

outliers appear in Fit Y by X scatterplots <strong>and</strong> in the <strong>Multivariate</strong> scatterplot matrix. You can see trivariate<br />

outliers in a three-dimensional plot produced by the Graph > Scatterplot 3D. Higher dimensional<br />

outliers can be found with Principal Components or Scatterplot 3D, <strong>and</strong> with Mahalanobis <strong>and</strong><br />

jack-knifed distances computed <strong>and</strong> plotted in the <strong>Multivariate</strong> platform.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!