14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

214 Performing Logistic Regression on Nominal <strong>and</strong> Ordinal Responses Chapter 7<br />

Validation<br />

Confusion Matrix<br />

Profiler<br />

A confusion matrix is a two-way classification of the actual response levels <strong>and</strong> the predicted response levels.<br />

For a good model, predicted response levels should be the same as the actual response levels. The confusion<br />

matrix gives a way of assessing how the predicted responses align with the actual responses.<br />

Brings up the prediction profiler, showing the fitted values for a specified response probability as the values<br />

of the factors in the model are changed. This feature is available for both nominal <strong>and</strong> ordinal responses. For<br />

detailed information about profiling features, refer to the “Visualizing, Optimizing, <strong>and</strong> Simulating<br />

Response Surfaces” chapter on page 553.<br />

Validation<br />

Validation is the process of using part of a data set to estimate model parameters, <strong>and</strong> using the other part to<br />

assess the predictive ability of the model.<br />

• The training set is the part that estimates model parameters.<br />

• The validation set is the part that assesses or validates the predictive ability of the model.<br />

• The test set is a final, independent assessment of the model’s predictive ability. The test set is available<br />

only when using a validation column.<br />

The training, validation, <strong>and</strong> test sets are created by subsetting the original data into parts. This is done<br />

through the use of a validation column in the Fit Model launch window.<br />

The validation column’s values determine how the data is split, <strong>and</strong> what method is used for validation:<br />

• If the column has two distinct values, then training <strong>and</strong> validation sets are created.<br />

• If the column has three distinct values, then training, validation, <strong>and</strong> test sets are created.<br />

• If the column has more than three distinct values, or only one, then no validation is performed.<br />

When validation is used, model fit statistics are given for the training, validation, <strong>and</strong> test sets.<br />

Example of a Nominal Logistic Model<br />

A market research study was undertaken to evaluate preference for a br<strong>and</strong> of detergent (Ries <strong>and</strong> Smith<br />

1963). The results are in the Detergent.jmp sample data table. The model is defined by the following:<br />

• the response variable, br<strong>and</strong> with values m <strong>and</strong> x<br />

• an effect called softness (water softness) with values soft, medium, <strong>and</strong> hard<br />

• an effect called previous use with values yes <strong>and</strong> no<br />

• an effect called temperature with values high <strong>and</strong> low<br />

• a count variable, count, which gives the frequency counts for each combination of effect categories.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!