14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 7 Performing Logistic Regression on Nominal <strong>and</strong> Ordinal Responses 203<br />

The Logistic Fit Report<br />

Prob>ChiSq is the probability of obtaining a greater Chi-square value by chance alone if the specified<br />

model fits no better than the model that includes only intercepts.<br />

RSquare (U) shows the R 2 , which is the ratio of the Difference to the Reduced negative log-likelihood<br />

values. It is sometimes referred to as U, the uncertainty coefficient. RSquare ranges from zero for no<br />

improvement to 1 for a perfect fit. A Nominal model rarely has a high Rsquare, <strong>and</strong> it has a Rsquare of<br />

1 only when all the probabilities of the events that occur are 1.<br />

AICc<br />

BIC<br />

is the corrected Akaike Information Criterion.<br />

is the Bayesian Information Criterion<br />

Observations<br />

(or Sum Wgts) is the total number of observations in the sample.<br />

Measure gives several measures of fit to assess model accuracy.<br />

Entropy RSquare is the same as R-Square (U) explained above.<br />

Generalized RSquare is a generalization of the Rsquare measure that simplifies to the regular Rsquare<br />

for continuous normal responses. It is similar to the Entropy RSquare, but instead of using the<br />

log-likelihood, it uses the 2/n root of the likelihood.<br />

Mean -Log p is the average of -log(p), where p is the fitted probability associated with the event that<br />

occurred.<br />

RMSE is the root mean square error, where the differences are between the response <strong>and</strong> p (the fitted<br />

probability for the event that actually occurred).<br />

Mean Abs Dev is the average of the absolute values of the differences between the response <strong>and</strong> p (the<br />

fitted probability for the event that actually occurred).<br />

Misclassification Rate is the rate for which the response category with the highest fitted probability is<br />

not the observed category.<br />

For Entropy RSquare <strong>and</strong> Generalized RSquare, values closer to 1 indicate a better fit. For Mean -Log p,<br />

RMSE, Mean Abs Dev, <strong>and</strong> Misclassification Rate, smaller values indicate a better fit.<br />

After fitting the full model with two regressors in the ingots example, the –LogLikelihood on the Difference<br />

line shows a reduction to 5.82 from the Reduced –LogLikelihood of 53.49. The ratio of Difference to<br />

Reduced (the proportion of the uncertainty attributed to the fit) is 10.9% <strong>and</strong> is reported as the Rsquare<br />

(U).<br />

To test that the regressors as a whole are significant (the Whole Model test), a Chi-square statistic is<br />

computed by taking twice the difference in negative log-likelihoods between the fitted model <strong>and</strong> the<br />

reduced model that has only intercepts. In the ingots example, this Chi-square value is 2× 5.82 = 11.64 ,<br />

<strong>and</strong> is significant at 0.003.<br />

Lack of Fit Test (Goodness of Fit)<br />

The next questions that JMP addresses are whether there is enough information using the variables in the<br />

current model or whether more complex terms need to be added. The Lack of Fit test, sometimes called a<br />

Goodness of Fit test, provides this information. It calculates a pure-error negative log-likelihood by<br />

constructing categories for every combination of the regressor values in the data (Saturated line in the Lack<br />

Of Fit table), <strong>and</strong> it tests whether this log-likelihood is significantly better than the Fitted model.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!