14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

282 Creating Neural Networks Chapter 10<br />

Overview of Neural Networks<br />

Table 10.2 Description of the Model Launch Dialog (Continued)<br />

Go<br />

Fits the neural network model <strong>and</strong> shows the model reports.<br />

After you click Go to fit a model, you can reopen the Model Launch Dialog <strong>and</strong> change the settings to fit<br />

another model.<br />

Validation Method<br />

Neural networks are very flexible models <strong>and</strong> have a tendency to overfit data. When that happens, the<br />

model predicts the fitted data very well, but predicts future observations poorly. To mitigate overfitting, the<br />

Neural platform does the following:<br />

• applies a penalty on the model parameters<br />

• uses an independent data set to assess the predictive power of the model<br />

Validation is the process of using part of a data set to estimate model parameters, <strong>and</strong> using the other part to<br />

assess the predictive ability of the model.<br />

• The training set is the part that estimates model parameters.<br />

• The validation set is the part that estimates the optimal value of the penalty, <strong>and</strong> assesses or validates the<br />

predictive ability of the model.<br />

• The test set is a final, independent assessment of the model’s predictive ability. The test set is available<br />

only when using a validation column. See Table 10.3.<br />

The training, validation, <strong>and</strong> test sets are created by subsetting the original data into parts. Table 10.3<br />

describes several methods for subsetting a data set.<br />

Table 10.3 Validation <strong>Methods</strong><br />

Excluded Rows<br />

Holdback<br />

KFold<br />

Uses row states to subset the data. Rows that are unexcluded are used as<br />

the training set, <strong>and</strong> excluded rows are used as the validation set.<br />

For more information about using row states <strong>and</strong> how to exclude rows,<br />

see Using JMP.<br />

R<strong>and</strong>omly divides the original data into the training <strong>and</strong> validation sets.<br />

You can specify the proportion of the original data to use as the<br />

validation set (holdback).<br />

Divides the original data into K subsets. In turn, each of the K sets is<br />

used to validate the model fit on the rest of the data, fitting a total of K<br />

models. The model giving the best validation statistic is chosen as the<br />

final model.<br />

This method is best for small data sets, because it makes efficient use of<br />

limited amounts of data.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!