11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

190 6. MODEL SELECTION, COMPARISON, AND AVERAGINGIndeed, for simple Gaussian models—ordinary linear regression—deviance and R 2 containsimilar information.ere is an accessible way to use deviance to navigate between underfitting and overfitting.It’s usually known as CROSS-VALIDATION. ere are many varieties of cross-validation,but here’s the premise. Suppose we have two samples of the same size. e first sample isusually called the training sample, and the second the testing sample. We use the trainingsample to fit a model, to estimate its parameters. en we use the testing sample to evaluatethe model’s performance. is means using the estimates based on the training sampleto compute the deviance for the observations in the testing sample. Once you’ve done thistrain-test procedure for all the models you wish to compare, you can use their relative testsampledeviance values to do so.Cross-validation is common and useful. But oen there aren’t two samples, or ratherwe’d like to use all of the available samples to fit the model, so we can use the full strengthof the evidence. But what we can do is imagine what would happen if we did have anothersample to predict. In other words, we can make a meta-model of forecasting. en we canask the meta-model to estimate deviance in a testing sample, from theory alone.is is the strategy behind the various INFORMATION CRITERIA, such as the renownAKAIKE INFORMATION CRITERION, abbreviated AIC. 80 AIC attempts to estimate the testsampledeviance of a model. Here’s the AIC gambit.(1) Suppose there’s a training sample of size N.(2) Fit a model (not necessarily the data generating model) to the training sample, andcompute the deviance on the training sample. Call this deviance D train .(3) Suppose another sample of size N from the same process. is is the test sample.(4) Compute the deviance on the test sample. is means using the MAP estimatesfrom step (2) to compute the deviance for the data in the test sample. Call thisdeviance D test .(5) Compute the difference D test − D train . is difference will usually be positive, becausethe model will tend to perform worse (have a higher deviance) in testing thanin training.(6) Finally, imagine repeating this procedure many times. e average difference thentells us the expected overfitting, how much the training deviance underestimatesthe divergence of the model.I call the above logic a gambit because it cannot provide guarantees. But it can provide valuableadvice. It turns out that this gambit leads to an astonishingly simple formula for theexpected test-sample deviance:AIC = D train + 2k ≈ E D testwhere k is the number of parameters in the model. e term 2k is oen called the penaltyterm. It is a measure of expected overfitting.is result depends upon weak priors, a Gaussian posterior distribution, and number ofparameters k much less than the number of cases N. So it’s appropriate for ordinary linearregression, and it even works quite well for many non-Gaussian regressions (generalizedlinear models, GLMs) that we’ll examine later in this book.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!