11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.1. THE PROBLEM WITH PARAMETERS 175we care about prediction, not fitting. So we require an estimate of the deviance of a modelwhen applied to new data. Information criteria like AIC and DIC accomplish this by buildinga model of forecasting. Aer establishing AIC and DIC, the chapter presents examples oftheir use in model selection, comparison, and averaging.Rethinking: Stargazing. e most common form of model selection among practicing scientists isto search for a model in which every coefficient is statistically significant. Statisticians sometimes callthis STARGAZING, as it is embodied by scanning for asterisks (**) trailing aer estimates. A colleagueof mine once called this approach the “Space Odyssey,” in honor of A. C. Clarke’s novel and film. emodel that is full of stars, the thinking goes, is best.But such a model is not best. Whatever you think about null hypothesis significance testing ingeneral, using it to select among structurally different models is a mistake—p-values are not designedto help you navigate between underfitting and overfitting. As you’ll see once you start using AICand related measures, it is true that predictor variables that do improve prediction are not alwaysstatistically significant. It also possible for variables that are statistically significant to do nothinguseful for prediction. And since the conventional 5% threshold is purely conventional, we shouldn’texpect it to optimize anything.Rethinking: Is AIC Bayesian? AIC is not usually thought of as a Bayesian tool. ere are both historicaland statistical reasons for this. Historically, AIC was originally derived without reference toBayesian probability. Statistically, AIC uses MAP estimates instead of the entire posterior, and it requiresflat priors. So it doesn’t look particularly Bayesian. Reinforcing this impression is the existenceof another model comparison metric, the BAYESIAN INFORMATION CRITERION (BIC). However, BICalso requires flat priors and MAP estimates, although it’s not actually an “information criterion.”Regardless, AIC has a clear and pragmatic interpretation under Bayesian probability, and Akaikeand others have long argued for alternative Bayesian justifications of the procedure. 71 And as you’llsee later in the book, more obviously Bayesian information criterion like DIC and WAIC providealmost exactly the same results as AIC, when AIC’s assumptions are met. In this light, we can fairlyregard AIC as a special limit of a Bayesian criterion like WAIC, even if that isn’t how AIC was originalderived.6.1. e problem with parametersIn the previous chapter, we saw how adding variables and parameters to a model canhelp to reveal hidden effects and improve estimates. You also saw that adding variables canhurt, in particular when predictor variables are highly correlated with one another. But whatabout when the predictor variables are not highly correlated? Would it be safe to just addthem all?e answer is “no.” ere are two principle concerns with just adding variables. e firstis that adding parameters—making the model more complex—always improves the fit of amodel family to the data. By “fit” I mean any reasonable measure of how well the model canretrodict the data used to fit the model. In the context of linear Gaussian models, R 2 is themost common measure of this kind. Fit always improves, even when the variables you addto a model are just random numbers, with no relation to the outcome. So it’s no good tochoose among models using fit to the data. Second, while more complex models always fitthe data better, they oen predict new data worse. Models that have too many parameterstend to overfit more than do simpler models. is means that a complex model will be verysensitive to the exact sample used to fit it, leading to potentially large mistakes when future

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!