11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

176 6. MODEL SELECTION, COMPARISON, AND AVERAGINGbrain volume (cc)400 800 1200habilisboiseiafarensisafricanussapiensergasterrudolfensisFIGURE 6.2. Average brain volume incubic centimeters against body massin kilograms, for six hominin species.What model best describes the relationshipbetween brain size and body size?30 40 50 60body mass (kg)data is not exactly like the past data. But simple models, with too few parameters, tendinstead to underfit, systematically over-predicting or under-predicting the data, regardlessof how well future data resemble past data. So we can’t always favor either simple models orcomplex models.Let’s examine both of these issues in the context of a simple data example.6.1.1. More parameters always improve fit. OVERFITTING occurs when a model learns toomuch from the sample. What this means is that there are both regular and irregular featuresin every sample. e regular features are the targets of our learning, because they generalizewell or answer a question of interest. Regular features are useful, given an objective of ourchoice. e irregular features are instead aspects of the data that do not generalize well ormislead us.Overfitting happens automatically, unfortunately. Here’s an example. e data displayedin FIGURE 6.2 are average brain volumes and body masses for seven hominin species. 72 Let’sget these data into R, so you can work with them. I’m going to build these data from directinput, rather than loading a pre-made data frame, just so you see an example of how to builda data frame from scratch.R code6.1sppnames

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!