11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

180 6. MODEL SELECTION, COMPARISON, AND AVERAGINGbrain volume (cc)600 100035 40 45 50 55 60FIGURE 6.4. An underfit model of homininbrain volume. is model ignoresany association between bodymass and brain volume, producing ahorizontal line of predictions. As a result,the model fits badly and (presumably)predicts badly.body mass (kg)relationships among the data. ese summaries compress the data into a simpler form, althoughwith loss of information (“lossy” compression) about the sample. e parameters can then be usedto generate new data, effectively decompressing the data.When a model has a parameter to correspond to each datum, such as m6.6, then there is actuallyno compression. e model just encodes the raw data in a different form, using parameters instead.As a result, we learn nothing about the data from such a model. Learning about the data requires usinga simpler model that achieves some compression, but not too much. is view of model selection isoen known as MINIMUM DESCRIPTION LENGTH (MDL). 73R code6.66.1.2. Too few parameters hurt too. e overfit polynomial models manage to fit the dataextremely well, but they suffer for this within-sample accuracy by making nonsensical outof-samplepredictions. In contrast, UNDERFITTING produces models that are inaccurate bothwithin and out of sample. ey have learned too little, failing to recover regular features ofthe sample.For example, consider this model of brain volume:v i ∼ Normal(µ, σ)µ = αere are no predictor variables here. Just the intercept α. Fit this model with:m6.7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!