11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.1. THE PROBLEM WITH PARAMETERS 181(a)(b)brain volume (cc)400 800 1200brain volume (cc)0 500 150035 40 45 50 55 60body mass (kg)35 40 45 50 55 60body mass (kg)FIGURE 6.5. Underfitting and overfitting as under-sensitivity and oversensitivityto sample. In both plots, a regression is fit to the seven sets ofdata made by dropping one row from the original data. (a) An underfitmodel is insensitive to the sample, changing little as individual points aredropped. (b) An overfit model is sensitive to the sample, changing dramaticallyas points are dropped.You can see the truth of this in FIGURE 6.5. In both plots in the figure what I’ve done is dropeach row of the data, one at a time, and refit the model. So each line plotted in (a) is a firstdegree polynomial fit to one of the seven possible sets of data constructed from droppingone row. e curves in (b) are instead different fih order polynomials fit the same sevensets of data. Notice that the straight lines hardly vary, while the curves fly about wildly. isis a general contrast between underfit and overfit models: sensitivity to exact compositionof the sample used to fit the model.Overthinking: Dropping rows. e calculations needed to produce FIGURE 6.5 are made easy by atrick of R’s index notation. To drop a row i from a data frame d, just use:d.new

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!