11.07.2015 Views

Preface to First Edition - lib

Preface to First Edition - lib

Preface to First Edition - lib

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

186 SMOOTHERS AND GENERALISED ADDITIVE MODELS10.2.3 Variable Selection and Model ChoiceQuantifying the influence of covariates on the response variable in generalisedadditive models does not merely relate <strong>to</strong> the problem of estimating regressioncoefficients but more generally calls for careful implementation of variable selection(determination of the relevant subset of covariates <strong>to</strong> enter the model)and model choice (specifying the particular form of the influence of a variable).The latter task requires choosing between linear and nonlinear modelling ofcovariate effects. While variable selection and model choice issues are alreadycomplicated in linear models (see Chapter 6) and generalised linear models(see Chapter 7) and still receive considerable attention in the statistical literature,they become even more challenging in generalised additive models. Here,variable selection and model choice needs <strong>to</strong> provide and answer on the complicatedquestion: Should a continuous covariate be included in<strong>to</strong> the model atall and, if so, as a linear effect or as a flexible, smooth effect? Methods <strong>to</strong> dealwith this problem are currently actively researched. Two general approachescan be distinguished: One can fit models using a target function incorporatinga penalty term which will increase for increasingly complex models (similar <strong>to</strong>10.2) or one can iteratively fit simple, univariate models which sum <strong>to</strong> a morecomplex generalised additive model. The latter approach is called boosting andrequires a careful determination of the s<strong>to</strong>p criterion for the iterative modelfitting algorithms. The technical details are far <strong>to</strong>o complex <strong>to</strong> be sketchedhere, and we refer the interested reader <strong>to</strong> the review paper by Bühlmann andHothorn (2007).10.3 Analysis Using R10.3.1 Olympic 1500m TimesTo begin we will construct a scatterplot of winning time against year the gameswere held. The R code and the resulting plot are shown in Figure 10.2. There isvery clear downward trend in the times over the years, and, in addition thereis a very clear outlier namely the winning time for 1896. We shall remove thistime from the data set and now concentrate on the remaining times. <strong>First</strong>we will fit a simple linear regression <strong>to</strong> the data and plot the fit on<strong>to</strong> thescatterplot. The code and the resulting plot are shown in Figure 10.3. Clearlythe linear regression model captures in general terms the downward trend inthe times. Now we can add the fits given by the lowess smoother and by acubic spline smoother; the resulting graph and the extra R code needed areshown in Figure 10.4.Both non-parametric fits suggest some distinct departure from linearity,and clearly point <strong>to</strong> a quadratic model being more sensible than a linearmodel here. And fitting a parametric model that includes both a linear anda quadratic effect for year gives a prediction curve very similar <strong>to</strong> the nonparametriccurves; see Figure 10.5.Here use of the non-parametric smoothers has effectively diagnosed our© 2010 by Taylor and Francis Group, LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!