an introduction to generalized linear models - GDM@FUDAN ...
an introduction to generalized linear models - GDM@FUDAN ...
an introduction to generalized linear models - GDM@FUDAN ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2.3.5 Inference <strong>an</strong>d interpretation<br />
It is sometimes useful <strong>to</strong> think of scientific data as measurements composed of<br />
a message, or signal, that is dis<strong>to</strong>rted by noise. For inst<strong>an</strong>ce, in the example<br />
about birthweight the ‘signal’ is the usual growth rate ofbabies <strong>an</strong>d the ‘noise’<br />
comes from all the genetic <strong>an</strong>d environmental fac<strong>to</strong>rs that lead <strong>to</strong> individual<br />
variation. A goal ofstatistical modelling is <strong>to</strong> extract as much information<br />
as possible about the signal. In practice, this has <strong>to</strong> be bal<strong>an</strong>ced against<br />
other criteria such as simplicity. The Oxford Dictionary describes the law of<br />
parsimony (otherwise known as Occam’s Razor) as the principle that no<br />
more causes should be assumed th<strong>an</strong> will account for the effect. Accordingly<br />
a simpler or more parsimonious model that describes the data adequately<br />
is preferable <strong>to</strong> a more complicated one which leaves little of the variability<br />
‘unexplained’. To determine a parsimonious model consistent with the data,<br />
we test hypotheses about the parameters.<br />
Hypothesis testing is performed in the context of model fitting by defining<br />
a series ofnested <strong>models</strong> corresponding <strong>to</strong> different hypotheses. Then the<br />
question about whether the data support a particular hypothesis c<strong>an</strong> be formulated<br />
in terms ofthe adequacy offit ofthe corresponding model relative<br />
<strong>to</strong> other more complicated <strong>models</strong>. This logic is illustrated in the examples<br />
earlier in this chapter. Chapter 5 provides a more detailed expl<strong>an</strong>ation of<br />
the concepts <strong>an</strong>d methods used, including the sampling distributions for the<br />
statistics used <strong>to</strong> describe ‘goodness offit’.<br />
While hypothesis testing is useful for identifying a good model, it is much<br />
less useful for interpreting it. Wherever possible, the parameters in a model<br />
should have some natural interpretation; for example, the rate of growth of<br />
babies, the relative risk ofacquiring a disease or the me<strong>an</strong> difference in profit<br />
from two marketing strategies. The estimated magnitude of the parameter <strong>an</strong>d<br />
the reliability ofthe estimate as indicated by its st<strong>an</strong>dard error or a confidence<br />
interval are far more informative th<strong>an</strong> signific<strong>an</strong>ce levels or p-values. They<br />
make it possible <strong>to</strong> <strong>an</strong>swer questions such as: is the effect estimated with<br />
sufficient precision <strong>to</strong> be useful, or is the effect large enough <strong>to</strong> be of practical,<br />
social or biological signific<strong>an</strong>ce?<br />
2.3.6 Further reading<br />
An excellent discussion ofthe principles ofstatistical modelling is in the introduc<strong>to</strong>ry<br />
part ofCox <strong>an</strong>d Snell (1981). The import<strong>an</strong>ce ofadopting a systematic<br />
approach is stressed by Kleinbaum et al. (1998). The various steps ofmodel<br />
choice, criticism <strong>an</strong>d validation are outlined by Krz<strong>an</strong>owski (1998). The use of<br />
residuals is described in Neter et al. (1996), Draper <strong>an</strong>d Smith (1998), Belsley<br />
et al. (1980) <strong>an</strong>d Cook <strong>an</strong>d Weisberg (1999).<br />
© 2002 by Chapm<strong>an</strong> & Hall/CRC