11.07.2015 Views

Preface to First Edition - lib

Preface to First Edition - lib

Preface to First Edition - lib

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

LOGISTIC REGRESSION AND GENERALISED LINEAR MODELS 121with probability π(x 1 , x 2 , ...,x q ) and if y = 0 then ε = π(x 1 , x 2 , ...,x q ) withprobability 1 − π(x 1 , x 2 , ...,x q ). So ε has a distribution with mean zero andvariance equal <strong>to</strong> π(x 1 , x 2 , ...,x q )(1 − π(x 1 , x 2 , ...,x q )), i.e., the conditionaldistribution of our binary response variable follows a binomial distributionwith probability given by the conditional mean, π(x 1 , x 2 , ...,x q ).So instead of modelling the expected value of the response directly as alinear function of explana<strong>to</strong>ry variables, a suitable transformation is modelled.In this case the most suitable transformation is the logistic or logit functionof π leading <strong>to</strong> the model( ) πlogit(π) = log = β 0 + β 1 x 1 + · · · + β q x q . (7.1)1 − πThe logit of a probability is simply the log of the odds of the response takingthe value one. Equation (7.1) can be rewritten asπ(x 1 , x 2 , ...,x q ) = exp(β 0 + β 1 x 1 + · · · + β q x q )1 + exp(β 0 + β 1 x 1 + · · · + β q x q ) . (7.2)The logit function can take any real value, but the associated probabilityalways lies in the required [0, 1] interval. In a logistic regression model, theparameter β j associated with explana<strong>to</strong>ry variable x j is such that exp(β j ) isthe odds that the response variable takes the value one when x j increases byone, conditional on the other explana<strong>to</strong>ry variables remaining constant. Theparameters of the logistic regression model (the vec<strong>to</strong>r of regression coefficientsβ) are estimated by maximum likelihood; details are given in Collett (2003).7.2.2 The Generalised Linear ModelThe analysis of variance models considered in Chapter 5 and the multipleregression model described in Chapter 6 are, essentially, completely equivalent.Both involve a linear combination of a set of explana<strong>to</strong>ry variables (dummyvariables in the case of analysis of variance) as a model for the observedresponse variable. And both include residual terms assumed <strong>to</strong> have a normaldistribution. The equivalence of analysis of variance and multiple regressionis spelt out in more detail in Everitt (2001).The logistic regression model described in this chapter also has similarities<strong>to</strong> the analysis of variance and multiple regression models. Again a linearcombination of explana<strong>to</strong>ry variables is involved, although here the expectedvalue of the binary response is not modelled directly but via a logistic transformation.In fact all three techniques can be unified in the generalised linearmodel (GLM), first introduced in a landmark paper by Nelder and Wedderburn(1972). The GLM enables a wide range of seemingly disparate problemsof statistical modelling and inference <strong>to</strong> be set in an elegant unifying frameworkof great power and flexibility. A comprehensive technical account of themodel is given in McCullagh and Nelder (1989). Here we describe GLMs onlybriefly. Essentially GLMs consist of three main features:© 2010 by Taylor and Francis Group, LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!