11.07.2015 Views

Preface to First Edition - lib

Preface to First Edition - lib

Preface to First Edition - lib

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

122 LOGISTIC REGRESSION AND GENERALISED LINEAR MODELS1. An error distribution giving the distribution of the response around itsmean. For analysis of variance and multiple regression this will be the normal;for logistic regression it is the binomial. Each of these (and othersused in other situations <strong>to</strong> be described later) come from the same, exponentialfamily of probability distributions, and it is this family that is usedin generalised linear modelling (see Everitt and Pickles, 2000).2. A link function, g, that shows how the linear function of the explana<strong>to</strong>ryvariables is related <strong>to</strong> the expected value of the response:g(µ) = β 0 + β 1 x 1 + · · · + β q x q .For analysis of variance and multiple regression the link function is simplythe identity function; in logistic regression it is the logit function.3. The variance function that captures how the variance of the response variabledepends on the mean. We will return <strong>to</strong> this aspect of GLMs later inthe chapter.Estimation of the parameters in a GLM is usually achieved through a maximumlikelihood approach – see McCullagh and Nelder (1989) for details.Having estimated a GLM for a data set, the question of the quality of its fitarises. Clearly the investiga<strong>to</strong>r needs <strong>to</strong> be satisfied that the chosen model describesthe data adequately, before drawing conclusions about the parameterestimates themselves. In practise, most interest will lie in comparing the fit ofcompeting models, particularly in the context of selecting subsets of explana<strong>to</strong>ryvariables that describe the data in a parsimonious manner. In GLMs ameasure of fit is provided by a quantity known as the deviance which measureshow closely the model-based fitted values of the response approximate the observedvalue. Comparing the deviance values for two models gives a likelihoodratio test of the two models that can be compared by using a statistic having aχ 2 -distribution with degrees of freedom equal <strong>to</strong> the difference in the numberof parameters estimated under each model. More details are given in Cook(1998).7.3 Analysis Using R7.3.1 ESR and Plasma ProteinsWe begin by looking at the ESR data from Table 7.1. As always it is good practise<strong>to</strong> begin with some simple graphical examination of the data before undertakingany formal modelling. Here we will look at conditional density plots ofthe response variable given the two explana<strong>to</strong>ry variables; such plots describehow the conditional distribution of the categorical variable ESR changes asthe numerical variables fibrinogen and gamma globulin change. The requiredR code <strong>to</strong> construct these plots is shown with Figure 7.1. It appears that higherlevels of each protein are associated with ESR values above 20 mm/hr.We can now fit a logistic regression model <strong>to</strong> the data using the glm func-© 2010 by Taylor and Francis Group, LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!