11.07.2015 Views

Preface to First Edition - lib

Preface to First Edition - lib

Preface to First Edition - lib

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

120 LOGISTIC REGRESSION AND GENERALISED LINEAR MODELSTable 7.4backpain data. Number of drivers (D) and non-drivers (¯D), suburban(S) and city inhabitants (¯S) either suffering from a herniated disc (cases)or not (controls).Controls¯D D¯S S ¯S S Total¯D ¯S 9 0 10 7 26Cases S 2 2 1 1 6D ¯S 14 1 20 29 64S 22 4 32 63 121Total 47 7 63 100 217The last of the data sets <strong>to</strong> be considered in this chapter is shown in Table7.4. These data arise from a study reported in Kelsey and Hardy (1975)which was designed <strong>to</strong> investigate whether driving a car is a risk fac<strong>to</strong>r for lowback pain resulting from acute herniated lumbar intervertebral discs (AHLID).A case-control study was used with cases selected from people who had recentlyhad X-rays taken of the lower back and had been diagnosed as having AHLID.The controls were taken from patients admitted <strong>to</strong> the same hospital as a casewith a condition unrelated <strong>to</strong> the spine. Further matching was made on ageand gender and a <strong>to</strong>tal of 217 matched pairs were recruited, consisting of 89female pairs and 128 male pairs. As a further potential risk fac<strong>to</strong>r, the variablesuburban indicates whether each member of the pair lives in the suburbs orin the city.7.2 Logistic Regression and Generalised Linear Models7.2.1 Logistic RegressionOne way of writing the multiple regression model described in the previouschapter is as y ∼ N (µ,σ 2 ) where µ = β 0 + β 1 x 1 + · · · + β q x q . This makesit clear that this model is suitable for continuous response variables with,conditional on the values of the explana<strong>to</strong>ry variables, a normal distributionwith constant variance. So clearly the model would not be suitable for applying<strong>to</strong> the erythrocyte sedimentation rate in Table 7.1, since the response variableis binary. If we were <strong>to</strong> model the expected value of this type of response, i.e.,the probability of it taking the value one, say π, directly as a linear function ofexplana<strong>to</strong>ry variables, it could lead <strong>to</strong> fitted values of the response probabilityoutside the range [0, 1], which would clearly not be sensible. And if we writethe value of the binary response as y = π(x 1 , x 2 , ...,x q ) + ε it soon becomesclear that the assumption of normality for ε is also wrong. In fact here ε mayassume only one of two possible values. If y = 1, then ε = 1−π(x 1 , x 2 , ...,x q )© 2010 by Taylor and Francis Group, LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!