01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

338 Modell<strong>in</strong>g cont<strong>in</strong>uous data<br />

predict which patients have a poor prognosis and to consider alternative<br />

methods of treatment for them (Armitage et al., 1969).<br />

The appropriate technique is called multiple regression. In general, the<br />

approach is to express the mean value of the dependent variable <strong>in</strong> terms of<br />

the values of a set of other variables, usually called <strong>in</strong>dependent variables. The<br />

nomenclature is confus<strong>in</strong>g, s<strong>in</strong>ce some of the latter variables may be either<br />

closely related to each other logically (e.g. one might be age and another the<br />

square of the age) or highly correlated (e.g. height and arm length). It is preferable<br />

to use the terms predictor or explanatory variables, or covariates, and we<br />

shall usually follow this practice.<br />

The data to be analysed consist of observations on a set of n <strong>in</strong>dividuals, each<br />

<strong>in</strong>dividual provid<strong>in</strong>g a value of the dependent variable, y, and a value of each of<br />

the predictor variables, x1, x2, ..., xp. The number of predictor variables, p,<br />

should preferably be considerably less than the number of observations, n, and<br />

the same p predictor variables must be available for each <strong>in</strong>dividual <strong>in</strong> any one<br />

analysis.<br />

Suppose that, for particular values of x1, x2, ..., xp, an observed value of y is<br />

specified by the l<strong>in</strong>ear model:<br />

y ˆ b 0 ‡ b 1x1 ‡ b 2x2 ‡ ...‡ b pxp ‡ e, …11:38†<br />

where e is an error term. The various values of e for different <strong>in</strong>dividuals are<br />

supposed to be <strong>in</strong>dependently normally distributed with zero mean and variance<br />

s 2 . The constants b 1, b 2, ..., b p are called partial regression coefficients; b 0 is<br />

sometimes called the <strong>in</strong>tercept. The coefficient b 1 is the amount by which y<br />

changes on the average when x1 changes by one unit and all the other xis rema<strong>in</strong><br />

constant. In general, b 1 will be different from the ord<strong>in</strong>ary regression coefficient<br />

of y on x1 because the latter represents the effect of changes <strong>in</strong> x1 on the average<br />

values of y with no attempt to keep the other variables constant.<br />

The coefficients b 0, b 1, b 2, ..., b p are idealized quantities, measurable only<br />

from an <strong>in</strong>f<strong>in</strong>ite number of observations. In practice, from n observations, we<br />

have to obta<strong>in</strong> estimates of the coefficients and thus an estimated regression<br />

equation:<br />

Y ˆ b0 ‡ b1x1 ‡ b2x2 ‡ ...‡ bpxp: …11:39†<br />

<strong>Statistical</strong> theory tells us that a satisfactory method of obta<strong>in</strong><strong>in</strong>g the estimated<br />

regression equation is to choose the coefficients such that the sum of squares of<br />

residuals, P …y Y† 2 , is m<strong>in</strong>imized, that is, by the method of least squares, which<br />

was <strong>in</strong>troduced <strong>in</strong> §7.2. Note that here y is an observed value and Y is the value<br />

predicted by (11.39) <strong>in</strong> terms of the predictor variables. A consequence of this<br />

approach is that the regression equation (11.39) is satisfied if all the variables are<br />

given their mean values. Thus,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!