01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>in</strong> group 1 and 0 for all observations <strong>in</strong> group 2. As a model for the data as a<br />

whole, suppose that<br />

E…y† ˆb 0 ‡ dz ‡ b 1x1 ‡ b 2x2 ‡ ...‡ b pxp: …11:55†<br />

Because of the def<strong>in</strong>ition of z, (11.55) is equivalent to assum<strong>in</strong>g that<br />

8<br />

< b0 ‡ d ‡ b1x1 ‡ b2x2 ‡ ...‡ bpxp for group 1<br />

E…y† ˆ<br />

:<br />

b0 ‡ b1x1 ‡ b2x2 ‡ ...‡ bpxp for group 2,<br />

…11:56†<br />

which is precisely the model required for the analysis of covariance. Accord<strong>in</strong>g to<br />

(11.56) the regression coefficients on the xs are the same for both groups, but<br />

there is a difference, d, between the <strong>in</strong>tercepts. The usual significance test <strong>in</strong> the<br />

analysis of covariance tests the hypothesis that d ˆ 0. S<strong>in</strong>ce (11.55) and (11.56)<br />

are equivalent, it follows from (11.55) that the whole analysis can be performed<br />

by a s<strong>in</strong>gle multiple regression of y on z, x1, x2, ..., xp. The new variable z is<br />

called a dummy, or<strong>in</strong>dicator, variable. The coefficient d is the partial regression<br />

coefficient of y on z, and is estimated <strong>in</strong> the usual way by the multiple<br />

regression analysis, giv<strong>in</strong>g an estimate d, say. The variance of d is estimated as<br />

usual from (11.43) or (11.51), and the appropriate tests and confidence limits<br />

follow by use of the t distribution. Note that the Residual MSq has n p 2DF<br />

(s<strong>in</strong>ce the <strong>in</strong>troduction of z <strong>in</strong>creases the number of predictor variables from p to<br />

p ‡ 1), and that this agrees with (v) on p. 348 (putt<strong>in</strong>g k ˆ 2).<br />

When k > 2, the procedure described above is generalized by the <strong>in</strong>troduction<br />

of k 1 dummy variables. These can be def<strong>in</strong>ed <strong>in</strong> many equivalent ways.<br />

One convenient method is as follows. The table shows the values taken by each<br />

of the dummy variables for all observations <strong>in</strong> each group.<br />

Dummy variables<br />

Group z1 z2 ... zk 1<br />

1 1 0 ... 0<br />

2<br />

.<br />

.<br />

0<br />

.<br />

.<br />

1<br />

.<br />

.<br />

... 0<br />

.<br />

.<br />

k 1 0 0 ... 1<br />

k 0 0 ... 0<br />

The model specifies that<br />

E…y† ˆb 0 ‡ d1z1 ‡ ...‡ dk 1 zk 1 ‡ b 1x1 ‡ ...‡ bpxp …11:57†<br />

and the fitted multiple regression equation is<br />

11.7 Multiple regression <strong>in</strong> groups 349<br />

Y ˆ b0 ‡ d1z1 ‡ ...‡ dk 1 zk 1 ‡ b1x1 ‡ ...‡ bpxp: …11:58†

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!