07.02.2014 Views

Introduction to Stata 8 - (GRIPS

Introduction to Stata 8 - (GRIPS

Introduction to Stata 8 - (GRIPS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

12. Regression analysis<br />

Performing regression analysis with <strong>Stata</strong> is easy. Defining regression models that give sense<br />

is more complex. Especially consider:<br />

• If you look for causes, make sure your model is meaningful. Don't include independent<br />

variables that represent steps in the causal pathway; it may create more confounding<br />

than it prevents. Au<strong>to</strong>matic selection procedures are available in <strong>Stata</strong> (see [R] sw), but<br />

they may seduce the user <strong>to</strong> non-thinking. I will not describe them.<br />

• If your hypothesis is non-causal and you only look for predic<strong>to</strong>rs, logical requirements<br />

are more relaxed. But make sure you really are looking at predic<strong>to</strong>rs, not consequences<br />

of the outcome.<br />

• Take care with closely associated independent variables, e.g. education and social class.<br />

Including both may obscure more than illuminate.<br />

12.1. Linear regression<br />

regress<br />

A standard linear regression with bmi as the dependent variable:<br />

regress bmi sex age<br />

[R] regress<br />

xi:<br />

[R] xi<br />

The xi: prefix handles categorical variables in regression models. From a five-level<br />

categorical variable xi: generates four indica<strong>to</strong>r variables; in the regression model they are<br />

referred <strong>to</strong> by the i. prefix <strong>to</strong> the original variable name:<br />

xi: regress bmi sex i.agegrp<br />

You may also use xi: <strong>to</strong> include interaction terms:<br />

xi: regress bmi age i.sex i.treat i.treat*i.sex<br />

By default the first (lowest) category will be omitted, i.e. be the reference group. You may,<br />

before the analysis, select agegrp 3 <strong>to</strong> be the reference by defining a 'characteristic':<br />

char agegrp[omit] 3<br />

predict<br />

[R] predict<br />

After a regression analysis you may generate predicted values from the regression<br />

coefficients, and this may be used for studying residuals:<br />

regress bmi sex age<br />

predict pbmi<br />

generate rbmi = bmi-pbmi<br />

scatter rbmi pbmi<br />

or use rvfplot, see below<br />

Regression diagnostics<br />

[R] Regression diagnostics<br />

The chapter is very instructive. Get a residual plot with a horizontal reference line by:<br />

rvfplot , yline(0)<br />

32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!