28.07.2014 Views

Linear Regression

Linear Regression

Linear Regression

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Re-expressing Data<br />

Often the conditions necessary for performing linear regression aren’t satisfied in<br />

a data set. However, it may still be possible to use these methods if we reexpress<br />

one or both of the variables.<br />

To re-express data we need be able to create new variables using STATA. We<br />

can do this using the generate command. For example to create a new variable<br />

named logx which is the logarithm of an already existing variable x, we type:<br />

generate logx = log(x)<br />

If we instead wanted to create a variable that is the square root of x, we could<br />

type<br />

generate sqx = sqrt(x)<br />

In general, the command is on the format:<br />

generate new_variable = expression(old_variable)<br />

where expression is the mathematical function applied to the old variable.<br />

Note that by default STATA uses log base e.<br />

<strong>Linear</strong> regression using re-expressed data<br />

In this portion of the tutorial we will be working with the data set discussed in<br />

example 10.11 on page 256 of the textbook. The data set gives information on<br />

the highest paid baseball players in the period spanning 1980-2001. The data set<br />

consists of 3 variables player, year and salary. To access the data type:<br />

use http://www.stat.columbia.edu/~martin/W1111/Data/salary<br />

in the command window.<br />

We begin by making a scatter plot of salary and year.<br />

scatter salary year

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!