Linear Regression
Linear Regression
Linear Regression
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Re-expressing Data<br />
Often the conditions necessary for performing linear regression aren’t satisfied in<br />
a data set. However, it may still be possible to use these methods if we reexpress<br />
one or both of the variables.<br />
To re-express data we need be able to create new variables using STATA. We<br />
can do this using the generate command. For example to create a new variable<br />
named logx which is the logarithm of an already existing variable x, we type:<br />
generate logx = log(x)<br />
If we instead wanted to create a variable that is the square root of x, we could<br />
type<br />
generate sqx = sqrt(x)<br />
In general, the command is on the format:<br />
generate new_variable = expression(old_variable)<br />
where expression is the mathematical function applied to the old variable.<br />
Note that by default STATA uses log base e.<br />
<strong>Linear</strong> regression using re-expressed data<br />
In this portion of the tutorial we will be working with the data set discussed in<br />
example 10.11 on page 256 of the textbook. The data set gives information on<br />
the highest paid baseball players in the period spanning 1980-2001. The data set<br />
consists of 3 variables player, year and salary. To access the data type:<br />
use http://www.stat.columbia.edu/~martin/W1111/Data/salary<br />
in the command window.<br />
We begin by making a scatter plot of salary and year.<br />
scatter salary year