08.08.2015 Views

Essentials

Essentials of Statistics for the Social and ... - Rincón de Paco

Essentials of Statistics for the Social and ... - Rincón de Paco

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CORRELATION AND REGRESSION 89Weight(pounds)220190Residualfor Mr.YMr.YPredictionfor Mr.Y076Height (inches)Figure 4.6 Measuring error from a regression linethat if you add up all of the residuals (i.e., errors from the line), the positives willexactly cancel out the negatives to yield a sum of zero.The total amount of error around a regression line is determined by squaringall of the residuals and adding them up. The resulting sum of squares (SS) is calledvariously SS error, SS residual, or SS unexplained. Another important property of the regressionline is that it minimizes SS error; it is the best possible line in the scatterplot,because no other line would produce a smaller SS error. Therefore, the regressionline has what is called the least squares property. The regression line mayremind you of the mean; that’s because it is a running mean of sorts. It is approximatelythe mean of the Y values at each X value (the larger the sample the betterthe approximation at each X value). Dividing SS residualby N (the sample size) givesyou 2 residual, the variance of the residuals, which is also the variance of the datapoints from the regression line (in the vertical direction).As the correlation gets closer to zero, 2 residualgets larger, but until the correlationactually equals zero, 2 residualremains less than the variance of the errors youwould make without using regression at all. How much is the variance of your errorswithout regression? Recall that when r equals zero your best strategy is toguess the mean of Y as the Y value, regardless of X. Using Y as your predictionfor everybody is the same as drawing a horizontal line through the scatterplot andusing it as your regression line. The variance of the Y values around the mean ofY is just the ordinary variance of Y. In the context of regression it is called the totalvariance. To the extent that the points tend to rise or fall as you move to the rightin the graph, a line that is angled to follow the points will get closer to the points,and the 2 residualaround that line will be less than the 2 residualaround the horizon-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!