06.08.2013 Views

Introduction to regression

Introduction to regression

Introduction to regression

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 3 <strong>Introduction</strong> <strong>to</strong> <strong>regression</strong> 115<br />

Fitting a straight line — least-squares<br />

<strong>regression</strong><br />

Another method for finding the equation of a straight line which is y<br />

fitted <strong>to</strong> data is known as the method of least-squares <strong>regression</strong>. It is<br />

used when data show a linear relationship and have no obvious<br />

outliers.<br />

To understand the underlying theory behind least-squares, consider<br />

x<br />

the <strong>regression</strong> line from an earlier section, reproduced here.<br />

We wish <strong>to</strong> minimise the <strong>to</strong>tal of the vertical lines, or ‘errors’ in some way. For<br />

example, in the earlier section we minimised the ‘sum’, balancing the errors above and<br />

below the line. This is reasonable, but for sophisticated mathematical reasons it is preferable<br />

<strong>to</strong> minimise the sum of the squares of each of these errors. This is the essential<br />

mathematics of least-squares <strong>regression</strong>.<br />

The calculation of the equation of a least-squares <strong>regression</strong> line is simple, using a<br />

graphics calcula<strong>to</strong>r. The arithmetic background <strong>to</strong> its calculation is shown here for<br />

interest.<br />

The least-squares formulas<br />

Like other <strong>regression</strong> methods, we assume we have a set of (x, y) values.<br />

The number of such values is n.<br />

Let x, y be the averages (means) of the x- and y-values.<br />

2 Recall from an earlier chapter the formulas for variances (sx ) and covariance (sxy) of<br />

bivariate data:<br />

2<br />

2 ( x – x)<br />

sx = ------------------------ ∑<br />

n – 1<br />

or = ∑ xy–<br />

nxy<br />

------------------------n<br />

– 1<br />

( x – x)<br />

( y– y)<br />

sxy = ∑------------------------------------- n – 1<br />

x<br />

or =<br />

Note that the summations (∑) are over all points in the data set. It is beyond the<br />

scope of Further Mathematics <strong>to</strong> derive the formulas for the slope and intercept of the<br />

least-squares <strong>regression</strong> line, so they are simply stated.<br />

2<br />

∑ nx 2<br />

–<br />

-----------------------n<br />

– 1<br />

sxy 2<br />

sx The slope of the <strong>regression</strong> line y = mx + b is m = ------<br />

∑<br />

∑<br />

∑<br />

( x – x)<br />

( y– y)<br />

=<br />

( x – x)<br />

2<br />

--------------------------------------<br />

xy – nxy<br />

or =<br />

x<br />

The y-intercept of the <strong>regression</strong> line is: b = − m<br />

2 nx 2<br />

--------------------------<br />

∑ –<br />

y x

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!