01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.1.2 Estimating the Regression Function<br />

7.1 Simple Linear Regression 273<br />

A popular method of estimating the regression function parameters is to use a least<br />

square error (LSE) approach, by minimising the total sum of the squares of the<br />

errors (deviations) between the observed values yi <strong>and</strong> the estimated values<br />

b0 + b1xi:<br />

E =<br />

n<br />

∑<br />

i=<br />

1<br />

2<br />

i<br />

n<br />

∑<br />

ε = ( y − b − b x ) . 7.2<br />

i=<br />

1<br />

i<br />

0<br />

1<br />

i<br />

2<br />

where b0 <strong>and</strong> b1 are estimates of β0 <strong>and</strong> β1, respectively.<br />

In order to apply the LSE method one starts by differentiating E in order to b0<br />

<strong>and</strong> b1 <strong>and</strong> equalising to zero, obtaining the so-called normal equations:<br />

∑ ∑<br />

⎪⎧<br />

yi<br />

= nb0<br />

+ b1<br />

xi<br />

⎨<br />

⎪⎩ ∑xiyi= b0<br />

∑ xi<br />

+ b1∑x<br />

2<br />

i<br />

, 7.3<br />

where the summations, from now on, are always assumed to be for the n predictor<br />

values. By solving the normal equations, the following parameter estimates, b0 <strong>and</strong><br />

b1, are derived:<br />

b<br />

1<br />

=<br />

∑ ( xi<br />

∑<br />

b0 1<br />

− x)(<br />

yi<br />

− y)<br />

. 7.4<br />

2<br />

( x − x)<br />

i<br />

= y − b x . 7.5<br />

The least square estimates of the linear regression parameters enjoy a number of<br />

desirable properties:<br />

i. The parameters b0 <strong>and</strong> b1 are unbiased estimates of the true parameters β0<br />

<strong>and</strong> β1 ( E[ b 0 ] = β 0 , E[<br />

b1<br />

] = β1<br />

), <strong>and</strong> have minimum variance among all<br />

unbiased linear estimates.<br />

ii. The predicted (or fitted) values yˆ i = b0<br />

+ b1x<br />

i are point estimates of the true,<br />

observed values, yi. The same is valid for the whole relationYˆ = b0<br />

+ b1<br />

X ,<br />

which is the point estimate of the mean response E[Y ].<br />

iii. The regression line always goes through the point ( x , y ).<br />

iv. The computed errors ei = yi<br />

− yˆ<br />

i = yi<br />

− b0<br />

− b1x<br />

i , called the residuals, are<br />

point estimates of the error values εi. The sum of the residuals is zero:<br />

∑ e i = 0 .<br />

v. The residuals are uncorrelated with the predictor <strong>and</strong> the predicted values:<br />

∑ e i xi<br />

= 0; ∑ ei<br />

yˆ<br />

i = 0 .<br />

vi. ∑ yi =∑ yˆ<br />

i ⇒ y = yˆ<br />

, i.e., the predicted values have the same mean as<br />

the observed values.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!