06.09.2021 Views

Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a

Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a

Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

40 3. LINEAR REGRESSION<br />

DEFINITION 3.1.1. Given a bivariate quantitative dataset {(x 1 ,y 1 ),...,(x n ,y n )} and<br />

a candidate line ŷ = mx + b passing through this dataset, a residual is <strong>the</strong> difference in<br />

y-co<strong>or</strong>dinates of an actual data point (x i ,y i ) and <strong>the</strong> line’s y value at <strong>the</strong> same x-co<strong>or</strong>dinate.<br />

That is, if <strong>the</strong> y-co<strong>or</strong>dinate of <strong>the</strong> line when x = x i is ŷ i = mx i + b, <strong>the</strong>n <strong>the</strong> residual is <strong>the</strong><br />

measure of err<strong>or</strong> given by err<strong>or</strong> i = y i − ŷ i .<br />

Note we use <strong>the</strong> convention here and elsewhere of writing ŷ f<strong>or</strong> <strong>the</strong> y-co<strong>or</strong>dinate on an<br />

approximating line, while <strong>the</strong> plain y variable is left f<strong>or</strong> actual data values, like y i .<br />

Here is an example of what residuals look like<br />

Now we are in <strong>the</strong> position <strong>to</strong> state <strong>the</strong><br />

DEFINITION 3.1.2. Given a bivariate quantitative dataset <strong>the</strong> least square regression<br />

line, almost always abbreviated <strong>to</strong> LSRL, is <strong>the</strong> line f<strong>or</strong> which <strong>the</strong> sum of <strong>the</strong> squares of<br />

<strong>the</strong> residuals is <strong>the</strong> smallest possible.<br />

FACT 3.1.3. If a bivariate quantitative dataset {(x 1 ,y 1 ),...,(x n ,y n )} has LSRL given<br />

by ŷ = mx + b, <strong>the</strong>n<br />

(1) The slope of <strong>the</strong> LSRL is given by m = r sy<br />

s x<br />

,wherer is <strong>the</strong> c<strong>or</strong>relation coefficient<br />

of <strong>the</strong> dataset.<br />

(2) The LSRL passes through <strong>the</strong> point (x, y).<br />

(3) It follows that <strong>the</strong> y-intercept of <strong>the</strong> LSRL is given by b = y − xm= y − xr sy<br />

s x<br />

.<br />

It is possible <strong>to</strong> find <strong>the</strong> (coefficients of <strong>the</strong>) LSRL using <strong>the</strong> above inf<strong>or</strong>mation, but it<br />

is often m<strong>or</strong>e convenient <strong>to</strong> use a calculat<strong>or</strong> <strong>or</strong> o<strong>the</strong>r electronic <strong>to</strong>ol. Such <strong>to</strong>ols also make<br />

it very easy <strong>to</strong> graph <strong>the</strong> LSRL right on <strong>to</strong>p of <strong>the</strong> scatterplot – although it is often fairly<br />

easy <strong>to</strong> sketch what <strong>the</strong> LSRL will likely look like by just making a good guess, using

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!