Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a
Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a
Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
40 3. LINEAR REGRESSION<br />
DEFINITION 3.1.1. Given a bivariate quantitative dataset {(x 1 ,y 1 ),...,(x n ,y n )} and<br />
a candidate line ŷ = mx + b passing through this dataset, a residual is <strong>the</strong> difference in<br />
y-co<strong>or</strong>dinates of an actual data point (x i ,y i ) and <strong>the</strong> line’s y value at <strong>the</strong> same x-co<strong>or</strong>dinate.<br />
That is, if <strong>the</strong> y-co<strong>or</strong>dinate of <strong>the</strong> line when x = x i is ŷ i = mx i + b, <strong>the</strong>n <strong>the</strong> residual is <strong>the</strong><br />
measure of err<strong>or</strong> given by err<strong>or</strong> i = y i − ŷ i .<br />
Note we use <strong>the</strong> convention here and elsewhere of writing ŷ f<strong>or</strong> <strong>the</strong> y-co<strong>or</strong>dinate on an<br />
approximating line, while <strong>the</strong> plain y variable is left f<strong>or</strong> actual data values, like y i .<br />
Here is an example of what residuals look like<br />
Now we are in <strong>the</strong> position <strong>to</strong> state <strong>the</strong><br />
DEFINITION 3.1.2. Given a bivariate quantitative dataset <strong>the</strong> least square regression<br />
line, almost always abbreviated <strong>to</strong> LSRL, is <strong>the</strong> line f<strong>or</strong> which <strong>the</strong> sum of <strong>the</strong> squares of<br />
<strong>the</strong> residuals is <strong>the</strong> smallest possible.<br />
FACT 3.1.3. If a bivariate quantitative dataset {(x 1 ,y 1 ),...,(x n ,y n )} has LSRL given<br />
by ŷ = mx + b, <strong>the</strong>n<br />
(1) The slope of <strong>the</strong> LSRL is given by m = r sy<br />
s x<br />
,wherer is <strong>the</strong> c<strong>or</strong>relation coefficient<br />
of <strong>the</strong> dataset.<br />
(2) The LSRL passes through <strong>the</strong> point (x, y).<br />
(3) It follows that <strong>the</strong> y-intercept of <strong>the</strong> LSRL is given by b = y − xm= y − xr sy<br />
s x<br />
.<br />
It is possible <strong>to</strong> find <strong>the</strong> (coefficients of <strong>the</strong>) LSRL using <strong>the</strong> above inf<strong>or</strong>mation, but it<br />
is often m<strong>or</strong>e convenient <strong>to</strong> use a calculat<strong>or</strong> <strong>or</strong> o<strong>the</strong>r electronic <strong>to</strong>ol. Such <strong>to</strong>ols also make<br />
it very easy <strong>to</strong> graph <strong>the</strong> LSRL right on <strong>to</strong>p of <strong>the</strong> scatterplot – although it is often fairly<br />
easy <strong>to</strong> sketch what <strong>the</strong> LSRL will likely look like by just making a good guess, using