24.03.2013 Views

For example what is the matrix of the linear transformation from R3 ...

For example what is the matrix of the linear transformation from R3 ...

For example what is the matrix of the linear transformation from R3 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

23C. Approximation and best-fit<br />

Example 1. Consider <strong>the</strong> system <strong>of</strong> equations:<br />

Aβ = w<br />

1 1<br />

1<br />

<br />

<br />

<br />

<br />

<br />

2 2<br />

<br />

<br />

1<br />

<br />

.<br />

<br />

<br />

2 0<br />

<br />

1<br />

Clearly <strong>the</strong>re <strong>is</strong> no solution. The problem here <strong>is</strong> to find a way to identify a “best approximation” to a solution.<br />

Use <strong>the</strong> idea <strong>of</strong> projection to formulate a geometric condition for th<strong>is</strong>.<br />

One way we have learned to think about th<strong>is</strong> system <strong>is</strong> that it’s asking<br />

us to write w as a <strong>linear</strong> combination <strong>of</strong> <strong>the</strong> columns <strong>of</strong> A. Call <strong>the</strong>se<br />

columns u and v. Then write <strong>the</strong> system as:<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

u v<br />

<br />

<br />

w<br />

<br />

<br />

<br />

<br />

<br />

<br />

Now u and v span a plane and since w cannot be written as a <strong>linear</strong><br />

combination <strong>of</strong> u and v, it does not lie in that plane.<br />

Thinking <strong>of</strong> our work with projections, we might propose that a good<br />

candidate for a “best” approximation to a solution would be <strong>the</strong> <strong>linear</strong><br />

combination that was closest to w, that <strong>is</strong>, that minimized <strong>the</strong> length <strong>of</strong><br />

<strong>the</strong> vector ε drawn <strong>from</strong> <strong>the</strong> <strong>linear</strong> combination Aβ to w. And we can<br />

find th<strong>is</strong> <strong>linear</strong> combination by projecting w onto <strong>the</strong> plane. In that<br />

case, <strong>the</strong> best approximation to β would be <strong>the</strong> coefficients <strong>of</strong> <strong>the</strong> <strong>linear</strong><br />

combination <strong>of</strong> u and v that gave us th<strong>is</strong> projection.<br />

A least-squares solution <strong>of</strong> <strong>the</strong> equation<br />

Aβ = w<br />

<strong>is</strong> a vector β = β^ that minimizes <strong>the</strong> length <strong>of</strong> <strong>the</strong> “error”<br />

ε = w – Aβ.<br />

Geometrical considerations tell us that th<strong>is</strong> will happen when <strong>is</strong> orthogonal to<br />

<strong>the</strong> columns <strong>of</strong> A. The vector β^ can be calculated as <strong>the</strong> set <strong>of</strong> coefficients <strong>of</strong><br />

<strong>the</strong> <strong>linear</strong> combination <strong>of</strong> u and v that gives us th<strong>is</strong> projection <strong>of</strong> w onto <strong>the</strong> plane<br />

spanned by <strong>the</strong> columns <strong>of</strong> A.<br />

Note 1. Of course if <strong>the</strong> system happens to have a solution, <strong>the</strong>n w will lie in <strong>the</strong><br />

plane, and <strong>the</strong> minimum ε will be 0.<br />

Note 2: The “hat” on β^ <strong>is</strong> put <strong>the</strong>re to signal that β <strong>is</strong> not a solution to <strong>the</strong> original<br />

equation but <strong>is</strong> an approximation.<br />

23C approximation and best-fit. 1<br />

O<br />

u v<br />

Aβ^<br />

w<br />

ε


Example 2. A <strong>matrix</strong> computational scheme. In Example 1 we used <strong>the</strong><br />

simple geometrical idea <strong>of</strong> projection to specify a “best approximation”<br />

β^ to a solution <strong>of</strong> a <strong>linear</strong> system such as:<br />

1 1<br />

1<br />

Aβ = w:<br />

<br />

<br />

<br />

<br />

<br />

2 2<br />

<br />

<br />

1<br />

<br />

<br />

<br />

<br />

2 0<br />

<br />

1<br />

In Section 22 we used <strong>the</strong> projection <strong>of</strong> w on <strong>the</strong> plane spanned by u<br />

and v to find such a solution (α, β) to <strong>the</strong> above system. To summarize<br />

<strong>the</strong> method, we first wrote <strong>the</strong> equation as<br />

Aβ + ε = w<br />

1<br />

<br />

<br />

2<br />

<br />

2<br />

1<br />

1<br />

1<br />

<br />

<br />

2<br />

<br />

<br />

<br />

<br />

<br />

<br />

2<br />

<br />

1<br />

<br />

<br />

<br />

0<br />

<br />

<br />

3<br />

<br />

<br />

1<br />

and writing <strong>the</strong> first term as a <strong>linear</strong> combination <strong>of</strong> <strong>the</strong> columns <strong>of</strong> A:<br />

αu + βv + ε = w<br />

1 1<br />

1<br />

1<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

2<br />

<br />

<br />

<br />

2<br />

<br />

2<br />

<br />

1<br />

<br />

<br />

2 <br />

<br />

0<br />

<br />

3<br />

<br />

<br />

1<br />

The ε vector serves as an error, and now <strong>the</strong>re <strong>is</strong> always a solution to<br />

<strong>the</strong> system, in fact <strong>the</strong>re <strong>is</strong> a solution for every choice <strong>of</strong> α and β––we<br />

can always choose <strong>the</strong> εi to make it work. Of course we want to choose<br />

<strong>the</strong> εi to be as small as possible, and indeed <strong>the</strong> criterion we used in<br />

section 22 was to minimize <strong>the</strong> sum <strong>of</strong> <strong>the</strong> squares <strong>of</strong> <strong>the</strong> εi (which <strong>is</strong><br />

<strong>the</strong> square <strong>of</strong> <strong>the</strong> length <strong>of</strong> ε). Th<strong>is</strong> occurs when ε <strong>is</strong> orthogonal to <strong>the</strong><br />

plane.<br />

To effectively “use” th<strong>is</strong> orthogonality, we took <strong>the</strong> dot product <strong>of</strong> <strong>the</strong><br />

equation with both u and v:<br />

αu•u + βu•v + u•ε = u•w<br />

αv•u + βv•v + v•ε = v•w<br />

Writing <strong>the</strong>se in <strong>matrix</strong> form:<br />

<br />

u<br />

u<br />

u v<br />

u<br />

<br />

u<br />

w<br />

<br />

<br />

<br />

v<br />

u<br />

v v<br />

v<br />

<br />

v<br />

w<br />

Since ε <strong>is</strong> orthogonal to <strong>the</strong> plane, u•ε and v•ε are both zero, and we get<br />

a “square” system in which <strong>the</strong> number <strong>of</strong> equations <strong>is</strong> <strong>the</strong> same as <strong>the</strong><br />

number <strong>of</strong> unknowns:<br />

u<br />

u<br />

u v<br />

u<br />

w<br />

<br />

<br />

.<br />

v<br />

u<br />

v v<br />

v<br />

w<br />

The dot products are calculated at <strong>the</strong> right:<br />

9<br />

<br />

<br />

3<br />

3<br />

1<br />

<br />

.<br />

5 <br />

3<br />

We could solve th<strong>is</strong> by elimination, but instead we use <strong>the</strong> <strong>matrix</strong> inverse<br />

approach:<br />

ˆ 9<br />

ˆ<br />

<br />

<br />

<br />

3<br />

1<br />

3<br />

1<br />

<br />

5 3<br />

1<br />

36<br />

5<br />

<br />

3<br />

31<br />

<br />

<br />

93<br />

1<br />

36<br />

14<br />

<br />

30<br />

1<br />

18<br />

7 <br />

.<br />

15<br />

We get α^ = 7/18 and β^ = 15/18. In applications, we <strong>of</strong>ten put “hats”<br />

on α and β to signal that <strong>the</strong>se are not solutions to <strong>the</strong> original equation.<br />

23C approximation and best-fit. 2<br />

O<br />

u v<br />

1 <br />

u <br />

<br />

<br />

2<br />

<br />

<br />

2 <br />

Aβ^<br />

1<br />

v <br />

<br />

<br />

2<br />

<br />

<br />

0<br />

w<br />

ε<br />

The method <strong>of</strong> section 22<br />

gives a solution which minimizes<br />

||ε|| 2 = ε1 2 + ε2 2 + ε3 2<br />

u•u = 9<br />

u•v = v•u = –3<br />

v•v = 5<br />

u•w = 1<br />

v•w = 3<br />

1<br />

w <br />

<br />

<br />

1<br />

<br />

<br />

1


As a check we calculate <strong>the</strong> error:<br />

= w – Aβ^ =<br />

1<br />

1 1<br />

1<br />

11<br />

<br />

<br />

1<br />

<br />

–<br />

1 7 <br />

<br />

2 2<br />

=<br />

1<br />

18<br />

<br />

1<br />

<br />

2 0<br />

15<br />

<br />

1<br />

<br />

–<br />

<br />

<br />

8<br />

<br />

=<br />

9<br />

<br />

<br />

1<br />

<br />

7 <br />

2<br />

1 <br />

<br />

1<br />

9 <br />

<br />

2 <br />

and check that it <strong>is</strong> orthogonal to <strong>the</strong> vectors u and v. Always do that!<br />

A technical note:<br />

In text-books on <strong>the</strong> subject, you will <strong>of</strong>ten see <strong>the</strong> original equation<br />

1 1<br />

1<br />

1<br />

Aβ + ε = w<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

2 2<br />

<br />

<br />

2<br />

<br />

1<br />

<br />

<br />

<br />

<br />

2 0<br />

<br />

3<br />

<br />

<br />

1<br />

transformed by multiplying both sides on <strong>the</strong> left by <strong>the</strong> transpose <strong>of</strong> A.<br />

A T Aβ + A T = A T 1 1<br />

1<br />

<br />

1<br />

w 1 2 2<br />

<br />

1 2 2<br />

1 2 2<br />

<br />

<br />

<br />

<br />

2 2<br />

<br />

<br />

2<br />

<br />

1<br />

<br />

1<br />

2 0<br />

<br />

<br />

1<br />

2 0<br />

<br />

1<br />

2 0<br />

2 0<br />

3<br />

<br />

<br />

1<br />

Th<strong>is</strong> <strong>is</strong> actually just a wonderfully slick way <strong>of</strong> getting our 2×2 system.<br />

<strong>For</strong> <strong>example</strong> consider <strong>the</strong> product <strong>of</strong> <strong>the</strong> first two matrices. The first <strong>is</strong><br />

2×3 and <strong>the</strong> second <strong>is</strong> 3×2 and <strong>the</strong> product will <strong>the</strong>n be 2×2. Now if<br />

you think carefully about <strong>the</strong> terms <strong>of</strong> <strong>the</strong> <strong>matrix</strong> products, you will see<br />

that <strong>the</strong>y are dot products. Thus, <strong>the</strong> entries <strong>of</strong> <strong>the</strong> <strong>matrix</strong> A T A are <strong>the</strong><br />

dot products u•u, u•v, v•u and v•v and <strong>the</strong> system above <strong>is</strong> simply:<br />

(A T A)β + A T = A T w<br />

u<br />

u<br />

<br />

v<br />

u<br />

<br />

u v<br />

u<br />

<br />

u<br />

w<br />

<br />

<br />

<br />

v v<br />

<br />

<br />

v<br />

<br />

v<br />

w<br />

No matter how you think <strong>of</strong> it, <strong>what</strong> <strong>is</strong> important <strong>is</strong> that you can easily arrive at <strong>the</strong> corresponding square system <strong>of</strong><br />

equations which will allow you to find <strong>the</strong> least-squares approximation. The forms <strong>of</strong> <strong>the</strong>se matrices are given below<br />

for <strong>the</strong> 2×2 and <strong>the</strong> 3×3 cases.<br />

Table <strong>of</strong> matrices for <strong>the</strong> 2×2 and 3×3 cases<br />

u<br />

u<br />

u<br />

u v<br />

A = [u v] A T A =<br />

u<br />

v<br />

<br />

v<br />

v<br />

u v v<br />

A = [u v w]<br />

u<br />

<br />

<br />

<br />

<br />

w<br />

The least-squares solution <strong>of</strong> <strong>the</strong> equation<br />

A T A = v u v w<br />

u<br />

u<br />

<br />

<br />

<br />

v u<br />

<br />

w u<br />

Aβ = w<br />

<strong>is</strong> <strong>the</strong> solution β = β^ <strong>of</strong> <strong>the</strong> “square” system <strong>of</strong> equations:<br />

A T Aβ = A T w.<br />

w v<br />

u w <br />

v w<br />

<br />

<br />

w w<br />

A T w =<br />

u<br />

u<br />

w<br />

w<br />

<br />

v<br />

v<br />

w<br />

23C approximation and best-fit. 3<br />

u v<br />

v v<br />

A T z =<br />

O<br />

u v<br />

Here A T <strong>is</strong> <strong>the</strong> transpose <strong>of</strong> A,<br />

defined as <strong>the</strong> <strong>matrix</strong> whose<br />

rows are <strong>the</strong> columns <strong>of</strong> A.<br />

Thus:<br />

1<br />

<br />

<br />

2<br />

<br />

2<br />

1<br />

2<br />

<br />

<br />

0<br />

T<br />

Aβ^<br />

u<br />

u<br />

z <br />

<br />

<br />

v<br />

<br />

z <br />

<br />

v z<br />

<br />

<br />

w<br />

<br />

w z<br />

w<br />

ε<br />

1 2<br />

<br />

1<br />

2<br />

2<br />

.<br />

0


Example 3. Find <strong>the</strong> least squares approximation to <strong>the</strong> solution <strong>of</strong> <strong>the</strong><br />

system <strong>of</strong> equations:<br />

2x + y = 3<br />

x – y = 1<br />

x + y = 2<br />

Soln.<br />

In <strong>matrix</strong> form <strong>the</strong> system <strong>is</strong>:<br />

2<br />

<br />

<br />

1<br />

<br />

1<br />

Ax = b<br />

1 3<br />

<br />

x<br />

1<br />

<br />

<br />

<br />

<br />

1<br />

<br />

.<br />

<br />

y<br />

1<br />

<br />

<br />

2<br />

The equation for <strong>the</strong> least-squares solution x <strong>is</strong> A T Ax = A T b.<br />

We calculate:<br />

A T<br />

u<br />

u<br />

A <br />

v<br />

u<br />

u v<br />

6<br />

<br />

v v<br />

<br />

2<br />

And <strong>the</strong> equation to be solved <strong>is</strong>:<br />

The solution <strong>is</strong>:<br />

^<br />

x<br />

6<br />

<br />

y<br />

2<br />

6<br />

<br />

2<br />

1<br />

2<br />

9<br />

<br />

3<br />

4<br />

We get: x^ = 19/14, y^ = 6/14.<br />

2<br />

3<br />

<br />

<br />

T<br />

A w<br />

2<br />

x<br />

9<br />

<br />

3<br />

<br />

<br />

y<br />

4<br />

1 3<br />

14<br />

<br />

<br />

2<br />

u<br />

w<br />

9<br />

<br />

4<br />

<br />

v<br />

w<br />

<br />

29<br />

<br />

<br />

6 4<br />

1 19<br />

.<br />

14 6 <br />

Example 4. Use <strong>the</strong> above approach to find a least squares solution for<br />

<strong>the</strong> equation<br />

1 6<br />

3 <br />

<br />

<br />

<br />

<br />

<br />

0 5<br />

<br />

<br />

1<br />

<br />

.<br />

<br />

<br />

2<br />

3 <br />

2 <br />

Solution. The new system <strong>is</strong>:<br />

1<br />

<br />

<br />

6<br />

0<br />

5<br />

1 2<br />

<br />

0<br />

3<br />

2<br />

6<br />

<br />

1<br />

5<br />

<br />

<br />

<br />

<br />

6<br />

3 <br />

0<br />

5<br />

5<br />

<br />

0<br />

0 <br />

7 <br />

<br />

<br />

70<br />

<br />

<br />

17<br />

3 <br />

2<br />

<br />

<br />

1<br />

3 <br />

<br />

2 <br />

The equations are uncoupled, <strong>the</strong> first involving only α and <strong>the</strong> second<br />

only β. They read 5α = 7 and 70β = –17 and <strong>the</strong> solution <strong>is</strong><br />

ˆ 7 / 5 <br />

ˆ<br />

.<br />

<br />

17<br />

/ 70<br />

2<br />

u <br />

<br />

<br />

1<br />

<br />

<br />

1<br />

u•u = 6<br />

u•v = v•u = 2<br />

v•v = 3<br />

u•w = 9<br />

v•w = 4<br />

1 <br />

v <br />

<br />

<br />

1<br />

<br />

<br />

1 <br />

3<br />

w <br />

<br />

<br />

1<br />

<br />

<br />

2<br />

Note: Th<strong>is</strong> <strong>is</strong> an <strong>example</strong> in which<br />

<strong>the</strong> columns <strong>of</strong> A are orthogonal.<br />

Notice that th<strong>is</strong> gives us a diagonal<br />

coefficient <strong>matrix</strong> for our new system,<br />

and <strong>the</strong> solution can be found<br />

immediately.<br />

23C approximation and best-fit. 4


Example 5. Find <strong>the</strong> line y = x + that “best fits” <strong>the</strong> data at <strong>the</strong><br />

right and calculate and d<strong>is</strong>play <strong>the</strong> error vector defined as <strong>the</strong> difference<br />

between <strong>the</strong> y-value <strong>of</strong> <strong>the</strong> point and <strong>the</strong> height <strong>of</strong> <strong>the</strong> approximating<br />

line. [Thus in <strong>the</strong> diagram, ε2 <strong>is</strong> positive (point above <strong>the</strong> line) and<br />

<strong>the</strong> o<strong>the</strong>r three errors are negative..<br />

Solution. Plug <strong>the</strong> data points into <strong>the</strong> equation. We get:<br />

yi = xi + + εi<br />

10 = α + β + ε1<br />

40 = 2α + β + ε2<br />

20 = 3α + β + ε3<br />

30 = 4α + β + ε4<br />

where <strong>the</strong> εi are <strong>the</strong> "errors" which measure <strong>the</strong> vertical d<strong>is</strong>tance between<br />

<strong>the</strong> ith data point and <strong>the</strong> line. In vector-<strong>matrix</strong> form, th<strong>is</strong> <strong>is</strong><br />

1<br />

<br />

<br />

2<br />

3<br />

<br />

4<br />

X + = y.<br />

1<br />

1<br />

10<br />

1<br />

<br />

<br />

<br />

<br />

2<br />

<br />

40<br />

<br />

<br />

.<br />

1<br />

<br />

3 20<br />

<br />

1<br />

<br />

4 30<br />

If <strong>the</strong>re was a line passing exactly through all four data points <strong>the</strong>n we<br />

could find a pair (α, β) which sat<strong>is</strong>fied all four equation with εi = 0. But<br />

th<strong>is</strong> <strong>is</strong> not <strong>the</strong> case so we try to choose α and β to make <strong>the</strong> error as small<br />

as possible. The ideas above tell us that th<strong>is</strong> will be <strong>the</strong> case when <strong>is</strong><br />

orthogonal to <strong>the</strong> columns <strong>of</strong> <strong>the</strong> <strong>matrix</strong> X. In th<strong>is</strong> <strong>example</strong>, <strong>the</strong>se are vectors<br />

in R 4 , but <strong>the</strong> same dot product analys<strong>is</strong> works. The system we get <strong>is</strong>:<br />

<br />

^<br />

30<br />

<br />

<br />

^<br />

10<br />

u<br />

u<br />

<br />

v<br />

u<br />

10<br />

4<br />

<br />

<br />

30<br />

<br />

10<br />

1<br />

u v<br />

u<br />

w<br />

<br />

<br />

v v<br />

<br />

<br />

v<br />

w<br />

10<br />

270<br />

<br />

4<br />

<br />

<br />

100<br />

270<br />

<br />

100<br />

1<br />

20<br />

4<br />

<br />

10<br />

10270<br />

4 <br />

<br />

30<br />

<br />

100<br />

15<br />

Th<strong>is</strong> tells us that <strong>the</strong> best fit line <strong>is</strong> y = 4x + 15. The error <strong>is</strong>:<br />

=<br />

<br />

1 <br />

<br />

<br />

2 <br />

<br />

3 <br />

<br />

<br />

4 <br />

= y – X^ =<br />

10<br />

1<br />

<br />

<br />

40<br />

<br />

2<br />

20<br />

3<br />

<br />

30<br />

4<br />

As usual, check that <strong>is</strong> orthogonal to <strong>the</strong> columns <strong>of</strong> X.<br />

1<br />

<br />

9<br />

1<br />

<br />

<br />

4 <br />

<br />

17<br />

<br />

.<br />

115<br />

<br />

7<br />

<br />

1<br />

1<br />

23C approximation and best-fit. 5<br />

50<br />

y<br />

40<br />

30<br />

20<br />

10<br />

0 1<br />

x y<br />

1 10<br />

2 40<br />

3 20<br />

4 30<br />

1<br />

2<br />

3<br />

4<br />

x<br />

2 3 4 5<br />

The εi are actually signed errors: when<br />

<strong>the</strong> point <strong>is</strong> below <strong>the</strong> line, εi <strong>is</strong> negative.<br />

Th<strong>is</strong> <strong>is</strong> clear <strong>from</strong> <strong>the</strong> equations<br />

at <strong>the</strong> right.<br />

1<br />

1<br />

<br />

<br />

2<br />

<br />

1<br />

u v <br />

3<br />

1<br />

<br />

4<br />

1<br />

u•u = 30<br />

u•v = v•u = 10<br />

v•v = 4<br />

u•y = 270<br />

v•y = 100<br />

10<br />

<br />

<br />

40<br />

y <br />

20<br />

<br />

30<br />

The regression line <strong>is</strong> <strong>of</strong>ten<br />

called <strong>the</strong> least squares line.<br />

That’s because it <strong>is</strong> <strong>the</strong> sum <strong>of</strong><br />

<strong>the</strong> squares <strong>of</strong> <strong>the</strong> errors i that<br />

<strong>is</strong> being minimized.


Example 6. The data points at <strong>the</strong> right have been observed.<br />

It <strong>is</strong> desired to fit a quadratic regression model:<br />

y = x 2 + x + .<br />

Find <strong>the</strong> least squares quadratic polynomial and plot its graph<br />

along with <strong>the</strong> data points. Calculate <strong>the</strong> residual vector .<br />

Solution: We write <strong>the</strong> six equations, one for each data point as<br />

<strong>the</strong> vector equation:<br />

0<br />

<br />

<br />

0<br />

1<br />

<br />

1<br />

4<br />

<br />

<br />

4<br />

0<br />

0<br />

1<br />

1<br />

2<br />

2<br />

X + = y.<br />

1<br />

1<br />

0<br />

<br />

1<br />

<br />

<br />

<br />

2 <br />

1<br />

<br />

<br />

1<br />

<br />

3 0<br />

<br />

<br />

<br />

<br />

1<br />

<br />

4 1<br />

<br />

1<br />

<br />

1<br />

5<br />

<br />

1<br />

<br />

6 <br />

<br />

2<br />

The condition that be orthogonal to <strong>the</strong> columns <strong>of</strong> X <strong>is</strong>:<br />

34<br />

<br />

<br />

18<br />

<br />

10<br />

X T = 0<br />

X T (y – X) = 0<br />

X T X = X T y<br />

18<br />

10<br />

6<br />

10<br />

13<br />

6<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

7 .<br />

<br />

6 <br />

<br />

<br />

<br />

5 <br />

We can solve th<strong>is</strong> by solving <strong>the</strong> system <strong>of</strong> equations by elimination<br />

or using technology to calculate <strong>the</strong> <strong>matrix</strong> inverse. A good website<br />

<strong>is</strong>: http://www.bluebit.gr/<strong>matrix</strong>-calculator/calculate.aspx<br />

The solution <strong>is</strong><br />

<br />

^<br />

34<br />

<br />

<br />

<br />

<br />

^<br />

<br />

18<br />

<br />

^<br />

<br />

10<br />

18<br />

10<br />

6<br />

10<br />

6<br />

<br />

<br />

6 <br />

1<br />

giving us <strong>the</strong> best fit parabola<br />

The residual vector <strong>is</strong><br />

= y – X^ =<br />

13<br />

3<br />

1<br />

<br />

<br />

<br />

7<br />

<br />

6<br />

4<br />

<br />

5 <br />

<br />

1<br />

1 2<br />

y = (x – x + 1) .<br />

2<br />

0<br />

0<br />

<br />

<br />

1<br />

<br />

0<br />

0<br />

1<br />

<br />

1<br />

1<br />

1<br />

4<br />

<br />

<br />

2<br />

<br />

4<br />

0<br />

0<br />

1<br />

1<br />

2<br />

2<br />

6<br />

13<br />

3<br />

1<br />

1<br />

1<br />

<br />

<br />

1<br />

1 <br />

1<br />

1 1 1<br />

<br />

1<br />

<br />

<br />

1<br />

2 2<br />

1 <br />

1<br />

1<br />

1<br />

<br />

1<br />

<br />

1 <br />

As usual, check that <strong>is</strong> orthogonal to <strong>the</strong> columns <strong>of</strong> X .<br />

1 13<br />

1 <br />

<br />

1<br />

3 <br />

<br />

<br />

7<br />

<br />

1<br />

2 <br />

2 <br />

<br />

5 <br />

<br />

1 <br />

x y<br />

0 0<br />

0 1<br />

1 0<br />

1 1<br />

2 1<br />

2 2<br />

23C approximation and best-fit. 6<br />

3<br />

2<br />

1<br />

0<br />

3<br />

2<br />

1<br />

0<br />

y<br />

y<br />

0 1 2 3<br />

0<br />

0<br />

<br />

<br />

0<br />

<br />

0<br />

<br />

1<br />

1<br />

u v <br />

1<br />

1<br />

4<br />

2<br />

<br />

<br />

4<br />

<br />

2<br />

u•u = 34<br />

v•v = 10<br />

w•w = 6<br />

u•v = v•u = 18<br />

u•w = w•u = 10<br />

v•w = w•v = 6<br />

1<br />

<br />

<br />

1<br />

<br />

1<br />

w <br />

1<br />

1<br />

<br />

<br />

1<br />

y= (x^ 2 - x + 1)/2<br />

0 1 2 3<br />

x<br />

x<br />

0<br />

<br />

<br />

1<br />

<br />

0<br />

y <br />

1<br />

1<br />

<br />

<br />

2<br />

Having seen th<strong>is</strong> picture, can<br />

you see a way you might have<br />

deduced that th<strong>is</strong> would have to<br />

be <strong>the</strong> answer without doing<br />

any work at all?


Example 7. Find <strong>the</strong> plane z = x + y that “best fits” <strong>the</strong> data at <strong>the</strong><br />

right and calculate and d<strong>is</strong>play <strong>the</strong> error vector .<br />

Solution. Plug <strong>the</strong> data points into <strong>the</strong> equation. We get:<br />

zi = xi + yi<br />

10 = α + β + ε1<br />

30 = α + 2β + ε2<br />

20 = 2α + β + ε3<br />

50 = 2α + 2β + ε4<br />

where <strong>the</strong> εi are <strong>the</strong> "errors" which measure <strong>the</strong> vertical d<strong>is</strong>tance between<br />

<strong>the</strong> ith data point and <strong>the</strong> plane. In vector-<strong>matrix</strong> form, th<strong>is</strong> <strong>is</strong><br />

1<br />

<br />

<br />

1<br />

2<br />

<br />

2<br />

X + = z.<br />

1<br />

1<br />

10<br />

2<br />

<br />

<br />

<br />

<br />

2<br />

<br />

30<br />

<br />

<br />

.<br />

1<br />

<br />

3 20<br />

<br />

2<br />

<br />

4 50<br />

The least-squares condition <strong>is</strong> that ε be orthogonal to <strong>the</strong> columns <strong>of</strong> <strong>the</strong><br />

<strong>matrix</strong> X, that <strong>is</strong> X T ε = 0. And <strong>the</strong> algebraic condition for that <strong>is</strong>:<br />

<br />

^<br />

10<br />

<br />

<br />

^<br />

9<br />

9 <br />

10<br />

<br />

<br />

x<br />

x<br />

<br />

y<br />

x<br />

1<br />

10<br />

<br />

9<br />

(X T X) = X T y<br />

180<br />

<br />

190<br />

Th<strong>is</strong> tells us that <strong>the</strong> best fit plane <strong>is</strong><br />

The error <strong>is</strong>:<br />

=<br />

1<br />

<br />

<br />

<br />

2<br />

<br />

3<br />

<br />

<br />

<br />

4<br />

<br />

= z–X^ =<br />

x y<br />

x<br />

z<br />

<br />

<br />

y y<br />

<br />

<br />

y<br />

z<br />

9 <br />

180<br />

<br />

10<br />

<br />

<br />

190<br />

1<br />

19<br />

10<br />

<br />

<br />

9<br />

9180<br />

<br />

<br />

10 190<br />

1<br />

z ( 90x<br />

280y)<br />

.<br />

19<br />

10<br />

<br />

<br />

30<br />

1<br />

<br />

20<br />

19<br />

<br />

50<br />

1<br />

<br />

<br />

1<br />

2<br />

<br />

2<br />

1<br />

2<br />

<br />

<br />

90 <br />

<br />

1280<br />

<br />

2<br />

As usual, check that <strong>is</strong> orthogonal to <strong>the</strong> columns <strong>of</strong> X.<br />

1<br />

19<br />

1<br />

19<br />

90 <br />

<br />

280<br />

190<br />

370<br />

<br />

<br />

570 650<br />

<br />

380<br />

460<br />

<br />

950<br />

740<br />

180<br />

<br />

<br />

80<br />

.<br />

80 <br />

<br />

210 <br />

23C approximation and best-fit. 7<br />

1<br />

19<br />

x<br />

x y z<br />

1 1 10<br />

1 2 30<br />

2 1 20<br />

2 2 50<br />

z<br />

1<br />

1<br />

<br />

<br />

1<br />

<br />

2<br />

x y <br />

2<br />

1<br />

<br />

2<br />

2<br />

x•x = 10<br />

x•y = y•x = 9<br />

y•y = 10<br />

x•z = 180<br />

y•z = 190<br />

10<br />

<br />

<br />

30<br />

z <br />

20<br />

<br />

50<br />

y


Example 8. At <strong>the</strong> right <strong>is</strong> tabulated <strong>the</strong> mid-term test mark (out <strong>of</strong><br />

10) and <strong>the</strong> final marks for four <strong>of</strong> Jason’s friends who took <strong>the</strong> course<br />

last year. Jason’s mid-year test mark <strong>is</strong> 7. He decides to use a <strong>linear</strong><br />

model<br />

y = mx + b<br />

to predict h<strong>is</strong> final mark. What does he d<strong>is</strong>cover?<br />

Solution.<br />

We use a “least squares” model. The equations are:<br />

90 = 8m + b<br />

85 = 7m + b<br />

95 = 9m + b<br />

75 = 7m + b<br />

In <strong>matrix</strong> form:<br />

8<br />

<br />

<br />

7<br />

9<br />

<br />

7<br />

Xb = y<br />

1<br />

90<br />

1<br />

<br />

<br />

m<br />

<br />

85<br />

.<br />

1<br />

<br />

b<br />

95<br />

<br />

1<br />

75<br />

The least squares solution b^ <strong>is</strong> <strong>the</strong> solution <strong>of</strong> <strong>the</strong> equation:<br />

243<br />

<br />

31<br />

m^<br />

243<br />

<br />

b^<br />

31<br />

The best fit recursive equation <strong>is</strong><br />

(X T X)b^ = X T y.<br />

31m^<br />

2695<br />

<br />

4<br />

<br />

<br />

b^<br />

345 <br />

31<br />

4<br />

<br />

<br />

1<br />

2695<br />

<br />

345 <br />

y = (85x + 290)/11<br />

1 85 <br />

.<br />

11 290<br />

Jason’s estimate <strong>of</strong> h<strong>is</strong> final mark <strong>is</strong> y = (85×7 + 290)/11 = 80.4.<br />

student Mid Final<br />

Brenda 8 90<br />

Alnoor 7 87<br />

Kit 9 94<br />

Tim 7 75<br />

23C approximation and best-fit. 8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!