08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.11 Exercises<br />

Exercise 3.1 (Least squares vertical error) In many experiments one collects the<br />

value <strong>of</strong> a parameter at various instances <strong>of</strong> time. Let y i be the value <strong>of</strong> the parameter y<br />

at time x i . Suppose we wish to construct the best linear approximation to the data in the<br />

sense that we wish to minimize the mean square error. Here error is measured vertically<br />

rather than perpendicular to the line. Develop formulas for m and b to minimize the mean<br />

square error <strong>of</strong> the points {(x i , y i ) |1 ≤ i ≤ n} to the line y = mx + b.<br />

Exercise 3.2 Given five observed variables, height, weight, age, income, and blood pressure<br />

<strong>of</strong> n people, how would one find the best least squares fit affine subspace <strong>of</strong> the form<br />

a 1 (height) + a 2 (weight) + a 3 (age) + a 4 (income) + a 5 (blood pressure) = a 6<br />

Here a 1 , a 2 , . . . , a 6 are the unknown parameters. If there is a good best fit 4-dimensional<br />

affine subspace, then one can think <strong>of</strong> the points as lying close to a 4-dimensional sheet<br />

rather than points lying in 5-dimensions. Why might it be better to use the perpendicular<br />

distance to the affine subspace rather than vertical distance where vertical distance is<br />

measured along the coordinate axis corresponding to one <strong>of</strong> the variables?<br />

Exercise 3.3 Manually find the best fit lines (not subspaces which must contain the origin)<br />

through the points in the sets below. Subtract the center <strong>of</strong> gravity <strong>of</strong> the points in<br />

the set from each <strong>of</strong> the points in the set and find the best fit line for the resulting points.<br />

Does the best fit line for the original data go through the origin?<br />

1. (4,4) (6,2)<br />

2. (4,2) (4,4) (6,2) (6,4)<br />

3. (3,2.5) (3,5) (5,1) (5,3.5)<br />

Exercise 3.4 Manually determine the best fit line through the origin for each <strong>of</strong> the<br />

following sets <strong>of</strong> points. Is the best fit line unique? Justify your answer for each <strong>of</strong> the<br />

subproblems.<br />

1. {(0, 1) , (1, 0)}<br />

2. {(0, 1) , (2, 0)}<br />

Exercise 3.5 Manually find the left and right-singular vectors, the singular values, and<br />

the SVD decomposition <strong>of</strong> the matrices in Figure 3.5.<br />

Exercise 3.6 Consider the matrix<br />

A =<br />

⎛<br />

⎜<br />

⎝<br />

1 2<br />

−1 2<br />

1 −2<br />

−1 −2<br />

⎞<br />

⎟<br />

⎠<br />

64

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!