20.08.2013 Views

ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab

ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab

ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>ECE</strong> <strong>174</strong> <strong>Homework</strong> # 5 <strong>Solutions</strong> <strong>Spring</strong> <strong>2010</strong><br />

FINAL EXAM: Thursday, June 10, 7-10pm, in regular classroom.<br />

The final exam is closed notes and book. Bring only a pen (no pencils). I can ask<br />

vocabulary questions on the final exam and expect you to know technical terms used in the<br />

problem statements. 1 The best way to study for the exam is to understand concepts and<br />

study all of the homework solutions, lecture notes, projects, and lecture supplements.<br />

<strong>Solutions</strong><br />

1. Note that a singular value decomposition (SVD) is not fully unique. However, the singular<br />

values are unique. Additional unique quantities are the projection matrices onto<br />

the four fundamental subspaces and the pseudoinverse, all of which can be computed<br />

from knowledge of the SVD.<br />

(a) We first determine that m = 3, n = 1, r = 1, ν = 0, and µ = 2. Once the<br />

dimensions of the various subspaces are known, we can systematically work the<br />

matrix A into the appropriate factorization, 2<br />

A =<br />

=<br />

⎛ ⎞ ⎛ 1 ⎞<br />

1 √<br />

2<br />

⎝0⎠<br />

= ⎝ 0 ⎠ (<br />

1 1 √<br />

2<br />

√ 2)(1) = U1SV T<br />

1<br />

⎛<br />

√1 0 −<br />

2<br />

⎝<br />

1 ⎞ ⎛√<br />

⎞<br />

√<br />

2 2<br />

0 1 0 ⎠ ⎝ 0 ⎠ (1) =<br />

√1 1 0 √2 0<br />

2 ( )<br />

U1 U2<br />

( S<br />

0<br />

)<br />

(V )<br />

T<br />

T<br />

1 = UΣV .<br />

All the columns of U provide an orthonormal basis for Y = R3 . The column of U1<br />

spans R(A) and the two columns of U2 provide an orthonormal basis for N (AT ).<br />

The single element of V T = V T<br />

1 spans R(AT ) = R = X . The nullspace of A is<br />

1 Thus during the exam I will not answer questions about questions asking for clarifications to to terms<br />

or concepts that you are expected to already know.<br />

2 Note that these matrices have been hand crafted to be amenable to this approach. Generally numerical<br />

techniques must be used to obtain the SVD.<br />

1


trivial. With the SVD in hand, it is readily determined that σ1 = √ 2 > 0 and<br />

A + = 1 ( )<br />

1 0 1 ;<br />

2<br />

PR(A) = 1<br />

⎛ ⎞<br />

1 0 1<br />

⎝0<br />

0 0⎠<br />

;<br />

2<br />

1 0 1<br />

⎛<br />

P N (A T ) = 1<br />

2<br />

P R(A T ) = 1 ;<br />

PN (A) = 0 .<br />

⎝<br />

1 0<br />

⎞<br />

−1<br />

0 2 0 ⎠ ;<br />

−1 0 1<br />

Note that because A has full column rank we can also compute A + directly as<br />

A + = (A T A) −1 A T .<br />

(b) We first determine that m = 3, n = 2, r = 2, ν = 0, and µ = 1. Again, once the<br />

dimensions of the various subspaces are known, we can systematically work the<br />

matrix A into the appropriate factorization,<br />

⎛ ⎞ ⎛ 1 ⎞<br />

⎛<br />

1 0 √ 1 ⎞<br />

0 (√ ) √ 0 (√ ) ( )<br />

2 2<br />

A = ⎝0<br />

1⎠<br />

= ⎝ 0 1⎠<br />

2 0<br />

= ⎝ 0 1⎠<br />

2 0 1 0<br />

= U1SV<br />

1 0 1<br />

1 0 √ 1 0 1 0 1<br />

0<br />

√ 0<br />

2 2 T<br />

1<br />

⎛ 1√ 0 −<br />

2<br />

= ⎝<br />

1 ⎞ ⎛√<br />

⎞<br />

√<br />

2 2 0 ( )<br />

0 1 0 ⎠ ⎝ 0 1⎠<br />

1 0<br />

=<br />

1√ 1 0 1<br />

0 √2 0 0<br />

2 ( )<br />

U1 U2<br />

( ) (<br />

T S 0 V1 0 0 V T<br />

)<br />

= UΣV<br />

2<br />

T .<br />

All the columns of U provide an orthonormal basis for Y = R3 . The two columns<br />

of U1 gives an orthonormal basis for R(A) and the column of U2 spans N (AT ).<br />

Both rows of V T = V T<br />

1 provide an orthonormal basis for R(AT ) = R2 = X . The<br />

nullspace of A is trivial. It now can be readily determined that σ1 = √ 2<br />

2 > σ2 =<br />

1 > 0 and<br />

A + ( )<br />

1 1 0<br />

= 2 2 ;<br />

0 1 0<br />

⎛ ⎞<br />

1 1 0 2 2<br />

PR(A) = ⎝0<br />

1 0⎠<br />

;<br />

1<br />

2<br />

1<br />

0 1<br />

2<br />

PN (AT ) =<br />

⎛<br />

2<br />

⎝<br />

0 − 1<br />

0<br />

−<br />

0<br />

2<br />

0<br />

1<br />

2 0 ⎞<br />

⎠ ;<br />

1<br />

2<br />

PR(A) = I , PN (A) = 0 .<br />

2


Note that because A has full column rank we can also compute A + directly as<br />

A + = (A T A) −1 A T .<br />

(c) Here m = n = 3 while r = 1. For this case, the matrix A is rank–deficient (i.e., is<br />

neither one–to–one nor onto) and hence cannot be constructed using an analytic<br />

formula as can be done for the full–rank cases. We also have ν = µ = 1. The<br />

matrix A factors as<br />

A =<br />

=<br />

⎛<br />

1<br />

⎝0<br />

0<br />

1<br />

⎞ ⎛<br />

1 √1 2<br />

0⎠<br />

= ⎝ 0<br />

⎞<br />

0<br />

1⎠<br />

1<br />

⎛<br />

⎝<br />

0 1<br />

⎞<br />

(<br />

⎠<br />

2<br />

0<br />

√1 0<br />

2<br />

) (<br />

0 √2 1<br />

1 0<br />

0<br />

1<br />

√2 1 )<br />

0<br />

√1 2<br />

0<br />

0 1<br />

√1 2<br />

0<br />

(√ √ )<br />

2 0 2<br />

=<br />

0 1 0<br />

= U1SV T<br />

1<br />

=<br />

⎛<br />

√1 2<br />

⎝<br />

0 − 1<br />

=<br />

0<br />

√1 2<br />

⎛<br />

√1 2<br />

⎝<br />

1<br />

0<br />

0<br />

⎞ ⎛<br />

√<br />

2 2<br />

0 ⎠ ⎝0<br />

√2 1 0<br />

−<br />

⎞<br />

0 (<br />

√2 1<br />

1⎠<br />

0<br />

0 1<br />

0<br />

√2 1 )<br />

0<br />

1<br />

0 1<br />

⎞ ⎛<br />

√<br />

2 2<br />

0 ⎠ ⎝0<br />

⎞ ⎛<br />

0 0 √1 2<br />

1 0⎠<br />

⎝ 0<br />

0 √2 1<br />

1 0<br />

√1 2<br />

1 0 √2 0 0 0 − 1<br />

⎞<br />

⎠<br />

=<br />

√<br />

2<br />

0 √2 1<br />

( U1<br />

)<br />

U2<br />

( S<br />

0<br />

) ( )<br />

T 0 V<br />

= UΣV<br />

0<br />

T .<br />

1<br />

V T<br />

2<br />

⎛<br />

⎝<br />

√1 2<br />

0<br />

0 1<br />

√1 2<br />

0<br />

⎞<br />

⎠<br />

(√<br />

2<br />

) (<br />

0 1 0<br />

)<br />

1<br />

0 1 0 1 0<br />

All the columns of U provide an orthonormal basis for Y = R3 . The two columns<br />

of U1 gives an orthonormal basis for R(A) and the column of U2 spans N (AT ).<br />

All the rows of V T provide an orthnormal basis for X = R3 . The two rows of V T<br />

1<br />

provide an orthonormal basis for R(AT ). The nullspace of A is spanned by the<br />

row of V T<br />

2 . It can now be readily determined that σ1 = 2 > σ2 = 1 > 0 and<br />

⎛ ⎞<br />

1 1 0 2 2<br />

PR(A) = ⎝0<br />

1 0⎠<br />

;<br />

P N (A T ) =<br />

P R(A T ) =<br />

PN (A) =<br />

3<br />

1<br />

2<br />

1<br />

0 1<br />

2<br />

⎛<br />

2<br />

⎝<br />

0 − 1<br />

0 0<br />

2<br />

0<br />

− 1<br />

2 0 ⎞<br />

⎠ ;<br />

1<br />

2<br />

⎛ ⎞<br />

1 1 0 2 2<br />

⎝0<br />

1 0⎠<br />

;<br />

1<br />

2<br />

1<br />

0 1<br />

2<br />

⎛<br />

2<br />

⎝<br />

0 − 1<br />

0 0<br />

2<br />

0<br />

− 1<br />

2 0 ⎞<br />

⎠ .<br />

1<br />

2


Note that because A is symmetric it just happens to be the case that PR(A) =<br />

P R(A T ) and PN (A) = P N (A T ), 3 however this is not true for a general nonsymmetric<br />

matrix A.<br />

(d) Here m = 2, n = 3, and r = 1. Thus A is rank–deficient and non–square (and<br />

definitely nonsymmetric).<br />

( ) ( )<br />

1 1 1 √5 1 (√5 √ √ )<br />

A =<br />

= 2 5 5<br />

2 2 2 √<br />

5<br />

( )<br />

√5 1<br />

= ( √ 5) ( 1 1 1 )<br />

=<br />

=<br />

=<br />

2<br />

√ 5<br />

(<br />

√5 1<br />

2<br />

√ 5<br />

)<br />

( 1<br />

√5 − 2<br />

2<br />

√ 5<br />

( √ (<br />

1√3<br />

15)<br />

√ 5<br />

1<br />

√ 5<br />

( 1<br />

√5 − 2<br />

2<br />

√ 5<br />

= UΣV T .<br />

√ 5<br />

1<br />

√ 5<br />

) (√15<br />

0<br />

1<br />

√ 3<br />

) ( 1<br />

√3<br />

) (√15 0 0<br />

0 0 0<br />

)<br />

√1 = U1SV 3<br />

T<br />

1<br />

1<br />

√ 3<br />

) ⎛<br />

⎜<br />

⎝<br />

)<br />

√1 3<br />

√1 √1 3 3<br />

√1 2<br />

√1 −<br />

6 2 √<br />

6<br />

1<br />

√ 3<br />

0 − 1<br />

√<br />

2<br />

√1 6<br />

Both columns of U provide an orthonormal basis for Y = R2 . The column of<br />

U1 spans R(A) and the column of U2 spans N (AT ). All the rows of V T provide<br />

an orthonormal basis for X = R3 . The row of V T<br />

1 spans R(AT ). The nullspace<br />

of A is spanned by the two rows of V T<br />

2 . It can now be readily determined that<br />

3 The special properties PR(A) = P R(A ∗ ) and P N (A) = P N (A ∗ ) hold for any self-adjoint matrix A,<br />

A = A ∗ .<br />

4<br />

⎞<br />

⎟<br />


σ1 = √ 15 > 0 and<br />

2. The MLE, x MLE, is determined as<br />

where<br />

x MLE = arg min {− ln p(y |x)} = arg min<br />

x x<br />

y =<br />

( y1<br />

y2<br />

A + = 1<br />

⎛ ⎞<br />

1 2<br />

⎝1<br />

2⎠<br />

;<br />

15<br />

1 2<br />

PR(A) = 1<br />

( )<br />

1 2<br />

;<br />

5 2 4<br />

PN (AT ) = 1<br />

( )<br />

4 −2<br />

;<br />

5 −2 1<br />

PR(AT ) = 1<br />

⎛ ⎞<br />

1 1 1<br />

⎝1<br />

1 1⎠<br />

;<br />

3<br />

1 1 1<br />

PN (A) = 1<br />

⎛<br />

⎞<br />

2 −1 −1<br />

⎝−1<br />

2 −1⎠<br />

.<br />

3<br />

−1 −1 2<br />

)<br />

, A =<br />

2∑<br />

i=1<br />

We have that the adjoint operator is given by<br />

(<br />

1<br />

so that,<br />

A ∗ = A T W =<br />

1<br />

σ 2 i<br />

(yi − x) 2 = arg min<br />

x ∥y − Ax∥ 2 W ,<br />

( ) (<br />

1<br />

1<br />

σ<br />

, W =<br />

1<br />

2 0<br />

1<br />

1 0 σ2 2<br />

A ∗ A = 1<br />

σ 2 1<br />

σ 2 1<br />

+ 1<br />

σ2 ,<br />

2<br />

, 1<br />

σ2 )<br />

,<br />

2<br />

which is invertible (as we already knew since A has full column rank). Continuing, we<br />

obtain,<br />

and<br />

Thus,<br />

A + = (A ∗ A) −1 A ∗ =<br />

x MLE = A + y = σ2 2<br />

σ 2 1 + σ 2 2<br />

(A ∗ A) −1 = σ2 1σ2 2<br />

σ2 1 + σ2 ,<br />

2<br />

( σ 2 2<br />

σ 2 1 + σ 2 2<br />

,<br />

)<br />

σ2 1<br />

σ2 1 + σ2 )<br />

.<br />

2<br />

y1 + σ2 1<br />

σ2 1 + σ2 y2 = α1 y1 + α2 y2 , .<br />

2<br />

5<br />

.


where<br />

0 ≤ αi = σ2 i<br />

σ2 1 + σ2 ≤ 1 , for i = 1, 2 ,<br />

2<br />

and α1 + α2 = 1. This shows that the MLE is a so–called convex combination(i.e., a<br />

normalized weighted average) of the measurements y1 and y2.<br />

(a) In the limit that σ1 → 0, we see from the general solution derived above that<br />

x MLE → y1. This makes sense as σ1 → 0 corresponds to y1 becoming a perfect,<br />

non–noisy measurement of the unknown quantity x. In the limit that σ1 → ∞, the<br />

general solution shows that x MLE → y2. This makes sense as σ1 → ∞ corresponds<br />

to the first measurement becoming so noisy that it is worthless relative to the<br />

second measurement.<br />

(b) In the case that σ1 = σ2 = σ we have,<br />

x MLE = arg min ∥y − Ax∥<br />

x 2 W = arg min<br />

x<br />

1<br />

σ 2 ∥y − Ax∥2 = arg min<br />

x ∥y − Ax∥ 2 ,<br />

the last expression being an unweighted least–squares problem. Using the condition<br />

σ1 = σ2 = σ, the general solution derived above becomes,<br />

x MLE = 1<br />

2 (y1 + y2) .<br />

This solution makes sense because the condition σ1 = σ2 = σ means that we<br />

have no rational reason to prefer one measurement over the other. The errors<br />

in both sets of measurements are additive, independent, zero mean, identically<br />

symmetrically-distributed measurement errors, and this symmetric condition<br />

yields a linear estimator which is the symmetric sample–mean solution. 4<br />

3. Note that E {yi} = αti.<br />

(a) The least–squares problem to be solved is<br />

where<br />

y =<br />

The least–squares solution is<br />

⎛<br />

⎜<br />

⎝<br />

y1<br />

.<br />

ym<br />

min<br />

α ∥y − Aα∥ 2 ,<br />

⎞<br />

⎛<br />

⎟<br />

⎜<br />

⎠ and A = ⎝<br />

α(m) = A + y = (A T A) −1 A T y =<br />

t1<br />

.<br />

tm<br />

⎞<br />

⎟<br />

⎠ .<br />

m∑<br />

tiyi<br />

i=1<br />

m∑<br />

t<br />

i=1<br />

2 i<br />

4 In advanced courses we would say that “the solution to the optimal linear estimator is obvious from<br />

symmetry.”<br />

6<br />

.


(b)<br />

(c)<br />

E {α(m)} =<br />

m∑<br />

tiE {yi}<br />

i=1<br />

m∑<br />

= α<br />

t<br />

i=1<br />

2 i<br />

Var {α(m)} = σ 2 (A T A) −1 = σ2<br />

m∑<br />

t<br />

i=1<br />

2 i<br />

m∑<br />

t<br />

i=1<br />

2 i<br />

m∑<br />

t<br />

i=1<br />

2 i<br />

= α .<br />

→ 0 as m → ∞ .<br />

4. (a) Note that for A one–to–one, A + A = (A T A) −1 A T A = I. Let e be the vector of all<br />

ones, e = ( 1 · · · 1 ) T . Note that<br />

We have that<br />

Thus<br />

(b) We have<br />

E {n} = b where b = βe .<br />

x = x − x = A + y − x = A + (Ax + n) − x = x + A + n − x = A + n .<br />

E {x} = A + E {n} = A + b ̸= 0 .<br />

y = Ax + n = Ax + b + (n − b) = Ax + b + ν<br />

where ν = n − b = n − βe has zero mean. Continuing,<br />

y = ( A e ) ( )<br />

x<br />

+ ν = Ax + ν ,<br />

β<br />

where<br />

A = ( A e )<br />

Assuming that A is one–to–one, we have<br />

and<br />

Since ν has zero mean, x is unbiased.<br />

and x =<br />

( )<br />

x<br />

.<br />

β<br />

x = A + y = (A T A) −1 A T y<br />

x = A + ν .<br />

5. Vocabulary. The relevant definitions can be found in your class lecture notes and/or<br />

in the lecture supplement handouts.<br />

7


6. Scalar Generalized Gradient Descent Algorithms.<br />

Note that in the scalar case the loss function is<br />

ℓ(x) = 1<br />

2 (y − h(x))2 .<br />

Also the gradient of ℓ(x) is just the derivative of ℓ(x) with respect to x,<br />

ℓ ′ (x) = ∇ℓ(x) = d<br />

dx ℓ(x) = −h′ (x) (y − h(x)) ,<br />

where h ′ (x) = d h(x). Note also that the second derivative of the loss function is<br />

dx<br />

(a) Gradient Descent Method.<br />

ℓ ′′ (x) = h ′2 (x) − h ′′ (x) (y − h(x)) .<br />

xk+1 = xk − αkℓ ′ (xk) = xk + αkh ′ (xk) (y − h(xk)) ,<br />

where the step size αk > 0 is used for convergence control. It is evident that<br />

Qk = 1 for all k.<br />

(b) Gauss’ Method (Gauss–Newton Method).<br />

To find a correction to the current estimate x, we linearize the nonlinear inverse<br />

problem y = h(x) about the point x,<br />

y ≈ h(x) + h ′ (x)(x − x) = h(x) + h ′ (x)∆x ,<br />

where ∆x = x − x, and find a solution which is optimal in the least–squares sense<br />

for this linearized problem. Note that the loss function for this linearized problem<br />

is<br />

ℓ gauss(x) = 1<br />

2 (y − [h(x) + h′ (x) (x − x)]) 2 = 1<br />

2 (∆y − h′ (x)∆x) 2 = ℓ gauss(∆x) ,<br />

where ∆y = y − h(x). Also note that because<br />

x = x + ∆x ,<br />

where x is given and fixed, it is evident that optimizing over x is equivalent to<br />

optimizing over ∆. An equivalent problem, then, is to find the correction ∆x<br />

which is optimal in the least–squares sense by minimizing ℓ gauss(∆x) with respect<br />

to ∆x.<br />

The solution is<br />

∆x = 1<br />

h ′ (x)<br />

∆y = 1<br />

h ′ (x)<br />

8<br />

(y − h(x)) ,


assuming that h ′ (x) ̸= 0. Incorporating a step size α > 0 for convergence control,<br />

yields<br />

1<br />

∆x = α(x)<br />

h ′ (x) (y − h(x)) = α(x) Q(x)h′ (x) (y − h(x)) ,<br />

where<br />

This yields the iterative algorithm,<br />

where<br />

Q(x) = 1<br />

h ′2 (x) .<br />

1<br />

xk+1 = xk + αk<br />

h ′ (xk) (y − h(xk)) = xk + αkQkh ′ (xk) (y − h(xk))<br />

(c) Newton’s Method.<br />

Qk =<br />

1<br />

h ′2 (xk) .<br />

To find a correction to the current estimate x we expand the loss function ℓ(x)<br />

about x to second order and then minimize this quadratic approximation to ℓ(x).<br />

With ∆x = x − x, the quadratic approximation is<br />

ℓ(x) ≈ ℓ quad(x) = ℓ(x) + ℓ ′ (x)∆x + 1<br />

2 ℓ′′ (x)(∆x) 2 = ℓ quad(∆x) .<br />

Note that ℓ quad(x) = ℓ quad(∆x) can be equivalent minimized either with respect to<br />

x or with respect to ∆x. If we take the derivative of ℓ quad(∆x) with respect ∆x<br />

and set it equal to zero we get<br />

∆x = − ℓ′ (x)<br />

ℓ ′′ (x) =<br />

h ′ (x) (y − h(x))<br />

h ′2 (x) − h ′′ . (1)<br />

(x) (y − h(x))<br />

If we multiply the correction ∆x by a step size α > 0 in order to control convergence<br />

we obtain the iterative algorithm<br />

where<br />

xk+1 = xk + αkQkh ′ (xk) (y − h(xk)) ,<br />

Qk =<br />

1<br />

h ′2 (x) − h ′′ (x) (y − h(x)) .<br />

Comparing the values of Qk for the Gauss method and the Newton method,<br />

we see that when y − h(xk) ≈ 0 the two methods become essentially<br />

equivalent. 5 Also note that when h ′′ ≡ 0 the two methods become exactly<br />

equivalent.<br />

5 This fact also holds for the more general vector case. For example, because this latter situation holds for<br />

the GPS computer assignment, the Gauss-Newton method used in that assignment is essentially equivalent<br />

to the Newton method and therefore has the very fast convergence rate associated with the Newton method.<br />

9


7. Simple Nonlinear Inverse Problem Example.<br />

Let y denote the known number for which you wish to determine the cube root. Let<br />

x denote the unknown cube root of y. We need a relationship y = h(x), which is<br />

obviously given by h(x) = x 3 . This relationship defines an inverse problem that we<br />

wish to solve. (I.e., given y = h(x) = x 3 determine x.) The relevant derivatives needed<br />

to implement the descent algorithms are<br />

h ′ (x) = 3x 2<br />

10<br />

and h ′′ (x) = 6x .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!