ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab
ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab
ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>ECE</strong> <strong>174</strong> <strong>Homework</strong> # 5 <strong>Solutions</strong> <strong>Spring</strong> <strong>2010</strong><br />
FINAL EXAM: Thursday, June 10, 7-10pm, in regular classroom.<br />
The final exam is closed notes and book. Bring only a pen (no pencils). I can ask<br />
vocabulary questions on the final exam and expect you to know technical terms used in the<br />
problem statements. 1 The best way to study for the exam is to understand concepts and<br />
study all of the homework solutions, lecture notes, projects, and lecture supplements.<br />
<strong>Solutions</strong><br />
1. Note that a singular value decomposition (SVD) is not fully unique. However, the singular<br />
values are unique. Additional unique quantities are the projection matrices onto<br />
the four fundamental subspaces and the pseudoinverse, all of which can be computed<br />
from knowledge of the SVD.<br />
(a) We first determine that m = 3, n = 1, r = 1, ν = 0, and µ = 2. Once the<br />
dimensions of the various subspaces are known, we can systematically work the<br />
matrix A into the appropriate factorization, 2<br />
A =<br />
=<br />
⎛ ⎞ ⎛ 1 ⎞<br />
1 √<br />
2<br />
⎝0⎠<br />
= ⎝ 0 ⎠ (<br />
1 1 √<br />
2<br />
√ 2)(1) = U1SV T<br />
1<br />
⎛<br />
√1 0 −<br />
2<br />
⎝<br />
1 ⎞ ⎛√<br />
⎞<br />
√<br />
2 2<br />
0 1 0 ⎠ ⎝ 0 ⎠ (1) =<br />
√1 1 0 √2 0<br />
2 ( )<br />
U1 U2<br />
( S<br />
0<br />
)<br />
(V )<br />
T<br />
T<br />
1 = UΣV .<br />
All the columns of U provide an orthonormal basis for Y = R3 . The column of U1<br />
spans R(A) and the two columns of U2 provide an orthonormal basis for N (AT ).<br />
The single element of V T = V T<br />
1 spans R(AT ) = R = X . The nullspace of A is<br />
1 Thus during the exam I will not answer questions about questions asking for clarifications to to terms<br />
or concepts that you are expected to already know.<br />
2 Note that these matrices have been hand crafted to be amenable to this approach. Generally numerical<br />
techniques must be used to obtain the SVD.<br />
1
trivial. With the SVD in hand, it is readily determined that σ1 = √ 2 > 0 and<br />
A + = 1 ( )<br />
1 0 1 ;<br />
2<br />
PR(A) = 1<br />
⎛ ⎞<br />
1 0 1<br />
⎝0<br />
0 0⎠<br />
;<br />
2<br />
1 0 1<br />
⎛<br />
P N (A T ) = 1<br />
2<br />
P R(A T ) = 1 ;<br />
PN (A) = 0 .<br />
⎝<br />
1 0<br />
⎞<br />
−1<br />
0 2 0 ⎠ ;<br />
−1 0 1<br />
Note that because A has full column rank we can also compute A + directly as<br />
A + = (A T A) −1 A T .<br />
(b) We first determine that m = 3, n = 2, r = 2, ν = 0, and µ = 1. Again, once the<br />
dimensions of the various subspaces are known, we can systematically work the<br />
matrix A into the appropriate factorization,<br />
⎛ ⎞ ⎛ 1 ⎞<br />
⎛<br />
1 0 √ 1 ⎞<br />
0 (√ ) √ 0 (√ ) ( )<br />
2 2<br />
A = ⎝0<br />
1⎠<br />
= ⎝ 0 1⎠<br />
2 0<br />
= ⎝ 0 1⎠<br />
2 0 1 0<br />
= U1SV<br />
1 0 1<br />
1 0 √ 1 0 1 0 1<br />
0<br />
√ 0<br />
2 2 T<br />
1<br />
⎛ 1√ 0 −<br />
2<br />
= ⎝<br />
1 ⎞ ⎛√<br />
⎞<br />
√<br />
2 2 0 ( )<br />
0 1 0 ⎠ ⎝ 0 1⎠<br />
1 0<br />
=<br />
1√ 1 0 1<br />
0 √2 0 0<br />
2 ( )<br />
U1 U2<br />
( ) (<br />
T S 0 V1 0 0 V T<br />
)<br />
= UΣV<br />
2<br />
T .<br />
All the columns of U provide an orthonormal basis for Y = R3 . The two columns<br />
of U1 gives an orthonormal basis for R(A) and the column of U2 spans N (AT ).<br />
Both rows of V T = V T<br />
1 provide an orthonormal basis for R(AT ) = R2 = X . The<br />
nullspace of A is trivial. It now can be readily determined that σ1 = √ 2<br />
2 > σ2 =<br />
1 > 0 and<br />
A + ( )<br />
1 1 0<br />
= 2 2 ;<br />
0 1 0<br />
⎛ ⎞<br />
1 1 0 2 2<br />
PR(A) = ⎝0<br />
1 0⎠<br />
;<br />
1<br />
2<br />
1<br />
0 1<br />
2<br />
PN (AT ) =<br />
⎛<br />
2<br />
⎝<br />
0 − 1<br />
0<br />
−<br />
0<br />
2<br />
0<br />
1<br />
2 0 ⎞<br />
⎠ ;<br />
1<br />
2<br />
PR(A) = I , PN (A) = 0 .<br />
2
Note that because A has full column rank we can also compute A + directly as<br />
A + = (A T A) −1 A T .<br />
(c) Here m = n = 3 while r = 1. For this case, the matrix A is rank–deficient (i.e., is<br />
neither one–to–one nor onto) and hence cannot be constructed using an analytic<br />
formula as can be done for the full–rank cases. We also have ν = µ = 1. The<br />
matrix A factors as<br />
A =<br />
=<br />
⎛<br />
1<br />
⎝0<br />
0<br />
1<br />
⎞ ⎛<br />
1 √1 2<br />
0⎠<br />
= ⎝ 0<br />
⎞<br />
0<br />
1⎠<br />
1<br />
⎛<br />
⎝<br />
0 1<br />
⎞<br />
(<br />
⎠<br />
2<br />
0<br />
√1 0<br />
2<br />
) (<br />
0 √2 1<br />
1 0<br />
0<br />
1<br />
√2 1 )<br />
0<br />
√1 2<br />
0<br />
0 1<br />
√1 2<br />
0<br />
(√ √ )<br />
2 0 2<br />
=<br />
0 1 0<br />
= U1SV T<br />
1<br />
=<br />
⎛<br />
√1 2<br />
⎝<br />
0 − 1<br />
=<br />
0<br />
√1 2<br />
⎛<br />
√1 2<br />
⎝<br />
1<br />
0<br />
0<br />
⎞ ⎛<br />
√<br />
2 2<br />
0 ⎠ ⎝0<br />
√2 1 0<br />
−<br />
⎞<br />
0 (<br />
√2 1<br />
1⎠<br />
0<br />
0 1<br />
0<br />
√2 1 )<br />
0<br />
1<br />
0 1<br />
⎞ ⎛<br />
√<br />
2 2<br />
0 ⎠ ⎝0<br />
⎞ ⎛<br />
0 0 √1 2<br />
1 0⎠<br />
⎝ 0<br />
0 √2 1<br />
1 0<br />
√1 2<br />
1 0 √2 0 0 0 − 1<br />
⎞<br />
⎠<br />
=<br />
√<br />
2<br />
0 √2 1<br />
( U1<br />
)<br />
U2<br />
( S<br />
0<br />
) ( )<br />
T 0 V<br />
= UΣV<br />
0<br />
T .<br />
1<br />
V T<br />
2<br />
⎛<br />
⎝<br />
√1 2<br />
0<br />
0 1<br />
√1 2<br />
0<br />
⎞<br />
⎠<br />
(√<br />
2<br />
) (<br />
0 1 0<br />
)<br />
1<br />
0 1 0 1 0<br />
All the columns of U provide an orthonormal basis for Y = R3 . The two columns<br />
of U1 gives an orthonormal basis for R(A) and the column of U2 spans N (AT ).<br />
All the rows of V T provide an orthnormal basis for X = R3 . The two rows of V T<br />
1<br />
provide an orthonormal basis for R(AT ). The nullspace of A is spanned by the<br />
row of V T<br />
2 . It can now be readily determined that σ1 = 2 > σ2 = 1 > 0 and<br />
⎛ ⎞<br />
1 1 0 2 2<br />
PR(A) = ⎝0<br />
1 0⎠<br />
;<br />
P N (A T ) =<br />
P R(A T ) =<br />
PN (A) =<br />
3<br />
1<br />
2<br />
1<br />
0 1<br />
2<br />
⎛<br />
2<br />
⎝<br />
0 − 1<br />
0 0<br />
2<br />
0<br />
− 1<br />
2 0 ⎞<br />
⎠ ;<br />
1<br />
2<br />
⎛ ⎞<br />
1 1 0 2 2<br />
⎝0<br />
1 0⎠<br />
;<br />
1<br />
2<br />
1<br />
0 1<br />
2<br />
⎛<br />
2<br />
⎝<br />
0 − 1<br />
0 0<br />
2<br />
0<br />
− 1<br />
2 0 ⎞<br />
⎠ .<br />
1<br />
2
Note that because A is symmetric it just happens to be the case that PR(A) =<br />
P R(A T ) and PN (A) = P N (A T ), 3 however this is not true for a general nonsymmetric<br />
matrix A.<br />
(d) Here m = 2, n = 3, and r = 1. Thus A is rank–deficient and non–square (and<br />
definitely nonsymmetric).<br />
( ) ( )<br />
1 1 1 √5 1 (√5 √ √ )<br />
A =<br />
= 2 5 5<br />
2 2 2 √<br />
5<br />
( )<br />
√5 1<br />
= ( √ 5) ( 1 1 1 )<br />
=<br />
=<br />
=<br />
2<br />
√ 5<br />
(<br />
√5 1<br />
2<br />
√ 5<br />
)<br />
( 1<br />
√5 − 2<br />
2<br />
√ 5<br />
( √ (<br />
1√3<br />
15)<br />
√ 5<br />
1<br />
√ 5<br />
( 1<br />
√5 − 2<br />
2<br />
√ 5<br />
= UΣV T .<br />
√ 5<br />
1<br />
√ 5<br />
) (√15<br />
0<br />
1<br />
√ 3<br />
) ( 1<br />
√3<br />
) (√15 0 0<br />
0 0 0<br />
)<br />
√1 = U1SV 3<br />
T<br />
1<br />
1<br />
√ 3<br />
) ⎛<br />
⎜<br />
⎝<br />
)<br />
√1 3<br />
√1 √1 3 3<br />
√1 2<br />
√1 −<br />
6 2 √<br />
6<br />
1<br />
√ 3<br />
0 − 1<br />
√<br />
2<br />
√1 6<br />
Both columns of U provide an orthonormal basis for Y = R2 . The column of<br />
U1 spans R(A) and the column of U2 spans N (AT ). All the rows of V T provide<br />
an orthonormal basis for X = R3 . The row of V T<br />
1 spans R(AT ). The nullspace<br />
of A is spanned by the two rows of V T<br />
2 . It can now be readily determined that<br />
3 The special properties PR(A) = P R(A ∗ ) and P N (A) = P N (A ∗ ) hold for any self-adjoint matrix A,<br />
A = A ∗ .<br />
4<br />
⎞<br />
⎟<br />
⎠
σ1 = √ 15 > 0 and<br />
2. The MLE, x MLE, is determined as<br />
where<br />
x MLE = arg min {− ln p(y |x)} = arg min<br />
x x<br />
y =<br />
( y1<br />
y2<br />
A + = 1<br />
⎛ ⎞<br />
1 2<br />
⎝1<br />
2⎠<br />
;<br />
15<br />
1 2<br />
PR(A) = 1<br />
( )<br />
1 2<br />
;<br />
5 2 4<br />
PN (AT ) = 1<br />
( )<br />
4 −2<br />
;<br />
5 −2 1<br />
PR(AT ) = 1<br />
⎛ ⎞<br />
1 1 1<br />
⎝1<br />
1 1⎠<br />
;<br />
3<br />
1 1 1<br />
PN (A) = 1<br />
⎛<br />
⎞<br />
2 −1 −1<br />
⎝−1<br />
2 −1⎠<br />
.<br />
3<br />
−1 −1 2<br />
)<br />
, A =<br />
2∑<br />
i=1<br />
We have that the adjoint operator is given by<br />
(<br />
1<br />
so that,<br />
A ∗ = A T W =<br />
1<br />
σ 2 i<br />
(yi − x) 2 = arg min<br />
x ∥y − Ax∥ 2 W ,<br />
( ) (<br />
1<br />
1<br />
σ<br />
, W =<br />
1<br />
2 0<br />
1<br />
1 0 σ2 2<br />
A ∗ A = 1<br />
σ 2 1<br />
σ 2 1<br />
+ 1<br />
σ2 ,<br />
2<br />
, 1<br />
σ2 )<br />
,<br />
2<br />
which is invertible (as we already knew since A has full column rank). Continuing, we<br />
obtain,<br />
and<br />
Thus,<br />
A + = (A ∗ A) −1 A ∗ =<br />
x MLE = A + y = σ2 2<br />
σ 2 1 + σ 2 2<br />
(A ∗ A) −1 = σ2 1σ2 2<br />
σ2 1 + σ2 ,<br />
2<br />
( σ 2 2<br />
σ 2 1 + σ 2 2<br />
,<br />
)<br />
σ2 1<br />
σ2 1 + σ2 )<br />
.<br />
2<br />
y1 + σ2 1<br />
σ2 1 + σ2 y2 = α1 y1 + α2 y2 , .<br />
2<br />
5<br />
.
where<br />
0 ≤ αi = σ2 i<br />
σ2 1 + σ2 ≤ 1 , for i = 1, 2 ,<br />
2<br />
and α1 + α2 = 1. This shows that the MLE is a so–called convex combination(i.e., a<br />
normalized weighted average) of the measurements y1 and y2.<br />
(a) In the limit that σ1 → 0, we see from the general solution derived above that<br />
x MLE → y1. This makes sense as σ1 → 0 corresponds to y1 becoming a perfect,<br />
non–noisy measurement of the unknown quantity x. In the limit that σ1 → ∞, the<br />
general solution shows that x MLE → y2. This makes sense as σ1 → ∞ corresponds<br />
to the first measurement becoming so noisy that it is worthless relative to the<br />
second measurement.<br />
(b) In the case that σ1 = σ2 = σ we have,<br />
x MLE = arg min ∥y − Ax∥<br />
x 2 W = arg min<br />
x<br />
1<br />
σ 2 ∥y − Ax∥2 = arg min<br />
x ∥y − Ax∥ 2 ,<br />
the last expression being an unweighted least–squares problem. Using the condition<br />
σ1 = σ2 = σ, the general solution derived above becomes,<br />
x MLE = 1<br />
2 (y1 + y2) .<br />
This solution makes sense because the condition σ1 = σ2 = σ means that we<br />
have no rational reason to prefer one measurement over the other. The errors<br />
in both sets of measurements are additive, independent, zero mean, identically<br />
symmetrically-distributed measurement errors, and this symmetric condition<br />
yields a linear estimator which is the symmetric sample–mean solution. 4<br />
3. Note that E {yi} = αti.<br />
(a) The least–squares problem to be solved is<br />
where<br />
y =<br />
The least–squares solution is<br />
⎛<br />
⎜<br />
⎝<br />
y1<br />
.<br />
ym<br />
min<br />
α ∥y − Aα∥ 2 ,<br />
⎞<br />
⎛<br />
⎟<br />
⎜<br />
⎠ and A = ⎝<br />
α(m) = A + y = (A T A) −1 A T y =<br />
t1<br />
.<br />
tm<br />
⎞<br />
⎟<br />
⎠ .<br />
m∑<br />
tiyi<br />
i=1<br />
m∑<br />
t<br />
i=1<br />
2 i<br />
4 In advanced courses we would say that “the solution to the optimal linear estimator is obvious from<br />
symmetry.”<br />
6<br />
.
(b)<br />
(c)<br />
E {α(m)} =<br />
m∑<br />
tiE {yi}<br />
i=1<br />
m∑<br />
= α<br />
t<br />
i=1<br />
2 i<br />
Var {α(m)} = σ 2 (A T A) −1 = σ2<br />
m∑<br />
t<br />
i=1<br />
2 i<br />
m∑<br />
t<br />
i=1<br />
2 i<br />
m∑<br />
t<br />
i=1<br />
2 i<br />
= α .<br />
→ 0 as m → ∞ .<br />
4. (a) Note that for A one–to–one, A + A = (A T A) −1 A T A = I. Let e be the vector of all<br />
ones, e = ( 1 · · · 1 ) T . Note that<br />
We have that<br />
Thus<br />
(b) We have<br />
E {n} = b where b = βe .<br />
x = x − x = A + y − x = A + (Ax + n) − x = x + A + n − x = A + n .<br />
E {x} = A + E {n} = A + b ̸= 0 .<br />
y = Ax + n = Ax + b + (n − b) = Ax + b + ν<br />
where ν = n − b = n − βe has zero mean. Continuing,<br />
y = ( A e ) ( )<br />
x<br />
+ ν = Ax + ν ,<br />
β<br />
where<br />
A = ( A e )<br />
Assuming that A is one–to–one, we have<br />
and<br />
Since ν has zero mean, x is unbiased.<br />
and x =<br />
( )<br />
x<br />
.<br />
β<br />
x = A + y = (A T A) −1 A T y<br />
x = A + ν .<br />
5. Vocabulary. The relevant definitions can be found in your class lecture notes and/or<br />
in the lecture supplement handouts.<br />
7
6. Scalar Generalized Gradient Descent Algorithms.<br />
Note that in the scalar case the loss function is<br />
ℓ(x) = 1<br />
2 (y − h(x))2 .<br />
Also the gradient of ℓ(x) is just the derivative of ℓ(x) with respect to x,<br />
ℓ ′ (x) = ∇ℓ(x) = d<br />
dx ℓ(x) = −h′ (x) (y − h(x)) ,<br />
where h ′ (x) = d h(x). Note also that the second derivative of the loss function is<br />
dx<br />
(a) Gradient Descent Method.<br />
ℓ ′′ (x) = h ′2 (x) − h ′′ (x) (y − h(x)) .<br />
xk+1 = xk − αkℓ ′ (xk) = xk + αkh ′ (xk) (y − h(xk)) ,<br />
where the step size αk > 0 is used for convergence control. It is evident that<br />
Qk = 1 for all k.<br />
(b) Gauss’ Method (Gauss–Newton Method).<br />
To find a correction to the current estimate x, we linearize the nonlinear inverse<br />
problem y = h(x) about the point x,<br />
y ≈ h(x) + h ′ (x)(x − x) = h(x) + h ′ (x)∆x ,<br />
where ∆x = x − x, and find a solution which is optimal in the least–squares sense<br />
for this linearized problem. Note that the loss function for this linearized problem<br />
is<br />
ℓ gauss(x) = 1<br />
2 (y − [h(x) + h′ (x) (x − x)]) 2 = 1<br />
2 (∆y − h′ (x)∆x) 2 = ℓ gauss(∆x) ,<br />
where ∆y = y − h(x). Also note that because<br />
x = x + ∆x ,<br />
where x is given and fixed, it is evident that optimizing over x is equivalent to<br />
optimizing over ∆. An equivalent problem, then, is to find the correction ∆x<br />
which is optimal in the least–squares sense by minimizing ℓ gauss(∆x) with respect<br />
to ∆x.<br />
The solution is<br />
∆x = 1<br />
h ′ (x)<br />
∆y = 1<br />
h ′ (x)<br />
8<br />
(y − h(x)) ,
assuming that h ′ (x) ̸= 0. Incorporating a step size α > 0 for convergence control,<br />
yields<br />
1<br />
∆x = α(x)<br />
h ′ (x) (y − h(x)) = α(x) Q(x)h′ (x) (y − h(x)) ,<br />
where<br />
This yields the iterative algorithm,<br />
where<br />
Q(x) = 1<br />
h ′2 (x) .<br />
1<br />
xk+1 = xk + αk<br />
h ′ (xk) (y − h(xk)) = xk + αkQkh ′ (xk) (y − h(xk))<br />
(c) Newton’s Method.<br />
Qk =<br />
1<br />
h ′2 (xk) .<br />
To find a correction to the current estimate x we expand the loss function ℓ(x)<br />
about x to second order and then minimize this quadratic approximation to ℓ(x).<br />
With ∆x = x − x, the quadratic approximation is<br />
ℓ(x) ≈ ℓ quad(x) = ℓ(x) + ℓ ′ (x)∆x + 1<br />
2 ℓ′′ (x)(∆x) 2 = ℓ quad(∆x) .<br />
Note that ℓ quad(x) = ℓ quad(∆x) can be equivalent minimized either with respect to<br />
x or with respect to ∆x. If we take the derivative of ℓ quad(∆x) with respect ∆x<br />
and set it equal to zero we get<br />
∆x = − ℓ′ (x)<br />
ℓ ′′ (x) =<br />
h ′ (x) (y − h(x))<br />
h ′2 (x) − h ′′ . (1)<br />
(x) (y − h(x))<br />
If we multiply the correction ∆x by a step size α > 0 in order to control convergence<br />
we obtain the iterative algorithm<br />
where<br />
xk+1 = xk + αkQkh ′ (xk) (y − h(xk)) ,<br />
Qk =<br />
1<br />
h ′2 (x) − h ′′ (x) (y − h(x)) .<br />
Comparing the values of Qk for the Gauss method and the Newton method,<br />
we see that when y − h(xk) ≈ 0 the two methods become essentially<br />
equivalent. 5 Also note that when h ′′ ≡ 0 the two methods become exactly<br />
equivalent.<br />
5 This fact also holds for the more general vector case. For example, because this latter situation holds for<br />
the GPS computer assignment, the Gauss-Newton method used in that assignment is essentially equivalent<br />
to the Newton method and therefore has the very fast convergence rate associated with the Newton method.<br />
9
7. Simple Nonlinear Inverse Problem Example.<br />
Let y denote the known number for which you wish to determine the cube root. Let<br />
x denote the unknown cube root of y. We need a relationship y = h(x), which is<br />
obviously given by h(x) = x 3 . This relationship defines an inverse problem that we<br />
wish to solve. (I.e., given y = h(x) = x 3 determine x.) The relevant derivatives needed<br />
to implement the descent algorithms are<br />
h ′ (x) = 3x 2<br />
10<br />
and h ′′ (x) = 6x .