ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab

ECE 174 Homework # 5 Solutions Spring 2010 

FINAL EXAM: Thursday, June 10, 7-10pm, in regular classroom. 

The final exam is closed notes and book. Bring only a pen (no pencils). I can ask 

vocabulary questions on the final exam and expect you to know technical terms used in the 

problem statements. 1 The best way to study for the exam is to understand concepts and 

study all of the homework solutions, lecture notes, projects, and lecture supplements. 

Solutions 

1. Note that a singular value decomposition (SVD) is not fully unique. However, the singular 

values are unique. Additional unique quantities are the projection matrices onto 

the four fundamental subspaces and the pseudoinverse, all of which can be computed 

from knowledge of the SVD. 

(a) We first determine that m = 3, n = 1, r = 1, ν = 0, and µ = 2. Once the 

dimensions of the various subspaces are known, we can systematically work the 

matrix A into the appropriate factorization, 2 

A = 

= 

⎛ ⎞ ⎛ 1 ⎞ 

1 √ 

2 

⎝0⎠ 

= ⎝ 0 ⎠ ( 

1 1 √ 

2 

√ 2)(1) = U1SV T 

1 

⎛ 

√1 0 − 

2 

⎝ 

1 ⎞ ⎛√ 

⎞ 

√ 

2 2 

0 1 0 ⎠ ⎝ 0 ⎠ (1) = 

√1 1 0 √2 0 

2 ( ) 

U1 U2 

( S 

0 

) 

(V ) 

T 

T 

1 = UΣV . 

All the columns of U provide an orthonormal basis for Y = R3 . The column of U1 

spans R(A) and the two columns of U2 provide an orthonormal basis for N (AT ). 

The single element of V T = V T 

1 spans R(AT ) = R = X . The nullspace of A is 

1 Thus during the exam I will not answer questions about questions asking for clarifications to to terms 

or concepts that you are expected to already know. 

2 Note that these matrices have been hand crafted to be amenable to this approach. Generally numerical 

techniques must be used to obtain the SVD. 

1

trivial. With the SVD in hand, it is readily determined that σ1 = √ 2 > 0 and 

A + = 1 ( ) 

1 0 1 ; 

2 

PR(A) = 1 

⎛ ⎞ 

1 0 1 

⎝0 

0 0⎠ 

; 

2 

1 0 1 

⎛ 

P N (A T ) = 1 

2 

P R(A T ) = 1 ; 

PN (A) = 0 . 

⎝ 

1 0 

⎞ 

−1 

0 2 0 ⎠ ; 

−1 0 1 

Note that because A has full column rank we can also compute A + directly as 

A + = (A T A) −1 A T . 

(b) We first determine that m = 3, n = 2, r = 2, ν = 0, and µ = 1. Again, once the 

dimensions of the various subspaces are known, we can systematically work the 

matrix A into the appropriate factorization, 

⎛ ⎞ ⎛ 1 ⎞ 

⎛ 

1 0 √ 1 ⎞ 

0 (√ ) √ 0 (√ ) ( ) 

2 2 

A = ⎝0 

1⎠ 

= ⎝ 0 1⎠ 

2 0 

= ⎝ 0 1⎠ 

2 0 1 0 

= U1SV 

1 0 1 

1 0 √ 1 0 1 0 1 

0 

√ 0 

2 2 T 

1 

⎛ 1√ 0 − 

2 

= ⎝ 

1 ⎞ ⎛√ 

⎞ 

√ 

2 2 0 ( ) 

0 1 0 ⎠ ⎝ 0 1⎠ 

1 0 

= 

1√ 1 0 1 

0 √2 0 0 

2 ( ) 

U1 U2 

( ) ( 

T S 0 V1 0 0 V T 

) 

= UΣV 

2 

T . 

All the columns of U provide an orthonormal basis for Y = R3 . The two columns 

of U1 gives an orthonormal basis for R(A) and the column of U2 spans N (AT ). 

Both rows of V T = V T 

1 provide an orthonormal basis for R(AT ) = R2 = X . The 

nullspace of A is trivial. It now can be readily determined that σ1 = √ 2 

2 > σ2 = 

1 > 0 and 

A + ( ) 

1 1 0 

= 2 2 ; 

0 1 0 

⎛ ⎞ 

1 1 0 2 2 

PR(A) = ⎝0 

1 0⎠ 

; 

1 

2 

1 

0 1 

2 

PN (AT ) = 

⎛ 

2 

⎝ 

0 − 1 

0 

− 

0 

2 

0 

1 

2 0 ⎞ 

⎠ ; 

1 

2 

PR(A) = I , PN (A) = 0 . 

2

Note that because A has full column rank we can also compute A + directly as 

A + = (A T A) −1 A T . 

(c) Here m = n = 3 while r = 1. For this case, the matrix A is rank–deficient (i.e., is 

neither one–to–one nor onto) and hence cannot be constructed using an analytic 

formula as can be done for the full–rank cases. We also have ν = µ = 1. The 

matrix A factors as 

A = 

= 

⎛ 

1 

⎝0 

0 

1 

⎞ ⎛ 

1 √1 2 

0⎠ 

= ⎝ 0 

⎞ 

0 

1⎠ 

1 

⎛ 

⎝ 

0 1 

⎞ 

( 

⎠ 

2 

0 

√1 0 

2 

) ( 

0 √2 1 

1 0 

0 

1 

√2 1 ) 

0 

√1 2 

0 

0 1 

√1 2 

0 

(√ √ ) 

2 0 2 

= 

0 1 0 

= U1SV T 

1 

= 

⎛ 

√1 2 

⎝ 

0 − 1 

= 

0 

√1 2 

⎛ 

√1 2 

⎝ 

1 

0 

0 

⎞ ⎛ 

√ 

2 2 

0 ⎠ ⎝0 

√2 1 0 

− 

⎞ 

0 ( 

√2 1 

1⎠ 

0 

0 1 

0 

√2 1 ) 

0 

1 

0 1 

⎞ ⎛ 

√ 

2 2 

0 ⎠ ⎝0 

⎞ ⎛ 

0 0 √1 2 

1 0⎠ 

⎝ 0 

0 √2 1 

1 0 

√1 2 

1 0 √2 0 0 0 − 1 

⎞ 

⎠ 

= 

√ 

2 

0 √2 1 

( U1 

) 

U2 

( S 

0 

) ( ) 

T 0 V 

= UΣV 

0 

T . 

1 

V T 

2 

⎛ 

⎝ 

√1 2 

0 

0 1 

√1 2 

0 

⎞ 

⎠ 

(√ 

2 

) ( 

0 1 0 

) 

1 

0 1 0 1 0 

All the columns of U provide an orthonormal basis for Y = R3 . The two columns 

of U1 gives an orthonormal basis for R(A) and the column of U2 spans N (AT ). 

All the rows of V T provide an orthnormal basis for X = R3 . The two rows of V T 

1 

provide an orthonormal basis for R(AT ). The nullspace of A is spanned by the 

row of V T 

2 . It can now be readily determined that σ1 = 2 > σ2 = 1 > 0 and 

⎛ ⎞ 

1 1 0 2 2 

PR(A) = ⎝0 

1 0⎠ 

; 

P N (A T ) = 

P R(A T ) = 

PN (A) = 

3 

1 

2 

1 

0 1 

2 

⎛ 

2 

⎝ 

0 − 1 

0 0 

2 

0 

− 1 

2 0 ⎞ 

⎠ ; 

1 

2 

⎛ ⎞ 

1 1 0 2 2 

⎝0 

1 0⎠ 

; 

1 

2 

1 

0 1 

2 

⎛ 

2 

⎝ 

0 − 1 

0 0 

2 

0 

− 1 

2 0 ⎞ 

⎠ . 

1 

2

Note that because A is symmetric it just happens to be the case that PR(A) = 

P R(A T ) and PN (A) = P N (A T ), 3 however this is not true for a general nonsymmetric 

matrix A. 

(d) Here m = 2, n = 3, and r = 1. Thus A is rank–deficient and non–square (and 

definitely nonsymmetric). 

( ) ( ) 

1 1 1 √5 1 (√5 √ √ ) 

A = 

= 2 5 5 

2 2 2 √ 

5 

( ) 

√5 1 

= ( √ 5) ( 1 1 1 ) 

= 

= 

= 

2 

√ 5 

( 

√5 1 

2 

√ 5 

) 

( 1 

√5 − 2 

2 

√ 5 

( √ ( 

1√3 

15) 

√ 5 

1 

√ 5 

( 1 

√5 − 2 

2 

√ 5 

= UΣV T . 

√ 5 

1 

√ 5 

) (√15 

0 

1 

√ 3 

) ( 1 

√3 

) (√15 0 0 

0 0 0 

) 

√1 = U1SV 3 

T 

1 

1 

√ 3 

) ⎛ 

⎜ 

⎝ 

) 

√1 3 

√1 √1 3 3 

√1 2 

√1 − 

6 2 √ 

6 

1 

√ 3 

0 − 1 

√ 

2 

√1 6 

Both columns of U provide an orthonormal basis for Y = R2 . The column of 

U1 spans R(A) and the column of U2 spans N (AT ). All the rows of V T provide 

an orthonormal basis for X = R3 . The row of V T 

1 spans R(AT ). The nullspace 

of A is spanned by the two rows of V T 

2 . It can now be readily determined that 

3 The special properties PR(A) = P R(A ∗ ) and P N (A) = P N (A ∗ ) hold for any self-adjoint matrix A, 

A = A ∗ . 

4 

⎞ 

⎟ 

⎠

σ1 = √ 15 > 0 and 

2. The MLE, x MLE, is determined as 

where 

x MLE = arg min {− ln p(y |x)} = arg min 

x x 

y = 

( y1 

y2 

A + = 1 

⎛ ⎞ 

1 2 

⎝1 

2⎠ 

; 

15 

1 2 

PR(A) = 1 

( ) 

1 2 

; 

5 2 4 

PN (AT ) = 1 

( ) 

4 −2 

; 

5 −2 1 

PR(AT ) = 1 

⎛ ⎞ 

1 1 1 

⎝1 

1 1⎠ 

; 

3 

1 1 1 

PN (A) = 1 

⎛ 

⎞ 

2 −1 −1 

⎝−1 

2 −1⎠ 

. 

3 

−1 −1 2 

) 

, A = 

2∑ 

i=1 

We have that the adjoint operator is given by 

( 

1 

so that, 

A ∗ = A T W = 

1 

σ 2 i 

(yi − x) 2 = arg min 

x ∥y − Ax∥ 2 W , 

( ) ( 

1 

1 

σ 

, W = 

1 

2 0 

1 

1 0 σ2 2 

A ∗ A = 1 

σ 2 1 

σ 2 1 

+ 1 

σ2 , 

2 

, 1 

σ2 ) 

, 

2 

which is invertible (as we already knew since A has full column rank). Continuing, we 

obtain, 

and 

Thus, 

A + = (A ∗ A) −1 A ∗ = 

x MLE = A + y = σ2 2 

σ 2 1 + σ 2 2 

(A ∗ A) −1 = σ2 1σ2 2 

σ2 1 + σ2 , 

2 

( σ 2 2 

σ 2 1 + σ 2 2 

, 

) 

σ2 1 

σ2 1 + σ2 ) 

. 

2 

y1 + σ2 1 

σ2 1 + σ2 y2 = α1 y1 + α2 y2 , . 

2 

5 

.

where 

0 ≤ αi = σ2 i 

σ2 1 + σ2 ≤ 1 , for i = 1, 2 , 

2 

and α1 + α2 = 1. This shows that the MLE is a so–called convex combination(i.e., a 

normalized weighted average) of the measurements y1 and y2. 

(a) In the limit that σ1 → 0, we see from the general solution derived above that 

x MLE → y1. This makes sense as σ1 → 0 corresponds to y1 becoming a perfect, 

non–noisy measurement of the unknown quantity x. In the limit that σ1 → ∞, the 

general solution shows that x MLE → y2. This makes sense as σ1 → ∞ corresponds 

to the first measurement becoming so noisy that it is worthless relative to the 

second measurement. 

(b) In the case that σ1 = σ2 = σ we have, 

x MLE = arg min ∥y − Ax∥ 

x 2 W = arg min 

x 

1 

σ 2 ∥y − Ax∥2 = arg min 

x ∥y − Ax∥ 2 , 

the last expression being an unweighted least–squares problem. Using the condition 

σ1 = σ2 = σ, the general solution derived above becomes, 

x MLE = 1 

2 (y1 + y2) . 

This solution makes sense because the condition σ1 = σ2 = σ means that we 

have no rational reason to prefer one measurement over the other. The errors 

in both sets of measurements are additive, independent, zero mean, identically 

symmetrically-distributed measurement errors, and this symmetric condition 

yields a linear estimator which is the symmetric sample–mean solution. 4 

3. Note that E {yi} = αti. 

(a) The least–squares problem to be solved is 

where 

y = 

The least–squares solution is 

⎛ 

⎜ 

⎝ 

y1 

. 

ym 

min 

α ∥y − Aα∥ 2 , 

⎞ 

⎛ 

⎟ 

⎜ 

⎠ and A = ⎝ 

α(m) = A + y = (A T A) −1 A T y = 

t1 

. 

tm 

⎞ 

⎟ 

⎠ . 

m∑ 

tiyi 

i=1 

m∑ 

t 

i=1 

2 i 

4 In advanced courses we would say that “the solution to the optimal linear estimator is obvious from 

symmetry.” 

6 

.

(b) 

(c) 

E {α(m)} = 

m∑ 

tiE {yi} 

i=1 

m∑ 

= α 

t 

i=1 

2 i 

Var {α(m)} = σ 2 (A T A) −1 = σ2 

m∑ 

t 

i=1 

2 i 

m∑ 

t 

i=1 

2 i 

m∑ 

t 

i=1 

2 i 

= α . 

→ 0 as m → ∞ . 

4. (a) Note that for A one–to–one, A + A = (A T A) −1 A T A = I. Let e be the vector of all 

ones, e = ( 1 · · · 1 ) T . Note that 

We have that 

Thus 

(b) We have 

E {n} = b where b = βe . 

x = x − x = A + y − x = A + (Ax + n) − x = x + A + n − x = A + n . 

E {x} = A + E {n} = A + b ̸= 0 . 

y = Ax + n = Ax + b + (n − b) = Ax + b + ν 

where ν = n − b = n − βe has zero mean. Continuing, 

y = ( A e ) ( ) 

x 

+ ν = Ax + ν , 

β 

where 

A = ( A e ) 

Assuming that A is one–to–one, we have 

and 

Since ν has zero mean, x is unbiased. 

and x = 

( ) 

x 

. 

β 

x = A + y = (A T A) −1 A T y 

x = A + ν . 

5. Vocabulary. The relevant definitions can be found in your class lecture notes and/or 

in the lecture supplement handouts. 

7

6. Scalar Generalized Gradient Descent Algorithms. 

Note that in the scalar case the loss function is 

ℓ(x) = 1 

2 (y − h(x))2 . 

Also the gradient of ℓ(x) is just the derivative of ℓ(x) with respect to x, 

ℓ ′ (x) = ∇ℓ(x) = d 

dx ℓ(x) = −h′ (x) (y − h(x)) , 

where h ′ (x) = d h(x). Note also that the second derivative of the loss function is 

dx 

(a) Gradient Descent Method. 

ℓ ′′ (x) = h ′2 (x) − h ′′ (x) (y − h(x)) . 

xk+1 = xk − αkℓ ′ (xk) = xk + αkh ′ (xk) (y − h(xk)) , 

where the step size αk > 0 is used for convergence control. It is evident that 

Qk = 1 for all k. 

(b) Gauss’ Method (Gauss–Newton Method). 

To find a correction to the current estimate x, we linearize the nonlinear inverse 

problem y = h(x) about the point x, 

y ≈ h(x) + h ′ (x)(x − x) = h(x) + h ′ (x)∆x , 

where ∆x = x − x, and find a solution which is optimal in the least–squares sense 

for this linearized problem. Note that the loss function for this linearized problem 

is 

ℓ gauss(x) = 1 

2 (y − [h(x) + h′ (x) (x − x)]) 2 = 1 

2 (∆y − h′ (x)∆x) 2 = ℓ gauss(∆x) , 

where ∆y = y − h(x). Also note that because 

x = x + ∆x , 

where x is given and fixed, it is evident that optimizing over x is equivalent to 

optimizing over ∆. An equivalent problem, then, is to find the correction ∆x 

which is optimal in the least–squares sense by minimizing ℓ gauss(∆x) with respect 

to ∆x. 

The solution is 

∆x = 1 

h ′ (x) 

∆y = 1 

h ′ (x) 

8 

(y − h(x)) ,

assuming that h ′ (x) ̸= 0. Incorporating a step size α > 0 for convergence control, 

yields 

1 

∆x = α(x) 

h ′ (x) (y − h(x)) = α(x) Q(x)h′ (x) (y − h(x)) , 

where 

This yields the iterative algorithm, 

where 

Q(x) = 1 

h ′2 (x) . 

1 

xk+1 = xk + αk 

h ′ (xk) (y − h(xk)) = xk + αkQkh ′ (xk) (y − h(xk)) 

(c) Newton’s Method. 

Qk = 

1 

h ′2 (xk) . 

To find a correction to the current estimate x we expand the loss function ℓ(x) 

about x to second order and then minimize this quadratic approximation to ℓ(x). 

With ∆x = x − x, the quadratic approximation is 

ℓ(x) ≈ ℓ quad(x) = ℓ(x) + ℓ ′ (x)∆x + 1 

2 ℓ′′ (x)(∆x) 2 = ℓ quad(∆x) . 

Note that ℓ quad(x) = ℓ quad(∆x) can be equivalent minimized either with respect to 

x or with respect to ∆x. If we take the derivative of ℓ quad(∆x) with respect ∆x 

and set it equal to zero we get 

∆x = − ℓ′ (x) 

ℓ ′′ (x) = 

h ′ (x) (y − h(x)) 

h ′2 (x) − h ′′ . (1) 

(x) (y − h(x)) 

If we multiply the correction ∆x by a step size α > 0 in order to control convergence 

we obtain the iterative algorithm 

where 

xk+1 = xk + αkQkh ′ (xk) (y − h(xk)) , 

Qk = 

1 

h ′2 (x) − h ′′ (x) (y − h(x)) . 

Comparing the values of Qk for the Gauss method and the Newton method, 

we see that when y − h(xk) ≈ 0 the two methods become essentially 

equivalent. 5 Also note that when h ′′ ≡ 0 the two methods become exactly 

equivalent. 

5 This fact also holds for the more general vector case. For example, because this latter situation holds for 

the GPS computer assignment, the Gauss-Newton method used in that assignment is essentially equivalent 

to the Newton method and therefore has the very fast convergence rate associated with the Newton method. 

9

7. Simple Nonlinear Inverse Problem Example. 

Let y denote the known number for which you wish to determine the cube root. Let 

x denote the unknown cube root of y. We need a relationship y = h(x), which is 

obviously given by h(x) = x 3 . This relationship defines an inverse problem that we 

wish to solve. (I.e., given y = h(x) = x 3 determine x.) The relevant derivatives needed 

to implement the descent algorithms are 

h ′ (x) = 3x 2 

10 

and h ′′ (x) = 6x .

ECE 174 Homework # 5 Solutions Spring 2010 ... - UCSD DSP Lab

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?