05.12.2012 Views

On Subspace Methods in System Identification and Sensor ... - KTH

On Subspace Methods in System Identification and Sensor ... - KTH

On Subspace Methods in System Identification and Sensor ... - KTH

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>On</strong> <strong>Subspace</strong> <strong>Methods</strong> <strong>in</strong> <strong>System</strong><br />

<strong>Identification</strong> <strong>and</strong> <strong>Sensor</strong> Array Signal<br />

Process<strong>in</strong>g<br />

Magnus Jansson<br />

TRITA–S3–REG–9701<br />

ISSN 0347–1071<br />

Automatic Control<br />

Department of Signals, <strong>Sensor</strong>s <strong>and</strong> <strong>System</strong>s<br />

Royal Institute of Technology (<strong>KTH</strong>)<br />

Stockholm, Sweden<br />

Submitted to the School of Electrical Eng<strong>in</strong>eer<strong>in</strong>g, Royal Institute of<br />

Technology, <strong>in</strong> partial fulfillment of the requirements for the degree of<br />

Doctor of Philosophy.


Copyright c○1997 by Magnus Jansson<br />

Pr<strong>in</strong>ted by <strong>KTH</strong> Tryck & Kopier<strong>in</strong>g 1997


Abstract<br />

The common theme of the thesis is the derivation <strong>and</strong> analysis of subspace<br />

parameter estimation methods. In particular, this thesis is concerned<br />

with the identification of l<strong>in</strong>ear time <strong>in</strong>variant dynamical systems <strong>and</strong><br />

the estimation of the directions of arrival (DOAs) us<strong>in</strong>g sensor arrays.<br />

The first part focuses on sensor array signal process<strong>in</strong>g. The forwardbackward<br />

approach has proved quite useful for improv<strong>in</strong>g the performance<br />

of many suboptimal array process<strong>in</strong>g methods. Here<strong>in</strong>, it is shown<br />

that this approach should not be used with a statistically optimal method<br />

such as MODE or WSF.<br />

A robust weighted subspace fitt<strong>in</strong>g method for a wide class of array<br />

perturbation models is derived. The method takes the first order errors<br />

due to f<strong>in</strong>ite sample effects of the noise <strong>and</strong> the model perturbation <strong>in</strong>to<br />

account <strong>in</strong> an optimal way to give m<strong>in</strong>imum variance estimates of the<br />

DOAs. Interest<strong>in</strong>gly enough, the method reduces to the WSF (MODE)<br />

estimator if no model errors are present. <strong>On</strong> the other h<strong>and</strong>, when model<br />

errors dom<strong>in</strong>ate, the proposed method turns out to be equivalent to the<br />

“model-errors-only subspace fitt<strong>in</strong>g method”. Furthermore, an alternative<br />

method known as MAPprox, is also shown to yield m<strong>in</strong>imum variance<br />

estimates.<br />

A novel subspace-based algorithm for the estimation of the DOAs of<br />

uncorrelated emitter signals is derived. The asymptotic variance of the<br />

estimates is shown to co<strong>in</strong>cide with the relevant Cramér Rao lower bound.<br />

This implies that the method <strong>in</strong> certa<strong>in</strong> cases has a significantly better<br />

performance than methods that do not exploit the a priori <strong>in</strong>formation<br />

about the signal correlation. Unlike previous techniques, the estimator<br />

can be implemented by mak<strong>in</strong>g use of st<strong>and</strong>ard matrix operations only,<br />

if the array is uniform <strong>and</strong> l<strong>in</strong>ear.<br />

The second part of the thesis deals with the analysis <strong>and</strong> <strong>in</strong>terpretation<br />

of subspace methods for system identification. It is shown that sev-


iv<br />

eral exist<strong>in</strong>g subspace estimates result from a l<strong>in</strong>ear regression multistepahead<br />

prediction error m<strong>in</strong>imization under certa<strong>in</strong> rank constra<strong>in</strong>ts. Furthermore,<br />

it is found that the pole estimates obta<strong>in</strong>ed by two subspace<br />

system identification methods asymptotically are <strong>in</strong>variant to a rowweight<strong>in</strong>g<br />

<strong>in</strong> the cost functions. The consistency of a large class of subspace<br />

estimates is analyzed. Persistence of excitation conditions on the<br />

<strong>in</strong>put signal are given which guarantee consistency for systems with only<br />

measurement noise. For systems with process noise, an example where<br />

subspace methods fail to yield a consistent estimate of the transfer function<br />

is given. This failure occurs even though the <strong>in</strong>put is persistently<br />

excit<strong>in</strong>g. Such problems do not arise if the <strong>in</strong>put is a white noise process<br />

or an ARMA process of sufficiently low order.


Acknowledgments<br />

The years at the department of Signals, <strong>Sensor</strong>s & <strong>System</strong>s have been<br />

most enjoyable <strong>and</strong> stimulat<strong>in</strong>g. I would like to express my gratitude<br />

to my supervisor Professor Bo Wahlberg who conv<strong>in</strong>ced me to proceed<br />

the master studies towards the Ph.D. degree. <strong>On</strong>ce started, I have never<br />

regretted that choice. Thank you Bo for always be<strong>in</strong>g positive <strong>and</strong> very<br />

encourag<strong>in</strong>g.<br />

It has been a pleasure to go to the department (almost) every morn<strong>in</strong>g<br />

<strong>in</strong> the last five years. Thanks to all colleagues at the department, former<br />

<strong>and</strong> present, for be<strong>in</strong>g who you are. I have enjoyed our discussions at<br />

lunches <strong>and</strong> coffee breaks, play<strong>in</strong>g “<strong>in</strong>neb<strong>and</strong>y”, <strong>and</strong> ...<br />

To all friends I apologize for often be<strong>in</strong>g absent m<strong>in</strong>ded dur<strong>in</strong>g these<br />

years. I appreciate your support.<br />

Many thanks to my collaborators <strong>and</strong> co-authors, the Professors Björn<br />

Ottersten, Petre Stoica, Lee Sw<strong>in</strong>dlehurst, Bo Wahlberg, <strong>and</strong> Tekn. Lic.<br />

Bo Göransson. The <strong>in</strong>teraction has been very stimulat<strong>in</strong>g for me <strong>and</strong><br />

reward<strong>in</strong>g for my work.<br />

The f<strong>in</strong>ancial support that made this thesis project possible is gratefully<br />

acknowledged.<br />

F<strong>in</strong>ally, I am very grateful to my family for their cont<strong>in</strong>uous support<br />

<strong>and</strong> encouragement. I dedicate this thesis to my family.


Contents<br />

1 Introduction 1<br />

1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g . . . . . . . . . . . . . . . 1<br />

1.1.1 Data Model . . . . . . . . . . . . . . . . . . . . . . 2<br />

1.1.2 Estimation <strong>Methods</strong> . . . . . . . . . . . . . . . . . 6<br />

1.1.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 14<br />

1.2 <strong>System</strong> <strong>Identification</strong> . . . . . . . . . . . . . . . . . . . . . 17<br />

1.2.1 Background . . . . . . . . . . . . . . . . . . . . . . 17<br />

1.2.2 <strong>Subspace</strong> <strong>Methods</strong> . . . . . . . . . . . . . . . . . . 19<br />

1.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 24<br />

1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions . . . . . . . . . . . . . . . . . . 25<br />

1.4 Topics for Future Research . . . . . . . . . . . . . . . . . 29<br />

I <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 33<br />

2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

35<br />

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />

2.2 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />

2.3 Prelim<strong>in</strong>ary Results . . . . . . . . . . . . . . . . . . . . . 38<br />

2.3.1 Results on A <strong>and</strong> B . . . . . . . . . . . . . . . . . 41<br />

2.3.2 Results on R . . . . . . . . . . . . . . . . . . . . . 42<br />

2.3.3 Results on P <strong>and</strong> R . . . . . . . . . . . . . . . . . 43<br />

2.3.4 Results on R0 <strong>and</strong> R . . . . . . . . . . . . . . . . 44<br />

2.3.5 Results on ABC Estimation . . . . . . . . . . . . . 46<br />

2.4 Brief Review of F-MODE . . . . . . . . . . . . . . . . . . 47<br />

2.5 Derivation of FB-MODE . . . . . . . . . . . . . . . . . . . 48<br />

2.6 Analysis of FB-MODE . . . . . . . . . . . . . . . . . . . . 52


viii Contents<br />

2.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . 56<br />

2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

61<br />

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 62<br />

3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . 64<br />

3.3 Robust Estimation . . . . . . . . . . . . . . . . . . . . . . 66<br />

3.3.1 MAP . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

3.3.2 MAPprox . . . . . . . . . . . . . . . . . . . . . . . 68<br />

3.3.3 MAP-NSF . . . . . . . . . . . . . . . . . . . . . . . 68<br />

3.3.4 Cramér Rao Lower Bound . . . . . . . . . . . . . . 70<br />

3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g (GWSF) . . . . . 71<br />

3.5 Performance Analysis . . . . . . . . . . . . . . . . . . . . 74<br />

3.5.1 Consistency . . . . . . . . . . . . . . . . . . . . . . 74<br />

3.5.2 Asymptotic Distribution . . . . . . . . . . . . . . . 74<br />

3.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 76<br />

3.6.1 Implementation for General Arrays . . . . . . . . . 76<br />

3.6.2 Implementation for Uniform L<strong>in</strong>ear Arrays . . . . 79<br />

3.7 Performance Analysis of MAPprox . . . . . . . . . . . . . 83<br />

3.8 Simulation Examples . . . . . . . . . . . . . . . . . . . . . 87<br />

3.8.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . 88<br />

3.8.2 Example 2 . . . . . . . . . . . . . . . . . . . . . . . 90<br />

3.8.3 Example 3 . . . . . . . . . . . . . . . . . . . . . . . 91<br />

3.8.4 Example 4 . . . . . . . . . . . . . . . . . . . . . . . 91<br />

3.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

4 A Direction Estimator for Uncorrelated Emitters 95<br />

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

4.2 Data Model <strong>and</strong> Assumptions . . . . . . . . . . . . . . . . 97<br />

4.3 Estimat<strong>in</strong>g the Parameters . . . . . . . . . . . . . . . . . 97<br />

4.4 The CRB for Uncorrelated Sources . . . . . . . . . . . . . 100<br />

4.5 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . 101<br />

4.5.1 Consistency . . . . . . . . . . . . . . . . . . . . . . 101<br />

4.5.2 Asymptotic Distribution . . . . . . . . . . . . . . . 102<br />

4.5.3 Optimal Weight<strong>in</strong>g . . . . . . . . . . . . . . . . . . 105<br />

4.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . 106<br />

4.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . 109<br />

4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 112


Contents ix<br />

II <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> 115<br />

5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

117<br />

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 118<br />

5.2 Model <strong>and</strong> Assumptions . . . . . . . . . . . . . . . . . . . 119<br />

5.2.1 4SID Basic Equation . . . . . . . . . . . . . . . . . 120<br />

5.2.2 General Assumptions . . . . . . . . . . . . . . . . . 121<br />

5.3 L<strong>in</strong>ear Regression . . . . . . . . . . . . . . . . . . . . . . . 122<br />

5.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . 122<br />

5.3.2 4SID L<strong>in</strong>ear Regression Model . . . . . . . . . . . 124<br />

5.4 <strong>Subspace</strong> Estimation . . . . . . . . . . . . . . . . . . . . . 125<br />

5.4.1 Weighted Least Squares (WLS) . . . . . . . . . . . 128<br />

5.4.2 Approximate Maximum Likelihood (AML) . . . . 129<br />

5.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . 130<br />

5.5 Pole Estimation . . . . . . . . . . . . . . . . . . . . . . . . 132<br />

5.5.1 The Shift Invariance Method . . . . . . . . . . . . 132<br />

5.5.2 <strong>Subspace</strong> Fitt<strong>in</strong>g . . . . . . . . . . . . . . . . . . . 133<br />

5.6 Performance Analysis . . . . . . . . . . . . . . . . . . . . 134<br />

5.6.1 Consistency . . . . . . . . . . . . . . . . . . . . . . 134<br />

5.6.2 Analysis of the Shift Invariance Method . . . . . . 136<br />

5.6.3 Analysis of the <strong>Subspace</strong> Fitt<strong>in</strong>g Method . . . . . 138<br />

5.7 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 142<br />

5.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 147<br />

6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

151<br />

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 152<br />

6.2 Prelim<strong>in</strong>aries . . . . . . . . . . . . . . . . . . . . . . . . . 153<br />

6.3 Background . . . . . . . . . . . . . . . . . . . . . . . . . . 155<br />

6.3.1 Basic 4SID . . . . . . . . . . . . . . . . . . . . . . 156<br />

6.3.2 IV-4SID . . . . . . . . . . . . . . . . . . . . . . . . 157<br />

6.4 Analysis of Basic 4SID . . . . . . . . . . . . . . . . . . . . 158<br />

6.5 Analysis of IV-4SID . . . . . . . . . . . . . . . . . . . . . 162<br />

6.6 Counterexample . . . . . . . . . . . . . . . . . . . . . . . 169<br />

6.7 Numerical Examples . . . . . . . . . . . . . . . . . . . . . 173<br />

6.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 175


x Contents<br />

A Proofs <strong>and</strong> Derivations 177<br />

A.1 Proofs of (2.34) <strong>and</strong> (2.35) . . . . . . . . . . . . . . . . . . 177<br />

A.2 Asymptotic Covariance of the Residual . . . . . . . . . . . 179<br />

A.3 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . 183<br />

A.4 Some Properties of the Block Transpose . . . . . . . . . . 185<br />

A.5 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . . . . 186<br />

A.6 Rank of B(θ) . . . . . . . . . . . . . . . . . . . . . . . . . 190<br />

A.7 Derivation of the Asymptotic Distribution . . . . . . . . . 191<br />

A.8 Proof of Theorem 4.3 . . . . . . . . . . . . . . . . . . . . . 193<br />

A.9 Proof of Expression (5.22) . . . . . . . . . . . . . . . . . . 197<br />

A.10 Proof of Expression (5.27) . . . . . . . . . . . . . . . . . . 197<br />

A.11 Proof of Theorem 5.1 . . . . . . . . . . . . . . . . . . . . . 198<br />

A.12 Proof of Theorem 5.3 . . . . . . . . . . . . . . . . . . . . . 201<br />

A.13 Proof of Theorem 5.5 . . . . . . . . . . . . . . . . . . . . . 207<br />

A.14 Rank of Coefficient Matrix . . . . . . . . . . . . . . . . . . 208<br />

A.15 Rank of Sylvester Matrix . . . . . . . . . . . . . . . . . . 209<br />

B Notations 211<br />

C Abbreviations <strong>and</strong> Acronyms 215<br />

Bibliography 217


Chapter 1<br />

Introduction<br />

The subject of this thesis is so-called subspace parameter estimation<br />

methods. Two areas of application are considered. This implies that<br />

the thesis naturally divides <strong>in</strong>to two ma<strong>in</strong> parts. Part I considers the<br />

sensor array signal process<strong>in</strong>g problem <strong>and</strong> Part II is concerned with the<br />

identification of l<strong>in</strong>ear time-<strong>in</strong>variant dynamical systems. This chapter<br />

conta<strong>in</strong>s a short common <strong>in</strong>troduction to these two parts.<br />

1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g<br />

The basic goal of sensor array signal process<strong>in</strong>g is to extract <strong>in</strong>formation<br />

from data obta<strong>in</strong>ed from multiple sensors. Consider the situation<br />

depicted <strong>in</strong> Figure 1.1. The emitters transmit energy (typically electromagnetic<br />

or acoustic) from different locations. The wavefield measured<br />

by the sensors conta<strong>in</strong>s <strong>in</strong>formation about, for example, the directions to<br />

the sources. By comb<strong>in</strong><strong>in</strong>g the <strong>in</strong>formation from the different sensors <strong>in</strong><br />

a clever way, one may obta<strong>in</strong> accurate estimates of the directions. Applications<br />

of sensor array techniques <strong>in</strong>clude: radar systems, underwater<br />

surveillance by hydrophones, seismology <strong>and</strong> radio or satellite communications.<br />

Currently, there is also a large <strong>in</strong>terest <strong>in</strong> us<strong>in</strong>g so-called smart<br />

antennas for mobile communication. The idea is to obta<strong>in</strong> diversity by<br />

mak<strong>in</strong>g use of multiple antennas.<br />

The research <strong>in</strong> the field of array signal process<strong>in</strong>g is by now quite<br />

mature. Over the last thirty years numerous articles have been published<br />

<strong>and</strong> there are excellent reviews of the obta<strong>in</strong>ed results (see, e.g., [KV96,


2 1 Introduction<br />

Source 1<br />

Source 2 Source d<br />

θ1<br />

θ2<br />

θd<br />

Array of m sensors<br />

Figure 1.1: The setup for the direction estimation problem.<br />

VB88, RA92, VDS93]). There are also several books written, <strong>in</strong>clud<strong>in</strong>g<br />

[Pil89, JD93, Hay91, SM97]. The motivation of this section is not to give<br />

a comprehensive <strong>and</strong> detailed <strong>in</strong>troduction (the curious reader is referred<br />

to the cited material), but rather to give a h<strong>in</strong>t on from where the research<br />

presented <strong>in</strong> the thesis starts.<br />

1.1.1 Data Model<br />

Consider aga<strong>in</strong> the situation <strong>in</strong> Figure 1.1. To develop a convenient<br />

mathematical model, one has to make several simplifications. A most<br />

important assumption made <strong>in</strong> the follow<strong>in</strong>g is that the source signals


1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 3<br />

are narrowb<strong>and</strong>. This means that the signals have b<strong>and</strong>pass characteristics<br />

around a given center or carrier frequency. Such signals are common<br />

<strong>in</strong> communications where the <strong>in</strong>formative (low-frequency) signal is<br />

modulated by a high-frequency carrier before transmission. A complex<br />

representation of a narrowb<strong>and</strong> signal s(t) is given by<br />

s(t) = ¯s(t)e jωct ,<br />

where ωc is the carrier angular frequency. The narrowb<strong>and</strong> assumption<br />

implies that the amplitude or the envelope ¯s(t) varies much slower than<br />

s(t). Hence, for small time delays, τ,<br />

s(t − τ) = ¯s(t − τ)e jωc(t−τ)<br />

≈ ¯s(t)e jωct e −jωcτ<br />

= s(t)e −jωcτ .<br />

That is, for narrowb<strong>and</strong> signals we may model a small time delay by a<br />

phase shift.<br />

Let Hk(ω,θ) denote the (complex) response of the kth sensor at the<br />

frequency ω <strong>and</strong> from a wavefield <strong>in</strong> the direction θ. Furthermore, assume<br />

that Hk(ω,θ) ≈ Hk(ωc,θ) over the passb<strong>and</strong> around ωc of <strong>in</strong>terest. Let<br />

τk denote the time delay of the signal s(t) at the kth sensor relative to a<br />

reference po<strong>in</strong>t. Then the output of sensor k can be written as<br />

xk(t) = Hk(ωc,θ)s(t − τk) + nk(t) = Hk(ωc,θ)s(t)e −jωcτk + nk(t),<br />

(1.1)<br />

where θ is the direction to the source emitt<strong>in</strong>g the signal s(t). In (1.1),<br />

nk(t) is an additive noise term captur<strong>in</strong>g, sensor thermal noise, background<br />

noise, model errors etc..<br />

In essence, the narrowb<strong>and</strong> assumption is used to make ¯s(t) approximately<br />

constant dur<strong>in</strong>g the time it takes for the wave to propagate across<br />

the array <strong>and</strong> to approximate the sensor response by a constant.<br />

The sources are further assumed to be <strong>in</strong> the far field so that the<br />

signal wavefronts are planar when imp<strong>in</strong>g<strong>in</strong>g on the array. This makes it<br />

possible to relate the time delay to the direction-of-arrival θ. To see this,<br />

consider the follow<strong>in</strong>g example.<br />

Example 1.1. A common sensor array is the uniform l<strong>in</strong>ear array (ULA).<br />

In the ULA configuration, the sensors are distributed along a l<strong>in</strong>e with<br />

equal distance between the sensor elements. Figure 1.2 illustrates the


4 1 Introduction<br />

θ<br />

✉<br />

τc<br />

✉ θ ✉ ✉ ✉<br />

� �� �<br />

∆<br />

Figure 1.2: A plane wave from the direction θ imp<strong>in</strong>ges on a ULA of 5<br />

sensors separated by the distance ∆. The figure illustrates geometrically<br />

the propagation delay between two adjacent sensors.<br />

geometry when a plane wave imp<strong>in</strong>ges on a ULA with a separation of<br />

∆ between two adjacent sensors. Here, we consider the sensors <strong>and</strong> the<br />

sources to be <strong>in</strong> the same plane. In Figure 1.2, τ is the time it takes<br />

for the wavefront to propagate between two sensors <strong>and</strong> c is the speed of<br />

propagation. The signal imp<strong>in</strong>ges from the direction θ measured relative<br />

to the normal of the array. As shown <strong>in</strong> Figure 1.2, the relation between<br />

τ <strong>and</strong> θ is given by<br />

cτ = ∆s<strong>in</strong>(θ) .<br />

If there are d signals {sk(t)} d k=1 , then the total contribution at the ith<br />

sensor is (cf. (1.1))<br />

xi(t) =<br />

d�<br />

Hi(ωc,θk)sk(t)e jωc∆ s<strong>in</strong>(θk)/c<br />

+ ni(t),<br />

k=1<br />

where the first sensor is taken as the reference po<strong>in</strong>t. If the sensors are<br />

omni-directional, then Hi(ωc,θk) does not depend on θk. Furthermore, if<br />

the sensors are assumed to be identical, then the signal can be redef<strong>in</strong>ed<br />

to also <strong>in</strong>clude the sensor response. If the array comprises m sensors,


1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 5<br />

then the outputs can be written <strong>in</strong> a vector,<br />

⎡ ⎤<br />

x1(t)<br />

⎢<br />

⎢x2(t)<br />

⎥<br />

x(t) = ⎢ .<br />

⎣ .<br />

⎥<br />

. ⎦<br />

xm(t)<br />

=<br />

⎡<br />

d� ⎢ e<br />

⎢<br />

⎣<br />

k=1<br />

e<br />

d�<br />

= a(θk)sk(t) + n(t)<br />

k=1<br />

1<br />

jωc∆ s<strong>in</strong>(θk)/c<br />

.<br />

jωc∆(m−1) s<strong>in</strong>(θk)/c<br />

= � a(θ1) ... a(θd) �<br />

⎡ ⎤<br />

s1(t)<br />

⎢ .<br />

⎣ .<br />

⎥<br />

. ⎦ + n(t)<br />

sd(t)<br />

= A(θ)s(t) + n(t),<br />

⎤<br />

⎥<br />

⎦ sk(t)<br />

⎡ ⎤<br />

n1(t)<br />

⎢<br />

⎢n2(t)<br />

⎥<br />

+ ⎢ .<br />

⎣ .<br />

⎥<br />

. ⎦<br />

nm(t)<br />

where θ = [θ1, θ2,..., θd] T . The vector a(θ) is often referred to as the<br />

array response or the steer<strong>in</strong>g vector <strong>in</strong> the direction θ, <strong>and</strong> the matrix<br />

A(θ) is called the steer<strong>in</strong>g matrix.<br />

Similar to the development <strong>in</strong> Example 1.1, the array output can <strong>in</strong><br />

general be modeled as<br />

x(t) = A(θ)s(t) + n(t) . (1.2)<br />

The parameter vector θ can be associated with several parameters for<br />

each signal, for example, elevation, azimuth, range, polarization <strong>and</strong> carrier<br />

frequency. Typically, we will simplify matters <strong>and</strong> only consider one<br />

parameter per signal, for example, the azimuth (the direction-of-arrival)<br />

as <strong>in</strong> the example above.<br />

In the receiver equipment of the sensor array, the modulated signals<br />

<strong>in</strong> (1.2) are demodulated. The demodulation process extracts the (lowfrequent)<br />

baseb<strong>and</strong> signal that conta<strong>in</strong>s the <strong>in</strong>formation. By a slight abuse<br />

of notation, we still use the model (1.2) for the demodulated signals. Even<br />

though the derivation of the model above is somewhat sloppy, the idea<br />

should be clear. The <strong>in</strong>terested reader is referred to [Tre71] for more<br />

details on the complex signal representation <strong>and</strong> to [Vib89, Ott89, SM97]<br />

for derivations of the array model.<br />

Equation (1.2) constitutes one vector sample of the array output; that<br />

is, the m array outputs are sampled simultaneously. <strong>On</strong>e vector sample<br />

or one observation of the vector process {x(t)} is sometimes called a


6 1 Introduction<br />

snapshot of the array output. Assume that N snapshots are available,<br />

the data may then be written <strong>in</strong> matrix form as<br />

XN = � x(1), ..., x(N) � = A(θ)SN + NN .<br />

Here, SN <strong>and</strong> NN are the collection of signal <strong>and</strong> noise vectors, respectively,<br />

def<strong>in</strong>ed analogously to XN. The objective of the sensor array<br />

estimation problem can be stated as follows:<br />

Based on the data, XN, <strong>and</strong> the model for the observations,<br />

(1.2); estimate the number of signals, d, the signal parameters,<br />

θ, the signal waveforms, SN (or the signal covariance matrix<br />

P = E{s(t)s ∗ (t)}), <strong>and</strong> the noise variance, σ 2 . (See the next<br />

section for the stochastic model assumptions.)<br />

The major concern <strong>in</strong> this thesis is the estimation of the signal parameters,<br />

θ. <strong>On</strong>ce θ is known, it is relatively straightforward to estimate the<br />

other parameters of <strong>in</strong>terest (see, e.g., [Böh86, ORK89]). The number<br />

of signals, d, is assumed to be known. Literature address<strong>in</strong>g the estimation<br />

of d is, for example, [Fuc92, Wax91, VOK91b, VOK91a] <strong>and</strong> the<br />

references there<strong>in</strong>.<br />

1.1.2 Estimation <strong>Methods</strong><br />

The early approaches to make use of the <strong>in</strong>formation obta<strong>in</strong>ed by the<br />

sensor array <strong>in</strong>volved so-called beamform<strong>in</strong>g (see [VB88]). In short, these<br />

methods form a weighted sum of the outputs of the array elements to give<br />

a measure of the received energy from different directions. This gives a<br />

spatial spectrum from which θ can be estimated. These methods can be<br />

classified as “non-parametric” s<strong>in</strong>ce they do not make any assumptions<br />

on the covariance of the data [SM97]. In the rema<strong>in</strong>der of this section,<br />

some of the “parametric” array process<strong>in</strong>g methods will be described.<br />

The methods to be presented are only those that serve as a background<br />

to the material <strong>in</strong> the thesis.<br />

Maximum Likelihood<br />

The maximum likelihood (ML) method is a powerful tool for parameter<br />

estimation. It allows for a systematic procedure to many estimation<br />

problems. The ML estimates have, under mild conditions, strong statistical<br />

properties. For example, if the ML estimate is consistent 1 then<br />

1 A parameter estimate is said to be (strongly) consistent if it converges to the true<br />

value with probability one as the number of observations tend to <strong>in</strong>f<strong>in</strong>ity.


1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 7<br />

it is also asymptotically statistically efficient. Here, asymptotically statistically<br />

efficient means that the ML estimator atta<strong>in</strong>s the Cramér-Rao<br />

lower bound (CRB) asymptotically <strong>in</strong> the number of samples. The CRB<br />

is a lower bound on the estimation error variance for any unbiased estimator<br />

[Cra46].<br />

A common assumption on the noise, n(t), is that it is a zero-mean<br />

stationary complex Gaussian r<strong>and</strong>om process [Goo63] with second order<br />

moments<br />

E{n(t)n ∗ (s)} = σ 2 I δt,s , E{n(t)n T (s)} = 0 .<br />

Here, the superscript ∗ denotes complex conjugate transpose, δt,s is the<br />

Kronecker delta <strong>and</strong> I denotes the identity matrix. Regard<strong>in</strong>g the signal<br />

waveforms, {s(t)}, two common assumptions exist. Sometimes {s(t)} are<br />

treated as completely unknown complex parameters, called the determ<strong>in</strong>istic<br />

approach. However, for estimator design purposes it has been shown<br />

that a stochastic model for the signals may be useful, whether or not the<br />

true signals are stochastic <strong>in</strong> nature. In what follows, the signal vector<br />

s(t) is assumed to be a zero-mean stationary complex Gaussian r<strong>and</strong>om<br />

process with second order moments<br />

E{s(t)s ∗ (s)} = P δt,s , E{s(t)s T (s)} = 0 .<br />

Under the assumption that the noise <strong>and</strong> the emitter signals are uncorrelated,<br />

the array output is complex Gaussian with zero-mean <strong>and</strong><br />

covariance matrix<br />

R = E{x(t)x ∗ (t)} = A(θ)PA ∗ (θ) + σ 2 I . (1.3)<br />

The assumption of Gaussian emitter signals may be commented upon.<br />

In [OVK90, SN90] it is shown that the exact distribution is immaterial;<br />

the important quantity that determ<strong>in</strong>es the asymptotic properties of the<br />

signal parameter estimates is limN→∞ SNS ∗ N /N.<br />

From the assumptions on the signals <strong>and</strong> noise, the array response<br />

data, x(t), will be complex m-variate Gaussian with zero-mean <strong>and</strong> the<br />

covariance matrix given <strong>in</strong> (1.3). The probability density function is given<br />

by [Goo63]<br />

fX(x(t)) =<br />

1<br />

π m det{R} e−x∗ (t)R −1 x(t) , (1.4)<br />

where det{R} denotes the determ<strong>in</strong>ant of R. The distribution depends on<br />

the unknown parameters <strong>in</strong> θ, P <strong>and</strong> σ 2 . Introduce the parameter vector


8 1 Introduction<br />

p with the d 2 real-valued parameters <strong>in</strong> the Hermitian signal covariance<br />

matrix P; that is,<br />

p = � �T<br />

P11, ..., Pdd, P12, ¯ P12, ˜ ... P(d−1)d, ¯ P(d−1)d ˜ ,<br />

where Pij denotes the (i,j)th element of P <strong>and</strong> where ā = Re{a} <strong>and</strong><br />

ã = Im{a}. Furthermore, let η be the vector of all unknown parameters<br />

η = � θ T , p T , σ 2�T .<br />

The probability density function of one observation x(t) is given by (1.4).<br />

S<strong>in</strong>ce the snapshots are <strong>in</strong>dependent <strong>and</strong> identically distributed, the jo<strong>in</strong>t<br />

probability density of the N observations is<br />

f(x(1), ..., x(N)|η) =<br />

N�<br />

t=1<br />

1<br />

π m det{R} e−x∗ (t)R −1 x(t) . (1.5)<br />

Equation (1.5) is a measure of the probability that the outcome should<br />

take on the value x(1), ..., x(N) given η. When the observed values of<br />

x(t) are <strong>in</strong>serted <strong>in</strong> (1.5), a function of η is obta<strong>in</strong>ed. This function is<br />

called the likelihood function <strong>and</strong> is denoted by f(η). The ML-estimate<br />

of η is given by the maximiz<strong>in</strong>g argument of f(η); that is, the value that<br />

makes the likelihood of the outcome as large as possible. Maximiz<strong>in</strong>g<br />

f(η) is equivalent to m<strong>in</strong>imiz<strong>in</strong>g the negative logarithm of f(η),<br />

−log{f(η)} = −<br />

N�<br />

t=1<br />

log � 1<br />

π m det{R} e−x∗ (t)R −1 x(t) �<br />

= mN log(π) + N log(det{R}) +<br />

N�<br />

t=1<br />

x ∗ (t)R −1 x(t) .<br />

Ignor<strong>in</strong>g the constant term <strong>and</strong> normaliz<strong>in</strong>g by N, the criterion function<br />

is given by<br />

l(η) = log(det{R}) + 1<br />

N<br />

= log(det{R}) + 1<br />

N<br />

N�<br />

t=1<br />

N�<br />

t=1<br />

= log(det{R}) + Tr{R −1 ˆ R} ,<br />

x ∗ (t)R −1 x(t)<br />

Tr{R −1 x(t)x ∗ (t)}<br />

(1.6)


1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 9<br />

where Tr{·} is the trace operator [Gra81] <strong>and</strong> ˆ R is the sample covariance<br />

matrix,<br />

ˆR = 1<br />

N<br />

N�<br />

x(t)x ∗ (t) .<br />

t=1<br />

The ML estimate is thus given by the follow<strong>in</strong>g optimization problem<br />

ˆη = arg m<strong>in</strong><br />

η<br />

l(η) . (1.7)<br />

This is a high-dimensional nonl<strong>in</strong>ear optimization problem with, at least,<br />

d 2 + d + 1 parameters. It is possible to decrease this dimension through<br />

a separation of the problem as shown <strong>in</strong> [Böh86, Jaf88]. There<strong>in</strong>, Equation<br />

(1.7) is solved explicitly with respect to the signal <strong>and</strong> noise covariance<br />

parameters. For a given θ, this results <strong>in</strong><br />

ˆP(θ) = A † (θ) � ˆ R − ˆσ 2 (θ)I � A † ∗<br />

(θ), (1.8)<br />

ˆσ 2 (θ) = 1<br />

m − d Tr{Π⊥ A(θ) ˆ R}, (1.9)<br />

where the notations A † = (A ∗ A) −1 A ∗ <strong>and</strong> Π ⊥ A = I − AA † are <strong>in</strong>troduced.<br />

If (1.8) <strong>and</strong> (1.9) are substituted back <strong>in</strong> (1.6), the follow<strong>in</strong>g<br />

concentrated criterion function is obta<strong>in</strong>ed (after some algebra)<br />

V (θ) = log<br />

<strong>and</strong> the ML-estimate of the DOAs are<br />

�<br />

det � A(θ) ˆ P(θ)A ∗ (θ) + ˆσ 2 (θ)I ��<br />

ˆθ = arg m<strong>in</strong><br />

θ V (θ) . (1.10)<br />

Then, the problem is reduced to a d-dimensional non-l<strong>in</strong>ear search over<br />

the DOAs <strong>in</strong> the vector θ. The ML-estimates of P <strong>and</strong> σ 2 are obta<strong>in</strong>ed<br />

by <strong>in</strong>sert<strong>in</strong>g (1.10) <strong>in</strong> (1.8) <strong>and</strong> (1.9), respectively.<br />

We will not further elaborate on the direct ML approach, but the<br />

presentation above serves as comparison <strong>and</strong> background to the methods<br />

used <strong>in</strong> the preced<strong>in</strong>g sections. A f<strong>in</strong>al remark about the derivations<br />

above concerns the name ML. The presentation given here results <strong>in</strong><br />

an ML-estimator sometimes called the stochastic or unconditional ML.<br />

There also exists a different ML-estimator called the determ<strong>in</strong>istic or the<br />

conditional ML. This is due to the two different choices of model for the<br />

emitter signals mentioned above. For a discussion about the differences<br />

between these approaches, see [SN90, OVSN93].


10 1 Introduction<br />

<strong>Subspace</strong> <strong>Methods</strong><br />

S<strong>in</strong>ce the <strong>in</strong>troduction of an eigenstructure technique by [Pis73] for the<br />

harmonic retrieval problem, there has been great <strong>in</strong>terest <strong>in</strong> such methods<br />

<strong>in</strong> the signal process<strong>in</strong>g literature. However, the major break-po<strong>in</strong>t<br />

came a few years later with the development of the MUSIC (multiple signal<br />

classification) algorithm by Schmidt [Sch81, Sch79] (see also [Bie79]).<br />

The eigenstructure or subspace methods try to exploit a geometrical formulation<br />

of the problem by utiliz<strong>in</strong>g the eigendecomposition of the array<br />

covariance matrix. Here, we just mention that the ideas of the subspace<br />

methods are closely tied to pr<strong>in</strong>cipal component <strong>and</strong> factor analysis<br />

[Hot33, Law40]. The popularity of the subspace methods is due to the<br />

fact that they have a much better resolution compared to classical beamform<strong>in</strong>g<br />

techniques. That is, they have a good ability to resolve sources<br />

that are close <strong>in</strong> space (angle). The ML method presented <strong>in</strong> the previous<br />

section also has a high resolution, but it is computationally more <strong>in</strong>volved<br />

than the subspace methods. Three well known subspace methods will be<br />

described below. First, a description of the eigendecomposition of the<br />

array output covariance matrix is needed.<br />

Consider the covariance matrix<br />

R = A(θ)PA ∗ (θ) + σ 2 I .<br />

Recall that there are m sensors <strong>and</strong> d signals. Let d ′ denote the rank<br />

of P. If d ′ < d, then this <strong>in</strong>dicates that some of the signals are fully<br />

correlated or coherent 2 . Let the eigendecomposition of R be<br />

R =<br />

m�<br />

i=1<br />

λieie ∗ i , (1.11)<br />

where the eigenvalues are assumed to be ordered as λ1 ≥ λ2 ≥ · · · ≥ λm.<br />

Notice that, s<strong>in</strong>ce R is Hermitian, the eigenvectors can be chosen orthonormal<br />

(e ∗ i ek = δi,k). The rank of A(θ)PA ∗ (θ) is d ′ which implies<br />

that there are m −d ′ eigenvectors <strong>in</strong> its nullspace. These eigenvectors all<br />

have a correspond<strong>in</strong>g eigenvalue equal to σ 2 . The rema<strong>in</strong><strong>in</strong>g d ′ eigenvalues<br />

will be larger than the noise variance. With the observations made,<br />

the relation (1.11) may be partitioned as<br />

R = EsΛsE ∗ s + EnΛnE ∗ n ,<br />

2 Two signals are coherent if the first signal is a scaled (by a complex constant)<br />

version of the other.


1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 11<br />

where Λs = diag[λ1, ..., λd ′] <strong>and</strong> Λn = diag[λd ′ +1, ..., λm] = σ 2 I.<br />

The notation diag[·] is used to denote a diagonal matrix with specified<br />

diagonal elements. The matrix Es is m × d ′ <strong>and</strong> conta<strong>in</strong>s the d ′ “signal”<br />

eigenvectors correspond<strong>in</strong>g to the eigenvalues <strong>in</strong> Λs. Similarly, En is<br />

m × (m − d ′ ) <strong>and</strong> consists of the “noise” eigenvectors correspond<strong>in</strong>g to<br />

Λn. The follow<strong>in</strong>g geometrical observation is at the heart of all subspace<br />

methods<br />

R{Es} ⊆ R{A(θ)} , (1.12)<br />

where R{·} denotes the range of the matrix <strong>in</strong> question. The relation<br />

(1.12) says that the subspace spanned by the columns of Es is conta<strong>in</strong>ed<br />

<strong>in</strong> or is equal to the subspace spanned by the columns of A(θ). Equality<br />

<strong>in</strong> (1.12) holds if <strong>and</strong> only if there are no coherent signals (d ′ = d). If<br />

d ′ = d, then (1.12) implies that<br />

N {E ∗ n} = R{A(θ)} , (1.13)<br />

where N {E ∗ n} denotes the null-space of E ∗ n. That is, the columns of<br />

A(θ) spans the null-space of E ∗ n. This observation forms the basis of the<br />

MUSIC algorithm presented next.<br />

MUSIC<br />

Let a(θ) denote a generic column of A(θ). In view of (1.13),<br />

a ∗ (θ)EnE ∗ na(θ) = 0 (1.14)<br />

for θ = θk (k = 1,...,d), where {θk} are the true DOAs. In the MUSIC algorithm,<br />

the noise eigenvectors are estimated from the sample covariance<br />

matrix <strong>and</strong> the function<br />

1<br />

a ∗ (θ) Ên Ê∗ na(θ)<br />

is computed over the sector of <strong>in</strong>terest. The DOAs are estimated as the<br />

d highest peaks. For uniform l<strong>in</strong>ear arrays, 3 MUSIC can be implemented<br />

<strong>in</strong> a “root<strong>in</strong>g” form (root-MUSIC) [Bar83]. In that case, the solutions to<br />

(1.14) are computed from the roots of a polynomial. The performance of<br />

MUSIC is <strong>in</strong> many cases excellent. However, for scenarios with strongly<br />

correlated <strong>and</strong> closely spaced emitters or for small samples it may be far<br />

from optimal.<br />

3 Actually, this can be easily extended to hold for all l<strong>in</strong>ear arrays where the distances<br />

between the sensors are rational.


12 1 Introduction<br />

ESPRIT<br />

ESPRIT (estimation of signal parameters by rotational <strong>in</strong>variance techniques)<br />

[PRK86, RPK86, RK89] (see also [Kun81, KAB83]) exploits the<br />

geometry of a particular type of arrays. It is assumed that the array<br />

consists of two identical subarrays displaced by a known distance <strong>and</strong> <strong>in</strong><br />

a known direction. A simple example of such an array is the ULA (see<br />

Example 1.1). Def<strong>in</strong>e<br />

∆<br />

φk = ωc<br />

c s<strong>in</strong>(θk) (k = 1,...,d), (1.15)<br />

A1 = � Im−1 0 � A,<br />

A2 = � �<br />

0 Im−1 A,<br />

where 0 is an (m − 1) × 1 vector of zeros, <strong>and</strong> notice that<br />

where<br />

A2 = A1Φ, (1.16)<br />

⎡<br />

e<br />

⎢<br />

Φ = ⎣<br />

jφ1 . ..<br />

0<br />

0 ejφd ⎤<br />

⎥<br />

⎦.<br />

Clearly, the eigenvalues of Φ determ<strong>in</strong>e the DOAs (under some identifiability<br />

conditions not discussed here). If d ′ = d then the relation (1.12)<br />

implies that there exists a nons<strong>in</strong>gular matrix T such that A = EsT. By<br />

def<strong>in</strong><strong>in</strong>g<br />

Es1 = � Im−1 0 � Es,<br />

Es2 = � �<br />

0 Im−1 Es,<br />

<strong>and</strong> mak<strong>in</strong>g use of (1.16), the follow<strong>in</strong>g relation is obta<strong>in</strong>ed:<br />

Es2<br />

= Es1 Ψ, (1.17)<br />

where Ψ = TΦT −1 . Observe that the matrices Φ <strong>and</strong> Ψ have the<br />

same eigenvalues s<strong>in</strong>ce they are related by a similarity transformation.<br />

Hence, given an estimate of Es, one can solve (1.17) to give an estimate<br />

of Ψ. The directions can then be estimated through the relation between<br />

(the angles of) the eigenvalues of Ψ <strong>and</strong> the DOAs given above. The<br />

performance of ESPRIT is <strong>in</strong> most cases similar to the one for MUSIC.


1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 13<br />

Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g<br />

In [VO91], a general framework for subspace methods is presented. It is<br />

shown that many of the exist<strong>in</strong>g subspace-based techniques can be formulated<br />

as a subspace fitt<strong>in</strong>g problem. Indeed, the relation (1.12) <strong>in</strong>dicates<br />

that, given an estimate of the “signal subspace”, Ês, the parameters θ<br />

should be chosen such that R{ Ês} is conta<strong>in</strong>ed <strong>in</strong> R{A(θ)}. (It can be<br />

shown that this uniquely determ<strong>in</strong>es θ under weak conditions [WZ89].)<br />

In the presence of noise, it may not be possible to f<strong>in</strong>d an exact solution<br />

<strong>and</strong> a measure of closeness has to be specified. The signal subspace fitt<strong>in</strong>g<br />

formulation relies on a criterion which gives the best weighted least<br />

squares fit of the subspaces <strong>in</strong> (1.12). Thus, consider the criterion<br />

m<strong>in</strong><br />

θ, T �ÊsW 1/2 − A(θ)T� 2 F , (1.18)<br />

where �A� 2 F = Tr{AA∗ } denotes the square of the Frobenius matrix<br />

norm, Ês is an estimate of Es, <strong>and</strong> T is an arbitrary d × d ′ matrix.<br />

The weight<strong>in</strong>g matrix W 1/2 is d ′ × d ′ <strong>and</strong> assumed to be Hermitian <strong>and</strong><br />

positive def<strong>in</strong>ite. It is possible to explicitly solve (1.18) with respect to<br />

T. The solution is given by ˆ T = (A ∗ A) −1 A ∗ ÊsW 1/2 = A † ÊsW 1/2<br />

where the argument, θ, has been suppressed for notational convenience.<br />

Insert<strong>in</strong>g the solution for T <strong>in</strong> (1.18), yields<br />

V (θ) = � ÊsW 1/2 − AA † ÊsW 1/2 � 2 F<br />

= �Π ⊥ A ÊsW 1/2 � 2 F = Tr{Π ⊥ A ÊsW Ê∗ s} ,<br />

(1.19)<br />

where the facts that Π ⊥ AΠ ⊥ A = Π ⊥ A <strong>and</strong> W 1/2 W ∗/2 = W are used. The<br />

estimate of θ is obta<strong>in</strong>ed as the m<strong>in</strong>imiz<strong>in</strong>g argument of V (θ),<br />

ˆθ = arg m<strong>in</strong><br />

θ V (θ) . (1.20)<br />

As shown <strong>in</strong> [VO91], different choices of W lead to a whole class of<br />

estimators. It is also shown that there exists a specific weight<strong>in</strong>g which<br />

m<strong>in</strong>imizes the asymptotic parameter estimation error covariance matrix<br />

(<strong>in</strong> terms of matrix <strong>in</strong>equalities). The optimal weight<strong>in</strong>g is given by<br />

WWSF = (Λs − σ 2 I) 2 Λ −1<br />

s = ˜ Λ 2<br />

sΛ −1<br />

s .<br />

With this specific choice of weight<strong>in</strong>g, the algorithm is usually referred to<br />

as the weighted subspace fitt<strong>in</strong>g (WSF) method. Furthermore, it is shown


14 1 Introduction<br />

<strong>in</strong> [VO91] that the asymptotic covariance atta<strong>in</strong>s the Cramér-Rao lower<br />

bound <strong>and</strong>, hence, WSF is asymptotically statistically efficient. This<br />

implies that WSF <strong>and</strong> ML have the same asymptotic performance.<br />

The solution of (1.20) requires a nonl<strong>in</strong>ear optimization <strong>in</strong> d dimensions<br />

(if there is only one parameter associated with each signal). However,<br />

due to the form of the criterion, the implementation of WSF can<br />

be made more efficient than the implementation of ML [OVSN93].<br />

In [SS90a], the WSF criterion (1.19) is derived <strong>in</strong> a different way than<br />

described here. In that paper, the criterion is obta<strong>in</strong>ed by a maximum<br />

likelihood approach based on the statistics of the sample eigenvectors, <strong>and</strong><br />

the method is referred to as MODE (method of direction estimation).<br />

For uniform l<strong>in</strong>ear arrays with identical omni-directional sensors, the<br />

m<strong>in</strong>imization <strong>in</strong> (1.20) can be implemented <strong>in</strong> a very appeal<strong>in</strong>g manner<br />

[SS90b, SS90a]. The simplification is possible through a l<strong>in</strong>ear reparameterization<br />

of the nullspace of A ∗ . See Section 2.4 for some more details<br />

on this.<br />

1.1.3 Discussion<br />

In this section, the different topics of the thesis are related to the background<br />

material presented above.<br />

Chapter 2 concerns the application of different DOA estimators to the<br />

forward-backward (FB) sample covariance matrix of the output from a<br />

ULA. For the current discussion, it suffices to say that the FB approach<br />

uses the reflectional symmetry of the ULA to, <strong>in</strong> pr<strong>in</strong>ciple, “decorrelate”<br />

the signals (see Chapter 2 for a more detailed description). By this operation,<br />

it is expected that estimation methods that work well <strong>in</strong> scenarios<br />

with low signal correlation should improve their performance when applied<br />

to the FB preprocessed covariance matrix. To show that this <strong>in</strong>deed<br />

may be the case, consider the follow<strong>in</strong>g example.<br />

Example 1.2. Consider a ULA consist<strong>in</strong>g of six omni-directional <strong>and</strong><br />

identical sensors separated by half of the carrier’s wavelength. The two<br />

signals imp<strong>in</strong>ge on the array from θ1 = −7.5 ◦ <strong>and</strong> θ2 = 7.5 ◦ relative to<br />

broadside. The signal covariance is<br />

�<br />

P =<br />

1 0.8e iπ/4<br />

0.8e −iπ/4 1<br />

The noise variance is fixed to σ 2 = 1. The DOAs are estimated <strong>in</strong> four<br />

different ways; by apply<strong>in</strong>g the root-MUSIC <strong>and</strong> ESPRIT methods to either<br />

the st<strong>and</strong>ard sample covariance matrix, or to the forward-backward<br />

�<br />

.


1.1 <strong>Sensor</strong> Array Signal Process<strong>in</strong>g 15<br />

RMS error (<strong>in</strong> degrees)<br />

10 2<br />

10 1<br />

10 0<br />

10 −1<br />

10 1<br />

10 2<br />

Number of snapshots, N.<br />

Figure 1.3: Root-mean-square errors for θ1 (<strong>in</strong> degrees) for root-MUSIC<br />

(*), FB-MUSIC (o), ESPRIT (x) <strong>and</strong> FB-ESPRIT (+) versus the number<br />

of snapshots N. The solid l<strong>in</strong>e represents the CRB.<br />

sample covariance. The acronyms FB-MUSIC <strong>and</strong> FB-ESPRIT will be<br />

used when root-MUSIC <strong>and</strong>, respectively, ESPRIT are applied to the<br />

eigenvectors of the FB sample covariance matrix. The root-mean-square<br />

(RMS) errors for the estimates obta<strong>in</strong>ed by the four methods are compared<br />

for different sample lengths <strong>in</strong> Figure 2.1. (<strong>On</strong>ly the RMS values<br />

for θ1 are shown; the plot correspond<strong>in</strong>g to θ2 is similar.) The sample<br />

RMS values are based on 500 <strong>in</strong>dependent trials. From the figure, it<br />

can be seen that the FB approach gives a significant improvement <strong>in</strong><br />

performance for both MUSIC <strong>and</strong> ESPRIT.<br />

While the performance of MUSIC <strong>and</strong> ESPRIT can be enhanced by<br />

us<strong>in</strong>g the FB approach, it is shown <strong>in</strong> Chapter 2 that this is not true<br />

for WSF, or equivalently, MODE. <strong>On</strong> the contrary, <strong>in</strong> general the performance<br />

of WSF/MODE degrades when applied to the FB sample covariance.<br />

Hence, the conclusion of Chapter 2 is that the FB approach should<br />

10 3


16 1 Introduction<br />

not be used with a statistically efficient method such as WSF/MODE.<br />

A number of idealizations <strong>and</strong> assumptions were made when deriv<strong>in</strong>g<br />

the array model. In practice these may be more or less <strong>in</strong>valid. For<br />

example, the array model depends on the exact knowledge of the sensor<br />

positions, the ga<strong>in</strong> <strong>and</strong> phase of the <strong>in</strong>dependent sensors etc.. Sometimes<br />

the array response is obta<strong>in</strong>ed by so-called array calibration. This simply<br />

consists of measur<strong>in</strong>g the response of the array when it is excited by<br />

signals <strong>in</strong> known locations. Needless to say, this does not give a perfect<br />

error-free model. In Chapter 3, a method that takes small model errors<br />

<strong>in</strong>to account when estimat<strong>in</strong>g the directions of arrival is proposed. By<br />

compensat<strong>in</strong>g for the model errors <strong>in</strong> the estimator, a more robust method<br />

is obta<strong>in</strong>ed.<br />

It is well known that the MUSIC estimator performs well when the<br />

emitter signals are uncorrelated. Often it is even considered to be close<br />

to optimal <strong>in</strong> that case. However, it seems mean<strong>in</strong>gful to ask: What is<br />

the best possible performance that can be obta<strong>in</strong>ed if it is a priori known<br />

that the signals are uncorrelated? In Chapter 4, the correspond<strong>in</strong>g CRB<br />

is derived. As expected, that CRB is a lower bound than the bound given<br />

by the “st<strong>and</strong>ard” CRB, where the signal correlation is considered to be<br />

arbitrary. This implies that if one can derive a method that optimally<br />

exploits the a priori knowledge that the emitter signals are uncorrelated,<br />

then it will perform better than, for example, MUSIC <strong>and</strong> WSF. <strong>On</strong>e<br />

efficient method is maximum likelihood (ML). The ML estimate for the<br />

current problem is obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g the criterion <strong>in</strong> (1.6) under<br />

the constra<strong>in</strong>t that the signal covariance is diagonal. However, this seems<br />

to be a quite complicated m<strong>in</strong>imization problem of high dimension (s<strong>in</strong>ce<br />

the unknowns are the DOAs, the diagonal elements of P <strong>and</strong> σ 2 ). A second<br />

alternative is to use the generalized least squares (GLS) method (also<br />

known as the method of moments or the covariance match<strong>in</strong>g method)<br />

<strong>in</strong> which the sample <strong>and</strong> the model covariance are matched <strong>in</strong> a weighted<br />

least squares sense. The GLS method is known to be asymptotically statistically<br />

efficient for the case under study (see [And84] <strong>and</strong> the references<br />

there<strong>in</strong>). Us<strong>in</strong>g the GLS method on the problem at h<strong>and</strong>, it is possible to<br />

separate the criterion so that the direction estimates can be obta<strong>in</strong>ed as<br />

the result of a d-dimensional non-l<strong>in</strong>ear m<strong>in</strong>imization. Hence, compared<br />

to ML, this leads to a less complicated optimization problem. In Chapter<br />

4, we propose a novel method for direction estimation <strong>in</strong> cases when it<br />

is a priori known that the emitter signals are uncorrelated. The method<br />

is shown to be asymptotically statistically efficient. Furthermore, the<br />

method allows for a relatively simple implementation when the array is


1.2 <strong>System</strong> <strong>Identification</strong> 17<br />

w(t) v(t)<br />

❄<br />

✛✘ ❄<br />

u(t) ✲ <strong>System</strong> ✲ Σ ✲<br />

✚✙<br />

Figure 1.4: L<strong>in</strong>ear f<strong>in</strong>ite dimensional time-<strong>in</strong>variant dynamic system.<br />

y(t)<br />

uniform <strong>and</strong> l<strong>in</strong>ear. Then, the estimates are obta<strong>in</strong>ed by us<strong>in</strong>g st<strong>and</strong>ard<br />

matrix operations (like an eigendecomposition) only. Hence, there is no<br />

need for an iterative search procedure.<br />

More precise descriptions of the contributions of the thesis are given<br />

<strong>in</strong> Section 1.3.<br />

1.2 <strong>System</strong> <strong>Identification</strong><br />

The second part of this thesis concerns subspace system identification.<br />

<strong>System</strong> identification deals with the topic of obta<strong>in</strong><strong>in</strong>g mathematical<br />

models of dynamic systems from measured data. The applications of<br />

system identification techniques are widespread <strong>in</strong> almost all eng<strong>in</strong>eer<strong>in</strong>g<br />

discipl<strong>in</strong>es. The theory is firmly established <strong>and</strong> covered <strong>in</strong> several books<br />

<strong>in</strong>clud<strong>in</strong>g [Boh91, Eyk74, GP77, HD88, Joh93, Lju87, SS89].<br />

1.2.1 Background<br />

Consider the situation <strong>in</strong> Figure 1.4. In the figure, u(t) <strong>and</strong> y(t) are<br />

observed <strong>in</strong>put <strong>and</strong> output signals, respectively. The signals w(t) <strong>and</strong><br />

v(t) represent unknown <strong>in</strong>puts to the system. All signals may very well<br />

be vector valued. Assume that the system is l<strong>in</strong>ear <strong>and</strong> time-<strong>in</strong>variant<br />

<strong>and</strong>, hence, that the observed output can be described as<br />

y(t) = G(q)u(t) + H(q)e(t). (1.21)


18 1 Introduction<br />

In (1.21), the transfer operators or the transfer functions H(q) <strong>and</strong> G(q)<br />

are functions of the forward shift operator q (qu(t) = u(t+1)). Moreover,<br />

H(q) is monic, which means that its first impulse response coefficient is<br />

the identity, I. The part H(q)e(t), where {e(t)} is a white noise process,<br />

encompasses the effect of both w(t) <strong>and</strong> v(t) <strong>in</strong> Figure 1.4.<br />

<strong>System</strong> identification aims at estimat<strong>in</strong>g the transfer functions H(q)<br />

<strong>and</strong> G(q) from measurements of u(t) <strong>and</strong> y(t) at the time <strong>in</strong>stances<br />

t = 1,2,...,N. A very general approach which has proved successful is<br />

the prediction error method (PEM). The idea of PEM is to m<strong>in</strong>imize the<br />

difference between the measured output y(t) <strong>and</strong> the prediction of y(t)<br />

based on past data. The best l<strong>in</strong>ear one-step ahead predictor is given by<br />

[Lju87]<br />

ˆy(t,θ) = H −1 (q,θ)G(q,θ)u(t) + � I − H −1 (q,θ) � y(t),<br />

where G(q,θ) <strong>and</strong> H(q,θ) are parameterized models of G(q) <strong>and</strong> H(q),<br />

respectively. The PEM estimate of θ is def<strong>in</strong>ed by<br />

ˆθ = arg m<strong>in</strong><br />

θ V (θ),<br />

where a typical choice of the criterion is<br />

V (θ) = 1<br />

N<br />

N�<br />

[y(t) − ˆy(t,θ)] T W[y(t) − ˆy(t,θ)]<br />

t=1<br />

for a positive def<strong>in</strong>ite weight<strong>in</strong>g matrix W.<br />

It should be remarked that the use of PEM is not limited to the case<br />

where there exists a “true system” represented by (1.21). The system<br />

can be of higher dimension than the model <strong>and</strong> it can be non-l<strong>in</strong>ear. The<br />

identified model will then be an approximation of the system with certa<strong>in</strong><br />

properties [Lju87, SS89].<br />

In general, the m<strong>in</strong>imization of the prediction errors requires a nonl<strong>in</strong>ear<br />

iterative search. The success of such procedures depends heavily on<br />

if there exists a sufficiently good <strong>in</strong>itial estimate. It is also important to<br />

have a numerically reliable (canonical) parameterization of G(q,θ) <strong>and</strong><br />

H(q,θ). This is an issue that is most difficult for multi-variable systems<br />

(see, for example, [VL82, Lju87, GW84, GW74]).<br />

The subspace identification methods have <strong>in</strong> many cases shown to<br />

give accurate models. Those models can either be used as they are, or<br />

they may serve as an <strong>in</strong>itial estimate for PEM to give a ref<strong>in</strong>ed model.<br />

There exists alternative methods to provide <strong>in</strong>itial model estimates, such


1.2 <strong>System</strong> <strong>Identification</strong> 19<br />

as, <strong>in</strong>strumental variable <strong>and</strong> two-stage least-squares methods [Lju87,<br />

SS89]. <strong>On</strong>e reason for the <strong>in</strong>terest <strong>in</strong> the subspace methods is due to<br />

computational aspects. Many subspace methods can be implemented <strong>in</strong><br />

a very effective way by mak<strong>in</strong>g use of numerical l<strong>in</strong>ear algebra tools such<br />

as QR-factorization <strong>and</strong> s<strong>in</strong>gular value decomposition [CXK94, CK95,<br />

VD91, VD96].<br />

1.2.2 <strong>Subspace</strong> <strong>Methods</strong><br />

<strong>Subspace</strong> methods for system identification have their orig<strong>in</strong> <strong>in</strong> the realization<br />

theory (see, for example, [HK66, Aka74a, Aka74b, Fau76]).<br />

The determ<strong>in</strong>istic realization algorithm of Ho <strong>and</strong> Kalman [HK66] was<br />

turned <strong>in</strong>to an identification algorithm <strong>in</strong> [ZM74, Kun78] where the s<strong>in</strong>gular<br />

value decomposition (SVD) was used as a tool to suppress the<br />

effects of noise. Later extensions <strong>in</strong>clude [Bay92, KDS88, LS91, Lju91,<br />

JP85]. These approaches are based on impulse response data. Often<br />

it is difficult to get accurate estimates of the impulse response coefficients<br />

from data. The subspace methods considered here directly utilize<br />

the measured time doma<strong>in</strong> data. Some basic approaches <strong>in</strong>clude<br />

[Gop69, De 88, DVVM88, MDVV89, Ver91]. More general methods are<br />

presented <strong>in</strong> [Lar83, VD94b, Ver94, PSD96]. These methods consider<br />

comb<strong>in</strong>ed determ<strong>in</strong>istic-stochastic systems where the output is due to<br />

both observed <strong>and</strong> unobserved <strong>in</strong>puts. There are also subspace methods<br />

for the pure time series case [Aka75, Aok87, AK90, VD93, Lar83, DPS95]<br />

<strong>and</strong> for the errors-<strong>in</strong>-variables problem [SCE95, CS96]. However, the<br />

ma<strong>in</strong> <strong>in</strong>terest <strong>in</strong> what follows is on the comb<strong>in</strong>ed methods. We refer to<br />

the survey papers [Vib95, RA92, VDS93] <strong>and</strong> the book [VD96] for a more<br />

comprehensive presentation of (the history of) the subspace identification<br />

methods.<br />

In the follow<strong>in</strong>g, a brief <strong>in</strong>troduction to the basic ideas that underlie<br />

the subspace identification methods is given. A l<strong>in</strong>ear time-<strong>in</strong>variant<br />

discrete-time system of order n can be described by the state-space equations<br />

x(t + 1) = Ax(t) + Bu(t) + w(t), (1.22a)<br />

y(t) = Cx(t) + Du(t) + v(t). (1.22b)<br />

Here, u(t) ∈ R m <strong>and</strong> y(t) ∈ R l represent the observed <strong>in</strong>put <strong>and</strong> output<br />

data, respectively. The state-vector x(t) ∈ R n is an <strong>in</strong>ternal variable.<br />

The state equation is perturbed by the process noise w(t) ∈ R n , <strong>and</strong> the


20 1 Introduction<br />

output equation conta<strong>in</strong>s the measurement noise v(t) ∈ Rl . Both w(t)<br />

<strong>and</strong> v(t) are white noise processes. We assume that the description is<br />

m<strong>in</strong>imal. That is, there is no other state-space description with lower<br />

state-dimension than n. Observe that, if the state-sequence {x(t)} were<br />

known, then it would be straightforward to estimate the state-space matrices<br />

{A,B,C,D} <strong>in</strong> (1.22) by l<strong>in</strong>ear regression. The question is how to<br />

estimate the state from the measured data {u(t),y(t)}.<br />

Before present<strong>in</strong>g the data-model that is used for state estimation, we<br />

need to def<strong>in</strong>e the follow<strong>in</strong>g block matrices:<br />

⎡<br />

⎢<br />

Γα = ⎢<br />

⎣<br />

C<br />

CA<br />

.<br />

CAα−1 ⎤<br />

⎥<br />

⎦ ,<br />

⎡<br />

D 0 · · · 0<br />

⎢<br />

Φα = ⎢<br />

⎣<br />

CB<br />

.<br />

D<br />

. ..<br />

. ..<br />

. ..<br />

0<br />

.<br />

CAα−2B CAα−3 ⎤<br />

⎥<br />

⎦<br />

B · · · D<br />

,<br />

⎡<br />

0 · · · 0 0<br />

⎢<br />

Ψα = ⎢<br />

⎣<br />

C<br />

.<br />

. ..<br />

. ..<br />

0 0<br />

.<br />

.<br />

.<br />

.<br />

CAα−2 ⎤<br />

⎥<br />

⎦<br />

· · · C 0<br />

.<br />

Notice that Γα is the extended observability matrix if α > n <strong>and</strong> it is full<br />

column rank s<strong>in</strong>ce the system is m<strong>in</strong>imal. Introduce the notation<br />

yα(t) = � y T (t) y T (t + 1) ... y T (t + α − 1) � T , (1.23)<br />

which is of dimension αl. Similarly, def<strong>in</strong>e the vectors uα(t), wα(t), <strong>and</strong><br />

vα(t) made from <strong>in</strong>puts, process <strong>and</strong> measurement noises, respectively.<br />

We are now ready to give the basic data model that is used <strong>in</strong> the<br />

subspace identification methods. From the state-space equations (1.22),<br />

it is easily seen that<br />

where<br />

yα(t) = Γαx(t) + Φαuα(t) + nα(t) , (1.24)<br />

nα(t) = Ψαwα(t) + vα(t).


1.2 <strong>System</strong> <strong>Identification</strong> 21<br />

Observe that the vectors yα(t), uα(t), <strong>and</strong> nα(t) conta<strong>in</strong> data from time<br />

t <strong>and</strong> onwards. The state x(t) conta<strong>in</strong>s the <strong>in</strong>formation about the past<br />

necessary to predict the future behavior of the system. In fact,<br />

ˆyα(t) = Γαx(t)<br />

is the optimal (<strong>in</strong> the mean-square sense) prediction of yα(t) given data<br />

up to time t. Assume for the moment that ˆyα(t) is known for t =<br />

1,2,...,N. Def<strong>in</strong>e the matrices<br />

Clearly,<br />

ˆYα = � ˆyα(1) ˆyα(2) ... ˆyα(N) � , (1.25)<br />

X = � x(1) x(2) ... x(N) � . (1.26)<br />

ˆYα = ΓαX (1.27)<br />

<strong>and</strong> it is of dimension αl × N. If N > n <strong>and</strong> αl > n, then 4<br />

rank{ ˆ Yα} = n (1.28)<br />

<strong>and</strong>, hence, ˆ Yα is low-rank. The factorization <strong>in</strong> (1.27) is not unique<br />

s<strong>in</strong>ce ΓαTT −1 X = ΓαX for any non-s<strong>in</strong>gular matrix T. This nonuniqueness<br />

corresponds to a change of state-space basis (x(t) → T −1 x(t))<br />

<strong>and</strong> changes only the <strong>in</strong>ternal structure of the system description. The<br />

<strong>in</strong>put-output relation does not change. This implies that if ˆ Yα is known,<br />

then Γα <strong>and</strong> X (<strong>in</strong> some state-space basis) can be obta<strong>in</strong>ed by a low-rank<br />

factorization of ˆ Yα.<br />

It rema<strong>in</strong>s to show how to estimate ˆ Yα, <strong>and</strong> how to factorize that<br />

estimate. Recall the relation (1.24) <strong>and</strong> observe that the state-vector x(t)<br />

conta<strong>in</strong>s the past <strong>in</strong>formation. In other words, x(t) can be written as a<br />

l<strong>in</strong>ear comb<strong>in</strong>ation of the past data {u(t−k),y(t−k)} for k = 1,2,... , ∞.<br />

From (1.24), we can write<br />

yα(t) = Γαx(t) + Φαuα(t) + nα(t)<br />

≈ Γα (K1yβ(t − β) + K2uβ(t − β)) + Φαuα(t) + nα(t),<br />

� L1yβ(t − β) + L2uβ(t − β) + Φαuα(t) + nα(t),<br />

(1.29)<br />

4 Here, we assume that X is full row-rank (= n), which is generically true for large<br />

N if the <strong>in</strong>puts are sufficiently excit<strong>in</strong>g. This is actually at the basis of the realization<br />

theory presented <strong>in</strong> [Aka74a, Aka74b, HK66].


22 1 Introduction<br />

where the notation uβ(t − β) <strong>and</strong> yβ(t − β) is analogous to (1.23):<br />

uβ(t − β) = � u T (t − β) u T (t − β + 1) ... u T (t − 1) � T ,<br />

yβ(t − β) = � y T (t − β) y T (t − β + 1) ... y T (t − 1) � T .<br />

In (1.29), L1 = ΓαK1 <strong>and</strong> L2 = ΓαK2 are two unknown matrices. The<br />

approximation <strong>in</strong> (1.29) is due to the fact that the amount of past <strong>in</strong>putoutput<br />

data is truncated. <strong>On</strong>ly β past samples are used to estimate x(t). 5<br />

S<strong>in</strong>ce Γα is unknown, it is actually Γαx(t) that is estimated from the past<br />

data <strong>in</strong> (1.29). The idea is to use (1.29) to estimate ˆyα(t) as<br />

ˆyα(t) ≈ L1yβ(t − β) + L2uβ(t − β).<br />

If Φα was known, then L1 <strong>and</strong> L2 could be estimated by l<strong>in</strong>ear regression.<br />

However, s<strong>in</strong>ce Φα is unknown, it is estimated as well. Let ˆ L1, ˆ L2 <strong>and</strong><br />

ˆΦα denote the estimates of the respective quantity obta<strong>in</strong>ed by l<strong>in</strong>ear<br />

regression <strong>in</strong> (1.29). An estimate of the prediction matrix ˆ Yα <strong>in</strong> (1.25)<br />

is then<br />

ˆYα = ˆ �<br />

L1 yβ(1) ... yβ(N) � + ˆ �<br />

L2 uβ(1) ... uβ(N) � .<br />

For f<strong>in</strong>ite data (f<strong>in</strong>ite N), ˆ Yα typically has full rank. S<strong>in</strong>ce the rank<br />

of ˆ Yα is equal to n, it is sensible to somehow reduce the rank of ˆ Yα.<br />

The s<strong>in</strong>gular value decomposition (SVD) is a powerful tool <strong>in</strong> numerical<br />

l<strong>in</strong>ear algebra. <strong>On</strong>e of its strong properties is that it provides the best<br />

(<strong>in</strong> spectral or Frobenius norm) low-rank approximation of a full rank<br />

matrix [GV96]. Let<br />

ˆYα = ˆ Qˆ S ˆ V T = � �<br />

Qs<br />

ˆ ˆQn<br />

� � �<br />

Ss<br />

ˆ 0 ˆV T<br />

s<br />

0 Sn<br />

ˆ ˆV T �<br />

(1.30)<br />

n<br />

be the SVD of ˆ Yα. In (1.30), ˆ Q <strong>and</strong> ˆ V are orthonormal matrices <strong>and</strong> ˆ S is<br />

a diagonal matrix hav<strong>in</strong>g the s<strong>in</strong>gular values arranged <strong>in</strong> non-<strong>in</strong>creas<strong>in</strong>g<br />

order along the diagonal. The partition<strong>in</strong>g <strong>in</strong> (1.30) is made so that ˆ Ss<br />

is n × n <strong>and</strong> ˆ Sn conta<strong>in</strong>s the rema<strong>in</strong><strong>in</strong>g (small) s<strong>in</strong>gular values. Notice<br />

5 For simplicity, we have taken the same number, β, of past <strong>in</strong>puts <strong>and</strong> outputs. Of<br />

course, it is possible to take different numbers for the <strong>in</strong>put <strong>and</strong> the output, respectively.


1.2 <strong>System</strong> <strong>Identification</strong> 23<br />

that ˆ Sn = 0 <strong>in</strong> the ideal case (N → ∞). Reta<strong>in</strong><strong>in</strong>g the part of the SVD<br />

correspond<strong>in</strong>g to the n largest s<strong>in</strong>gular values gives<br />

ˆYα ≈ ˆ Qs ˆ Ss ˆ V T s , (1.31)<br />

which is the desired rank n approximation of ˆ Yα. In view of (1.27) <strong>and</strong><br />

(1.31), we obta<strong>in</strong> the follow<strong>in</strong>g estimates of the extended observability<br />

matrix <strong>and</strong> the state-sequence:<br />

ˆΓα = ˆ Qs ˆ S 1/2<br />

s T,<br />

ˆX = T −1ˆ1/2 S ˆV s<br />

T s .<br />

Here, ˆ S 1/2<br />

s denotes the square root of ˆ Ss, <strong>and</strong> T is an arbitrary nons<strong>in</strong>gular<br />

n × n matrix. Recall that T merely determ<strong>in</strong>es the state-space<br />

basis of the realization <strong>and</strong> does not affect the <strong>in</strong>put-output relation. We<br />

may, for example, choose T = I. Thus, by the manipulations above, we<br />

have an estimate of X which can be used to estimate the system matrices<br />

by l<strong>in</strong>ear regression <strong>in</strong> the state-space equations (1.22) as mentioned before.<br />

Furthermore, the sample covariance of the residuals <strong>in</strong> the regression<br />

gives an estimate of the covariance of the noise process � w T (t) v T (t) � T .<br />

The procedure presented above summarizes the general ideas that,<br />

more or less explicitly, are used <strong>in</strong> the subspace system identification<br />

methods. Clearly, the crucial step is how to estimate the states. The<br />

approximations made above are the truncation of past data (β), <strong>and</strong> the<br />

estimation of ˆ Yα by l<strong>in</strong>ear regression. It can be shown [PSD96] that<br />

the state estimate derived above is consistent if β → ∞. The exist<strong>in</strong>g<br />

subspace methods conta<strong>in</strong> several modifications of the above procedure.<br />

<strong>On</strong>e of the reasons why these modifications are made is that it is possible<br />

to get consistent estimates of the system matrices for a f<strong>in</strong>ite β.<br />

Conceptually, the idea of first estimat<strong>in</strong>g the state <strong>and</strong> then the system<br />

matrices by l<strong>in</strong>ear regression is very appeal<strong>in</strong>g. However, as mentioned<br />

above, it is complicated to obta<strong>in</strong> a consistent estimate of the<br />

state. A common approach to estimate A <strong>and</strong> C is <strong>in</strong>stead to use the<br />

shift-<strong>in</strong>variance structure <strong>in</strong> Γα as <strong>in</strong> the classical realization algorithm<br />

by Ho <strong>and</strong> Kalman [HK66] (see Section 5.5). Recall that an estimate<br />

of Γα is obta<strong>in</strong>ed from the factorization of ˆ Yα <strong>in</strong> (1.30). An advantage<br />

with methods based on ˆ Γα is that it is much easier to obta<strong>in</strong> a consistent<br />

estimate of Γα than it is to estimate the states (see Chapter 6). Given<br />

estimates of A <strong>and</strong> C, the B <strong>and</strong> D matrices can be estimated by l<strong>in</strong>ear


24 1 Introduction<br />

regression s<strong>in</strong>ce the transfer function from u(t) to y(t) is l<strong>in</strong>ear <strong>in</strong> B <strong>and</strong><br />

D. (There are also other possibilities [VD96, Ver94].)<br />

Another approach is presented by Van Overschee <strong>and</strong> De Moor<br />

[VD94b]. In [VD94b], it is shown that the state estimate obta<strong>in</strong>ed for<br />

fixed β can be <strong>in</strong>terpreted as be<strong>in</strong>g the output of a bank of non-steady<br />

state Kalman filters. This observation is used to rewrite the state-space<br />

equations <strong>in</strong>to a slightly modified l<strong>in</strong>ear regression from which the system<br />

matrices {A,B,C,D} can be estimated by least-squares.<br />

1.2.3 Discussion<br />

Many of the subspace system identification methods have been derived<br />

<strong>in</strong> an algebraic framework <strong>in</strong>volv<strong>in</strong>g orthogonal <strong>and</strong> oblique projections<br />

on the data. Traditionally, such operations are <strong>in</strong> the system identification<br />

literature rather described <strong>in</strong> terms of cross-correlations. The<br />

subspace methods also make extensive use of tools from numerical l<strong>in</strong>ear<br />

algebra, which is beneficial at the implementation level, but it complicates<br />

the underst<strong>and</strong><strong>in</strong>g of the underly<strong>in</strong>g ideas. Among other facts, this<br />

has somewhat obscured the relation between classical methods (like PEM<br />

<strong>and</strong> <strong>in</strong>strumental variable methods) <strong>and</strong> the 4SID methods. It is still a<br />

topic of research to br<strong>in</strong>g these two approaches closer. <strong>On</strong>e of the aims<br />

of Chapter 5 is to po<strong>in</strong>t to similarities <strong>and</strong> differences by formulat<strong>in</strong>g<br />

the subspace estimation problem <strong>in</strong> a l<strong>in</strong>ear regression framework. (By<br />

subspace estimation we mean the estimation of the extended observability<br />

matrix Γα.) This formulation is similar to the one <strong>in</strong> the previous<br />

section <strong>and</strong> gives some further <strong>in</strong>sights <strong>in</strong>to the relation between the different<br />

subspace estimates that have been proposed <strong>in</strong> the literature. The<br />

l<strong>in</strong>ear regression formulation unifies the subspace estimation step of most<br />

exist<strong>in</strong>g methods. Weight<strong>in</strong>g matrices that appear rather ad hoc <strong>in</strong> many<br />

previous derivations fall naturally <strong>in</strong>to the formulation of the subspace<br />

estimation <strong>in</strong> terms of a l<strong>in</strong>ear regression cost function m<strong>in</strong>imization. As<br />

seen above, there are a number of truncation parameters that are to be<br />

chosen by the user (α <strong>and</strong> β). Some effects due to different choices of<br />

these parameters are <strong>in</strong> Chapter 5 studied by means of asymptotic variance<br />

expressions for the pole estimates. It can be shown that most of<br />

the common choices of subspace estimates lead to the same asymptotic<br />

variance of the poles when the observed <strong>in</strong>put is white noise. For colored<br />

<strong>in</strong>puts, there are differences though. This is illustrated <strong>in</strong> a numerical<br />

example.<br />

The subspace identification methods rely on the fact that a certa<strong>in</strong>


1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions 25<br />

cross-correlation matrix ( ˆ Yα) is low-rank. The rank should be equal to<br />

the system order to guarantee identifiability. Given the “true” system<br />

<strong>and</strong> the signal properties, it is of course possible to check this condition.<br />

If only the system order n is known, it is also possible to test if the data<br />

quantity is “sufficiently” close to a rank n matrix by, for example, a hypothesis<br />

test. If the rank is to small, then this <strong>in</strong>dicates that the data<br />

are not <strong>in</strong>formative enough, <strong>and</strong> the experiment should be re-designed.<br />

In Chapter 6, excitation conditions on the data to ensure consistency<br />

are discussed. For more or less restrictive assumptions on the system,<br />

explicit persistence of excitation conditions on the observed <strong>in</strong>put signal<br />

are given. An <strong>in</strong>terest<strong>in</strong>g observation made <strong>in</strong> Chapter 6 is the fact<br />

that a persistence of excitation condition on the <strong>in</strong>put does not <strong>in</strong> general<br />

guarantee consistency. Indeed, there exists examples for which the<br />

aforementioned rank condition does not hold. This means that, <strong>in</strong> such<br />

cases, the subspace methods under consideration do not give a consistent<br />

estimate of the true system. <strong>On</strong>e should, however, remember that such<br />

counterexamples are rather “pathological” <strong>and</strong> generically the rank condition<br />

holds for the persistence of excitation conditions given <strong>in</strong> the thesis.<br />

The importance of this result maybe is more relevant from a performance<br />

po<strong>in</strong>t of view. It is expected that the accuracy of the estimated model is<br />

poor if the identification conditions are close to a “counterexample”.<br />

1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions<br />

This section gives a chapter by chapter description of the contributions<br />

made <strong>in</strong> the thesis. This also serves as a brief overview of what will<br />

follow.<br />

Chapter 2<br />

The chapter studies the application of the MODE (method of direction estimation)<br />

or the weighted subspace fitt<strong>in</strong>g (WSF) pr<strong>in</strong>ciple to the pr<strong>in</strong>cipal<br />

(signal) eigenvectors of the forward-backward (FB) sample covariance<br />

matrix. The MODE pr<strong>in</strong>ciple consists of match<strong>in</strong>g the signal eigenvectors<br />

to the array model <strong>in</strong> a statistically sound way. Somewhat surpris<strong>in</strong>gly,<br />

it turns out that the best one can do is to apply the st<strong>and</strong>ard MODE<br />

estimator to the eigenelements of the FB-covariance. It is shown that the<br />

accuracy of the so-obta<strong>in</strong>ed estimates <strong>in</strong> general is <strong>in</strong>ferior to the st<strong>and</strong>ard<br />

MODE estimate. The FB approach has shown to be beneficial for


26 1 Introduction<br />

other (suboptimal) array process<strong>in</strong>g algorithms such as MUSIC <strong>and</strong> ES-<br />

PRIT. However, <strong>in</strong> view of the abovementioned result, forward-backward<br />

averag<strong>in</strong>g is not recommended to be used <strong>in</strong> conjunction with MODE or<br />

WSF.<br />

An article version of this chapter has been accepted for publication<br />

as<br />

Petre Stoica <strong>and</strong> Magnus Jansson, “<strong>On</strong> Forward-Backward MODE<br />

for Array Signal Process<strong>in</strong>g”, Digital Signal Process<strong>in</strong>g – A Review<br />

Journal, Aug. 1997.<br />

This work is also related to the material presented <strong>in</strong><br />

Magnus Jansson <strong>and</strong> Björn Ottersten, “Covariance Preprocess<strong>in</strong>g<br />

<strong>in</strong> Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g”, In Proc. IEEE/IEE workshop on<br />

signal process<strong>in</strong>g methods <strong>in</strong> multipath environments, pp. 23-32,<br />

Glasgow, Scotl<strong>and</strong>, Apr. 1995.<br />

That paper analyzes the performance of MODE <strong>and</strong> WSF when the sample<br />

covariance matrix is l<strong>in</strong>early preprocessed before the estimator is applied.<br />

The forward-backward preprocess<strong>in</strong>g is a special case for which<br />

that analysis holds.<br />

Chapter 3<br />

In many cases it is not the errors due to noise that is the limit<strong>in</strong>g factor<br />

when high resolution direction estimators is applied <strong>in</strong> real situations.<br />

The performance limitations may rather be due to errors <strong>in</strong> the array<br />

model. The problem of obta<strong>in</strong><strong>in</strong>g a robust direction of arrival estimator<br />

is considered <strong>in</strong> this chapter. The robustness refers to <strong>in</strong>sensitivity to errors<br />

<strong>in</strong> the array model. A signal subspace fitt<strong>in</strong>g approach is taken <strong>and</strong><br />

a weight<strong>in</strong>g that accounts for both the f<strong>in</strong>ite sample effects of the noise<br />

<strong>and</strong> the model errors is derived. It is shown that the weight<strong>in</strong>g can be<br />

written as the sum of two terms. The first, is noth<strong>in</strong>g but the st<strong>and</strong>ard<br />

WSF weight<strong>in</strong>g that is optimal <strong>in</strong> the case with no model errors. The second<br />

term compensates for the first order effects of the errors <strong>in</strong> the array<br />

model. The asymptotic performance is shown to co<strong>in</strong>cide with the CRB<br />

for the problem under study. There exists other methods (MAPprox <strong>and</strong><br />

MAP-NSF) that have the same asymptotic optimality properties, but the<br />

proposed method (GWSF) has some important advantages. Compared<br />

to MAP-NSF, GWSF has a much better small sample performance when<br />

the sources are closely spaced <strong>and</strong>/or when they are highly correlated.


1.3 Outl<strong>in</strong>e <strong>and</strong> Contributions 27<br />

The GWSF formulation also allows for a relatively simple <strong>and</strong> efficient<br />

implementation compared to both MAPprox <strong>and</strong> MAP-NSF. The chapter<br />

also conta<strong>in</strong>s a performance analysis of MAPprox which establishes<br />

the asymptotic optimality of the method that has been observed <strong>in</strong> simulations.<br />

The results presented <strong>in</strong> Chapter 3 have been submitted for possible<br />

publication <strong>in</strong> the follow<strong>in</strong>g form:<br />

Magnus Jansson, A. Lee Sw<strong>in</strong>dlehurst <strong>and</strong> Björn Ottersten,<br />

“Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models”, Submitted<br />

to IEEE Trans. on Signal Process<strong>in</strong>g, Jul. 1997.<br />

Chapter 4<br />

In many applications it is reasonable to assume that the signals that<br />

imp<strong>in</strong>ge on the array are <strong>in</strong>dependent of one another. In general, array<br />

process<strong>in</strong>g methods do not presuppose any correlation among the signals.<br />

In case it is a priori known that the signals are uncorrelated it<br />

seems judicious to use this <strong>in</strong>formation to improve the performance of<br />

the estimator. A method which makes use of the a priori <strong>in</strong>formation <strong>in</strong><br />

an optimal way is presented <strong>in</strong> Chapter 4. That is, it provides m<strong>in</strong>imum<br />

variance direction estimates. Natural alternative approaches are the use<br />

of the maximum likelihood or the covariance match<strong>in</strong>g method to impose<br />

the additional structure <strong>in</strong> the estimation problem. However, the proposed<br />

method appears to have computational advantages compared to<br />

the two aforementioned methods.<br />

A version of Chapter 4 has been submitted as<br />

Magnus Jansson, Bo Göransson <strong>and</strong> Björn Ottersten, “A <strong>Subspace</strong><br />

Method for Direction of Arrival Estimation of Uncorrelated Emitter<br />

Signals”, Submitted to IEEE Trans. on Signal Process<strong>in</strong>g, Aug.<br />

1997.<br />

Some of the results given <strong>in</strong> Chapter 4 have also appeared <strong>in</strong><br />

Bo Göransson, Magnus Jansson <strong>and</strong> Björn Ottersten, “Spatial <strong>and</strong><br />

Temporal Frequency Estimation of Uncorrelated Signals Us<strong>in</strong>g <strong>Subspace</strong><br />

Fitt<strong>in</strong>g”, In Proceed<strong>in</strong>gs of 8th IEEE Signal Process<strong>in</strong>g Workshop<br />

on Statistical Signal <strong>and</strong> Array Process<strong>in</strong>g, pp. 94-96, Corfu,<br />

Greece, June 1996.


28 1 Introduction<br />

Magnus Jansson, Bo Göransson <strong>and</strong> Björn Ottersten, “Analysis of<br />

a <strong>Subspace</strong>-Based Spatial Frequency Estimator”, In Proceed<strong>in</strong>gs of<br />

ICASSP’97, pp. 4001-4004, Munich, Germany, Apr. 1997.<br />

Chapter 5<br />

The chapter conta<strong>in</strong>s the l<strong>in</strong>ear regression <strong>in</strong>terpretation of the subspace<br />

system identification methods. The l<strong>in</strong>ear regression formulation is used<br />

to <strong>in</strong>terpret exist<strong>in</strong>g estimates of the observability matrix as be<strong>in</strong>g the<br />

m<strong>in</strong>imiz<strong>in</strong>g arguments of different cost functions. It is shown that the accuracy<br />

of the estimates strongly depends on the choice of a user specified<br />

parameter. An important observation is that the choice of this parameter<br />

is system dependent, which makes the task non-trivial. It is also<br />

shown that, asymptotically <strong>in</strong> the number of samples, the pole estimates<br />

are <strong>in</strong>variant to a specific weight<strong>in</strong>g matrix employed <strong>in</strong> the identification<br />

schemes. Numerical examples are given to further illustrate the effect of<br />

different choices of weight<strong>in</strong>g matrices.<br />

With m<strong>in</strong>or changes, this chapter conta<strong>in</strong>s the article,<br />

Magnus Jansson <strong>and</strong> Bo Wahlberg. “A L<strong>in</strong>ear Regression Approach<br />

to State-Space <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”, Signal Process<strong>in</strong>g,<br />

Special Issue on <strong>Subspace</strong> <strong>Methods</strong>, Part II: <strong>System</strong> <strong>Identification</strong>,<br />

Vol. 52, No. 2, pp. 103-129, July 1996.<br />

Parts of this material have also appeared as<br />

Bo Wahlberg <strong>and</strong> Magnus Jansson, “4SID L<strong>in</strong>ear Regression”, In<br />

Proc. IEEE 33rd Conf. on Decision <strong>and</strong> Control, pp. 2858-2863.<br />

Orl<strong>and</strong>o, USA, Dec. 1994.<br />

Magnus Jansson <strong>and</strong> Bo Wahlberg, “<strong>On</strong> Weight<strong>in</strong>g <strong>in</strong> State-Space<br />

<strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”, In Proc. European Control Conference,<br />

ECC’95, pp. 435-440, Rome, Italy, Sep. 1995.<br />

Magnus Jansson, “<strong>On</strong> Performance Analysis of <strong>Subspace</strong> <strong>Methods</strong><br />

<strong>in</strong> <strong>System</strong> <strong>Identification</strong> <strong>and</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g”, Licentiate<br />

Thesis (TRITA-REG-9503, ISSN 0347-1071), Automatic Control,<br />

Dept. of Signals, <strong>Sensor</strong>s <strong>and</strong> <strong>System</strong>s, <strong>KTH</strong>, Stockholm, June<br />

1995.


1.4 Topics for Future Research 29<br />

Chapter 6<br />

The consistency of the subspace estimate (or, equivalently, the extended<br />

observability matrix) is studied. Persistence of excitation conditions on<br />

the <strong>in</strong>put signal are given, which proves consistency for systems where<br />

the output is perturbed by white measurement noise. If the system <strong>in</strong>cludes<br />

process noise, then it is shown that the subspace methods under<br />

consideration may fail to give a consistent estimate even though the <strong>in</strong>put<br />

is persistently excit<strong>in</strong>g to any order. An explicit example is given when<br />

such a failure occurs. It is also shown that this problem is elim<strong>in</strong>ated if<br />

the <strong>in</strong>put is a white noise process (or a low order ARMA process.)<br />

A version of Chapter 6 has been submitted as<br />

Magnus Jansson <strong>and</strong> Bo Wahlberg, “<strong>On</strong> Consistency of <strong>Subspace</strong><br />

<strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong>”, Submitted to Automatica, Jun.<br />

1997.<br />

Parts of the material <strong>in</strong> Chapter 6 have been presented at the follow<strong>in</strong>g<br />

conferences:<br />

Magnus Jansson <strong>and</strong> Bo Wahlberg, “<strong>On</strong> Consistency of <strong>Subspace</strong><br />

<strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”, In Prepr<strong>in</strong>ts of the 13th IFAC<br />

World Congress, pp. 181-186, vol. I, San Francisco, California,<br />

USA, Jul. 1996.<br />

Magnus Jansson <strong>and</strong> Bo Wahlberg, “Counterexample to General<br />

Consistency of <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”, In Proc.<br />

11th IFAC Symposium on <strong>System</strong> <strong>Identification</strong>, pp. 1677-1682,<br />

Fukuoka, Japan, Jul. 1997.<br />

1.4 Topics for Future Research<br />

Forward-backward (FB) averag<strong>in</strong>g is a topic that has received quite large<br />

attention <strong>in</strong> the array process<strong>in</strong>g literature. Despite this, it seems not<br />

really known when FB averag<strong>in</strong>g should or should not be used. For<br />

example, should FB averag<strong>in</strong>g always be used when employ<strong>in</strong>g the MU-<br />

SIC or ESPRIT estimators? It has been shown that, <strong>in</strong> the case of<br />

uncorrelated emitter signals, the asymptotic estimation error variance is<br />

not improved, but the resolution threshold is decreased [KP89]. There<br />

are also some results for the case of correlated emitter signals, but they<br />

seem to be less explicit [RH93, PK89]. If the true covariance matrix is


30 1 Introduction<br />

centro-Hermitian, then it is relatively straightforward to show that the<br />

FB sample covariance <strong>in</strong> (2.16) is the correspond<strong>in</strong>g (structured) maximum<br />

likelihood (ML) estimate [Qua84]. Recall that the st<strong>and</strong>ard sample<br />

covariance is the unstructured ML estimate. It is then reasonable to expect<br />

that the FB sample estimate is a better estimate than the st<strong>and</strong>ard<br />

sample covariance with lower estimation error variance. This does not<br />

directly follow from the ML properties, but the author has proved this explicitly<br />

<strong>and</strong> a report is <strong>in</strong> progress. In view of this, the result of Chapter 2<br />

becomes even more surpris<strong>in</strong>g. Although the FB covariance estimate is<br />

better, this does not necessarily lead to better direction estimates, as<br />

proved <strong>in</strong> Chapter 2 (see also [KP89]).<br />

Similar to the development <strong>in</strong> [VS94b], one can extend the derivation<br />

of GWSF <strong>in</strong> Chapter 3 to also <strong>in</strong>clude particular errors <strong>in</strong> the noise<br />

model. In [VS94b], a statistical model that allows for the compensation<br />

of first order errors <strong>in</strong> the noise covariance is given. Us<strong>in</strong>g this model,<br />

an additional term <strong>in</strong> the weight<strong>in</strong>g matrix for GWSF would result. It<br />

would also be <strong>in</strong>terest<strong>in</strong>g to evaluate GWSF on real array data.<br />

The method (DEUCE) for direction estimation of uncorrelated emitter<br />

signals that is derived <strong>in</strong> Chapter 4 admittedly is quite cumbersome<br />

to implement. It would be <strong>in</strong>terest<strong>in</strong>g to <strong>in</strong>vestigate if there exists a<br />

simpler estimator preserv<strong>in</strong>g the performance of DEUCE. Some prelim<strong>in</strong>ary<br />

results <strong>in</strong>dicate that this really is conceivable. Indeed, consider<br />

the case of uncorrelated emitter signals imp<strong>in</strong>g<strong>in</strong>g on a uniform l<strong>in</strong>ear<br />

array. The covariance of the array output vector is then a Toeplitz matrix.<br />

This also implies that the matrix employed <strong>in</strong> DEUCE (see (4.3))<br />

ideally is Toeplitz. The idea is to first perform a simple averag<strong>in</strong>g along<br />

the diagonals of the sample estimate of the matrix <strong>in</strong> (4.3) to obta<strong>in</strong> a<br />

sufficient statistic of lower dimension. Compared to DEUCE, this will<br />

lead to an estimator which is computationally less complicated. The results<br />

of Chapter 4 may also f<strong>in</strong>d applications <strong>in</strong> “structured covariance<br />

estimation”. S<strong>in</strong>ce DEUCE provides asymptotically maximum likelihood<br />

estimates of the parameters that parameterizes the Toeplitz covariance<br />

matrix, a structured covariance estimate can be constructed. More detailed<br />

results <strong>in</strong> this direction are expected <strong>in</strong> the near future. The ideas<br />

<strong>in</strong> Chapter 4 may also be useful to apply <strong>in</strong> other similar estimation<br />

problems such as, for example, bl<strong>in</strong>d channel identification.<br />

The analysis of the state-space subspace system identification (4SID)<br />

methods <strong>in</strong> the thesis concerns the consistency of the methods <strong>and</strong> the<br />

asymptotic variance of the pole estimates. So far we have not been able<br />

to extend the analysis to obta<strong>in</strong> asymptotic variance expressions for the


1.4 Topics for Future Research 31<br />

estimated transfer function. We have only derived the asymptotic variance<br />

for the estimates of the A <strong>and</strong> C matrices obta<strong>in</strong>ed by the shift<br />

<strong>in</strong>variance method. Given estimates of A <strong>and</strong> C, the B <strong>and</strong> D matrices<br />

are often estimated by l<strong>in</strong>ear regression <strong>in</strong> the <strong>in</strong>put output relation. The<br />

estimates of A <strong>and</strong> C appear <strong>in</strong> the regression equations for estimat<strong>in</strong>g<br />

B <strong>and</strong> D, which makes it very complicated to compute the asymptotic<br />

variance for the estimates. Deriv<strong>in</strong>g asymptotic variance expressions for<br />

the estimated transfer function is an important topic for future research.<br />

Such variance expressions can be used to show the relation between the<br />

subspace methods <strong>and</strong> the prediction error methods. They can also be<br />

used to estimate the quality of the identified model. There exist several<br />

alternatives for estimat<strong>in</strong>g the system matrices. A topic for research is<br />

to evaluate their relative performance. Another question that can be addressed<br />

is how the future <strong>in</strong>puts really should be treated <strong>in</strong> the subspace<br />

estimation step. In the elaborations <strong>in</strong> this chapter <strong>and</strong> <strong>in</strong> Chapter 5,<br />

the matrix Φα is estimated by l<strong>in</strong>ear regression without impos<strong>in</strong>g any<br />

structure on the estimate. Recall that Φα is a lower triangular Toeplitz<br />

matrix. The simulation results <strong>in</strong> [PSD96] suggest that the performance<br />

can be improved by us<strong>in</strong>g a structured estimate of Φα. As po<strong>in</strong>ted out<br />

<strong>in</strong> the thesis, the subspace methods for system identification are quite<br />

sensitive to the <strong>in</strong>put excitation. It would be <strong>in</strong>terest<strong>in</strong>g to f<strong>in</strong>d ways to<br />

decrease this sensitivity. Us<strong>in</strong>g more of the structure <strong>in</strong> the problem is<br />

likely the best approach to achieve this goal.


Part I<br />

<strong>Sensor</strong> Array Signal<br />

Process<strong>in</strong>g


Chapter 2<br />

<strong>On</strong> Forward-Backward<br />

MODE for Array Signal<br />

Process<strong>in</strong>g<br />

In this chapter we apply the MODE (method of direction estimation)<br />

pr<strong>in</strong>ciple to the forward-backward (FB) covariance of the output vector<br />

of a sensor array to obta<strong>in</strong> what we call the FB-MODE procedure. The<br />

derivation of FB-MODE is an <strong>in</strong>terest<strong>in</strong>g exercise <strong>in</strong> matrix analysis, the<br />

outcome of which was somewhat unexpected: FB-MODE simply consists<br />

of apply<strong>in</strong>g the st<strong>and</strong>ard MODE approach to the eigenelements of<br />

the FB sample covariance matrix. By us<strong>in</strong>g an asymptotic expansion<br />

technique we also establish the surpris<strong>in</strong>g result that FB-MODE is outperformed,<br />

from a statistical st<strong>and</strong>po<strong>in</strong>t, by the st<strong>and</strong>ard MODE applied<br />

to the forward-only sample covariance (F-MODE). We believe this to be<br />

an important result that shows that the FB approach, which proved quite<br />

useful for improv<strong>in</strong>g the performance of many suboptimal array process<strong>in</strong>g<br />

methods, should not be used with a statistically optimal method such<br />

as F-MODE.<br />

2.1 Introduction<br />

MODE is a method of direction estimation by means of a sensor array,<br />

which is known to be statistically efficient <strong>in</strong> cases when either the num-


36 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

ber of data samples (N) or the signal-to-noise ratio (SNR) is sufficiently<br />

large [VO91, SS90a, SN90]. For uniform <strong>and</strong> l<strong>in</strong>ear arrays (ULAs) the<br />

implementation of MODE requires only st<strong>and</strong>ard matrix operations (such<br />

as an eigendecomposition) <strong>and</strong> is thus very straightforward. In fact <strong>in</strong><br />

the ULA case MODE is a clear c<strong>and</strong>idate for the best possible (i.e., computationally<br />

simple <strong>and</strong> statistically efficient) array process<strong>in</strong>g method<br />

[KV96].<br />

We stress that the aforementioned statistical efficiency of MODE<br />

holds asymptotically (<strong>in</strong> N or SNR). In short-sample or low-SNR cases<br />

other methods may provide better performance. The forward-backward<br />

(FB) approach is a well-known methodology that has been successfully<br />

used to enhance the performance of a number of array signal process<strong>in</strong>g<br />

algorithms [RH93, KP89, Pil89]. Recently this approach has been used<br />

<strong>in</strong> conjunction with MODE probably <strong>in</strong> an attempt to obta<strong>in</strong> enhanced<br />

(f<strong>in</strong>ite sample or SNR) performance [Haa97b, Haa97a]. More exactly <strong>in</strong><br />

the cited publications the st<strong>and</strong>ard MODE approach was applied to the<br />

eigenelements of the FB sample covariance matrix.<br />

The first problem we deal with <strong>in</strong> this chapter concerns the formal<br />

application of the MODE pr<strong>in</strong>ciple to the FB sample covariance (see also<br />

[JO95]). After an exercise <strong>in</strong> matrix analysis we prove the somewhat<br />

unexpected result that FB-MODE should <strong>in</strong>deed consist of apply<strong>in</strong>g the<br />

st<strong>and</strong>ard MODE to the eigenelements of the FB sample covariance matrix.<br />

Then we go on to establish the asymptotic statistical performance<br />

of FB-MODE. Because the st<strong>and</strong>ard F-MODE is asymptotically statistically<br />

efficient (<strong>in</strong> the sense that it achieves the Cramér-Rao bound (CRB))<br />

[VO91, SS90a, SN90], we cannot expect FB-MODE to perform better <strong>in</strong><br />

asymptotic regimes. What one might expect is that FB-MODE outperforms<br />

F-MODE <strong>in</strong> non-asymptotic regimes <strong>and</strong> that the two methods<br />

have the same asymptotic performance. We show that the conjecture on<br />

identical asymptotic performances, even though quite natural, is false:<br />

FB-MODE’s asymptotic performance is usually <strong>in</strong>ferior to that of F-<br />

MODE. Regard<strong>in</strong>g the comparison of the two methods <strong>in</strong> non-asymptotic<br />

regimes, because the f<strong>in</strong>ite-sample/SNR analysis is <strong>in</strong>tractable, we resort<br />

to Monte-Carlo simulations to show that F-MODE typically outperforms<br />

FB-MODE also <strong>in</strong> such cases.<br />

In conclusion, the FB approach – which has been successfully employed<br />

to enhance the performance of several array signal process<strong>in</strong>g algorithms<br />

– cannot be recommended for use with MODE. This result also<br />

shows that the related subspace fitt<strong>in</strong>g or asymptotic maximum likeli-


2.2 Data Model 37<br />

hood approaches of [VO91] <strong>and</strong> [SS90a], which produced the statistically<br />

efficient MODE <strong>in</strong> the forward-only case, may fail to yield statistically<br />

optimal estimators <strong>in</strong> other cases.<br />

2.2 Data Model<br />

Let x(t) ∈ C m×1 denote the output vector of a ULA that comprises m elements.<br />

The variable t denotes the sampl<strong>in</strong>g po<strong>in</strong>t; hence, t = 1, 2, ..., N,<br />

where N is the number of available samples. Under the assumption that<br />

the signals imp<strong>in</strong>g<strong>in</strong>g on the array are narrowb<strong>and</strong> with the same center<br />

frequency, the array output can be described by the equation (see, e.g.,<br />

[SM97])<br />

x(t) = As(t) + n(t). (2.1)<br />

Here, s(t) ∈ Cd×1 is the vector of the d signals (translated to baseb<strong>and</strong>)<br />

that imp<strong>in</strong>ge on the array, n(t) ∈ Cm×1 is a noise term, <strong>and</strong> A is the<br />

V<strong>and</strong>ermonde matrix<br />

⎡<br />

⎤<br />

1 ... 1<br />

⎢<br />

A = ⎢<br />

⎣<br />

e iω1 e iωd<br />

.<br />

.<br />

e i(m−1)ω1 ... e i(m−1)ωd<br />

where {ωk} d k=1 are the so-called spatial frequencies (ωk = φk def<strong>in</strong>ed <strong>in</strong><br />

(1.15)). We assume that<br />

<strong>and</strong> that m > d, which implies that<br />

ωk �= ωj for k �= j<br />

rank(A) = d.<br />

The follow<strong>in</strong>g assumption, frequently used <strong>in</strong> the array process<strong>in</strong>g literature,<br />

is also made here: the signal vector <strong>and</strong> the noise are temporally<br />

white, zero-mean, circular Gaussian r<strong>and</strong>om variables that are <strong>in</strong>dependent<br />

of one another; additionally, the noise is spatially white <strong>and</strong> has the<br />

same power <strong>in</strong> all sensors. Mathematically, this assumption implies that<br />

⎥<br />

⎦ ,<br />

E{s(t)s ∗ (s)} = P0δt,s , (2.2)<br />

E{s(t)s T (s)} = 0, (2.3)<br />

E{n(t)n ∗ (s)} = σ 2 Iδt,s , (2.4)<br />

E{n(t)n T (s)} = 0, (2.5)


38 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

where E is the statistical expectation operator, P0 denotes the signal<br />

covariance matrix, σ 2 is the noise power <strong>in</strong> each element of the array, δt,s<br />

is the Kronecker delta, <strong>and</strong> the superscripts ∗ <strong>and</strong> T denote the conjugate<br />

transpose <strong>and</strong> the transpose, respectively. We also assume that<br />

rank(P0) = d,<br />

which is only done to simplify the notation (<strong>in</strong>deed, most of the results<br />

presented <strong>in</strong> what follows carry over to the case when rank(P0) < d).<br />

A pr<strong>in</strong>cipal goal of array process<strong>in</strong>g consists of estimat<strong>in</strong>g {ωk} d k=1<br />

from {x(t)} N t=1, without assum<strong>in</strong>g knowledge of the <strong>in</strong>com<strong>in</strong>g signals<br />

{s(t)} or (of course) the noise {n(t)}. This is precisely the problem<br />

we will consider <strong>in</strong> this chapter. <strong>On</strong>ce {ωk} are estimated, the directions<br />

(also called angles-of-arrival) of the d signals can be easily obta<strong>in</strong>ed<br />

[VO91, SS90a, SN90, RH93, KP89, Pil89, SM97].<br />

2.3 Prelim<strong>in</strong>ary Results<br />

In this section, we present most of the mathematical results that are<br />

used <strong>in</strong> the next sections to derive <strong>and</strong> analyze FB-MODE. Some of<br />

these results are known, yet we provide simple proofs for all results, for<br />

completeness.<br />

It follows from (2.1), (2.2)-(2.5) that the array output x(t) is a temporally<br />

white, zero-mean, circular Gaussian r<strong>and</strong>om variable with the<br />

covariance properties<br />

E{x(t)x ∗ (t)} = AP0A ∗ + σ 2 I � R0, (2.6)<br />

E{x(t)x T (t)} = 0.<br />

Let J denote the reversal matrix of dimension m × m,<br />

⎡<br />

0<br />

⎢<br />

⎢0<br />

J = ⎢<br />

⎣ .<br />

...<br />

·<br />

0<br />

1<br />

1<br />

0<br />

··<br />

1 ... 0<br />

⎤<br />

⎥<br />

.<br />

⎥<br />

. ⎦<br />

0<br />

.<br />

Similarly ˜ J will denote the (m−d)×(m−d) reversal matrix. It is readily<br />

checked that<br />

JA c = AΦ, (2.7)


2.3 Prelim<strong>in</strong>ary Results 39<br />

where the superscript c st<strong>and</strong>s for the complex conjugate, <strong>and</strong><br />

⎡<br />

e<br />

⎢<br />

Φ = ⎣<br />

−i(m−1)ω1 . ..<br />

0<br />

⎤<br />

⎥<br />

⎦.<br />

Hence, we have<br />

<strong>and</strong>, accord<strong>in</strong>gly,<br />

Let<br />

where<br />

0 e −i(m−1)ωd<br />

Jx c (t) = AΦs c (t) + Jn c (t) (2.8)<br />

JR c 0J = AΦP c 0Φ ∗ A ∗ + σ 2 I.<br />

R = 1�<br />

R0 + JR<br />

2<br />

c 0J � = APA ∗ + σ 2 I, (2.9)<br />

P = 1�<br />

P0 + ΦP<br />

2<br />

c 0Φ ∗� .<br />

In the equations above, R0 is the so-called “forward” covariance matrix,<br />

whereas R is called the “forward-backward” covariance. The reason for<br />

this term<strong>in</strong>ology is that, while the <strong>in</strong>dices of the elements of x(t) run<br />

forward (1, 2, ..., m), those of the elements of Jxc (t) run backward<br />

(m, (m − 1), ..., 1). It will be useful for what follows to observe that<br />

R0 <strong>and</strong> R have the same structure: the only difference between the<br />

expressions <strong>in</strong> (2.6) <strong>and</strong> (2.9) for these two matrices is that P0 <strong>in</strong> R0 is<br />

replaced by P <strong>in</strong> R.<br />

Next, let us <strong>in</strong>troduce the Toeplitz matrix<br />

⎡<br />

⎤<br />

B ∗ =<br />

b0 ... bd 0<br />

. .. . ..<br />

⎥<br />

⎦ (m − d) × m, (2.10)<br />

⎢<br />

⎣<br />

0 b0 ... bd<br />

where the complex-valued coefficients {bk} are def<strong>in</strong>ed through<br />

b0 + b1z + · · · + bdz d = bd<br />

d�<br />

k=1<br />

�<br />

z − e iωk<br />

�<br />

. (2.11)


40 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

Because the polynomial <strong>in</strong> (2.11) has all the zeros on the unit circle, it<br />

can be written such that its coefficients satisfy the so-called “conjugate<br />

symmetry constra<strong>in</strong>t” (see, e.g., [SS90b]):<br />

bk = b c d−k (for k = 0, 1, ..., d). (2.12)<br />

It follows easily from (2.10) <strong>and</strong> (2.11) that<br />

B ∗ A = 0, (2.13)<br />

which, along with the fact that rank(A) = d <strong>and</strong> rank(B) = m − d,<br />

implies that<br />

R(B) = N(A ∗ ). (2.14)<br />

Hereafter, R(·) <strong>and</strong> N(·) denote the range, respectively, the null space<br />

associated with the matrix <strong>in</strong> question.<br />

To end up these notational preparations, let<br />

R = � Es<br />

����<br />

d<br />

�<br />

En<br />

����<br />

m−d<br />

� Λs 0<br />

0 σ2I � � �<br />

∗ Es E ∗ n<br />

(2.15)<br />

denote the eigenvalue decomposition (EVD) of the matrix R, where the<br />

matrix of eigenvectors � �<br />

Es En is unitary,<br />

<strong>and</strong> where<br />

E ∗ sEs = I; E ∗ nEn = I; E ∗ sEn = 0,<br />

⎡<br />

λ1<br />

⎢<br />

Λs = ⎣<br />

. ..<br />

⎤<br />

0<br />

⎥<br />

⎦.<br />

0 λd<br />

Here, {λk} are the d largest eigenvalues of R arranged <strong>in</strong> a decreas<strong>in</strong>g<br />

order: λ1 > λ2 > · · · > λd. We assume that λk �= λp (for k �= p; k,p =<br />

1, 2, ..., d), which is generically true. Note from (2.15) that the smallest<br />

(m −d) eigenvalues of R are identical <strong>and</strong> equal to σ2 , a fact that follows<br />

easily from (2.9). The EVD of R0 is similarly def<strong>in</strong>ed, but with Es, En,<br />

<strong>and</strong> Λs replaced by Es0 , En0 , <strong>and</strong> Λs0 .<br />

The sample covariance matrices correspond<strong>in</strong>g to R0 <strong>and</strong> R are def<strong>in</strong>ed<br />

as<br />

ˆR0 = 1<br />

N<br />

N�<br />

x(t)x ∗ (t),<br />

t=1


2.3 Prelim<strong>in</strong>ary Results 41<br />

<strong>and</strong><br />

ˆR = 1<br />

�<br />

ˆR0 + J<br />

2<br />

ˆ R c �<br />

0J . (2.16)<br />

It is well known that, under the assumptions made, ˆ R0 <strong>and</strong> ˆ R converge<br />

(with probability one <strong>and</strong> <strong>in</strong> mean square) to R0 <strong>and</strong>, respectively, R,<br />

as N tends to <strong>in</strong>f<strong>in</strong>ity. Hence ˆ R0 <strong>and</strong> ˆ R are consistent estimates of<br />

the correspond<strong>in</strong>g theoretical covariance matrices [SM97, And84, SS89].<br />

In fact ˆ R0 can be shown to be the (unstructured) maximum likelihood<br />

estimate (MLE) of R0 [And84, SS89]. By the <strong>in</strong>variance pr<strong>in</strong>ciple of ML<br />

estimation, it then follows that ˆ R is the MLE of R. We let<br />

ˆR = � Ês<br />

����<br />

d<br />

�<br />

Ên<br />

����<br />

m−d<br />

� Λs<br />

ˆ 0<br />

0 Λn<br />

ˆ<br />

denote the EVD of ˆ R, with the eigenvalues <strong>in</strong><br />

� �<br />

ˆΛs 0<br />

0 ˆ Λn<br />

� � �<br />

∗ Ês arranged <strong>in</strong> a decreas<strong>in</strong>g order. The EVD of ˆ R0 is similarly def<strong>in</strong>ed, but<br />

with a subscript 0 attached to Ês, etc., to dist<strong>in</strong>guish it from the EVD<br />

of ˆ R.<br />

2.3.1 Results on A <strong>and</strong> B<br />

R1. The matrix B satisfies<br />

<strong>and</strong><br />

Ê ∗ n<br />

JB c = B ˜ J (2.17)<br />

˜J � B ∗ B � ˜ J = � B ∗ B � c. (2.18)<br />

Proof. Recall that the polynomial coefficients {bk} satisfy the conjugate<br />

symmetry constra<strong>in</strong>t (2.14). Hence<br />

JB c ⎡<br />

0 bd<br />

⎢ ·<br />

= ⎢<br />

⎣<br />

··<br />

.<br />

bd b0<br />

. · ··<br />

⎤ ⎡<br />

0 b<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎥ ⎢<br />

⎥ = ⎢<br />

⎥ ⎢<br />

⎦ ⎣<br />

b0 0<br />

c 0<br />

· ··<br />

.<br />

bc 0 bc d<br />

. · ··<br />

bc ⎤<br />

⎥ = B<br />

⎥<br />

⎦<br />

d 0<br />

˜ J,


42 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

which proves (2.17). Equation (2.18) is a simple corollary of (2.17).<br />

R2. Let<br />

a(ω) = � 1 e iω ... e i(m−1)ω� T<br />

denote a generic column of the matrix A, <strong>and</strong> let<br />

Then<br />

d(ω) � da(ω)<br />

dω = i� 0 e iω 2e i2ω ... (m − 1)e i(m−1)ω� T .<br />

B ∗ Jd c (ω) = e −i(m−1)ω B ∗ d(ω) for ω = ωj (j = 1, ..., d). (2.19)<br />

Proof. From (2.7), we have<br />

Ja c (ω) = e −i(m−1)ω a(ω).<br />

Differentiat<strong>in</strong>g both sides with respect to ω gives<br />

Jd c (ω) = e −i(m−1)ω d(ω) − i(m − 1)e −i(m−1)ω a(ω). (2.20)<br />

In view of (2.13), B ∗ a(ω) = 0 at ω = ωj (j = 1, ..., d), <strong>and</strong> hence (2.19)<br />

is obta<strong>in</strong>ed by pre-multiply<strong>in</strong>g (2.20) by B ∗ .<br />

2.3.2 Results on R<br />

R3. Let e denote a generic eigenvector of R, associated with a dist<strong>in</strong>ct<br />

eigenvalue λ. Then<br />

Je c = e iψ e for some ψ ∈ (−π,π]. (2.21)<br />

In particular, Equation (2.21) implies that<br />

JE c s = EsΓ,<br />

where Es is as def<strong>in</strong>ed <strong>in</strong> (2.15), <strong>and</strong><br />

⎡<br />

e<br />

⎢<br />

Γ = ⎣<br />

iψ1 . ..<br />

0<br />

0 eiψd ⎤<br />

⎥<br />

⎦ ; ψk ∈ (−π,π] (k = 1, ..., d).


2.3 Prelim<strong>in</strong>ary Results 43<br />

Proof. Because λ is real-valued <strong>and</strong> JR c J = R, we have the follow<strong>in</strong>g<br />

equivalences:<br />

Re = λe ⇐⇒ JR c Je = λe ⇐⇒ R � Je c� = λ � Je c� .<br />

Hence, Je c is also an eigenvector associated with λ. Then it must be<br />

related to e as shown <strong>in</strong> (2.21), as λ is dist<strong>in</strong>ct by assumption.<br />

The above result is a consequence of the centro-Hermitian property of R.<br />

Hence, a similar result holds for ˆ R as well.<br />

2.3.3 Results on P <strong>and</strong> R<br />

R4. The condition number of P<br />

cond(P) � λmax(P)<br />

λm<strong>in</strong>(P)<br />

is never greater than that of P0. Here, λmax(P) <strong>and</strong> λm<strong>in</strong>(P) denote the<br />

largest <strong>and</strong> smallest eigenvalues of P, respectively. A similar result holds<br />

for R <strong>and</strong> R0.<br />

Proof. For a Hermitian matrix it holds that<br />

λmax(P) = max<br />

x<br />

λm<strong>in</strong>(P) = m<strong>in</strong><br />

x<br />

x∗Px x∗x ,<br />

x ∗ Px<br />

x ∗ x .<br />

Hence, mak<strong>in</strong>g use of the fact that P0 <strong>and</strong> P c 0 have the same eigenvalues,<br />

we obta<strong>in</strong><br />

λmax(P0 + ΦP c 0Φ ∗ ) = max<br />

x<br />

≤ max<br />

x<br />

λm<strong>in</strong>(P0 + ΦP c 0Φ ∗ ) = m<strong>in</strong><br />

x<br />

≥ m<strong>in</strong><br />

x<br />

x ∗ P0x + x ∗ ΦP c 0Φ ∗ x<br />

x∗P0x x∗x x ∗ x<br />

+ max<br />

y<br />

y∗Pc 0y<br />

y∗y x ∗ P0x + x ∗ ΦP c 0Φ ∗ x<br />

x ∗ P0x<br />

x ∗ x<br />

from which the asserted result follows.<br />

x ∗ x<br />

+ m<strong>in</strong><br />

y<br />

y ∗ P c 0y<br />

y ∗ y<br />

= 2λmax(P0),<br />

= 2λm<strong>in</strong>(P0),


44 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

Based on the above result we can say that the signals are less correlated<br />

<strong>in</strong> R than <strong>in</strong> R0. In particular, P may be nons<strong>in</strong>gular even though P0<br />

is s<strong>in</strong>gular. This observation, along with the fact that the performance<br />

of many array process<strong>in</strong>g algorithms deteriorates as the signal correlation<br />

<strong>in</strong>creases, lies at the basis of the belief that the FB approach should<br />

outperform the F-only approach. This would really be true if one had<br />

two sets of <strong>in</strong>dependent data vectors with covariance matrices R0 <strong>and</strong>,<br />

respectively, R. However, only R0 <strong>and</strong> ˆ R0 correspond to such a data<br />

set, whereas R <strong>and</strong> ˆ R do not. In fact, the statistical properties of ˆ R<br />

are quite different from those of ˆ R0, <strong>and</strong> this may very well counterbalance<br />

the desirable property that P <strong>in</strong> R is better conditioned than P0 <strong>in</strong><br />

R0. Consequently, the FB approach may not lead to any performance<br />

enhancement. This discussion provides an <strong>in</strong>tuitive motivation for the<br />

superiority of F-MODE over FB-MODE, which will be shown later <strong>in</strong><br />

the chapter.<br />

2.3.4 Results on R0 <strong>and</strong> R<br />

The follow<strong>in</strong>g results are valid for both R0 <strong>and</strong> R. However, for the sake<br />

of simplicity, we state them only for R.<br />

R5. The columns of Es, {ek} d k=1 , belong to the range of A:<br />

More exactly,<br />

where<br />

ek ∈ R(A). (2.22)<br />

Es = A � PA ∗ Es ˜ Λ −1� ,<br />

˜Λ = Λs − σ 2 I. (2.23)<br />

Proof. A simple calculation shows that<br />

REs = EsΛs = APA ∗ Es + σ 2 Es ⇒ Es ˜ Λ = A � PA ∗ �<br />

Es ,<br />

from which (2.23) follows.<br />

R6. Under the assumption made that rank(P) = d,<br />

E ∗ nA = 0 ⇐⇒ R(En) = N(A ∗ ).


2.3 Prelim<strong>in</strong>ary Results 45<br />

Proof. This result follows at once from R5 <strong>and</strong> the fact that E ∗ nEs =<br />

0.<br />

R7. The follow<strong>in</strong>g equality holds:<br />

Proof. From<br />

PA ∗ R −1 AP = � A ∗ A � −1 A ∗ Es ˜ Λ 2 Λ −1<br />

s E ∗ sA � A ∗ A � −1 . (2.24)<br />

R = APA ∗ + σ 2 I = EsΛsE ∗ s + σ 2 EnE ∗ n = Es ˜ ΛE ∗ s + σ 2 I<br />

we obta<strong>in</strong><br />

P = � A ∗ A � −1 A ∗ Es ˜ ΛE ∗ sA � A ∗ A � −1 .<br />

Insert<strong>in</strong>g the above expression for P <strong>in</strong> the left-h<strong>and</strong> side of (2.24), <strong>and</strong><br />

mak<strong>in</strong>g use of R5 <strong>and</strong> R6, yields<br />

� A ∗ A � −1A ∗ Es ˜ ΛE ∗ sA � A ∗ A � −1 A ∗<br />

× � EsΛ −1<br />

s E ∗ s + 1<br />

σ2EnE ∗� � � ∗ −1A∗ n A A A Es ˜ ΛE ∗ sA � A ∗ A �−1 which we had to prove.<br />

= � A ∗ A � −1 A ∗ Es ˜ Λ 2 Λ −1<br />

s E ∗ sA � A ∗ A � −1 ,<br />

R8. Let ≃ denote an asymptotically valid equality, that is, an equality<br />

<strong>in</strong> which only the asymptotically dom<strong>in</strong>ant terms (<strong>in</strong> the mean square<br />

sense) have been reta<strong>in</strong>ed. Then,<br />

Proof. From (2.3), we have<br />

B ∗ Ês ≃ B ∗ ˆ REs ˜ Λ −1 . (2.25)<br />

B ∗ ˆ RÊs = B ∗ Ês ˆ Λs,<br />

B ∗ ˆ REs + B ∗ R Ês = B ∗ Ês ˆ Λs,<br />

B ∗ ˆ REs = B ∗ Ês( ˆ Λs − σ 2 I),<br />

B ∗ Ês ≃ B ∗ ˆ REs ˜ Λ −1 .<br />

The second <strong>and</strong> the third step follow from the facts that B ∗ REs = 0<br />

<strong>and</strong> B ∗ R = σ 2 B ∗ . S<strong>in</strong>ce Ês → Es <strong>and</strong> ˆ Λs → Λs as N → ∞, <strong>and</strong> s<strong>in</strong>ce<br />

B ∗ Es = 0, the first order approximation <strong>in</strong> the last step results.


46 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

2.3.5 Results on ABC Estimation<br />

Let z be a complex-valued r<strong>and</strong>om vector that depends on an unknown<br />

real-valued parameter vector ω, <strong>and</strong> whose mean <strong>and</strong> covariance matrix<br />

tend to zero (typically as 1/N), as N <strong>in</strong>creases. Then an asymptotically<br />

best (<strong>in</strong> the mean square sense) consistent (ABC) estimate of the parameter<br />

vector ω, based on z, is given by the global m<strong>in</strong>imizer of the function<br />

[SS89]<br />

where<br />

<strong>and</strong> where<br />

µ =<br />

µ T C −1<br />

µ µ, (2.26)<br />

� �<br />

Re(z)<br />

,<br />

Im(z)<br />

Cµ = E{µµ T }<br />

is assumed to be <strong>in</strong>vertible. The ABC estimate above is the extension of<br />

the Markov l<strong>in</strong>ear estimator to a class of nonl<strong>in</strong>ear regression equations.<br />

The m<strong>in</strong>imization of (2.26) can also be given an approximate (asymptotically<br />

valid) ML <strong>in</strong>terpretation (see, e.g., [SS90a]). The estimat<strong>in</strong>g<br />

criterion (2.26) also results <strong>in</strong> certa<strong>in</strong> estimation problems, the solutions<br />

of which are derived by weighted subspace fitt<strong>in</strong>g techniques [VO91].<br />

In most applications we derive E{zz∗ } <strong>and</strong> E{zzT }, but not Cµ directly.<br />

It is hence more convenient to work with the vector<br />

�<br />

z<br />

ρ =<br />

zc �<br />

(2.27)<br />

rather than with µ. The next result obta<strong>in</strong>s the function of ρ whose<br />

m<strong>in</strong>imization is equivalent to that of (2.26).<br />

R9. Let Cρ = E{ρρ ∗ }. Then,<br />

Proof. We have<br />

where<br />

µ T C −1<br />

µ µ = ρ ∗ C −1<br />

ρ ρ. (2.28)<br />

L = 1<br />

2<br />

µ = Lρ,<br />

� �<br />

I I<br />

.<br />

−iI iI


2.4 Brief Review of F-MODE 47<br />

As L is nons<strong>in</strong>gular,<br />

which concludes the proof.<br />

µ T C −1<br />

µ µ = ρ ∗ L ∗� LCρL ∗� −1 Lρ = ρ ∗ C −1<br />

ρ ρ,<br />

2.4 Brief Review of F-MODE<br />

Consider the ABC estimator (see Section 2.3) obta<strong>in</strong>ed from<br />

z = vec(B ∗ Ês0 ), (2.29)<br />

where vec denotes column-wise vectorization. The ABC criterion correspond<strong>in</strong>g<br />

to z above can be shown to be<br />

Tr{ � B ∗ B � −1 B ∗ Ês0<br />

˜Λ 2<br />

0Λs −1<br />

0 Ê∗s0 B}, (2.30)<br />

where Tr denotes the trace operator. The F-MODE estimate of ω �<br />

[ω1 ... ωd] T is obta<strong>in</strong>ed as an asymptotically valid approximation to the<br />

m<strong>in</strong>imizer of (2.30), <strong>in</strong> the follow<strong>in</strong>g steps [SS90a]:<br />

1. Obta<strong>in</strong> Ês0 , ˆ Λs0 , <strong>and</strong> ˆ˜ Λ0 from the EVD of ˆ R0. Derive an <strong>in</strong>itial<br />

estimate of {bk} by m<strong>in</strong>imiz<strong>in</strong>g the quadratic function<br />

Tr{B ∗ ˆ˜Λ Ês0<br />

2<br />

Ê ∗ s0B}. 0 ˆ Λ −1<br />

s0<br />

2. Let � ˆ B ∗ ˆ B � be the estimate of � B ∗ B � made from the previously<br />

obta<strong>in</strong>ed estimates of {bk}. Derive ref<strong>in</strong>ed estimates of {bk} by<br />

m<strong>in</strong>imiz<strong>in</strong>g the quadratic function<br />

Tr{ � �<br />

Bˆ ∗<br />

Bˆ −1B∗ ˆ˜Λ Ês0<br />

2<br />

0 ˆ Λ −1<br />

s0<br />

Ê ∗ s0 B}.<br />

Possibly repeat this operation once more, us<strong>in</strong>g the latest estimate<br />

of {bk} to obta<strong>in</strong> � ˆ B ∗ ˆ B � . F<strong>in</strong>ally, derive estimates of {ωk} by<br />

root<strong>in</strong>g the polynomial � d<br />

k=0 ˆ bkz k (see (2.11)).<br />

The m<strong>in</strong>imization <strong>in</strong> steps 1 <strong>and</strong> 2 above should be conducted under an<br />

appropriate constra<strong>in</strong>t on {bk} (to prevent the trivial solution {bk = 0}).<br />

Typically, one uses<br />

d�<br />

|bk| 2 = 1.<br />

k=0


48 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

Additionally the {bk} should satisfy (2.12).<br />

Asymptotically (as N <strong>in</strong>creases) the F-MODE estimate is Gaussian<br />

distributed with mean equal to the true parameter vector ω <strong>and</strong> covariance<br />

matrix,<br />

CF = σ2<br />

�<br />

Re<br />

2N<br />

� U ⊙ Q T� 0<br />

�−1 , (2.31)<br />

where ⊙ denotes the Hadamard product (i.e., element-wise multiplication),<br />

<strong>and</strong> where<br />

<strong>and</strong><br />

U = D ∗ Π ⊥ AD, (2.32)<br />

Q0 = P0A ∗ R −1<br />

0 AP0, (2.33)<br />

D = � d(ω1) ... d(ωd) � ; d(ω) = da(ω)<br />

dω<br />

Π ⊥ A = I − A � A ∗ A � −1 A ∗<br />

is the orthogonal projector onto N(A ∗ ). Because (2.31) is also the CRB<br />

matrix for the estimation problem under consideration, it follows that<br />

F-MODE is asymptotically statistically efficient [VO91, SS90a, SN90].<br />

2.5 Derivation of FB-MODE<br />

Similarly to (2.29), the FB-MODE is associated with the ABC estimate<br />

derived from<br />

⎡ ⎤<br />

z = vec(B ∗ Ês) =<br />

⎢<br />

⎣<br />

B ∗ ê1<br />

.<br />

B ∗ êd<br />

⎥<br />

⎦.<br />

As shown <strong>in</strong> Appendix A.1, asymptotically <strong>in</strong> N,<br />

<strong>and</strong><br />

E{zz ∗ } = σ2 � −2 � � �<br />

˜Λ ∗<br />

Λs ⊗ B B<br />

2N<br />

(2.34)<br />

E{zz T } = σ2 � −2<br />

˜Λ ΛsΓ<br />

2N<br />

∗� ⊗ � B ∗ B˜ J � , (2.35)


2.5 Derivation of FB-MODE 49<br />

where ⊗ is the Kronecker matrix product operator. Consequently, see<br />

(2.27),<br />

��<br />

z<br />

Cρ � E<br />

zc �<br />

�z<br />

∗ zT�� = σ2<br />

� �˜Λ −2 � � � �<br />

∗<br />

−2<br />

Λs ⊗ B B ˜Λ ΛsΓ<br />

2N<br />

∗� ⊗ � B∗B˜ J �<br />

� −2<br />

˜Λ ΛsΓ � ⊗ � � (2.36)<br />

� �<br />

JB ˜ ∗<br />

−2 � � �<br />

B ˜Λ ∗ c .<br />

Λs ⊗ B B<br />

Us<strong>in</strong>g the fact that � B ∗ B � c = ˜ JB ∗ B ˜ J (see R1), we can factor Cρ as (we<br />

omit the scalar factor σ 2 /2N <strong>in</strong> (2.36) s<strong>in</strong>ce it has no <strong>in</strong>fluence on the<br />

estimate obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g the ABC criterion)<br />

Cρ =<br />

��˜Λ −2 �<br />

Λs ⊗ I 0<br />

0<br />

×<br />

� −2 �<br />

˜Λ Λs ⊗ J˜<br />

� � � � �<br />

∗ ∗ I ⊗ I Γ ⊗ I I ⊗ B B<br />

Γ ⊗ I I ⊗ I<br />

The middle matrix <strong>in</strong> (2.37) can be rewritten as<br />

� �<br />

I ⊗ I �I � ∗<br />

⊗ I Γ ⊗ I<br />

Γ ⊗ I<br />

�<br />

0<br />

0 I ⊗ � B ∗ B ˜ J �<br />

�<br />

. (2.37)<br />

<strong>and</strong> it is evidently s<strong>in</strong>gular. Consequently Cρ is s<strong>in</strong>gular <strong>and</strong> hence the<br />

ABC estimat<strong>in</strong>g criterion <strong>in</strong> (2.28) cannot be used as it st<strong>and</strong>s. There are<br />

several possibilities for overcom<strong>in</strong>g this difficulty; see, e.g., [SS89, Gor97].<br />

Here we adopt the approach of [Gor97], where it was shown that if Cρ(ε)<br />

denotes a nons<strong>in</strong>gular matrix obta<strong>in</strong>ed by a slight perturbation of Cρ,<br />

then we can use (2.28) with Cρ(ε) <strong>in</strong> lieu of Cρ. The m<strong>in</strong>imizer of the<br />

so-obta<strong>in</strong>ed criterion,<br />

ρ ∗ C −1<br />

ρ (ε)ρ (2.38)<br />

is then an ABC estimate, as the perturbation parameter ε goes to zero.<br />

If the limit of (2.38), as ε → 0, cannot be evaluated analytically then we<br />

can obta<strong>in</strong> an (almost) ABC estimate by m<strong>in</strong>imiz<strong>in</strong>g (2.38) with a “very<br />

small” ε value. Interest<strong>in</strong>gly enough, <strong>in</strong> the case under discussion, we can<br />

obta<strong>in</strong> a closed-form expression for the limit criterion. To this end, let<br />

Γε = (1 − ε)Γ


50 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

<strong>and</strong> let Cρ(ε) be given by (2.37) where Γ is replaced by Γε. It can be<br />

readily checked that (for ε �= 0)<br />

� � ∗ −1 � � ∗<br />

I ⊗ I Γε × I 1 I ⊗ I −Γε ⊗ I<br />

=<br />

,<br />

Γε ⊗ I I ⊗ I ε(2 − ε) −Γε ⊗ I I ⊗ I<br />

which implies that<br />

C −1<br />

ρ (ε)<br />

�<br />

1<br />

∗ −1 I ⊗ (B B) 0<br />

=<br />

ε(2 − ε) 0 I ⊗ � J(B ˜ ∗ −1 B) �<br />

�<br />

� � ∗<br />

I ⊗ I −Γε ⊗ I<br />

×<br />

−Γε ⊗ I I ⊗ I<br />

�<br />

( ˜ Λ 2 Λ −1<br />

s ) ⊗ I 0<br />

0 ( ˜ Λ 2 Λ −1<br />

s ) ⊗ ˜ J<br />

1<br />

=<br />

ε(2 − ε)<br />

�<br />

(<br />

×<br />

˜ Λ 2 Λ −1<br />

s ) ⊗ (B∗B) −1 −(Γε ˜ Λ 2 Λ −1<br />

s ) ⊗ � (B∗B) −1˜<br />

�<br />

J<br />

−(Γε ˜ Λ 2 Λ −1<br />

s ) ⊗ � J(B ˜ ∗ −1 B) �<br />

( ˜ Λ 2 Λ −1<br />

s ) ⊗ � �<br />

J(B ˜ ∗B) −1˜<br />

�<br />

J<br />

Insert<strong>in</strong>g (2.39) <strong>in</strong>to (2.38) yields<br />

ε(2 − ε)ρ ∗ C −1<br />

�<br />

∗<br />

ρ (ε)ρ = z ( ˜ Λ 2 Λ −1<br />

s ) ⊗ (B ∗ B) −1�<br />

z<br />

� �� �<br />

Ω1<br />

�<br />

T<br />

− z (Γε ˜ Λ 2 Λ −1<br />

s ) ⊗ � J(B ˜ ∗ −1<br />

B) ��<br />

z<br />

� �� �<br />

Ω2<br />

�<br />

∗<br />

− z (Γε ˜ Λ 2 Λ −1<br />

s ) ⊗ � (B ∗ B) −1˜<br />

�<br />

J �<br />

z<br />

� �� �<br />

c<br />

Ω3<br />

�<br />

T<br />

+ z ( ˜ Λ 2 Λ −1<br />

s ) ⊗ � J(B ˜ ∗<br />

B)<br />

−1˜<br />

�<br />

J �<br />

z<br />

� �� �<br />

c .<br />

Us<strong>in</strong>g R1 we obta<strong>in</strong> (observe the def<strong>in</strong>ition of Ω1, ..., Ω4 above)<br />

<strong>and</strong><br />

Ω4 = Ω c 1<br />

Ω3 = Ω c 2.<br />

Ω4<br />

�<br />

.<br />

(2.39)<br />

(2.40)


2.5 Derivation of FB-MODE 51<br />

Hence we can rewrite (2.40) <strong>in</strong> a more compact form:<br />

ε(2 − ε)<br />

ρ<br />

2<br />

∗ C −1<br />

ρ (ε)ρ = z ∗ �<br />

Ω1z − Re z T �<br />

Ω2z . (2.41)<br />

Next, we make use of the fact that, for conformable matrices A, X, B,<br />

<strong>and</strong> Y,<br />

to obta<strong>in</strong><br />

<strong>and</strong><br />

Tr{AX ∗ BY} = � vec(X) � ∗� A T ⊗ B � vec(Y)<br />

z ∗ Ω1z = Tr{ ˜ Λ 2 Λ −1<br />

s Ê∗ sB(B ∗ B) −1 B ∗ Ês} (2.42)<br />

z T Ω2z = Tr{Γε ˜ Λ 2 Λ −1<br />

s ÊT s B c˜ J(B ∗ B) −1 B ∗ Ês}. (2.43)<br />

F<strong>in</strong>ally, we use R1 <strong>and</strong> R3 to rewrite (2.43) as<br />

z T Ω2z = (1 − ε)Tr{ ˜ Λ 2 Λ −1<br />

s Γ ÊT s JB(B ∗ B) −1 B ∗ Ês}<br />

= (1 − ε)Tr{ ˜ Λ 2 Λ −1<br />

s Ê∗ sB(B ∗ B) −1 B ∗ Ês} = (1 − ε)z ∗ Ω1z.<br />

(2.44)<br />

Comb<strong>in</strong><strong>in</strong>g (2.41), (2.42), <strong>and</strong> (2.44) leads to the function<br />

ρ ∗ C −1<br />

ρ (ε)ρ = 2 � ∗<br />

z Ω1z<br />

2 − ε<br />

� ,<br />

the limit of which, as ε → 0, yields the FB-MODE criterion:<br />

Tr{(B ∗ B) −1 B ∗ Ês ˜ Λ 2 Λ −1<br />

s Ê∗ sB}. (2.45)<br />

Compar<strong>in</strong>g (2.45) with (2.30) we see that the FB-MODE criterion has<br />

exactly the same structure as the F-MODE criterion, with the only difference<br />

that the former criterion depends on the eigenelements of the FB<br />

sample covariance matrix. Ow<strong>in</strong>g to this neat result, the FB-MODE estimate<br />

of ω can be obta<strong>in</strong>ed by apply<strong>in</strong>g to ˆ R the two- (or three-) step<br />

algorithm outl<strong>in</strong>ed <strong>in</strong> Section 2.4.


52 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

2.6 Analysis of FB-MODE<br />

The FB-MODE criterion <strong>in</strong> (2.45) can be parameterized directly via the<br />

parameters of <strong>in</strong>terest, {ωk}. Indeed, it follows from (2.14) that<br />

B(B ∗ B) −1 B ∗ = Π ⊥ A<br />

<strong>and</strong> hence that (2.45) can be rewritten as<br />

where<br />

f(ω) � Tr{ ˆ W Ê∗ sΠ ⊥ A Ês}, (2.46)<br />

ˆW = ˆ˜ Λ 2 ˆΛ −1<br />

s .<br />

Later we will also use the notation W = ˜ Λ 2 Λ −1<br />

s . The reason that ˆ W<br />

appears <strong>in</strong> (2.46), rather than W as <strong>in</strong> (2.45), is that the estimator should<br />

be based on the sample data. It will, however, be shown that it is only<br />

the limit matrix W that affects the asymptotic performance. Let<br />

g = ∂f(ω)<br />

∂ω<br />

denote the gradient vector of (2.46), let<br />

H = ∂2 f(ω)<br />

∂ω∂ω T<br />

denote the correspond<strong>in</strong>g Hessian matrix, <strong>and</strong> let<br />

H∞ = lim<br />

N→∞ H<br />

denote the asymptotic Hessian. Us<strong>in</strong>g this notation <strong>and</strong> a st<strong>and</strong>ard Taylor<br />

series expansion technique (the details of which are omitted <strong>in</strong> the<br />

<strong>in</strong>terest of brevity, see [VO91, SS90a] <strong>and</strong> [SN90] for similar analyses),<br />

we can show that the covariance matrix of the FB-MODE estimate of ω<br />

is asymptotically (for N ≫ 1) given by<br />

CFB = H −1<br />

�<br />

∞ E(gg T �<br />

) H −1<br />

∞ . (2.47)<br />

It rema<strong>in</strong>s to evaluate H∞ <strong>and</strong> the middle matrix <strong>in</strong> (2.47).


2.6 Analysis of FB-MODE 53<br />

Let Ai denote the derivative of A with respect to ωi:<br />

Ai = ∂A<br />

= diu<br />

∂ωi<br />

∗ i . (2.48)<br />

Here, di is a short notation for d(ωi), <strong>and</strong> ui = [0, ..., 0, 1, 0, ..., 0] T<br />

(with 1 <strong>in</strong> the ith position). A simple calculation shows that<br />

∂Π ⊥ A<br />

= −Ai(A ∗ A) −1 A ∗<br />

∂ωi<br />

+ A(A ∗ A) −1� A ∗ iA + A ∗ Ai<br />

= −Π ⊥ AAi(A ∗ A) −1 A ∗ − A(A ∗ A) −1 A ∗ iΠ ⊥ A.<br />

� (A ∗ A) −1 A ∗ − A(A ∗ A) −1 A ∗ i<br />

(2.49)<br />

Next note that – for conformable matrices X <strong>and</strong> Y, with Y Hermitian<br />

– we have<br />

Tr{(X + X ∗ )Y} = 2Re � Tr{XY} � . (2.50)<br />

Us<strong>in</strong>g (2.49) <strong>and</strong> (2.50) we obta<strong>in</strong><br />

�<br />

gi = −2Re Tr[ ˆ WÊ∗sA(A ∗ A) −1 A ∗ iΠ ⊥ AÊs] �<br />

. (2.51)<br />

Insert<strong>in</strong>g (2.48) <strong>in</strong> (2.51) gives<br />

gi ≃ −2Re<br />

�<br />

d ∗ iΠ ⊥ A ÊsWE ∗ sA(A ∗ A) −1 ui<br />

Tak<strong>in</strong>g the derivative of the right h<strong>and</strong> side of (2.51) with respect to ωj,<br />

we obta<strong>in</strong><br />

�<br />

Hij ≃ 2Re d ∗ iΠ ⊥ AAj(A ∗ A) −1 A ∗ EsWE ∗ sA(A ∗ A) −1 �<br />

ui<br />

�� = 2Re d ∗ iΠ ⊥ ���(A∗ −1 ∗<br />

Adj A) A EsWE ∗ sA(A ∗ A) −1�T � �<br />

ij<br />

= � �<br />

H∞ ij .<br />

Mak<strong>in</strong>g use of R7 yields the follow<strong>in</strong>g expression for the asymptotic<br />

Hessian matrix,<br />

H∞ = 2Re � U ⊙ Q T� ,<br />

�<br />

.


54 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

where U is as def<strong>in</strong>ed <strong>in</strong> (2.32), <strong>and</strong> Q is def<strong>in</strong>ed similarly to Q0 <strong>in</strong> (2.33),<br />

Q = PA ∗ R −1 AP.<br />

To derive an expression for the middle matrix <strong>in</strong> (2.47) we need the<br />

additional notation<br />

α ∗ i = d ∗ iB(B ∗ B) −1 ,<br />

β i = WE ∗ sA(A ∗ A) −1 ui,<br />

with the use of which we can write<br />

�<br />

−gi ≃ 2Re α ∗ i (B ∗ �<br />

Ês)βi It follows that<br />

E[gigj]<br />

≃ 2Re<br />

�<br />

= 2Re (β T i ⊗ α ∗ �<br />

i )z .<br />

�<br />

(β T i ⊗ α ∗ i )E[zz ∗ ](β c j ⊗ αj) + (β T i ⊗ α ∗ i )E[zz T ](βj ⊗ α c �<br />

j)<br />

� 2Re(T1 + T2).<br />

Next, we note that (cf. (2.34))<br />

2N<br />

σ 2 T1 = (β T i ˜ Λ −2 Λsβ c j)(α ∗ iB ∗ Bαj)<br />

= (β ∗ j ˜ Λ −2 Λsβ i)(α ∗ iB ∗ Bαj) = Q T ijUij.<br />

(2.52)<br />

(2.53)<br />

A convenient expression for T2 is slightly more difficult to obta<strong>in</strong>. First,<br />

we note that (cf. (2.35))<br />

2N<br />

σ 2 T2 = (β T i ˜ Λ −2 ΛsΓ ∗ β j)(α ∗ iB ∗ B ˜ Jα c j). (2.54)<br />

Next, we make use of R3 <strong>and</strong> (2.7) to write<br />

Γ ∗ β j = W(Γ ∗ E ∗ s)A(A ∗ A) −1 uj = WE T s (JA)(A ∗ A) −1 uj<br />

= WE T s A c Φ ∗ (A ∗ A) −1 uj = WE T s A c (A ∗ JA c ) −1 uj<br />

= WE T s A c (ΦA T A c ) −1 uj = WE T s A c (A T A c ) −1 e i(m−1)ωj uj<br />

= β c je i(m−1)ωj ,


2.6 Analysis of FB-MODE 55<br />

which gives<br />

β T i ˜ Λ −2 ΛsΓ ∗ β j = β T i ˜ Λ −2 Λsβ c je i(m−1)ωj<br />

F<strong>in</strong>ally, we use R1 <strong>and</strong> R2 to obta<strong>in</strong><br />

which yields<br />

= (β ∗ j ˜ Λ −2 Λsβ i)e i(m−1)ωj = Q T ije i(m−1)ωj . (2.55)<br />

˜Jα c j = (B T B c˜ J) −1 B T d c j = ( ˜ JB ∗ B) −1 B T d c j<br />

= (B ∗ B) −1 B ∗ Jd c j = (B ∗ B) −1 B ∗ dje −i(m−1)ωj ,<br />

α ∗ i (B ∗ B) ˜ Jα c j = d ∗ iB(B ∗ B) −1 B ∗ dje −i(m−1)ωj = Uije −i(m−1)ωj . (2.56)<br />

Insert<strong>in</strong>g (2.55) <strong>and</strong> (2.56) <strong>in</strong>to (2.54) gives the desired expression for T2:<br />

2N<br />

σ 2 T2 = Q T ijUij. (2.57)<br />

Comb<strong>in</strong><strong>in</strong>g (2.52), (2.53), <strong>and</strong> (2.57) leads to the follow<strong>in</strong>g expression for<br />

the asymptotic covariance matrix of FB-MODE:<br />

CFB = σ2<br />

�<br />

Re<br />

2N<br />

� U ⊙ Q T�� −1<br />

. (2.58)<br />

Compar<strong>in</strong>g (2.58) <strong>and</strong> (2.31) we see that the only difference between CFB<br />

<strong>and</strong> CF is that Q0 <strong>in</strong> CF is replaced by Q <strong>in</strong> CFB. If P0 is diagonal then<br />

Q = Q0 <strong>and</strong> hence CFB = CF. However, if P0 is non-diagonal (i.e., the<br />

signals are correlated) then, <strong>in</strong> general, Q �= Q0. To see this, consider for<br />

example a case where P0 is s<strong>in</strong>gular, which implies that Q0 is s<strong>in</strong>gular as<br />

well; however, P, <strong>and</strong> hence Q, may well be nons<strong>in</strong>gular. This example<br />

also shows that the difference matrix Q0 − Q is not necessarily positive<br />

semi-def<strong>in</strong>ite, <strong>in</strong> spite of the fact that, by the CRB <strong>in</strong>equality, we must<br />

have (A ≥ B below means that the matrix A−B is positive semi-def<strong>in</strong>ite)<br />

CFB ≥ CF<br />

⇐⇒<br />

(2.59)<br />

Re [U ⊙ (Q0 − Q)] ≥ 0. (2.60)<br />

(Of course, this is no contradiction because Q0 ≥ Q is sufficient for (2.60)<br />

to hold, but it is not necessary.)


56 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

As already <strong>in</strong>dicated above, the <strong>in</strong>equality <strong>in</strong> (2.59) is usually “strict”<br />

<strong>in</strong> the sense that CFB �= CF. Every time this happens, the FB-MODE is<br />

asymptotically statistically less efficient than the F-MODE. To study the<br />

difference (CFB − CF) quantitatively, as well as the extent to which the<br />

asymptotic results derived above hold <strong>in</strong> samples of practical lengths, we<br />

resort to numerical simulations (see the next section).<br />

2.7 Numerical Examples<br />

In this section we will study the difference <strong>in</strong> performance between F-<br />

MODE <strong>and</strong> FB-MODE by means of a two signal scenario. It is a simple<br />

exercise to verify that the covariance matrices P <strong>and</strong> P0 are identical for<br />

a certa<strong>in</strong> correlation phase between the two signals. Indeed, let the signal<br />

covariance matrix be given by<br />

� �<br />

p11 p12<br />

P0 =<br />

p ∗ 12 p22<br />

with p12 = ξe iφ . If the correlation phase is<br />

φ = 1<br />

2 (m − 1)(ω2 − ω1) + kπ,<br />

where k is any <strong>in</strong>teger, then P = P0. This implies that R = R0 <strong>and</strong> hence<br />

that CF = CFB. Thus, <strong>in</strong> this special case, the large sample performance<br />

is the same for F-MODE <strong>and</strong> FB-MODE. However, as previously mentioned,<br />

<strong>in</strong> general CFB > CF. In the follow<strong>in</strong>g, we choose φ so as to<br />

emphasize the difference between F-MODE <strong>and</strong> FB-MODE.<br />

Consider a ULA consist<strong>in</strong>g of four omni-directional <strong>and</strong> identical sensors<br />

separated by half of the carrier’s wavelength. The two signals imp<strong>in</strong>ge<br />

on the array from θ1 = −7.5 ◦ <strong>and</strong> θ2 = 7.5 ◦ relative to broadside.<br />

These directions correspond to the spatial (angular) frequencies:<br />

ωi = π s<strong>in</strong>(θi) (i = 1, 2). The signal covariance is<br />

P0 = 10 SNR/10<br />

�<br />

1 0.99e iπ/4<br />

0.99e −iπ/4 1<br />

where SNR is expressed <strong>in</strong> decibels (dB). The correlation phase of π/4 =<br />

45 ◦ is (close to) the “worst case,” that is, the case where the difference<br />

between CFB <strong>and</strong> CF is most significant (given the other parameters).<br />

The noise variance is fixed to σ 2 = 1.<br />

�<br />

,


2.7 Numerical Examples 57<br />

MSE<br />

10 2<br />

10 1<br />

10 1<br />

10 0<br />

10 2<br />

Number of snapshots, N.<br />

Figure 2.1: Mean square errors for θ1 (<strong>in</strong> degrees 2 ) for F-MODE (+)<br />

<strong>and</strong> FB-MODE (o) versus the number of snapshots N. The solid l<strong>in</strong>e is<br />

the theoretical (asymptotic) MSE for F-MODE as given <strong>in</strong> (2.31), <strong>and</strong><br />

the dashed l<strong>in</strong>e is the theoretical MSE for FB-MODE given <strong>in</strong> (2.58).<br />

In the first example, SNR=0 dB. The mean-square errors (MSEs)<br />

for F-MODE <strong>and</strong> FB-MODE are compared for different sample lengths<br />

<strong>in</strong> Figure 2.1. (<strong>On</strong>ly the MSE values for θ1 are shown; the MSE plot<br />

correspond<strong>in</strong>g to θ2 is similar.) The sample MSEs are based on 1000<br />

<strong>in</strong>dependent trials. In the simulations, two steps of the algorithm given<br />

<strong>in</strong> Section 2.4 were used both for F-MODE <strong>and</strong> FB-MODE. The MSE<br />

predicted by the large sample analysis is also depicted <strong>in</strong> Figure 2.1. It<br />

can be seen that the theoretical <strong>and</strong> simulation results are similar even<br />

for quite small sample lengths. It is clearly not useful to use FB-MODE<br />

<strong>in</strong> this case, not even for small samples.<br />

In the second example, the number of samples is fixed to N = 100<br />

<strong>and</strong> the SNR is varied. All other parameters are as before. The MSEs<br />

(computed from 1000 trials) are shown <strong>in</strong> Figure 2.2. We see that F-<br />

MODE outperforms FB-MODE for all SNR values.<br />

10 3


58 2 <strong>On</strong> Forward-Backward MODE for Array Signal Process<strong>in</strong>g<br />

MSE<br />

10 2<br />

10 1<br />

10 0<br />

10<br />

−6 −4 −2 0 2 4 6 8 10 12 14<br />

−1<br />

SNR <strong>in</strong> dB<br />

Figure 2.2: Mean square errors for θ1 (<strong>in</strong> degrees 2 ) for F-MODE (+) <strong>and</strong><br />

FB-MODE (o) versus the SNR. The solid l<strong>in</strong>e is the theoretical (asymptotic)<br />

MSE for F-MODE as given <strong>in</strong> (2.31), <strong>and</strong> the dashed l<strong>in</strong>e is the<br />

theoretical MSE for FB-MODE given <strong>in</strong> (2.58).<br />

2.8 Conclusions<br />

MODE is an angle-of-arrival estimator which is asymptotically statistically<br />

efficient. The st<strong>and</strong>ard MODE algorithm is used with the forwardonly<br />

sample covariance matrix of the output of a sensor array. Here, we<br />

have applied the MODE pr<strong>in</strong>ciple to the forward-backward sample covariance<br />

matrix to obta<strong>in</strong> the FB-MODE estimator. We have shown that<br />

FB-MODE simply consists of apply<strong>in</strong>g the st<strong>and</strong>ard MODE algorithm to<br />

the eigenelements of the forward-backward covariance matrix. We have<br />

also shown that the st<strong>and</strong>ard MODE algorithm outperforms FB-MODE,<br />

from a statistical st<strong>and</strong>po<strong>in</strong>t, <strong>in</strong> large samples. Numerical examples were<br />

presented to show that these results most often holds for small samples or


2.8 Conclusions 59<br />

low signal-to-noise ratios as well. Thus, the forward-backward averag<strong>in</strong>g<br />

should <strong>in</strong> general not be used <strong>in</strong> conjunction with MODE.


Chapter 3<br />

Weighted <strong>Subspace</strong><br />

Fitt<strong>in</strong>g for General Array<br />

Error Models<br />

Model error sensitivity is an issue common to all high resolution direction<br />

of arrival estimators. Much attention has been directed to the design of<br />

algorithms for m<strong>in</strong>imum variance estimation tak<strong>in</strong>g only f<strong>in</strong>ite sample<br />

errors <strong>in</strong>to account. Approaches to reduce the sensitivity due to array<br />

calibration errors have also appeared <strong>in</strong> the literature. Here<strong>in</strong>, one such<br />

approach is adopted which assumes that the errors due to f<strong>in</strong>ite samples<br />

<strong>and</strong> model errors are of comparable size. M<strong>in</strong>imum variance estimators<br />

have previously been proposed for this case, but these estimators typically<br />

lead to non-l<strong>in</strong>ear optimization problems <strong>and</strong> are not <strong>in</strong> general consistent<br />

if the source signals are fully correlated. In this chapter, a weighted<br />

subspace fitt<strong>in</strong>g method for general array perturbation models is derived.<br />

This method provides m<strong>in</strong>imum variance estimates under the assumption<br />

that the prior distribution of the perturbation model is known. Interest<strong>in</strong>gly,<br />

the method reduces to the WSF (MODE) estimator if no model<br />

errors are present. Vice versa, assum<strong>in</strong>g that model errors dom<strong>in</strong>ate, the<br />

method specializes to the correspond<strong>in</strong>g “model-errors-only subspace fitt<strong>in</strong>g<br />

method”. Unlike previous techniques for model errors, the estimator<br />

can be implemented us<strong>in</strong>g a two-step procedure if the nom<strong>in</strong>al array is<br />

uniform <strong>and</strong> l<strong>in</strong>ear, <strong>and</strong> it is also consistent even if the signals are fully<br />

correlated. The chapter also conta<strong>in</strong>s a large sample analysis of one of the


62 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

alternative methods, namely, MAPprox. It is shown that MAPprox also<br />

provides m<strong>in</strong>imum variance estimates under reasonable assumptions.<br />

3.1 Introduction<br />

All signal parameter estimation methods <strong>in</strong> array signal process<strong>in</strong>g rely<br />

on <strong>in</strong>formation about the array response, <strong>and</strong> assume that the signal<br />

wavefronts imp<strong>in</strong>g<strong>in</strong>g on the array have perfect spatial coherence (e.g.,<br />

perfect plane waves). The array response may be determ<strong>in</strong>ed by either<br />

empirical measurements, a process referred to as array calibration, or<br />

by mak<strong>in</strong>g certa<strong>in</strong> assumptions about the sensors <strong>in</strong> the array <strong>and</strong> their<br />

geometry (e.g., identical sensors <strong>in</strong> known locations).<br />

Unfortunately, an array cannot be perfectly calibrated, <strong>and</strong> analytically<br />

derived array responses rely<strong>in</strong>g on the array geometry <strong>and</strong> wave<br />

propagation are at best good approximations. Due to changes <strong>in</strong> antenna<br />

location, temperature, <strong>and</strong> the surround<strong>in</strong>g environment, the response<br />

of the array may be significantly different than when it was last<br />

calibrated. Furthermore, the calibration measurements themselves are<br />

subject to ga<strong>in</strong> <strong>and</strong> phase errors, <strong>and</strong> they can only be obta<strong>in</strong>ed for discrete<br />

angles, thus necessitat<strong>in</strong>g <strong>in</strong>terpolation techniques for uncalibrated<br />

directions. For the case of analytically calibrated arrays of nom<strong>in</strong>ally<br />

identical, identically oriented elements, errors result s<strong>in</strong>ce the elements<br />

are not really identical <strong>and</strong> their locations are not precisely known.<br />

Because of the many sources of error listed above, the limit<strong>in</strong>g factor<br />

<strong>in</strong> the performance of array signal process<strong>in</strong>g algorithms is most often<br />

not measurement noise, but rather perturbations <strong>in</strong> the array response<br />

model. Depend<strong>in</strong>g on the size of such errors, estimates of the directions<br />

of arrival (DOAs) <strong>and</strong> the source signals may be significantly degraded.<br />

A number of studies have been conducted to quantify the performance<br />

degradation due to model errors for both DOA [ZW88, WWN88, SK90,<br />

Fri90a, Fri90b, SK92, SK93, LV92, VS94b, SH92, KSS94, KSS96] <strong>and</strong><br />

signal estimation [RH80, Com82, Qua82, FW94, YS95]. These analyses<br />

have assumed either a determ<strong>in</strong>istic model for the errors, us<strong>in</strong>g bias as<br />

the performance metric, or a stochastic model, where variance is typically<br />

calculated.<br />

A number of techniques have also been considered for improv<strong>in</strong>g the<br />

robustness of array process<strong>in</strong>g algorithms. In one such approach, the array<br />

response is parameterized not only by the DOAs of the signals, but<br />

also by perturbation or “nuisance” parameters that describe deviations


3.1 Introduction 63<br />

of the response from its nom<strong>in</strong>al value. These parameters can <strong>in</strong>clude,<br />

for example, displacements of the antenna elements from their nom<strong>in</strong>al<br />

positions, uncalibrated receiver ga<strong>in</strong> <strong>and</strong> phase offsets, etc.. With such<br />

a model, a natural approach is to attempt to estimate the unknown nuisance<br />

parameters simultaneously with the signal parameters. Such methods<br />

are referred to as auto-calibration techniques, <strong>and</strong> have been proposed<br />

by a number of authors, <strong>in</strong>clud<strong>in</strong>g [PK85, RS87a, RS87b, WF89, WOV91,<br />

WRM94, VS94a, FFLC95, Swi96].<br />

When auto-calibration techniques are employed, it is critical to determ<strong>in</strong>e<br />

whether both the signal <strong>and</strong> nuisance parameters are identifiable.<br />

In certa<strong>in</strong> cases they are not; for example, one cannot uniquely estimate<br />

both DOAs <strong>and</strong> sensor positions unless of course additional <strong>in</strong>formation<br />

is available, such as sources <strong>in</strong> known locations [LM87, WF89, NN93,<br />

KF93], cyclostationary signals with two or more known cycle frequencies<br />

[YZ95], or partial <strong>in</strong>formation about the phase response of the array<br />

[MR94]. The identifiability problem can be alleviated if the perturbation<br />

parameters are assumed to be drawn from some known a priori distribution.<br />

While this itself represents a form of additional <strong>in</strong>formation, it<br />

has the advantage of allow<strong>in</strong>g an optimal maximum a posteriori (MAP)<br />

solution to the problem to be formulated [VS94a]. In [VS94a] it is shown<br />

that, by us<strong>in</strong>g an asymptotically equivalent approximation to the result<strong>in</strong>g<br />

MAP criterion, the estimation of the signal <strong>and</strong> nuisance parameters<br />

can be decoupled, lead<strong>in</strong>g to a significant simplification of the problem.<br />

Another technique for reduc<strong>in</strong>g the sensitivity of array process<strong>in</strong>g<br />

algorithms to model errors is through the use of statistically optimal<br />

weight<strong>in</strong>gs based on probabilistic models for the errors. Optimal weight<strong>in</strong>gs<br />

that ignore the f<strong>in</strong>ite sample effects of noise have been developed<br />

for MUSIC [SK92] <strong>and</strong> the general subspace fitt<strong>in</strong>g technique [SK93]. A<br />

more general analysis <strong>in</strong> [VS94b] <strong>in</strong>cluded the effects of noise, <strong>and</strong> presented<br />

a weighted subspace fitt<strong>in</strong>g (WSF) method whose DOA estimates<br />

asymptotically achieve the Cramér-Rao bound (CRB), but only for a very<br />

special type of array error model. <strong>On</strong> the other h<strong>and</strong>, the MAP approach<br />

<strong>in</strong> [VS94a] is asymptotically statistically efficient for very general error<br />

models. However, s<strong>in</strong>ce it is implemented by means of noise subspace<br />

fitt<strong>in</strong>g [OVSN93], if the sources are highly correlated or closely spaced <strong>in</strong><br />

angle, its f<strong>in</strong>ite sample performance may be poor. In fact, the method<br />

of [VS94a] is not a consistent estimator of the DOAs if the signals are<br />

perfectly coherent.<br />

In this chapter, we develop a statistically efficient weighted signal<br />

subspace fitt<strong>in</strong>g (SSF) algorithm that holds for very general array per-


64 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

turbation models, that has much better f<strong>in</strong>ite sample performance than<br />

[VS94a] when the signals are highly correlated or closely spaced, <strong>and</strong><br />

that yields consistent estimates when coherent sources are present. An<br />

additional advantage of our SSF formulation is that if the array is nom<strong>in</strong>ally<br />

uniform <strong>and</strong> l<strong>in</strong>ear, a two-step procedure similar to that for MODE<br />

[SS90a] can be used to elim<strong>in</strong>ate the search for the DOAs. Furthermore,<br />

as part of our analysis we show that the so-called MAPprox technique<br />

described <strong>in</strong> [WOV91] also asymptotically achieves the CRB, <strong>and</strong> we<br />

draw some connections between it <strong>and</strong> the new SSF algorithm presented<br />

here<strong>in</strong>.<br />

The weighted SSF method presented here differs from other WSF<br />

techniques like [VO91, SK93, VS94b] <strong>in</strong> that all of the elements of the<br />

vectors spann<strong>in</strong>g the signal subspace have their own <strong>in</strong>dividual weight<strong>in</strong>g<br />

[OAKP97]. In these earlier algorithms, the form of the weight<strong>in</strong>g forced<br />

certa<strong>in</strong> subsets of the elements to share a particular weight. It is the<br />

generalization to <strong>in</strong>dividual weights that is the key to the algorithm’s<br />

optimality for very general array response perturbation models. For this<br />

reason, we will refer to the algorithm as generalized weighted subspace<br />

fitt<strong>in</strong>g (GWSF).<br />

The chapter is organized as follows: In the next section, the data<br />

model is briefly <strong>in</strong>troduced. In Section 3.3, more <strong>in</strong>sight <strong>in</strong>to the studied<br />

problem is given by present<strong>in</strong>g previous approaches. In Section 3.4, the<br />

proposed GWSF estimator is <strong>in</strong>troduced, <strong>and</strong> its statistical properties are<br />

analyzed <strong>in</strong> Section 3.5. Details about the implementation of the algorithm<br />

are given <strong>in</strong> Section 3.6. In Section 3.7, we show that the MAPprox<br />

estimator has very close connections with GWSF <strong>and</strong> <strong>in</strong>herits the same<br />

asymptotic properties as GWSF. F<strong>in</strong>ally, the theoretical observations are<br />

illustrated by numerical examples <strong>in</strong> Section 3.8.<br />

3.2 Problem Formulation<br />

Assume that the output of an array of m sensors is given by the model<br />

x(t) = A(θ,ρ)s(t) + n(t),<br />

where s(t) is a complex d-vector conta<strong>in</strong><strong>in</strong>g the emitted signal waveforms<br />

<strong>and</strong> n(t) is an additive noise vector. The array steer<strong>in</strong>g matrix is def<strong>in</strong>ed<br />

as<br />

A(θ,ρ) = � ā(θ1,ρ) ... ā(θd,ρ) � ,


3.2 Problem Formulation 65<br />

where ā(θi,ρ) denotes the array response to a unit waveform associated<br />

with the signal parameter θi (possibly vector valued, although we will<br />

specialize to the scalar case). The parameters <strong>in</strong> the real n-vector ρ are<br />

used to model the uncerta<strong>in</strong>ty <strong>in</strong> the array steer<strong>in</strong>g matrix. It is assumed<br />

that A(θ,ρ) is known for a nom<strong>in</strong>al value ρ = ρ 0 <strong>and</strong> that the columns<br />

<strong>in</strong> A(θ,ρ 0) are l<strong>in</strong>early <strong>in</strong>dependent as long as θi �= θj, i �= j. The model<br />

for A(θ,ρ 0) can be obta<strong>in</strong>ed for example by physical <strong>in</strong>sight, or it could<br />

be the result of a calibration experiment. <strong>On</strong>e may then have knowledge<br />

about the sensitivity of the nom<strong>in</strong>al response to certa<strong>in</strong> variations <strong>in</strong> ρ,<br />

which can be modeled by consider<strong>in</strong>g ρ as a r<strong>and</strong>om vector. The a priori<br />

<strong>in</strong>formation is then <strong>in</strong> the form of a probability distribution which can<br />

be used <strong>in</strong> the design of an estimation algorithm to make it robust with<br />

regard to the model errors.<br />

We will use a stochastic model for the signals; more precisely, s(t) is<br />

considered to be a zero-mean Gaussian r<strong>and</strong>om vector with second order<br />

moments<br />

E{s(t)s ∗ (s)} = Pδt,s,<br />

E{s(t)s T (s)} = 0,<br />

where (·) ∗ denotes complex conjugate transpose, (·) T denotes transpose<br />

<strong>and</strong> δt,s is the Kronecker delta. Let d ′ denote the rank of the signal<br />

covariance matrix P. Special attention will be given to cases where d ′ <<br />

d, i.e., cases <strong>in</strong> which the signals may be fully correlated (coherent). In<br />

particular, we show that the proposed method is consistent even under<br />

such severe conditions. The noise is modeled as a zero-mean spatially <strong>and</strong><br />

temporally white complex Gaussian vector with second order moments<br />

E{n(t)n ∗ (s)} = σ 2 Iδt,s,<br />

E{n(t)n T (s)} = 0.<br />

The perturbation parameter vector ρ is modeled as a Gaussian r<strong>and</strong>om<br />

variable with mean E{ρ} = ρ 0 <strong>and</strong> covariance<br />

E{(ρ − ρ 0)(ρ − ρ 0) T } = Ω.<br />

It is assumed that both ρ 0 <strong>and</strong> Ω are known. Similar to [VS94b, VS94a],<br />

we consider small perturbations <strong>in</strong> ρ <strong>and</strong> more specifically assume that<br />

the effect of the array errors on the estimates of θ is of comparable size<br />

to those due to the f<strong>in</strong>ite sample effects of the noise. To model this, we


66 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

assume that<br />

Ω = ¯ Ω/N,<br />

where N is the number of samples <strong>and</strong> ¯ Ω is <strong>in</strong>dependent of N. This<br />

somewhat artificial assumption concern<strong>in</strong>g the model errors is made only<br />

for convenience <strong>in</strong> show<strong>in</strong>g the statistical optimality of the algorithm<br />

presented later. An identical result can be obta<strong>in</strong>ed for small ρ − ρ 0 if a<br />

first order perturbation analysis is used, but this approach is somewhat<br />

less elegant. The assumption that ρ − ρ 0 = Op(1/ √ N) implies that<br />

the calibration errors <strong>and</strong> f<strong>in</strong>ite sample effects are of comparable size.<br />

(The notation ξ = Op(1/ √ N) means that √ Nξ tends to a constant <strong>in</strong><br />

probability as N → ∞.) Despite this fact, we will see that the result<strong>in</strong>g<br />

algorithm is optimal also for the follow<strong>in</strong>g limit<strong>in</strong>g cases:<br />

• N → ∞ for fixed (ρ − ρ 0) – (model errors only),<br />

• (ρ − ρ 0) → 0 for fixed N – (f<strong>in</strong>ite sample errors only).<br />

In other words, for each of the above cases, the proposed technique converges<br />

to an algorithm that has previously been shown to be statistically<br />

efficient for that case. We can thus conclude that the algorithm is globally<br />

optimal, not just under the assumption ρ − ρ 0 = Op(1/ √ N).<br />

3.3 Robust Estimation<br />

This section <strong>in</strong>troduces some of the previous approaches taken to solve<br />

the problem considered <strong>in</strong> this chapter. The Cramér Rao lower bound is<br />

also restated from [VS94a].<br />

3.3.1 MAP<br />

As the problem has been formulated, it is natural to use a maximum<br />

a posteriori (MAP) estimator. Follow<strong>in</strong>g [WOV91], the cost function<br />

to be m<strong>in</strong>imized for simultaneous estimation of signal <strong>and</strong> perturbation<br />

parameters is<br />

VMAP(θ,ρ) = VML(θ,ρ) + 1<br />

2 (ρ − ρ 0) T Ω −1 (ρ − ρ 0).<br />

VML(θ,ρ) is the concentrated negative log-likelihood function <strong>and</strong> is given<br />

by (see, e.g., [Böh86, Jaf88])<br />

VML(θ,ρ) = N log det{A ˆ PA ∗ + ˆσ 2 I},


3.3 Robust Estimation 67<br />

where for brevity we have suppressed the arguments <strong>and</strong> where constant<br />

terms have been neglected. Here, ˆ P = ˆ P(θ,ρ) <strong>and</strong> ˆσ 2 = ˆσ 2 (θ,ρ) denote<br />

the maximum likelihood estimates of P <strong>and</strong> σ 2 , respectively. They are<br />

explicitly given by:<br />

where<br />

ˆσ 2 (θ,ρ) = 1<br />

m − d Tr{Π⊥A(θ,ρ) ˆ R},<br />

ˆP(θ,ρ) = A † � �<br />

(θ,ρ) ˆR 2<br />

− ˆσ (θ,ρ)I A †∗ (θ,ρ), (3.1)<br />

ˆR = 1<br />

N<br />

N�<br />

x(t)x ∗ (t),<br />

t=1<br />

A † = (A ∗ A) −1 A ∗ ,<br />

Π ⊥ A = I − AA † .<br />

For large N, it is further argued <strong>in</strong> [WOV91] that VML(θ,ρ) can<br />

be replaced with any other cost function that has the same large sample<br />

behavior as ML. In particular, it was proposed that the weighted subspace<br />

fitt<strong>in</strong>g (WSF) cost function [VO91] be used. This results <strong>in</strong><br />

where<br />

VMAP-WSF(θ,ρ) = N<br />

ˆσ 2 VWSF(θ,ρ) + 1<br />

2 (ρ − ρ 0) T Ω −1 (ρ − ρ 0), (3.2)<br />

VWSF(θ,ρ) = Tr{Π ⊥ A(θ,ρ) Ês ˆ WWSF Ê∗ s}. (3.3)<br />

Here, Ês <strong>and</strong> ˆ WWSF are def<strong>in</strong>ed via the eigendecomposition of ˆ R<br />

ˆR = Ês ˆ Λs Ê∗ s + Ên ˆ Λn Ê∗ n, (3.4)<br />

where ˆ Λs is a diagonal matrix conta<strong>in</strong><strong>in</strong>g the d ′ largest eigenvalues <strong>and</strong><br />

Ês is composed of the correspond<strong>in</strong>g eigenvectors. The WSF weight<strong>in</strong>g<br />

matrix is def<strong>in</strong>ed as<br />

ˆWWSF = ˆ � Λ 2<br />

ˆΛ −1<br />

s ,<br />

where ˆ � Λ = ˆ Λs − ˆσ 2 I <strong>and</strong> ˆσ 2 is a consistent estimate of σ 2 (for example,<br />

the average of the m − d ′ smallest eigenvalues of ˆ R). The normaliz<strong>in</strong>g<br />

factor N/ˆσ 2 <strong>in</strong> (3.2) is important to make the partial derivatives of


68 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

N VWSF/ˆσ 2 <strong>and</strong> VML with respect to θ <strong>and</strong> ρ of comparable size locally<br />

around {θ0,ρ}. More precisely,<br />

∂VML<br />

∂η<br />

�<br />

�<br />

�<br />

�<br />

θ0,ρ<br />

= N<br />

ˆσ 2<br />

∂VWSF<br />

∂η<br />

�<br />

�<br />

�<br />

� + op(<br />

θ0,ρ<br />

√ N),<br />

where η denotes a generic component of the vector [θ T ρ T ] T , <strong>and</strong> where<br />

op(·) represents a term that tends to zero faster than its argument <strong>in</strong><br />

probability. The m<strong>in</strong>imization of either VMAP or VMAP-WSF is over both<br />

θ <strong>and</strong> ρ. The idea of the methods presented <strong>in</strong> the follow<strong>in</strong>g two sections<br />

is to elim<strong>in</strong>ate the need for an explicit m<strong>in</strong>imization over ρ.<br />

3.3.2 MAPprox<br />

In [WOV91] it is suggested that the m<strong>in</strong>imization of VMAP-WSF over ρ can<br />

be done implicitly. Assum<strong>in</strong>g that ρ − ρ 0 is small, a second order Taylor<br />

expansion of VMAP-WSF around ρ 0 is performed, <strong>and</strong> then the result is<br />

analytically m<strong>in</strong>imized with regard to ρ. This leads to a concentrated<br />

form of the cost function<br />

VMAPprox(θ)<br />

= N<br />

ˆσ 2<br />

�<br />

VWSF − 1<br />

2 ∂ρV T WSF<br />

�<br />

∂ρρVWSF + ˆσ2<br />

N Ω−1<br />

�−1<br />

∂ρVWSF<br />

�<br />

, (3.5)<br />

where ∂ρVWSF denotes the gradient of VWSF with respect to ρ <strong>and</strong><br />

∂ρρVWSF denotes the Hessian. The right h<strong>and</strong> side of (3.5) is evaluated<br />

at ρ 0 <strong>and</strong> θ. The method is referred to as MAPprox. In Section 3.7 we<br />

analyze the asymptotic properties of the MAPprox estimates <strong>and</strong> po<strong>in</strong>t<br />

out its relationship to the GWSF estimator derived <strong>in</strong> Section 3.4.<br />

3.3.3 MAP-NSF<br />

Another method for approximat<strong>in</strong>g the MAP estimator is the MAP-NSF<br />

approach of [VS94a]. In MAP-NSF, one uses the fact that the noise<br />

subspace fitt<strong>in</strong>g (NSF) method is also asymptotically equivalent to ML<br />

[OVSN93]. Thus, <strong>in</strong> lieu of us<strong>in</strong>g the WSF criterion <strong>in</strong> (3.2), the NSF<br />

cost function<br />

VNSF = Tr{A ∗ ÊnÊ∗nA Û} (3.6)


3.3 Robust Estimation 69<br />

is employed. Here, Û is a consistent estimate of the matrix<br />

U = A † Es � Λ 2<br />

Λ −1<br />

s E ∗ sA †∗<br />

.<br />

As shown below, this also leads to an explicit solution for ρ when ρ − ρ 0<br />

is small.<br />

The NSF cost function can be rewritten us<strong>in</strong>g the follow<strong>in</strong>g general<br />

identities:<br />

Tr{AB} = vec T (A T )vec(B) (3.7)<br />

vec(ABC) = � C T ⊗ A � vec(B), (3.8)<br />

where vec(·) is the vectorization operation <strong>and</strong> ⊗ denotes the Kronecker<br />

matrix product [Bre78, Gra81]. An equivalent form of (3.6) is then given<br />

by<br />

where<br />

VNSF = a ∗ ˆ Ma,<br />

a = vec(A),<br />

ˆM = ÛT ⊗ ( Ên Ê∗ n). (3.9)<br />

The next idea is to approximate the steer<strong>in</strong>g matrix around ρ 0. A firstorder<br />

Taylor expansion gives<br />

where<br />

a(θ,ρ) ≈ a0 + Dρ(ρ − ρ 0), (3.10)<br />

a0 = a(θ,ρ 0),<br />

Dρ =<br />

� ∂a(θ,ρ)<br />

∂ρ 1<br />

...<br />

��<br />

∂a(θ,ρ) ��θ,ρ0<br />

∂ρ . (3.11)<br />

n<br />

S<strong>in</strong>ce the derivatives of both a <strong>and</strong> the approximation <strong>in</strong> (3.10) with respect<br />

to any parameter <strong>in</strong> θ or ρ are identical when evaluated at ρ 0, it<br />

can be shown that this approximation will not affect the asymptotic properties<br />

of the estimates. Insert<strong>in</strong>g (3.10) <strong>in</strong> (3.6) <strong>and</strong> us<strong>in</strong>g the result<strong>in</strong>g<br />

cost function to replace VWSF <strong>in</strong> (3.2) leads to<br />

N<br />

ˆσ 2 (a0 + Dρ˜ρ) ∗ ˆ M(a0 + Dρ˜ρ) + 1<br />

2 ˜ρT Ω −1 ˜ρ, (3.12)


70 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

where ˜ρ = ρ − ρ 0. This cost function is quadratic <strong>in</strong> ρ <strong>and</strong> can thus,<br />

for fixed θ, be explicitly m<strong>in</strong>imized with respect to ρ. Substitut<strong>in</strong>g the<br />

solution for ρ back <strong>in</strong>to (3.12) gives<br />

where<br />

VMAP-NSF(θ) = a ∗ 0 ˆ Ma0 − ˆ f T ˆ Γ −1 ˆf, (3.13)<br />

ˆ f = Re{D ∗ ρ ˆ Ma0},<br />

ˆΓ = Re{D ∗ ρ ˆ MDρ + ˆσ2<br />

2 ¯ Ω −1 }. (3.14)<br />

The function <strong>in</strong> (3.13) depends on θ via a0 <strong>and</strong> Dρ. It is noted <strong>in</strong><br />

[VS94a] that Dρ (<strong>and</strong> ˆ M) can be replaced with a consistent estimate<br />

without affect<strong>in</strong>g the large sample second order properties.<br />

The follow<strong>in</strong>g procedure is suggested by the derivations above:<br />

1. Compute the eigendecomposition of ˆ R. Obta<strong>in</strong> a consistent estimate<br />

of θ us<strong>in</strong>g, for example, MUSIC.<br />

2. Based on the estimate from the first step, compute consistent estimates<br />

of Dρ <strong>and</strong> ˆ M. Insert these estimates <strong>in</strong> the cost function<br />

(3.13), which thus depends on θ only via a0. Take the m<strong>in</strong>imiz<strong>in</strong>g<br />

argument of the result<strong>in</strong>g cost function as the MAP-NSF estimate<br />

of θ.<br />

In [VS94a], the statistical properties of the MAP-NSF estimate of θ are<br />

analyzed. It is shown that MAP-NSF is a consistent <strong>and</strong> asymptotically<br />

efficient estimator if d ′ = d. This means that the asymptotic covariance<br />

matrix of the estimation error <strong>in</strong> θ is equal to the Cramér-Rao lower<br />

bound given <strong>in</strong> the next section.<br />

3.3.4 Cramér Rao Lower Bound<br />

In [WOV91, VS94a] an asymptotically valid Cramér Rao lower bound is<br />

derived for the problem of <strong>in</strong>terest here<strong>in</strong>. Below we give the lower bound<br />

on the signal parameters only.<br />

Theorem 3.1. Assum<strong>in</strong>g that ˆ θ is an asymptotically unbiased estimate<br />

of θ0, then for large N,<br />

E{( ˆ θ − θ0)( ˆ θ − θ0) T } ≥ CRBθ � σ2 � T<br />

C − F<br />

2N<br />

θΓ −1 �−1<br />

Fθ , (3.15)


3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g (GWSF) 71<br />

where 1<br />

C = Re{D ∗ θMD θ},<br />

M = U T ⊗ Π ⊥ A, (3.16)<br />

� �<br />

∂a(θ,ρ) ∂a(θ,ρ)<br />

Dθ = ... ,<br />

∂θ1<br />

∂θd<br />

Fθ = Re{D ∗ ρMD θ}, (3.17)<br />

Γ = Re{D ∗ ρMDρ + σ2<br />

2 ¯ Ω −1 }. (3.18)<br />

The above expressions are evaluated at θ0 <strong>and</strong> ρ 0.<br />

Proof. See [WOV91, VS94a].<br />

It is important to notice that the CRB for the case with no calibration<br />

errors is σ 2 C −1 /2N <strong>and</strong> clearly is a lower bound for the MAP-CRB.<br />

3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g<br />

(GWSF)<br />

It was shown <strong>in</strong> [VS94a] that the MAP-NSF method is asymptotically<br />

statistically efficient. <strong>On</strong>e drawback with MAP-NSF is that it is not<br />

consistent if the sources are fully correlated, that is, if d ′ < d. This<br />

problem can be overcome <strong>in</strong> the signal subspace formulation. In this<br />

section, we derive a new method <strong>in</strong> the signal subspace fitt<strong>in</strong>g (SSF) class<br />

of algorithms. The method will be shown to possess the same large sample<br />

performance as MAP-NSF <strong>and</strong> is thus also asymptotically efficient. A<br />

further advantage of the proposed SSF approach is that the cost function<br />

depends on the parameters <strong>in</strong> a relatively simple manner <strong>and</strong>, if the<br />

nom<strong>in</strong>al array is uniform <strong>and</strong> l<strong>in</strong>ear, the non-l<strong>in</strong>ear m<strong>in</strong>imization can be<br />

replaced by a root<strong>in</strong>g technique (see Section 3.6).<br />

Accord<strong>in</strong>g to the problem formulation, ˆ R − R = Op(1/ √ N), where<br />

R = E{x(t)x ∗ (t)} = A0PA ∗ 0 + σ 2 I<br />

<strong>and</strong> A0 = A(θ0,ρ 0). This implies that Ês → Es = A0T at the same<br />

rate, where T is a d × d ′ full rank matrix. The idea of the proposed<br />

method is to m<strong>in</strong>imize a suitable norm of the residuals<br />

ε = vec(B ∗ (θ) Ês),<br />

1 Notice that M is the limit of ˆ M def<strong>in</strong>ed <strong>in</strong> (3.9) if d ′ = d.


72 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

where B(θ) is an m × (m − d) full rank matrix whose columns span the<br />

null-space of A ∗ (θ). This implies that B ∗ (θ)A(θ) = 0 <strong>and</strong> B ∗ (θ0)Es =<br />

0.<br />

Remark 3.1. For general arrays, there is no known closed form parameterization<br />

of B <strong>in</strong> terms of θ. However, this will not <strong>in</strong>troduce any problems<br />

s<strong>in</strong>ce, as we show <strong>in</strong> Section 3.6, the f<strong>in</strong>al criterion can be written<br />

as an explicit function of θ.<br />

For general array error models, we have to consider the real <strong>and</strong> imag<strong>in</strong>ary<br />

parts of ε separately to get optimal (m<strong>in</strong>imum variance) performance.<br />

Equivalently, we may study ε <strong>and</strong> its complex conjugate, which<br />

we denote by ε c (cf. Section 2.3.5). We believe that the analysis of the<br />

method is clearer if ε <strong>and</strong> ε c are used <strong>in</strong>stead of the real <strong>and</strong> imag<strong>in</strong>ary<br />

parts. However, from an implementational po<strong>in</strong>t of view, it may be more<br />

suitable to use real arithmetic. Before present<strong>in</strong>g the estimation criterion,<br />

we def<strong>in</strong>e:<br />

<strong>and</strong><br />

�<br />

vec(Ês)<br />

ês =<br />

vec( Êc �<br />

,<br />

s)<br />

�<br />

(Id ′ ⊗ A)<br />

Ā =<br />

0<br />

0<br />

(Id ′ ⊗ Ac �<br />

,<br />

)<br />

�<br />

¯B =<br />

,<br />

�<br />

(Id ′ ⊗ B) 0<br />

0 (Id ′ ⊗ Bc )<br />

¯ε =<br />

�<br />

ε<br />

εc �<br />

= ¯ B ∗ ês.<br />

Notice that the columns of ¯ B span the null-space of Ā∗ .<br />

We propose to estimate the signal parameters θ as follows:<br />

ˆθ = arg m<strong>in</strong><br />

θ VGWSF(θ), (3.19)<br />

VGWSF(θ) = ¯ε ∗ (θ)W¯ε(θ), (3.20)<br />

where W is a positive def<strong>in</strong>ite weight<strong>in</strong>g matrix. The method will <strong>in</strong> the<br />

sequel be referred to as the generalized weighted subspace fitt<strong>in</strong>g (GWSF)<br />

method. To derive the weight<strong>in</strong>g that leads to m<strong>in</strong>imum variance estimates<br />

of θ, we need to compute the residual covariance matrix. As shown<br />

<strong>in</strong> Appendix A.2, the (asymptotic) second order moment of the residual


3.4 Generalized Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g (GWSF) 73<br />

vector ¯ε at θ0<br />

can be written as<br />

C¯ε = lim<br />

N→∞ N E{¯ε¯ε∗ } (3.21)<br />

C¯ε = ¯ L + ¯ G ¯ G ∗ , (3.22)<br />

where we have def<strong>in</strong>ed<br />

�<br />

¯L<br />

L 0<br />

=<br />

0 Lc �<br />

, (3.23)<br />

�<br />

L = σ 2 −2<br />

Λ� Λs ⊗ B ∗ �<br />

B ,<br />

�<br />

¯G<br />

G<br />

=<br />

Gc �<br />

, (3.24)<br />

G = � T T ⊗ B ∗� Dρ ¯ Ω 1/2 .<br />

Here, ¯ Ω 1/2 is a (symmetric) square root of ¯ Ω <strong>and</strong> T = A †<br />

0 Es. It is easy<br />

to see that C¯ε is positive def<strong>in</strong>ite s<strong>in</strong>ce L is. It is then well known that<br />

(<strong>in</strong> the class of estimators based on ¯ε) the optimal choice of the weight<strong>in</strong>g<br />

<strong>in</strong> terms of m<strong>in</strong>imiz<strong>in</strong>g the parameter estimation error variance is<br />

WGWSF = C −1<br />

¯ε<br />

(3.25)<br />

(see, e.g., Section 2.3.5 <strong>and</strong> [SS89]). In the next section it is proven<br />

that this weight<strong>in</strong>g <strong>in</strong> fact also gives asymptotically (<strong>in</strong> N) m<strong>in</strong>imum<br />

variance estimates (<strong>in</strong> the class of unbiased estimators) for the estimation<br />

problem under consideration. The implementation of GWSF is discussed<br />

<strong>in</strong> Section 3.6. We conclude this section with two remarks regard<strong>in</strong>g the<br />

GWSF formulation.<br />

Remark 3.2. If there are no model errors, then GWSF reduces to the<br />

WSF estimator (3.3). This can easily be verified by sett<strong>in</strong>g ¯ Ω = 0 <strong>and</strong><br />

rewrit<strong>in</strong>g the GWSF criterion (3.20).<br />

Remark 3.3. For the case of model errors only (N → ∞ or σ 2 → 0),<br />

GWSF becomes the “model errors only” algorithm [SK93]. The results<br />

obta<strong>in</strong>ed for GWSF are also consistent with weight<strong>in</strong>gs given <strong>in</strong> [VS94b]<br />

for special array error models.<br />

Thus, the GWSF method can be considered to be optimal <strong>in</strong> general,<br />

<strong>and</strong> not just when the model errors <strong>and</strong> the f<strong>in</strong>ite sample effects are of<br />

the same order.


74 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

3.5 Performance Analysis<br />

The asymptotic properties of the GWSF estimates are analyzed <strong>in</strong> this<br />

section. It is shown that the estimates are consistent <strong>and</strong> have a limit<strong>in</strong>g<br />

Gaussian distribution. It is also shown that the asymptotic covariance<br />

matrix of the estimation error is equal to the CRB.<br />

3.5.1 Consistency<br />

The cost function (3.20) converges uniformly <strong>in</strong> θ with probability one<br />

(w.p.1) to the limit<br />

¯VGWSF(θ) = � vec∗ (B∗ (θ)Es) vecT (B∗ (θ)Es) � � �<br />

∗ vec(B (θ)Es)<br />

W<br />

.<br />

vec c (B ∗ (θ)Es)<br />

(3.26)<br />

This implies that ˆ θ converges to the m<strong>in</strong>imiz<strong>in</strong>g argument of (3.26).<br />

S<strong>in</strong>ce W is positive def<strong>in</strong>ite, ¯ VGWSF(θ) ≥ 0 with equality if <strong>and</strong> only<br />

if B ∗ (θ)Es = 0. However, it is proven by [WZ89] that this uniquely<br />

determ<strong>in</strong>es θ0 if d < (m + d ′ )/2. We have thus proven that ˆ θ <strong>in</strong> (3.19)<br />

tends to θ0 w.p.1 as N → ∞.<br />

3.5.2 Asymptotic Distribution<br />

In this section we derive the asymptotic distribution of the GWSF estimates<br />

(3.19). As already mentioned, there exists a certa<strong>in</strong> weight<strong>in</strong>g<br />

matrix (3.25) that m<strong>in</strong>imizes the estimation error variance. This matrix<br />

depends on the true parameters <strong>and</strong>, <strong>in</strong> practice, has to be replaced with<br />

a consistent estimate. However, this will not change the second order<br />

asymptotic properties considered here<strong>in</strong>. In fact it is easy to see below<br />

that the derivative of the cost function with respect to W only leads to<br />

higher order terms. Thus, whether W is considered as a fixed matrix<br />

or a consistent estimate thereof does not matter for the analysis <strong>in</strong> this<br />

section.<br />

The estimate that m<strong>in</strong>imizes (3.20) satisfies by def<strong>in</strong>ition V ′ ( ˆ θ) = 0,<br />

<strong>and</strong> s<strong>in</strong>ce ˆ θ is consistent, a Taylor series expansion of V ′ ( ˆ θ) around θ0<br />

leads to<br />

˜θ = −H −1 V ′ + op(1/ √ N), (3.27)<br />

where ˜ θ = ˆ θ − θ0, H is the limit<strong>in</strong>g Hessian, <strong>and</strong> V ′ is the gradient of<br />

V (θ). Both V ′ <strong>and</strong> H are evaluated at θ0. In the follow<strong>in</strong>g, let the <strong>in</strong>dex


3.5 Performance Analysis 75<br />

i denote the partial derivative with respect to θi. The derivative of V (θ)<br />

is given by<br />

V ′<br />

i = ¯ε ∗ iW¯ε + ¯ε ∗ W¯εi<br />

= 2Re {¯ε ∗ iW¯ε} .<br />

(3.28)<br />

It is straightforward to verify that ¯ε ∗ iW¯ε is real for any weight<strong>in</strong>g matrix<br />

hav<strong>in</strong>g the form<br />

W =<br />

� W11 W12<br />

W c 12 W c 11<br />

�<br />

. (3.29)<br />

In particular, the matrix WGWSF <strong>in</strong> (3.25) has this structure. Next notice<br />

that ¯ε = ¯ B ∗ ês <strong>and</strong> thus<br />

¯εi = ¯ B ∗ i ês = ¯ B ∗ ies + op(1), (3.30)<br />

where es denotes the limit of ês. Def<strong>in</strong>e the vector<br />

ki = ¯ B ∗ � �<br />

∗ vec(B<br />

ies = iEs) ,<br />

<strong>and</strong> <strong>in</strong>sert (3.30) <strong>in</strong> (3.28) to show that<br />

vec(B∗ c<br />

iEs) V ′<br />

i ≃ 2k ∗ iW¯ε, (3.31)<br />

for any W satisfy<strong>in</strong>g (3.29), where ≃ represents an equality up to first<br />

order; that is, terms of order op(1/ √ N) are omitted. Recall the relation<br />

(A.5) <strong>and</strong> that √ N( ˆ R − R) is asymptotically Gaussian by the central<br />

limit theorem. By (3.27) <strong>and</strong> (3.31) we thus conclude that the normalized<br />

estimation error has a limit<strong>in</strong>g Gaussian distribution<br />

√ N ˜ θ ∈ AsN(0,H −1 QH −1 ),<br />

where<br />

Q = lim<br />

N→∞ N E{V ′ V ′T }, (3.32)<br />

<strong>and</strong> where the notation AsN means asymptotically Gaussian distributed.<br />

In summary we have the follow<strong>in</strong>g result.<br />

Theorem 3.2. If d < (m+d ′ )/2, then the GWSF estimate (3.19) tends<br />

to θ0 w.p.1 as N → ∞, <strong>and</strong><br />

√ N( ˆ θ − θ0) ∈ AsN(0,H −1 QH −1 ).


76 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

If W = WGWSF given <strong>in</strong> (3.25), then<br />

H −1 QH −1 = N CRBθ.<br />

Expressions for H <strong>and</strong> Q can be found <strong>in</strong> Appendix A.3 <strong>and</strong> CRBθ is<br />

given <strong>in</strong> (3.15). All quantities are evaluated at θ0.<br />

Proof. The result follows from the analysis above <strong>and</strong> the calculations <strong>in</strong><br />

Appendix A.3.<br />

This result is similar to what was derived for the MAP-NSF method<br />

<strong>in</strong> [VS94a]. It implies that GWSF <strong>and</strong> MAP-NSF are asymptotically<br />

equivalent <strong>and</strong> efficient for large N <strong>and</strong> small Ω (= ¯ Ω/N). An advantage<br />

of GWSF as compared to MAP-NSF is that GWSF is consistent even if<br />

P is rank deficient (d ′ < d). Another advantage is that the m<strong>in</strong>imization<br />

of VGWSF(θ) can be performed without a search if the nom<strong>in</strong>al array<br />

is uniform <strong>and</strong> l<strong>in</strong>ear, as we show <strong>in</strong> Section 3.6. In short, GWSF is<br />

a generalization of WSF [VO91] <strong>and</strong> MODE [SS90a] that allows one to<br />

<strong>in</strong>clude a priori knowledge of the array perturbations <strong>in</strong>to the estimation<br />

criterion.<br />

3.6 Implementation<br />

We will <strong>in</strong> this section discuss <strong>in</strong> some detail the implementation of the<br />

GWSF estimator. Recall that the GWSF criterion is written <strong>in</strong> terms<br />

of the matrix ¯ B(θ) <strong>and</strong> that the parameterization of ¯ B <strong>in</strong> general is<br />

not known. In this section we first rewrite the GWSF criterion so that<br />

it becomes explicitly parameterized by θ. This function can then be<br />

m<strong>in</strong>imized numerically to give the GWSF estimate of θ. Next, <strong>in</strong> Section<br />

3.6.2, we present the implementation for the common special case<br />

when the nom<strong>in</strong>al array is uniform <strong>and</strong> l<strong>in</strong>ear. For this case, it is possible<br />

to implement GWSF without the need for gradient search methods.<br />

Thus, GWSF is an <strong>in</strong>terest<strong>in</strong>g direction of arrival estimator which comb<strong>in</strong>es<br />

statistical efficiency with computational simplicity (relative to other<br />

efficient estimators).<br />

3.6.1 Implementation for General Arrays<br />

The GWSF cost function <strong>in</strong> (3.20) reads<br />

VGWSF(θ) = ¯ε ∗ (θ)W¯ε(θ) = ê ∗ s ¯ B(θ)W ¯ B ∗ (θ)ês. (3.33)


3.6 Implementation 77<br />

What complicates matters is that the functional form of ¯ B(θ) is not<br />

known for general array geometries. However, if the optimal weight<strong>in</strong>g<br />

<strong>in</strong> (3.25) is used <strong>in</strong> (3.33), then it is possible to rewrite (3.33) so that it<br />

depends explicitly on θ. To see how this can be done, we will use the<br />

follow<strong>in</strong>g two lemmas.<br />

Lemma 3.1. Let X <strong>and</strong> Y be two full column rank matrices of dimension<br />

m × n <strong>and</strong> m × (m − n), respectively. If X∗Y = 0 then<br />

�<br />

� � †<br />

−1 X<br />

X Y =<br />

Y †<br />

�<br />

,<br />

where (·) † denotes the Moore-Penrose pseudo-<strong>in</strong>verse of the matrix <strong>in</strong><br />

question (X † = (X ∗ X) −1 X ∗ ).<br />

Proof. By direct verification we have<br />

� �<br />

X Y � X †<br />

Y †<br />

� �<br />

† X<br />

=<br />

Y †<br />

�<br />

�X �<br />

Y = Im.<br />

Lemma 3.2. Let X (m × n) <strong>and</strong> Y (m × (m − n)) be two full column<br />

rank matrices. Furthermore, assume that X ∗ Y = 0, <strong>and</strong> let C <strong>and</strong> D be<br />

two non-s<strong>in</strong>gular matrices. Then<br />

XC −1 X ∗ = ΠX<br />

�<br />

X †∗<br />

CX † + Y †∗<br />

DY †� −1<br />

ΠX,<br />

where ΠX = X(X ∗ X) −1 X ∗ is the projection matrix project<strong>in</strong>g orthogonally<br />

onto the range-space of X.<br />

Proof. Note that ΠXY = 0. The lemma is proven by the follow<strong>in</strong>g set<br />

of equalities:<br />

XC −1 X ∗ = ΠXXC −1 X ∗ ΠX<br />

� �<br />

= ΠX X Y � C−1 0<br />

0 D−1 � �<br />

∗ X<br />

Y∗ �<br />

��X ∗<br />

= ΠX<br />

Y∗ �−1 � �<br />

C 0 �X �−1 Y<br />

0 D<br />

�<br />

�X<br />

= ΠX<br />

† ∗<br />

Y †∗� � � �<br />

† C 0 X<br />

0 D Y †<br />

�<br />

= ΠX X †∗<br />

CX † + Y †∗<br />

DY †� −1<br />

ΠX,<br />

ΠX<br />

� −1<br />

�� −1<br />

ΠX<br />

ΠX


78 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

where use was made of Lemma 3.1 <strong>in</strong> the fourth step.<br />

Now apply Lemma 3.2 to the GWSF criterion (3.33). With reference<br />

to Lemma 3.2, take X = ¯ B, Y = Ā, C = W−1 <strong>and</strong>, for example,<br />

D = ( Ā∗ Ā) 2 to show that<br />

where<br />

VGWSF(θ) = ê ∗ sΠ ⊥Ā ˜ WΠ ⊥Ā ês, (3.34)<br />

˜W =<br />

� ¯B † ∗<br />

W −1 ¯ B † + Ā Ā∗� −1<br />

. (3.35)<br />

⊥Ā In (3.34), we used the fact that Π¯ B = Π . Let W be chosen accord<strong>in</strong>g<br />

to (3.25). It is then readily verified that<br />

where<br />

¯B †∗<br />

W −1 B ¯ †<br />

= B ¯ † ∗<br />

C¯ε ¯ B † � �<br />

˜L 0<br />

=<br />

0 L˜ c +<br />

� �<br />

˜G �<br />

˜G ∗ G˜ T � , (3.36)<br />

˜G c<br />

�<br />

˜L = σ 2 −2<br />

Λ� Λs ⊗ Π ⊥ �<br />

A ,<br />

�<br />

˜G = T T ⊗ Π ⊥ �<br />

A Dρ ¯ Ω 1/2 .<br />

Comb<strong>in</strong><strong>in</strong>g (3.36), (3.35) <strong>and</strong> (3.34), we see that the criterion is now<br />

explicitly parameterized by θ. The GWSF algorithm for general array<br />

geometries can be summarized as follows:<br />

1. Based on ˆ R, compute Ês, ˆ Λs, <strong>and</strong><br />

ˆσ 2 1<br />

=<br />

m − d ′<br />

�<br />

Tr( ˆ R) − Tr( ˆ �<br />

Λs) ,<br />

ˆ�Λ = ˆ Λs − ˆσ 2 I.<br />

2. The GWSF estimate of θ is then given by<br />

ˆθ = arg m<strong>in</strong>ê<br />

θ ∗ sΠ ⊥Ā<br />

ˆ˜WΠ<br />

⊥Ā ês,<br />

where ˆ˜ W is given by (3.35) <strong>and</strong> (3.36) with the sample estimates<br />

from the first step <strong>in</strong>serted <strong>in</strong> place of the true quantities.


3.6 Implementation 79<br />

This results <strong>in</strong> a non-l<strong>in</strong>ear optimization problem where θ enters <strong>in</strong> a<br />

complicated way. From the analysis <strong>in</strong> Section 3.5 it follows that the<br />

asymptotic second order properties of ˆ θ are not affected if the weight<strong>in</strong>g<br />

matrix ˜ W is replaced by a consistent estimate. Hence, if a consistent<br />

estimate of θ0 is used to compute ˜ W, then the criterion <strong>in</strong> the second<br />

⊥Ā step only depends on θ via Π . This may lead to some computational<br />

sav<strong>in</strong>gs, but it still results <strong>in</strong> a non-l<strong>in</strong>ear optimization problem. In the<br />

next section, we turn to the common special case of a uniform l<strong>in</strong>ear<br />

array, for which it is possible to avoid gradient search methods.<br />

3.6.2 Implementation for Uniform L<strong>in</strong>ear Arrays<br />

In this section, we present a way to significantly simplify the implementation<br />

of GWSF if the nom<strong>in</strong>al array is l<strong>in</strong>ear <strong>and</strong> uniform. The form of the<br />

estimator is <strong>in</strong> the spirit of IQML [BM86] <strong>and</strong> MODE [SS90b, SS90a].<br />

Many of the details concern<strong>in</strong>g the implementation are borrowed from<br />

[SS90b]. Def<strong>in</strong>e the follow<strong>in</strong>g polynomial with d roots on the unit circle<br />

correspond<strong>in</strong>g to the DOAs:<br />

b0z d + b1z d−1 + · · · + bd = b0<br />

d�<br />

(z − e jωk ), b0 �= 0, (3.37)<br />

k=1<br />

where ωk = 2πδ s<strong>in</strong>(θk) <strong>and</strong> where δ is the separation between the sensors<br />

measured <strong>in</strong> wavelengths. The idea is to re-parameterize the m<strong>in</strong>imization<br />

problem <strong>in</strong> terms of the polynomial coefficients <strong>in</strong>stead of θ. As will<br />

be seen below, this leads to a considerable computational simplification.<br />

Observe that the roots of (3.37) lie on the unit circle. A common way to<br />

impose this constra<strong>in</strong>t on the coefficients <strong>in</strong> the estimation procedure is<br />

to enforce the conjugate symmetry property<br />

bk = b c d−k; k = 0,... ,d. (3.38)<br />

This only approximately ensures that the roots are on the unit circle.<br />

However, notice that if zi is a root of (3.37) with conjugate symmetric<br />

coefficients, then so is z −c<br />

i . Assume now that we have a set of perturbed<br />

polynomial coefficients satisfy<strong>in</strong>g (3.38). As long as the perturbation is<br />

small enough, the d roots of (3.37) will be on the unit circle s<strong>in</strong>ce the true<br />

DOAs are dist<strong>in</strong>ct. That is, for large enough N, the conjugate symmetry<br />

property <strong>in</strong> fact guarantees that the roots are on the unit circle <strong>and</strong> the<br />

asymptotic properties of the estimates will not be affected.


80 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

Next, def<strong>in</strong>e the m × (m − d) Toeplitz matrix B as<br />

B ∗ ⎡<br />

bd<br />

⎢<br />

= ⎢ 0<br />

⎢ .<br />

⎣ .<br />

· · ·<br />

bd<br />

. ..<br />

b1<br />

· · ·<br />

. ..<br />

b0<br />

b1<br />

0<br />

b0<br />

. ..<br />

· · ·<br />

. ..<br />

. ..<br />

⎤<br />

0<br />

⎥<br />

0 ⎥<br />

. ⎥ .<br />

. ⎦<br />

(3.39)<br />

0 ... 0 bd ... b1 b0<br />

It is readily verified that B ∗ A(θ) = 0, <strong>and</strong> s<strong>in</strong>ce the rank of B is m − d<br />

(by construction, s<strong>in</strong>ce b0 �= 0), it follows that the columns of B span the<br />

null-space of A ∗ . This implies that we can use B <strong>in</strong> (3.39) <strong>in</strong> the GWSF<br />

criterion, m<strong>in</strong>imize with respect to the polynomial coefficients, <strong>and</strong> then<br />

compute θ from the roots of the polynomial (3.37). In the follow<strong>in</strong>g we<br />

give the details on how this can be performed.<br />

It is useful to rewrite the GWSF criterion as follows (cf. (3.20) <strong>and</strong><br />

(3.25)):<br />

where<br />

VGWSF = ¯ε ∗ WGWSF¯ε =<br />

The blocks <strong>in</strong> ¯ W −1 are given by<br />

�<br />

∗ vec(B ÊW)<br />

vec(BTÊc W )<br />

�∗<br />

¯W<br />

�<br />

∗ vec(B ÊW)<br />

vec(BTÊc W )<br />

�<br />

, (3.40)<br />

ÊW = 1<br />

√ Ês<br />

ˆσ 2<br />

ˆ −1/2<br />

Λ� Λˆ s , (3.41)<br />

¯W −1 �<br />

( ¯ −1 W )11 (<br />

=<br />

¯ W−1 �<br />

)12<br />

. (3.42)<br />

( ¯ W −1 ) c 12 ( ¯ W −1 ) c 11<br />

( ¯ W −1 )11 = (Id ′ ⊗ B∗ B) + � ¯ T T ⊗ B ∗ � Dρ ¯ ΩD ∗ ρ<br />

( ¯ W −1 )12 = � ¯ T T ⊗ B ∗ � Dρ ¯ ΩD T ρ<br />

� ¯T ⊗ B c � ,<br />

� ¯T c ⊗ B � ,<br />

where ¯ T = T ˆ −1/2<br />

Λ� Λˆ s / √ ˆσ 2 = A † ÊW. Let ÊW•k denote column k <strong>and</strong><br />

ÊWi,j the (i,j)th element of ÊW, <strong>and</strong> def<strong>in</strong>e the vector<br />

b = � b0 b1 ...<br />

�T bd .<br />

Observe that<br />

⎡<br />

B ∗ ÊW•k =<br />

⎢<br />

⎣<br />

ÊWd+1,k<br />

ÊWd+2,k<br />

.<br />

ÊWm,k<br />

ÊWd,k ... ÊW1,k<br />

ÊWd+1,k ... ÊW2,k<br />

.<br />

ÊWm−1,k ... ÊWm−d,k<br />

⎤<br />

⎥<br />

⎦ b � ˜ Skb


3.6 Implementation 81<br />

for k = 1, ..., d ′ , <strong>and</strong> def<strong>in</strong>e<br />

Then we have<br />

S =<br />

⎡<br />

⎢<br />

⎣<br />

˜S1<br />

.<br />

˜Sd ′<br />

⎤<br />

⎥<br />

⎦.<br />

vec(B ∗ ÊW) = Sb.<br />

Let β = � b0 ...<br />

�T, b⌊(d−1)/2⌋ where ⌊(·)⌋ denotes the largest <strong>in</strong>teger<br />

less than or equal to (·). In view of the conjugate symmetry constra<strong>in</strong>t<br />

(3.38),<br />

⎡ ⎤<br />

b = ⎣<br />

β<br />

(b d/2)<br />

Ĩβ c<br />

where the real valued parameter bd/2 only appears if d is even. Here,<br />

Ĩ is the exchange matrix with ones along the anti-diagonal <strong>and</strong> zeros<br />

elsewhere. Clearly, we have<br />

Sb = � ⎡ ⎤<br />

�<br />

βr + jβi S1 (s) S2 ⎣ (bd/2) ⎦<br />

Ĩ(βr − jβi) ⎡ ⎤<br />

⎦,<br />

= � S1 + S2Ĩ (s) j(S1 − S2Ĩ)� ⎣<br />

� Fµ,<br />

β r<br />

(b d/2)<br />

β i<br />

where the subscripts r <strong>and</strong> i denote real <strong>and</strong> imag<strong>in</strong>ary parts, respectively.<br />

With the notation <strong>in</strong>troduced above, the criterion (3.40) can be written<br />

as<br />

VGWSF = µ T<br />

�<br />

F<br />

Fc �∗ ¯W<br />

�<br />

F<br />

Fc �<br />

µ.<br />

To obta<strong>in</strong> a proper parameterization we should also <strong>in</strong>clude a non-triviality<br />

constra<strong>in</strong>t to avoid the solution b = 0 (µ = 0). In [SS90b] it is proposed<br />

that either b0r or b0i can be set to one. In what follows we assume, for<br />

simplicity, that it is possible to put b0r = µ 1 = 1. For particular cases<br />


82 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

this may not be possible. In such cases one should <strong>in</strong>stead use the constra<strong>in</strong>t<br />

b0i = 1. As an alternative to these l<strong>in</strong>ear constra<strong>in</strong>ts, one can use<br />

a norm constra<strong>in</strong>t on µ (see, e.g., [NK94]). The key advantage of the<br />

reformulation of the problem <strong>in</strong> terms of µ is that VGWSF is quadratic <strong>in</strong><br />

µ, <strong>and</strong> thus it can be solved explicitly.<br />

A consistent estimate of θ0 can be obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g<br />

�vec(B ∗ ÊW)� 2 2 s<strong>in</strong>ce B ∗ Es = 0 uniquely determ<strong>in</strong>es θ0. Furthermore,<br />

this also implies that replac<strong>in</strong>g the weight<strong>in</strong>g matrix ¯ W <strong>in</strong> (3.40) by a<br />

consistent estimate does not change the asymptotic distribution of the estimates.<br />

In summary, we have the follow<strong>in</strong>g GWSF algorithm for uniform<br />

l<strong>in</strong>ear arrays:<br />

1. Based on ˆ R, compute Ês, ˆ Λs, <strong>and</strong><br />

ˆσ 2 1<br />

=<br />

m − d ′<br />

�<br />

Tr( ˆ R) − Tr( ˆ �<br />

Λs) ,<br />

ˆ�Λ = ˆ Λs − ˆσ 2 I.<br />

2. Us<strong>in</strong>g the notation <strong>in</strong>troduced above, an <strong>in</strong>itial estimate can be<br />

obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g<br />

�vec(B ∗ ÊW)� 2 2 = �Sb� 2 2 = �Fµ� 2 2 = �Jη + c� 2 2<br />

over η ∈ Rd , where η, J <strong>and</strong> c are def<strong>in</strong>ed via<br />

Fµ = � c J � � �<br />

1<br />

= Jη + c.<br />

η<br />

(3.43)<br />

The norm should be <strong>in</strong>terpreted as �x� 2 2 = x ∗ x. Us<strong>in</strong>g the m<strong>in</strong>imizer<br />

ˆη of (3.43), an estimate of b can be constructed. The <strong>in</strong>itial<br />

estimate of θ0 is then found via the roots of (3.37). Notice that ˆη<br />

is the least-squares solution to the over-determ<strong>in</strong>ed set of equations<br />

given <strong>in</strong> (3.43).<br />

3. Given ¯ Ω <strong>and</strong> an <strong>in</strong>itial estimate ˆ θ (<strong>and</strong> ˆ b), compute ˆ¯ T = A † ( ˆ θ)ÊW,<br />

B( ˆ b) <strong>and</strong> Dρ( ˆ θ). Use these quantities to obta<strong>in</strong> a consistent estimate<br />

of ¯ W −1 <strong>in</strong> (3.42). Let ¯ W −1/2 denote the Cholesky factor of<br />

¯W −1 such that ¯ W −1 = ¯ W −1/2 ¯ W −∗/2 . Solve the follow<strong>in</strong>g l<strong>in</strong>ear<br />

system of equations for Fw<br />

¯W −1/2 Fw =<br />

�<br />

F<br />

Fc �<br />

,


3.7 Performance Analysis of MAPprox 83<br />

<strong>and</strong> def<strong>in</strong>e Jw <strong>and</strong> cw via<br />

Fwµ = � cw Jw<br />

The solution of this step is given by<br />

� � �<br />

1<br />

= Jwη + cw.<br />

η<br />

ˆη = arg m<strong>in</strong><br />

η∈Rd �Jwη + cw� 2 2.<br />

The GWSF estimate ˆ θ is then constructed from ˆη as described<br />

before.<br />

In f<strong>in</strong>ite samples <strong>and</strong> difficult scenarios it may be useful to reiterate the<br />

third step several times; however, this does not improve the asymptotic<br />

statistical properties of the estimate.<br />

In this section, we have shown that GWSF can be implemented <strong>in</strong><br />

a very attractive manner if the nom<strong>in</strong>al array is uniform <strong>and</strong> l<strong>in</strong>ear. In<br />

particular, the solution can be obta<strong>in</strong>ed <strong>in</strong> a “closed form” by solv<strong>in</strong>g a<br />

set of l<strong>in</strong>ear systems of equations <strong>and</strong> root<strong>in</strong>g the polynomial <strong>in</strong> (3.37).<br />

Thus, there is no need for an iterative optimization procedure like that<br />

necessary <strong>in</strong>, for example, MAP-NSF.<br />

3.7 Performance Analysis of MAPprox<br />

The MAPprox cost function (3.5) was derived by a second order approximation<br />

of the MAP-WSF cost function (3.2) around ρ0. The MAPprox<br />

estimates are thus expected to have a performance similar to ML. This<br />

has also been observed <strong>in</strong> simulations [WOV91, VS94a]. However, a formal<br />

analysis of the asymptotic properties of MAPprox has not been conducted.<br />

In this section we show that, us<strong>in</strong>g asymptotic approximations,<br />

the MAPprox cost function <strong>in</strong> fact can be rewritten to co<strong>in</strong>cide with the<br />

GWSF cost function. This establishes the observed asymptotic efficiency<br />

of the MAPprox method.<br />

Consider the normalized MAPprox cost function (cf. (3.5))<br />

1<br />

N VMAPprox(θ) = V (θ) − 1<br />

2 ∂ρV T �<br />

(θ) ∂ρρV (θ) + ¯ Ω −1�−1 ∂ρV (θ),<br />

where<br />

(3.44)<br />

V (θ) = V (θ,ρ 0) (3.45)


84 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

<strong>and</strong><br />

V (θ,ρ) = 1<br />

ˆσ 2 Tr{Π⊥ A(θ,ρ) Ês ˆ WWSF Ê∗ s}. (3.46)<br />

The gradient <strong>and</strong> the Hessian <strong>in</strong> (3.44) are evaluated at ρ 0. <strong>On</strong>e problem<br />

with VMAPprox(θ) is that the <strong>in</strong>verse (∂ρρV (θ) + ¯ Ω −1 ) −1 may not exist<br />

for all θ. This implies that the consistency of the MAPprox estimate<br />

cannot be guaranteed. However, locally around θ0 the function is well<br />

behaved. In the follow<strong>in</strong>g we will approximate (3.44) around θ0. The<br />

approximation reta<strong>in</strong>s the second order behavior of VMAPprox(θ) around<br />

θ0 <strong>and</strong> is well def<strong>in</strong>ed for all θ, imply<strong>in</strong>g that the consistency problem is<br />

elim<strong>in</strong>ated. More precisely, we will study (3.44) for θ such that θ − θ0 =<br />

Op(1/ √ N).<br />

Let us start by deriv<strong>in</strong>g the gradient ∂ρV . In what follows we write<br />

Π ⊥ <strong>in</strong> place of Π ⊥ A, for simplicity of notation. The derivative of (3.46)<br />

with respect to ρ i is<br />

∂ρ V = i 1<br />

ˆσ 2 Tr{Π⊥i Ês ˆ WWSFÊ∗s} = − 2<br />

�<br />

Re<br />

ˆσ 2<br />

where we have used the follow<strong>in</strong>g result:<br />

Π ⊥ i = −Π ⊥ AiA † − A †∗<br />

A ∗ iΠ ⊥ .<br />

Tr{Π ⊥ AiA † Ês ˆ WWSF Ê∗ s}<br />

Us<strong>in</strong>g the def<strong>in</strong>ition (3.11) <strong>and</strong> the formula (3.7), we can write<br />

∂ρV (θ) = ∂ρV | θ,ρ0<br />

= − 2<br />

�<br />

Re D<br />

ˆσ 2 ∗ ρ vec(Π ⊥ Ês ˆ WWSFÊ∗sA †∗<br />

�<br />

)<br />

Consider next the (i,j)th element of the Hessian matrix<br />

∂ρiρ V = − j 2<br />

�<br />

Re Tr{Π<br />

ˆσ 2 ⊥ j AiA † Ês ˆ WWSFÊ∗s � �� �<br />

Op(1)<br />

�<br />

,<br />

= Op(1/ √ N). (3.47)<br />

+ Π ⊥ AijA † Ês ˆ WWSFÊ∗s +Π ⊥ AiA †<br />

jÊs ˆ WWSFÊ∗s � �� �<br />

Op(1/ √ N)<br />

� �� �<br />

Op(1/ √ N)<br />

�<br />

} . (3.48)


3.7 Performance Analysis of MAPprox 85<br />

Notice that<br />

1<br />

N VMAPprox<br />

=<br />

����<br />

V<br />

Op(1/N)<br />

− 1<br />

2<br />

∂ρV T<br />

� �� �<br />

Op(1/ √ N)<br />

(∂ρρV<br />

� �� �<br />

Op(1)<br />

+ ¯ Ω −1<br />

) −1<br />

����<br />

O(1)<br />

∂ρV<br />

����<br />

Op(1/ √ N)<br />

= Op(1/N).<br />

All of the order relations above are valid locally around θ0. Next, study<br />

the derivative of 1<br />

N VMAPprox with respect to θi (here, an <strong>in</strong>dex i denotes<br />

a partial derivative with respect to θi)<br />

∂<br />

∂θi<br />

1<br />

N VMAPprox<br />

= Vi<br />

����<br />

Op(1/ √ N)<br />

+ 1<br />

2<br />

− 1<br />

2<br />

− 1<br />

2<br />

∂ρV T<br />

� �� �<br />

Op(1/ √ N)<br />

∂ρV T<br />

� �� �<br />

Op(1/ √ N)<br />

∂ρV T<br />

i<br />

� �� �<br />

Op(1)<br />

(∂ρρV<br />

� �� �<br />

Op(1)<br />

(∂ρρV<br />

� �� �<br />

Op(1)<br />

(∂ρρV<br />

� �� �<br />

Op(1)<br />

+ ¯ Ω −1<br />

����<br />

O(1)<br />

+ ¯ Ω −1<br />

����<br />

O(1)<br />

+ ¯ Ω −1<br />

) −1<br />

����<br />

O(1)<br />

∂ρV<br />

����<br />

Op(1/ √ N)<br />

) −1 ∂ρρVi (∂ρρV<br />

� �� �<br />

Op(1)<br />

) −1 ∂ρVi<br />

����<br />

Op(1)<br />

.<br />

� �� �<br />

Op(1)<br />

+ ¯ Ω −1<br />

) −1<br />

����<br />

O(1)<br />

∂ρV<br />

����<br />

Op(1/ √ N)<br />

(3.49)<br />

S<strong>in</strong>ce the term conta<strong>in</strong><strong>in</strong>g ∂ρρVi is of order Op(1/N), we conclude that<br />

∂ρρV can be replaced by a consistent estimate without affect<strong>in</strong>g the<br />

asymptotic second order properties. This implies <strong>in</strong> particular that we<br />

can neglect the last two terms <strong>in</strong> (3.48) <strong>and</strong> only reta<strong>in</strong> the first. (Notice<br />

that all three terms <strong>in</strong> ∂ρρVi are of order Op(1). If the second term <strong>in</strong><br />

(3.49) were Op(1/ √ N), then all three terms <strong>in</strong> (3.48) would have contributed.)<br />

Similarly, it can be shown that asymptotically the two last<br />

terms <strong>in</strong> (3.48) do not affect the second order derivatives of VMAPprox/N


86 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

with respect to θ. Recall (3.48) <strong>and</strong> rewrite it as<br />

∂ρiρ V = − j 2<br />

�<br />

Re Tr{Π<br />

ˆσ 2 ⊥ j AiA † Ês ˆ WWSFÊ∗ �<br />

s} + op(1)<br />

= − 2<br />

�<br />

Re Tr{(−Π<br />

ˆσ 2 ⊥ AjA † − A †∗<br />

= 2<br />

�<br />

Re Tr{A<br />

ˆσ 2 †∗<br />

A ∗ jΠ ⊥ AiA † Ês ˆ WWSFÊ∗ �<br />

s} + op(1)<br />

= 2<br />

�<br />

Re vec<br />

ˆσ 2 ∗ ��A †<br />

(Ai) Ês ˆ WWSFÊ∗sA †∗�T ⊥<br />

⊗ Π � �<br />

vec(Aj)<br />

A ∗ jΠ ⊥ )AiA † Ês ˆ WWSF Ê∗ s}<br />

The dom<strong>in</strong>at<strong>in</strong>g term of (3.50) can be written <strong>in</strong> matrix form as<br />

˜Vρρ(θ) � 2<br />

�<br />

Re D<br />

ˆσ 2 ∗ ��A †<br />

ρ Ês ˆ WWSFÊ∗sA †∗�T ⊥<br />

⊗ Π � �<br />

Dρ<br />

= 2<br />

�<br />

Re D<br />

ˆσ 2 ∗ ρ ˆ �<br />

MDρ .<br />

�<br />

+ op(1)<br />

+ op(1).<br />

(3.50)<br />

(3.51)<br />

If ˜ Vρρ(θ) is used <strong>in</strong> lieu of ∂ρρV (θ) <strong>in</strong> (3.44), we get the follow<strong>in</strong>g<br />

alternate cost function:<br />

VMAPprox2(θ) = V (θ) − 1<br />

2 ∂ρV T �<br />

(θ) ˜Vρρ(θ) + ¯ Ω −1�−1 ∂ρV (θ).<br />

It can be seen that (3.51) is non-negative def<strong>in</strong>ite, which ensures that<br />

the <strong>in</strong>verse ( ˜ Vρρ(θ) + ¯ Ω −1 ) −1 always exists. Furthermore, we have (cf.<br />

(3.14))<br />

˜Vρρ(θ) + ¯ Ω −1 = 2<br />

ˆσ 2 ˆ Γ.<br />

It can also be verified that V (θ) <strong>and</strong> ∂ρV (θ), as def<strong>in</strong>ed <strong>in</strong> (3.45) <strong>and</strong><br />

(3.47), can be written as<br />

V (θ) = 1<br />

2 ¯ε∗ˆ¯ L −1<br />

¯ε,<br />

∂ρV (θ) = − ¯ Ω −1/2 ˆ¯ G ∗ ˆ¯ L −1<br />

¯ε,<br />

where ˆ¯ L <strong>and</strong> ˆ¯ G are def<strong>in</strong>ed similarly to (3.23) <strong>and</strong> (3.24) but computed<br />

with sample data quantities <strong>and</strong> <strong>in</strong> θ (not <strong>in</strong> θ0). This implies that<br />

VMAPprox2(θ) = 1<br />

2 ¯ε∗ ˆ WM2¯ε,


3.8 Simulation Examples 87<br />

where<br />

ˆWM2 =<br />

�<br />

ˆ¯L −1<br />

− ˆ¯ L −1 ˆ¯ G ¯ Ω −1/2 ˆσ 2<br />

2 ˆ Γ −1 ¯ Ω −1/2 ˆ¯ G ∗ ˆ¯ L −1 �<br />

. (3.52)<br />

The MAPprox2 cost function can thus be written <strong>in</strong> the same form as the<br />

GWSF criterion (see (3.20)). The weight<strong>in</strong>g matrix <strong>in</strong> (3.52) depends on<br />

θ <strong>and</strong> on sample data quantities. However, <strong>in</strong> Section 3.5 it was shown<br />

that any two weight<strong>in</strong>g matrices related as W1 = W2 + op(1) give the<br />

same asymptotic performance. This implies that <strong>in</strong> the follow<strong>in</strong>g we can<br />

consider the “limit” of (3.52) <strong>in</strong> lieu of ˆ WM2. In what follows, let WM2<br />

denote ˆ WM2 evaluated <strong>in</strong> θ0 <strong>and</strong> for <strong>in</strong>f<strong>in</strong>ite data. Thus, if we can show<br />

that<br />

WM2 ∝ WGWSF<br />

then we have proven that MAPprox2 has the same asymptotic performance<br />

as GWSF, <strong>and</strong> hence that MAPprox2 is asymptotically efficient.<br />

(The symbol ∝ denotes equality up to a multiplicative scalar constant.)<br />

Comb<strong>in</strong><strong>in</strong>g (3.25), (A.19), <strong>and</strong> (A.21) it is immediately clear that<br />

WM2 = WGWSF.<br />

In this section, the MAPprox cost function was approximated so as<br />

to reta<strong>in</strong> the local properties that affect the estimation error variance.<br />

The approximation (MAPprox2) was then shown to co<strong>in</strong>cide with the<br />

GWSF cost function (with a “consistent” estimate of the weight<strong>in</strong>g matrix),<br />

<strong>and</strong> hence it can be concluded that MAPprox(2) provides m<strong>in</strong>imum<br />

variance estimates of the DOAs. However, the MAPprox cost function<br />

depends on θ <strong>in</strong> quite a complicated manner. In Section 3.8 we compare<br />

MAPprox2 <strong>and</strong> GWSF by means of several simulations, <strong>and</strong> the strong<br />

similarities between these two approaches are clearly visible. S<strong>in</strong>ce the<br />

implementation of GWSF is much easier, its use is preferred.<br />

3.8 Simulation Examples<br />

In this section, we illustrate the f<strong>in</strong>d<strong>in</strong>gs of this chapter by means of simulation<br />

examples. The MAP-NSF, MAPprox2 <strong>and</strong> GWSF methods are<br />

compared with one another, as well as with methods like WSF (MODE)<br />

that do not take the array perturbations <strong>in</strong>to account. In [VS94a], the<br />

asymptotic performance of WSF was analyzed for the case under study.<br />

Unfortunately, the result given <strong>in</strong> that paper conta<strong>in</strong>s a pr<strong>in</strong>t<strong>in</strong>g error<br />

<strong>and</strong>, therefore, we provide the follow<strong>in</strong>g result.


88 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

Theorem 3.3. Let ˆ θWSF be the m<strong>in</strong>imiz<strong>in</strong>g argument of VWSF(θ) =<br />

VWSF(θ,ρ 0), where VWSF(θ,ρ) is given <strong>in</strong> (3.3). Then<br />

√ N( ˆ θWSF − θ0) ∈ AsN(0,CWSF),<br />

where<br />

CWSF = σ2<br />

2 C−1 + C −1 F T θ ¯ ΩFθC −1 . (3.53)<br />

Proof. S<strong>in</strong>ce WSF is a special case of GWSF, the result follows from<br />

the analysis <strong>in</strong> Section 3.5. Indeed, if <strong>in</strong> GWSF we use the weight<strong>in</strong>g<br />

W = ¯ L −1 (which makes VGWSF ∝ VWSF) then CWSF = H −1 QH −1 ,<br />

where H <strong>and</strong> Q are given <strong>in</strong> (A.16) <strong>and</strong> (A.17), respectively.<br />

It can be seen from the expression (3.53) that the first term corresponds<br />

to the CRB for the case with no model errors. The second term <strong>in</strong> (3.53)<br />

reflects the performance degradation of WSF due to model errors.<br />

3.8.1 Example 1<br />

Consider a uniform l<strong>in</strong>ear array consist<strong>in</strong>g of m = 10 sensors separated by<br />

a half wavelength. Two signals imp<strong>in</strong>ge from the directions θ0 = [0 ◦ 5 ◦ ] T<br />

relative to broadside. The signals are uncorrelated <strong>and</strong> the SNR is 5 dB:<br />

P/σ 2 = 10 5/10 I2.<br />

The nom<strong>in</strong>al unit ga<strong>in</strong> sensors are perturbed by additive Gaussian r<strong>and</strong>om<br />

variables with variance 0.01. This corresponds to an uncerta<strong>in</strong>ty <strong>in</strong><br />

the ga<strong>in</strong> with a st<strong>and</strong>ard deviation of 10%. Figure 3.1 depicts the rootmean-square<br />

(RMS) error versus the sample size for different methods.<br />

(<strong>On</strong>ly the RMS values for θ1 are displayed; the results correspond<strong>in</strong>g to<br />

θ2 are similar.) In Figure 3.1 <strong>and</strong> <strong>in</strong> the graphs that follow, we show simulation<br />

results us<strong>in</strong>g different symbols, while the theoretical results are<br />

displayed by l<strong>in</strong>es. The empirical RMS values are computed from 1000<br />

<strong>in</strong>dependent trials <strong>in</strong> all simulations. The MAP-NSF <strong>and</strong> MAPprox2<br />

cost functions are m<strong>in</strong>imized with Newton-type methods <strong>in</strong>itialized by<br />

the WSF estimate. WSF (or, equivalently, MODE [SS90a]) is implemented<br />

<strong>in</strong> the “root<strong>in</strong>g form” described <strong>in</strong> [SS90b] (see Section 2.4). The<br />

GWSF method is implemented as described <strong>in</strong> Section 3.6.2. In the examples<br />

presented here, we reiterated the third step of the algorithm three<br />

times. The poor performance of MAP-NSF <strong>in</strong> Figure 3.1 is due to two


3.8 Simulation Examples 89<br />

Figure 3.1: RMS errors for θ1 versus N.<br />

ma<strong>in</strong> reasons. First, MAP-NSF fails to resolve the two signals <strong>in</strong> many<br />

cases. Second, the numerical search is sometimes trapped <strong>in</strong> a local m<strong>in</strong>ima.<br />

The performance of GWSF <strong>and</strong> MAPprox2 is excellent <strong>and</strong> close<br />

to the accuracy predicted by the asymptotic analysis. However, the numerical<br />

m<strong>in</strong>imization of the MAPprox2 cost function is complicated. In<br />

this example, MAPprox2 produced “outliers” <strong>in</strong> about five percent of<br />

the 1000 trials. These outliers were removed before the RMS value for<br />

MAPprox2 was calculated. For small N, GWSF <strong>and</strong> WSF have a similar<br />

performance s<strong>in</strong>ce the f<strong>in</strong>ite sample effects dom<strong>in</strong>ate. However, for larger<br />

N (when calibration errors affect the performance), it can be seen that<br />

GWSF <strong>and</strong> MAPprox2 are significantly better than the st<strong>and</strong>ard WSF<br />

method.


90 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

Figure 3.2: RMS errors for θ1 versus the variance of the uncerta<strong>in</strong>ty <strong>in</strong><br />

the ga<strong>in</strong> of the sensors.<br />

3.8.2 Example 2<br />

Here, we have the same set-up as <strong>in</strong> the previous example, but the sample<br />

size is fixed to N = 200 <strong>and</strong> the variance of the ga<strong>in</strong> uncerta<strong>in</strong>ty is varied.<br />

In Figure 3.2, the RMS errors for θ1 are plotted versus the variance of the<br />

sensor ga<strong>in</strong>s. The curve denoted by CRB <strong>in</strong> Figure 3.2 is the Cramér-<br />

Rao lower bound for the case with no model errors; that is, CRB =<br />

σ 2 C −1 /2N. Notice that GWSF <strong>and</strong> MAPprox2 beg<strong>in</strong> to differ for large<br />

model errors. Indeed, the equivalence of GWSF <strong>and</strong> MAPprox shown <strong>in</strong><br />

Section 3.7 holds for large N <strong>and</strong> small model errors. The three methods<br />

(GWSF, MAPprox2 <strong>and</strong> MAP-NSF) are known to be asymptotically<br />

efficient estimators for this case only; the performance for large model<br />

errors is not predicted by the theory <strong>in</strong> this chapter.


3.9 Conclusions 91<br />

3.8.3 Example 3<br />

This example <strong>in</strong>volves uncerta<strong>in</strong>ty <strong>in</strong> the phase response of the sensors<br />

<strong>and</strong> is taken from [WOV91]. Two plane waves are <strong>in</strong>cident on a uniform<br />

l<strong>in</strong>ear array of m = 6 sensors with half wavelength <strong>in</strong>ter-element spac<strong>in</strong>g.<br />

The signals are uncorrelated <strong>and</strong> 10 dB above the noise, <strong>and</strong> have DOAs<br />

θ0 = [0 ◦ 10 ◦ ] T relative to broadside. The uncerta<strong>in</strong>ty <strong>in</strong> the phase of the<br />

sensors is modeled by add<strong>in</strong>g uncorrelated zero-mean Gaussian r<strong>and</strong>om<br />

variables with covariance<br />

Ω = 10 −3<br />

⎡<br />

10<br />

⎢ 0<br />

⎢ 0<br />

⎢ 0<br />

⎣ 0<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

1<br />

⎤<br />

0<br />

0 ⎥<br />

0 ⎥<br />

0 ⎥<br />

0 ⎦<br />

0 0 0 0 0 10<br />

to the nom<strong>in</strong>al phase, which is zero for all sensors. RMS errors for θ1<br />

are shown <strong>in</strong> Figure 3.3. It appears that MAP-NSF has a higher small<br />

sample threshold than GWSF <strong>and</strong> MAPprox2. For large samples, the<br />

performance of the methods is similar. Aga<strong>in</strong>, it is clearly useful to take<br />

the array uncerta<strong>in</strong>ty <strong>in</strong>to account when design<strong>in</strong>g the estimator.<br />

3.8.4 Example 4<br />

Here, we consider the same set-up as <strong>in</strong> the previous example, but the<br />

signals are correlated. More precisely, the signal covariance matrix is<br />

� �<br />

1 −0.99<br />

P = 10<br />

−0.99 1<br />

<strong>and</strong> the noise variance is σ 2 = 1. The RMS errors for θ1 are shown <strong>in</strong><br />

Figure 3.4. It can be seen that the two signal subspace fitt<strong>in</strong>g methods<br />

GWSF <strong>and</strong> WSF can resolve the two signals, while the noise subspace<br />

fitt<strong>in</strong>g method MAP-NSF has a much higher threshold.<br />

3.9 Conclusions<br />

This chapter has studied the problem of develop<strong>in</strong>g robust weighted signal<br />

subspace fitt<strong>in</strong>g algorithms for arrays with calibration errors. It<br />

was shown that if the second-order statistics of the calibration errors


92 3 Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models<br />

Figure 3.3: RMS errors for θ1 versus the number of snapshots, N.<br />

are known, then asymptotically statistically efficient weight<strong>in</strong>gs can be<br />

derived for very general perturbation models. Earlier research had resulted<br />

<strong>in</strong> optimal weight<strong>in</strong>gs that were applicable only for very special<br />

cases. The GWSF technique derived here<strong>in</strong> unifies earlier work <strong>and</strong> also<br />

enjoys several advantages over the MAP-NSF approach, another statistically<br />

efficient technique developed for calibration errors. In particular,<br />

s<strong>in</strong>ce GWSF is based on the signal rather than noise subspace, it is a<br />

consistent estimator even when the signals are perfectly coherent, <strong>and</strong><br />

has better f<strong>in</strong>ite sample performance than MAP-NSF when the signals<br />

are highly correlated or closely spaced. In addition, unlike MAP-NSF,<br />

when the array is nom<strong>in</strong>ally uniform <strong>and</strong> l<strong>in</strong>ear, the GWSF criterion can<br />

be re-parameterized <strong>in</strong> such a way that the directions of arrival may be<br />

solved for by root<strong>in</strong>g a polynomial rather than via a gradient search. As a<br />

byproduct of our analysis, we also demonstrated the asymptotic statistical<br />

efficiency of the MAPprox algorithm, another robust DOA estimator<br />

for perturbed arrays. While the MAPprox <strong>and</strong> GWSF estimators have a


3.9 Conclusions 93<br />

Figure 3.4: RMS errors for θ1 versus the number of snapshots, N.<br />

very similar performance, GWSF depends on the parameters <strong>in</strong> a simpler<br />

way than MAPprox <strong>and</strong> this facilitates the implementation. Our simulations<br />

<strong>in</strong>dicate that, <strong>in</strong> the presence of calibration errors with known<br />

statistics, both algorithms can yield a significant performance improvement<br />

over techniques that ignore such errors.


Chapter 4<br />

A Direction Estimator for<br />

Uncorrelated Emitters<br />

In this chapter, a novel eigenstructure-based method for direction estimation<br />

is presented. The method assumes that the emitter signals are<br />

uncorrelated. Ideas from subspace <strong>and</strong> covariance match<strong>in</strong>g methods are<br />

comb<strong>in</strong>ed to yield a non-iterative estimation algorithm when a uniform<br />

l<strong>in</strong>ear array is employed. The large sample performance of the estimator<br />

is analyzed. It is shown that the asymptotic variance of the direction estimates<br />

co<strong>in</strong>cides with the relevant Cramér-Rao lower bound (CRB). A<br />

compact expression for the CRB is derived for the case when it is known<br />

that the signals are uncorrelated, <strong>and</strong> it is lower than the CRB usually<br />

used <strong>in</strong> the array process<strong>in</strong>g literature (assum<strong>in</strong>g no particular structure<br />

for the signal covariance matrix). The difference between the two CRBs<br />

can be large <strong>in</strong> difficult scenarios. This implies that, <strong>in</strong> such scenarios,<br />

the proposed method has significantly better performance than exist<strong>in</strong>g<br />

subspace methods such as, for example, WSF, MUSIC <strong>and</strong> ESPRIT.<br />

Numerical examples are provided to illustrate the obta<strong>in</strong>ed results.<br />

4.1 Introduction<br />

The problem of estimat<strong>in</strong>g the directions of arrival (DOAs) of signals from<br />

array data is well documented <strong>in</strong> the literature. A number of high resolution<br />

algorithms or eigenstructure methods have been presented <strong>and</strong><br />

analyzed (see, e.g., [VO91, Sch81, RK89, SS90a, SS91, BM86, RH89,


96 4 A Direction Estimator for Uncorrelated Emitters<br />

SN89b]). In some applications, such as, radio astronomy, communications<br />

etc., it is reasonable to assume that the signals are spatially uncorrelated.<br />

<strong>On</strong>e disadvantage with eigenstructure or, so-called, subspace<br />

based methods is that it is difficult to <strong>in</strong>corporate prior knowledge of the<br />

signal correlation <strong>in</strong>to the eigendecomposition. Hence, it is difficult to<br />

use this prior <strong>in</strong>formation to <strong>in</strong>crease the estimation accuracy.<br />

Here<strong>in</strong>, we propose an estimator which comb<strong>in</strong>es ideas from subspace<br />

<strong>and</strong> covariance match<strong>in</strong>g methods [TO96, WSW95] which makes it possible<br />

to <strong>in</strong>corporate prior knowledge of the signal correlation <strong>in</strong>to the<br />

estimator. It should be noted that the Cramér-Rao lower bound (CRB)<br />

usually used when compar<strong>in</strong>g estimators <strong>in</strong> array signal process<strong>in</strong>g is not<br />

the proper one <strong>in</strong> this scenario. As will be shown here, the CRB that<br />

should be used to compare estimators that use priors on the source correlation<br />

is <strong>in</strong> general lower than that presented <strong>in</strong>, e.g., [SN89a, VO91].<br />

We derive the relevant CRB for the case under study <strong>and</strong> give a compact<br />

matrix expression for the CRB on the DOA parameters. It is also shown<br />

that the proposed method yields estimates that asymptotically atta<strong>in</strong><br />

this CRB on the estimation error covariance.<br />

The proposed estimator is based on a weighted least squares fit of certa<strong>in</strong><br />

elements <strong>in</strong> the eigendecomposition of the output covariance of the<br />

array. In general, this leads to a multi-modal cost function which has to<br />

be m<strong>in</strong>imized with some multi-dimensional search technique. This will,<br />

of course, lead to a computationally rather unattractive problem. However,<br />

for the common special case of a uniform l<strong>in</strong>ear array (ULA), the<br />

cost function can be reparameterized <strong>in</strong> a similar manner as <strong>in</strong> MODE<br />

<strong>and</strong> IQML [SS90b, SS90a, BM86] (see also [JGO97b]). In this case, the<br />

estimates of the directions can be found from the roots of a certa<strong>in</strong> polynomial.<br />

This leads to a significant computational simplification. Note<br />

that a similar reparameterization <strong>and</strong> “non-iterative” scheme has not yet<br />

been found for maximum-likelihood (ML) or covariance match<strong>in</strong>g techniques.<br />

Thus, the presented method has the important property to yield<br />

a non-iterative solution with the same asymptotic accuracy as ML.<br />

This chapter is organized as follows. In Section 4.2, the model <strong>and</strong><br />

assumptions are given. In Section 4.3, we present the DOA estimator. In<br />

Section 4.4, the Cramér-Rao bound for uncorrelated sources is derived.<br />

The proposed estimator is analyzed <strong>in</strong> Section 4.5, where the large sample<br />

properties of the estimates are derived. In Section 4.5, we also derive<br />

the optimal weight<strong>in</strong>g matrix lead<strong>in</strong>g to m<strong>in</strong>imum variance estimates of<br />

the DOA. The numerical implementation of the method when a ULA is<br />

employed, is discussed <strong>in</strong> Section 4.6. Numerical examples are provided <strong>in</strong>


4.2 Data Model <strong>and</strong> Assumptions 97<br />

Section 4.7, to demonstrate the performance of the proposed algorithm.<br />

F<strong>in</strong>ally, some conclusions are given <strong>in</strong> Section 4.8.<br />

4.2 Data Model <strong>and</strong> Assumptions<br />

Consider d uncorrelated, narrow-b<strong>and</strong>, plane waves imp<strong>in</strong>g<strong>in</strong>g on an<br />

unambiguous array consist<strong>in</strong>g of m sensors. The spatial response of<br />

the array is assumed to be parameterized by the directions of arrival,<br />

θ = [θ1, ..., θd] T . The functional form of the array response vector,<br />

a(θ), is assumed to be known. This scenario is described by the model<br />

x(t) = A(θ)s(t) + n(t),<br />

where the array response matrix, A(θ) = [a(θ1),... ,a(θd)], is the collection<br />

of the array response vectors. The source signals, s(t), <strong>and</strong> the<br />

additive noise, n(t), are assumed to be zero-mean Gaussian r<strong>and</strong>om sequences<br />

with second-order moments given by<br />

E{s(t)s T (τ)} = 0, E{n(t)n T (τ)} = 0,<br />

E{s(t)s ∗ (τ)} = Pδt,τ, E{n(t)n ∗ (τ)} = σ 2 Iδt,τ,<br />

where δ is the Kronecker delta function, <strong>and</strong> where the superscripts T<br />

<strong>and</strong> ∗ st<strong>and</strong> for transposition <strong>and</strong> complex conjugate transposition, respectively.<br />

For later use, we also <strong>in</strong>troduce the superscript c for complex<br />

conjugation. Notice that, s<strong>in</strong>ce the sources are assumed to be spatially<br />

uncorrelated, the signal covariance matrix, P, is diagonal. With the<br />

def<strong>in</strong>itions above <strong>and</strong> under the assumption that s(t) <strong>and</strong> n(t) are uncorrelated,<br />

the output covariance matrix of the array is given by<br />

R = E{x(t)x ∗ (t)} = A(θ)PA ∗ (θ) + σ 2 I . (4.1)<br />

It is further assumed that the number of signals is known, or has been<br />

consistently estimated [VOK91a, Wax92].<br />

4.3 Estimat<strong>in</strong>g the Parameters<br />

The problem under consideration is that of estimat<strong>in</strong>g the unknown parameters<br />

<strong>in</strong> (4.1) from sensor array data. Of ma<strong>in</strong> <strong>in</strong>terest are the DOAs,<br />

θ. The estimation procedure should use the prior knowledge that the<br />

signals are uncorrelated to <strong>in</strong>crease the estimation accuracy.


98 4 A Direction Estimator for Uncorrelated Emitters<br />

The subspace estimation techniques rely on the properties of the<br />

eigendecomposition of the output covariance matrix R. Let<br />

R = EsΛsE ∗ s + EnΛnE ∗ n = EsΛsE ∗ s + σ 2 EnE ∗ n<br />

(4.2)<br />

be a partitioned eigendecomposition of R. Here, Λs is a diagonal matrix<br />

conta<strong>in</strong><strong>in</strong>g the d < m largest eigenvalues <strong>and</strong> the columns of Es are the<br />

correspond<strong>in</strong>g eigenvectors. We will assume that the eigenvalues <strong>in</strong> Λs<br />

are dist<strong>in</strong>ct, which is generically true. Similarly, Λn conta<strong>in</strong>s the m − d<br />

smallest eigenvalues <strong>and</strong> En is composed of the rema<strong>in</strong><strong>in</strong>g eigenvectors.<br />

The fact that Λn = σ 2 I follows from the assumption that A is full rank<br />

<strong>and</strong> s<strong>in</strong>ce P is positive def<strong>in</strong>ite. Compar<strong>in</strong>g (4.1) <strong>and</strong> (4.2) gives<br />

where the def<strong>in</strong>ition<br />

APA ∗ = EsΛsE ∗ s + σ 2 EnE ∗ n − σ 2 I = EsΛE ∗ s, (4.3)<br />

Λ = Λs − σ 2 I<br />

has been used. The last step <strong>in</strong> (4.3) follows from the fact that EnE ∗ n =<br />

I −EsE ∗ s. Apply<strong>in</strong>g the vectorization operator (vec) [Gra81] on (4.3), we<br />

obta<strong>in</strong> 1<br />

(A c ⊗ A) vec(P) = (E c s ⊗ Es) vec(Λ), (4.4)<br />

where ⊗ denotes the Kronecker matrix product. S<strong>in</strong>ce the matrices P<br />

<strong>and</strong> Λ are diagonal, there exists a (d 2 × d) selection matrix L such that<br />

vec(P) = Lp <strong>and</strong> vec(Λ) = Lλ ,<br />

where p <strong>and</strong> λ are (d × 1) vectors consist<strong>in</strong>g of the diagonal entries of P<br />

<strong>and</strong> Λ, respectively. Notice that<br />

(A c ⊗ A)L = (A c ◦ A) ,<br />

where ◦ is the Khatri-Rao matrix product, which is a column-wise Kronecker<br />

product [Bre78]. The relation <strong>in</strong> (4.4) will now be used to formulate<br />

an estimator of the DOA.<br />

Let ˆ R denote the sample estimate of the covariance matrix <strong>in</strong> (4.1);<br />

i.e.,<br />

ˆR = 1<br />

N<br />

1 vec(ABC) = � C T ⊗ A � vec(B)<br />

N�<br />

x(t)x ∗ (t),<br />

t=1


4.3 Estimat<strong>in</strong>g the Parameters 99<br />

where N is the number of snapshots. Let<br />

ˆR = Ês ˆ Λs Ê∗ s + Ên ˆ Λn Ê∗ n<br />

be an eigendecomposition of ˆ R, similar to (4.2). A consistent estimate<br />

of the noise variance is given by<br />

ˆσ 2 = 1<br />

m − d Tr{ˆ Λn} = 1<br />

�<br />

Tr{<br />

m − d<br />

ˆ R} − Tr{ ˆ �<br />

Λs} .<br />

Replac<strong>in</strong>g Es <strong>and</strong> Λ <strong>in</strong> (4.4) by Ês <strong>and</strong> ˆ Λ = ˆ Λs − ˆσ 2 I yields<br />

Us<strong>in</strong>g the def<strong>in</strong>itions<br />

(A c �<br />

c<br />

◦ A)p ≈ Ês ◦ Ês<br />

�<br />

ˆλ . (4.5)<br />

B(θ) = (A c (θ) ◦ A(θ)) = � (a c (θ1) ⊗ a(θ1)) ... (a c (θd) ⊗ a(θd)) �<br />

= � vec(a(θ1)a∗ (θ1)) ... vec(a(θd)a∗ (θd)) � , (4.6)<br />

�<br />

ˆ c<br />

f = Ês ◦ Ês<br />

�<br />

ˆλ, (4.7)<br />

we can, from (4.5), formulate the least squares problem<br />

m<strong>in</strong><br />

θ, p �ˆ f − B(θ)p� 2 2. (4.8)<br />

For fixed θ, the solution of (4.8) with respect to p is<br />

ˆp = B † (θ) ˆ f, (4.9)<br />

where B † = (B ∗ B) −1 B ∗ denotes the Moore-Penrose pseudo-<strong>in</strong>verse of<br />

B. By substitut<strong>in</strong>g ˆp back <strong>in</strong>to (4.8), we get<br />

m<strong>in</strong><br />

θ � ˆ f − B(θ)B † (θ) ˆ f� 2 2 = m<strong>in</strong><br />

θ �Π ⊥ B(θ) ˆ f� 2 2, (4.10)<br />

where Π ⊥ B(θ) = I − B(θ)B † (θ). In general, we will use the notation<br />

Π ⊥ X for the orthogonal projection onto the null-space of X ∗ . Similarly,<br />

ΠX = I−Π ⊥ X will denote the orthogonal projection onto the range-space<br />

of X. An estimate of θ can be obta<strong>in</strong>ed by m<strong>in</strong>imiz<strong>in</strong>g the criterion <strong>in</strong><br />

(4.10) (see [GJO96]). However, it is useful to <strong>in</strong>troduce a weight<strong>in</strong>g <strong>in</strong>


100 4 A Direction Estimator for Uncorrelated Emitters<br />

the criterion to improve the accuracy of the estimates. More precisely,<br />

we suggest to estimate θ as<br />

ˆθ = arg m<strong>in</strong><br />

θ V (θ) (4.11)<br />

V (θ) = �Π ⊥ B(θ) ˆ f� 2 W = ˆ f ∗ Π ⊥ B(θ)WΠ ⊥ B(θ) ˆ f , (4.12)<br />

where W is a Hermitian positive def<strong>in</strong>ite weight<strong>in</strong>g matrix yet to be<br />

determ<strong>in</strong>ed (see Section 4.5).<br />

Remark 4.1. It is readily shown that ˆp <strong>in</strong> (4.9) is real valued by mak<strong>in</strong>g<br />

use of the results <strong>in</strong> Appendix A.8. However, ˆp is not guaranteed to be<br />

positive for small N. For large N, this is not a problem s<strong>in</strong>ce ˆp is a<br />

consistent estimate of p. This implies that ˆp is positive for sufficiently<br />

large N <strong>and</strong> the large sample performance of the estimates is not affected.<br />

M<strong>in</strong>imiz<strong>in</strong>g (4.12) to obta<strong>in</strong> the estimates ˆ θ can be accomplished<br />

with a Newton-type algorithm. The gradient <strong>and</strong> Hessian of the cost<br />

function, needed <strong>in</strong> such an implementation, are given as by-products<br />

<strong>in</strong> the statistical analysis of the estimator <strong>in</strong> Section 4.5. The criterion<br />

is <strong>in</strong> general multi-modal, render<strong>in</strong>g the multi-dimensional search for a<br />

global extremum computationally expensive. However, when a ULA is<br />

employed it is possible to reparameterize the criterion <strong>and</strong> convert it to a<br />

l<strong>in</strong>ear problem, where the estimate is obta<strong>in</strong>ed by root<strong>in</strong>g a polynomial.<br />

This is further discussed <strong>in</strong> Section 4.6, where the reparameterization is<br />

shown <strong>in</strong> detail.<br />

4.4 The CRB for Uncorrelated Sources<br />

The Cramér-Rao bound (CRB) is a commonly used lower bound on the<br />

estimation error covariance of any unbiased estimator [Cra46]. Let η =<br />

[θ T ,p T ,σ 2 ] T denote the vector that parameterizes the covariance matrix<br />

R. Notice that we here will derive the CRB when it is known that<br />

the emitter signals are uncorrelated. When the asymptotic estimation<br />

error covariance of an estimator is equal to the CRB it is referred to as<br />

be<strong>in</strong>g asymptotically statistically efficient. We are ma<strong>in</strong>ly <strong>in</strong>terested <strong>in</strong><br />

the bound on the direction parameters θ. Let us denote this bound by<br />

CRBθ. We can now formulate the follow<strong>in</strong>g result:<br />

Theorem 4.1. Under the assumptions given <strong>in</strong> Section 4.2, the Cramér-<br />

Rao lower bound on the estimation error covariance for any unbiased


4.5 Asymptotic Analysis 101<br />

estimate of θ is<br />

Here,<br />

CRBθ =<br />

�<br />

PD ∗ G(G ∗ �−1 CG) ˜ −1 ∗<br />

G DP . (4.13)<br />

˜C = � R T ⊗ R � + σ4<br />

m − d vec(ΠA)vec ∗ (ΠA), (4.14)<br />

D = � � � �<br />

D¯ c c<br />

◦ A + A ◦ D¯ , (4.15)<br />

¯D =<br />

� ∂a(θ1)<br />

∂θ1<br />

· · ·<br />

∂a(θd)<br />

∂θd<br />

�<br />

, (4.16)<br />

<strong>and</strong> G is any matrix whose columns span the null-space of B ∗ (see also<br />

Section 4.6).<br />

Proof. The proof is given <strong>in</strong> Appendix A.5.<br />

The CRB for the “st<strong>and</strong>ard” case, when P is any Hermitian positive<br />

def<strong>in</strong>ite matrix is, e.g., given <strong>in</strong> [SN89a, VO91]. A comparison of (4.13)<br />

with the correspond<strong>in</strong>g CRB-expressions <strong>in</strong> [SN89a, VO91] can be used<br />

to quantify the ga<strong>in</strong> <strong>in</strong> performance of us<strong>in</strong>g the prior <strong>in</strong>formation about<br />

the signal correlation. An example of such a comparison is given <strong>in</strong><br />

Section 4.7.<br />

4.5 Asymptotic Analysis<br />

The asymptotic (<strong>in</strong> N) behavior of the estimates given by (4.11) is analyzed<br />

<strong>in</strong> this section. The derivation of the estimator <strong>in</strong> Section 4.3 is<br />

rather ad hoc. However, below we show that the estimator <strong>in</strong> fact is<br />

asymptotically statistically efficient. We will start the asymptotic analysis<br />

by establish<strong>in</strong>g the consistency of the estimates.<br />

4.5.1 Consistency<br />

In what follows, we use θ0 to dist<strong>in</strong>guish the true DOA vector from a<br />

generic vector θ. Notice that ˆ f converges to<br />

f = (E c s ◦ Es) λ = (A c (θ0) ◦ A(θ0))p = B(θ0)p<br />

with probability one as N → ∞. S<strong>in</strong>ce the norm of Π ⊥ B(θ) is bounded,<br />

the criterion (4.12) tends to<br />

¯V = f ∗ Π ⊥ B(θ)WΠ ⊥ B(θ)f


102 4 A Direction Estimator for Uncorrelated Emitters<br />

uniformly <strong>in</strong> θ with probability one. The consistency of ˆ θ <strong>in</strong> (4.11) then<br />

follows if θ = θ0 is the unique solution to ¯ V = 0. Notice that ¯ V = 0<br />

if <strong>and</strong> only if Π ⊥ B(θ)f = Π ⊥ B(θ)B(θ0)p = 0 s<strong>in</strong>ce W > 0 by assumption.<br />

Assume that (without loss of generality), the first n DOAs are equal but<br />

not all; that is, θi = θ0i for i = 1,... ,n, where 0 ≤ n < d. Then<br />

� � �<br />

p1<br />

Π ⊥ B(θ)f = Π ⊥ �<br />

B(θ) B1 B2<br />

= � �<br />

B2 B(θ)<br />

� �� �<br />

2d−n<br />

�<br />

p2<br />

Id−n<br />

−B † (θ)B2<br />

= Π ⊥ B(θ)B2p2<br />

�<br />

p2,<br />

where B1 <strong>and</strong> B2 conta<strong>in</strong> the first n <strong>and</strong> the last d − n columns of<br />

B(θ0), respectively, <strong>and</strong> where p has been partitioned accord<strong>in</strong>gly. In<br />

Appendix A.6 it is proved that � B2 B(θ) � is full rank if m > (2d−n)/2.<br />

This condition is satisfied for all admissible n if m > d. Clearly, the only<br />

solution to Π ⊥ B(θ)f = 0 is p2 = 0, which contradicts the fact that p<br />

consists of the signal powers. Thus, the only valid solution is θ = θ0 <strong>and</strong><br />

strong consistency of the estimates follows.<br />

Remark 4.2. The identifiability condition d < m is natural if no additional<br />

assumptions are made. Consider, for example, the case of a ULA<br />

for which A(θ) is a V<strong>and</strong>ermonde matrix. The array covariance matrix<br />

(4.1) is then a Toeplitz matrix s<strong>in</strong>ce P is diagonal. It is well known<br />

from the Carathéodory parameterization that any Hermitian, positive<br />

def<strong>in</strong>ite, Toeplitz matrix can be written <strong>in</strong> the form<br />

R = APA ∗ + σ 2 I,<br />

with d < m (see, e.g., [SM97]). This implies that if d ≥ m signals<br />

are received by the array, then it cannot be dist<strong>in</strong>guished from another<br />

scenario with d < m (based on the <strong>in</strong>formation <strong>in</strong> R). <strong>On</strong> the other<br />

h<strong>and</strong>, for particular array geometries, it is possible to estimate more<br />

signals than the number of sensors (d ≥ m) under the assumption of<br />

uncorrelated emitter signals (see, e.g., [PPL90]).<br />

4.5.2 Asymptotic Distribution<br />

After establish<strong>in</strong>g consistency, the asymptotic distribution of the estimates<br />

from (4.11) can be derived through a Taylor series expansion approach<br />

(see, e.g., [VO91, SS89]). Let V ′ (θ) <strong>and</strong> V ′′ (θ) denote the gradient<br />

<strong>and</strong> the Hessian of V (θ), respectively. By def<strong>in</strong>ition V ′ ( ˆ θ) = 0, <strong>and</strong> s<strong>in</strong>ce


4.5 Asymptotic Analysis 103<br />

ˆθ is consistent, a first order Taylor series expansion of V ′ ( ˆ θ) around θ0<br />

leads to<br />

˜θ = ˆ θ − θ0 ≃ −H −1 V ′ (θ0), (4.17)<br />

where ≃ denotes equality <strong>in</strong> probability up to first order. The notation<br />

H refers to the limit<strong>in</strong>g Hessian matrix,<br />

H = lim<br />

N→∞ V ′′ (θ0) . (4.18)<br />

The derivative of (4.12) with respect to θk (i.e., the kth element <strong>in</strong> the<br />

parameter vector θ) is given by<br />

V ′<br />

k = ˆ f ∗ {Π ⊥ B}kWΠ ⊥ B ˆ f + ˆ f ∗ Π ⊥ BW{Π ⊥ B}k ˆ f<br />

≃ −2Re{f ∗ B †∗ B ∗ kΠ ⊥ BWΠ ⊥ B ˆ f}, (4.19)<br />

where, for notational simplicity, the argument (θ0) is omitted. In (4.19),<br />

we made use of the follow<strong>in</strong>g expression for the derivative of the projection<br />

matrix:<br />

{Π ⊥ B}k = −Π ⊥ BB kB † − B †∗ B ∗ kΠ ⊥ B . (4.20)<br />

We have also used the fact that ˆf → f as N → ∞ <strong>and</strong> Π ⊥ Bf = 0. Next, we<br />

will relate ˆf to the sample covariance. From the def<strong>in</strong>ition of ˆf <strong>in</strong> (4.7),<br />

we have<br />

�<br />

ˆf = vec Ês( ˆ Λs − ˆσ 2 I) Ê∗ �<br />

s<br />

�<br />

= vec ˆR 2<br />

− ˆσ I − Ên( ˆ Λn − ˆσ 2 I) Ê∗ �<br />

n<br />

�<br />

= vec ˆR 2<br />

− ˆσ I − ÊnÊ∗n( ˆ R − ˆσ 2 I) ÊnÊ∗ �<br />

n<br />

�<br />

≃ vec ˆR 2 ⊥<br />

− ˆσ I − ΠA( ˆ R − ˆσ 2 I)Π ⊥ �<br />

A<br />

�<br />

= vec<br />

, (4.21)<br />

� ˆR − Π ⊥ A ˆ RΠ ⊥ A − ˆσ 2 Π A<br />

where Π ⊥ A = I − Π A = Π ⊥ A(θ0). In (4.21), use is made of the facts that<br />

Ên Ê∗ n → EnE ∗ n = Π ⊥ A <strong>and</strong> Π ⊥ A( ˆ R − ˆσ 2 I) → 0 as N → ∞. Recall that<br />

the noise variance is estimated as the average of the noise eigenvalues <strong>in</strong>


104 4 A Direction Estimator for Uncorrelated Emitters<br />

ˆΛn <strong>and</strong> notice that<br />

(m − d)(ˆσ 2 − σ 2 ) = Tr{ ˆ Λn − σ 2 Im−d} = Tr{ Ê∗ � �<br />

ˆR 2<br />

n − σ Im Ên}<br />

= Tr{ ÊnÊ∗ � �<br />

ˆR 2<br />

n − σ Im ÊnÊ∗n} ≃ Tr{EnE ∗ � �<br />

ˆR 2<br />

n − σ Im EnE ∗ n}<br />

= Tr{Π ⊥ A ˆ R} − (m − d)σ 2 ,<br />

where Ik denotes a k × k identity matrix. This implies that 2<br />

ˆσ 2 ≃ 1<br />

m − d Tr{Π⊥ A ˆ R} = 1<br />

m − d vec∗ (Π ⊥ A)vec( ˆ R) . (4.22)<br />

Us<strong>in</strong>g (4.22) <strong>in</strong> (4.21), we get<br />

� �<br />

ˆf = I − Π ⊥T<br />

A ⊗ Π ⊥ �<br />

A − 1<br />

m − d vec(ΠA)vec ∗ (Π ⊥ �<br />

A) vec( ˆ R)<br />

� Mvec( ˆ R). (4.23)<br />

From the central limit theorem it follows that the elements <strong>in</strong> √ N( ˆ R−R)<br />

are asymptotically zero-mean Gaussian distributed. The relations (4.19),<br />

(4.17) <strong>and</strong> (4.23) then show that this holds for √ NV ′ (θ0) <strong>and</strong> √ N ˜ θ as<br />

well. We have<br />

√ NV ′ (θ0) ∈ AsN(0,Q),<br />

where<br />

Q = lim<br />

N→∞ N E{V ′ V ′T }, (4.24)<br />

<strong>and</strong> where the notation AsN means asymptotically Gaussian distributed.<br />

The asymptotic second-order properties of ˜ θ are given by the follow<strong>in</strong>g<br />

result:<br />

Theorem 4.2. If m > d, then the estimate ˆ θ obta<strong>in</strong>ed from (4.11) is a<br />

consistent estimate of θ0 <strong>and</strong> the normalized estimation error is asymptotically<br />

Gaussian distributed accord<strong>in</strong>g to<br />

√ N( ˆ θ − θ0) ∈ AsN(0,Γ),<br />

2 Tr{AB} = vec ∗ (A ∗ ) vec(B)


4.5 Asymptotic Analysis 105<br />

where<br />

The matrices H <strong>and</strong> Q are given by<br />

where<br />

Γ = H −1 QH −1 .<br />

H = 2Re{PD ∗ Π ⊥ BWΠ ⊥ BDP}, (4.25)<br />

Q = 2Re{U ∗ ¯ CU c + U ∗ CU}, (4.26)<br />

U = Π ⊥ BWΠ ⊥ BDP, (4.27)<br />

<strong>and</strong> where D is def<strong>in</strong>ed <strong>in</strong> (4.15). The two covariance matrices C <strong>and</strong> ¯ C<br />

are def<strong>in</strong>ed as<br />

C = lim<br />

N→∞ N E{˜f ˜f ∗ }, (4.28)<br />

¯C = lim<br />

N→∞ N E{˜f ˜f T }, (4.29)<br />

where ˜ f = ˆ f − f. Explicit expressions for C <strong>and</strong> ¯ C are given <strong>in</strong> Appendix<br />

A.7. All quantities are evaluated at the true DOA θ0.<br />

Proof. The derivation of H <strong>and</strong> Q can be found <strong>in</strong> Appendix A.7.<br />

4.5.3 Optimal Weight<strong>in</strong>g<br />

The result <strong>in</strong> Theorem 4.2 is valid for any positive def<strong>in</strong>ite weight<strong>in</strong>g<br />

W. However, it is of course of <strong>in</strong>terest to f<strong>in</strong>d a weight<strong>in</strong>g such that the<br />

asymptotic covariance matrix Γ is m<strong>in</strong>imized (<strong>in</strong> the sense of positive<br />

def<strong>in</strong>iteness). A lower bound on Γ is given by the CRB <strong>in</strong> Theorem 4.1.<br />

Next, we show that there <strong>in</strong>deed exists a weight<strong>in</strong>g matrix W such that<br />

Γ atta<strong>in</strong>s the CRB. This implies that the estimate <strong>in</strong> (4.11) is asymptotically<br />

statistically efficient.<br />

Theorem 4.3. If<br />

W = Wopt =<br />

where ˜ C is def<strong>in</strong>ed <strong>in</strong> (4.14), then<br />

�<br />

Π ⊥ B ˜ CΠ ⊥ B + BB ∗� −1<br />

, (4.30)<br />

Γ = CRBθ.<br />

Here, CRBθ is the Cramér-Rao lower bound on the estimation error covariance<br />

of θ correspond<strong>in</strong>g to the prior knowledge that the signals are<br />

uncorrelated (see Theorem 4.1).


106 4 A Direction Estimator for Uncorrelated Emitters<br />

Proof. The proof can be found <strong>in</strong> Appendix A.8.<br />

Note that the optimal weight<strong>in</strong>g matrix <strong>in</strong> (4.30) depends on the true<br />

parameters. However, s<strong>in</strong>ce Π ⊥ B(θ0)f = 0, it can be shown that Wopt can<br />

be replaced by a consistent estimate without chang<strong>in</strong>g the asymptotic<br />

analysis given above. In particular, this implies that the estimator can be<br />

implemented as a two-step procedure as further expla<strong>in</strong>ed <strong>in</strong> Section 4.6.<br />

Remark 4.3. By mak<strong>in</strong>g use of the results <strong>in</strong> Appendix A.5 <strong>and</strong> Appendix<br />

A.8, it is possible to show that ˆ f can be replaced by vec{ ˆ R − ˆσ 2 I}<br />

<strong>in</strong> the criterion (4.12), without chang<strong>in</strong>g the asymptotic second order<br />

properties of ˆ θ [GO97]. We will not elaborate more on this here.<br />

4.6 Implementation<br />

This section deals with the implementation of the estimator (4.11) when<br />

a uniform l<strong>in</strong>ear array is employed. For general arrays, the criterion<br />

<strong>in</strong> (4.12) can be m<strong>in</strong>imized by a Newton-type method. In the follow<strong>in</strong>g<br />

we will discuss a way to avoid gradient search algorithms. For ULAs,<br />

a technique similar to the one used <strong>in</strong> MODE [SS90b, SS90a] can be<br />

utilized. Note that, when the array consists of uniformly spaced omnidirectional<br />

sensors, the steer<strong>in</strong>g matrix, A, takes the form<br />

⎛<br />

⎞<br />

1 · · · 1<br />

⎜<br />

A = ⎜<br />

⎝<br />

e iω1 · · · e iωd<br />

.<br />

.<br />

e i(m−1)ω1 · · · e i(m−1)ωd<br />

⎟<br />

⎠ ,<br />

where ωk (k = 1, ..., d) are the so-called spatial frequencies. The relationship<br />

between ωk <strong>and</strong> θk is given by<br />

ωk = 2π∆s<strong>in</strong>(θk),<br />

where ∆ is the element spac<strong>in</strong>g measured <strong>in</strong> wavelengths, <strong>and</strong> where θk<br />

is measured relative to array broadside.<br />

The idea is to f<strong>in</strong>d a basis for the null-space of B ∗ that depends<br />

l<strong>in</strong>early on a m<strong>in</strong>imal set of parameters. For that purpose, <strong>in</strong>troduce the<br />

follow<strong>in</strong>g polynomial<br />

g0z d + g1z d−1 + · · · + gd = g0<br />

g0 �= 0 .<br />

d�<br />

(z − e iωk ) (4.31)<br />

k=1


4.6 Implementation 107<br />

From the def<strong>in</strong>ition of B <strong>in</strong> (4.6), it follows that the kth column of B is<br />

given by<br />

B•k = [1 zk · · · z m−1<br />

k<br />

· · · z −(m−1)<br />

k<br />

z −1<br />

k 1 · · · z m−2<br />

k<br />

z −2<br />

k z −1<br />

k · · · z m−3<br />

k<br />

· · ·<br />

z −(m−2)<br />

k · · · 1] T , (4.32)<br />

where zk = e iωk . The goal is to f<strong>in</strong>d a full rank matrix G of dimension<br />

m 2 × (m 2 − d) such that<br />

G ∗ B = 0 .<br />

In this regard, it is useful to permute the rows of B, <strong>and</strong> correspond<strong>in</strong>gly<br />

the columns of G ∗ , such that the permuted version of (4.32) reads<br />

˜B•k = [z −m+1<br />

k<br />

· · · z m−2<br />

k<br />

z −m+2<br />

k<br />

z m−2<br />

k<br />

z −m+2<br />

k<br />

z m−1<br />

k ] T .<br />

z −m+3<br />

k<br />

z −m+3<br />

k<br />

z −m+3<br />

k<br />

Notice that the permutation is made such that elements conta<strong>in</strong><strong>in</strong>g zk<br />

of the same power are collected next to each other. This permutation<br />

will simplify the construction of ˜ G ∗ , the permuted version of G ∗ . The<br />

parameterization is best described by examples. The generalization of the<br />

parameterization will be straightforward from the examples given next.<br />

Example 4.1. Let m = 3 <strong>and</strong> d = 1 which implies that we need m 2 −d =<br />

8 <strong>in</strong>dependent rows to form ˜ G ∗ . <strong>On</strong>e such ˜ G ∗ is given by<br />

⎡<br />

˜G ∗ ⎢<br />

= ⎢<br />

⎣<br />

g1 g0 0 0 0 0 0 0 0<br />

g1 0 g0 0 0 0 0 0 0<br />

0 g1 0 g0 0 0 0 0 0<br />

0 g1 0 0 g0 0 0 0 0<br />

0 g1 0 0 0 g0 0 0 0<br />

0 0 0 g1 0 0 g0 0 0<br />

0 0 0 g1 0 0 0 g0 0<br />

0 0 0 0 0 0 g1 0 g0<br />

⎤<br />

· · ·<br />

⎥ , (4.33)<br />

⎥<br />

⎦<br />

which is easily seen to be full rank <strong>and</strong> to satisfy the key relation ˜ G ∗ ˜ B =<br />

0. The column partition<strong>in</strong>g <strong>in</strong> (4.33) corresponds to the different powers<br />

of zk <strong>in</strong> ˜ B•k. Notice that there exists several alternative positions for g1<br />

<strong>in</strong> the last six rows.


108 4 A Direction Estimator for Uncorrelated Emitters<br />

Example 4.2. In this example, let m = 3 <strong>and</strong> d = 2; i.e., m2 − d = 7<br />

<strong>in</strong>dependent rows are needed. In this case we may take<br />

˜G ∗ ⎡<br />

0<br />

⎢ g2 ⎢ g2 ⎢<br />

= ⎢ g2 ⎢ 0<br />

⎣ 0<br />

−1<br />

g1<br />

g1<br />

g1<br />

g2<br />

g2<br />

1<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

g0<br />

0<br />

0<br />

g1<br />

g1<br />

0<br />

0<br />

g0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

g0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

g0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

g0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

0<br />

⎤<br />

⎥ .<br />

⎥<br />

⎦<br />

(4.34)<br />

0 0 0 g2 0 0 g1 0 g0<br />

Clearly, the matrix ˜ G ∗ <strong>in</strong> (4.34) is full rank <strong>and</strong> ˜ G ∗ ˜ B = 0. Aga<strong>in</strong>, notice<br />

that there are several alternative positions for g1 <strong>and</strong> g2 <strong>in</strong> many rows.<br />

The first row conta<strong>in</strong><strong>in</strong>g ±1 appears because the pattern of the last six<br />

rows <strong>in</strong> (4.34) cannot be cont<strong>in</strong>ued s<strong>in</strong>ce the degree of the polynomial is<br />

“too large”.<br />

By extend<strong>in</strong>g the pattern given <strong>in</strong> the examples above, it can be seen<br />

that there are d(d − 1)/2 rows with ±1 <strong>and</strong> m 2 − d 2 /2 − d/2 rows with<br />

g-coefficients <strong>in</strong> the general case. Altogether this becomes m 2 − d rows,<br />

<strong>and</strong> by construction these rows are l<strong>in</strong>early <strong>in</strong>dependent. The matrix G ∗<br />

is obta<strong>in</strong>ed by a column permutation of ˜ G ∗ .<br />

Observe that the polynomial (4.31) should have its roots on the unit<br />

, (k =<br />

circle. By impos<strong>in</strong>g the conjugate symmetry constra<strong>in</strong>t gk = gc d−k<br />

0,...,d), it is more likely that the roots will lie on the unit circle (see<br />

[SS90b] for details). Notice that the parameterization rema<strong>in</strong>s l<strong>in</strong>ear<br />

with this constra<strong>in</strong>t. To get a m<strong>in</strong>imal (<strong>in</strong>vertible) reparameterization<br />

we need one additional constra<strong>in</strong>t. It is common to use either a l<strong>in</strong>ear or<br />

a norm constra<strong>in</strong>t on the polynomial coefficients (see [SS90b, NK94]).<br />

Next, notice that Π ⊥ B = ΠG = GG † <strong>and</strong> rewrite the criterion (4.12)<br />

as (see (A.46))<br />

ˆ f ∗ Π ⊥ BWoptΠ ⊥ B ˆ f = ˆ f ∗ G(G ∗ ˜ CG) −1 G ∗ˆ f . (4.35)<br />

S<strong>in</strong>ce G ∗ f = 0, it is possible to show that the <strong>in</strong>verse <strong>in</strong> (4.35) can<br />

be replaced with a consistent estimate without alter<strong>in</strong>g the asymptotic<br />

properties of the estimates. The proposed procedure for estimat<strong>in</strong>g θ can<br />

be summarized as follows:<br />

1. Compute a consistent estimate of θ0 by m<strong>in</strong>imiz<strong>in</strong>g the quadratic<br />

function �G ∗ˆ f� 2 2. Observe that there are only d free real parameters<br />

<strong>in</strong> G under the constra<strong>in</strong>ts mentioned above. Alternatively, one


4.7 Numerical Examples 109<br />

can use any other consistent estimator, for example, root-MUSIC<br />

or ESPRIT.<br />

2. Based on sample data <strong>and</strong> the <strong>in</strong>itial estimate of θ0 from the first<br />

step, calculate consistent estimates ˆ˜ C <strong>and</strong> ˆ G of ˜ C <strong>and</strong> G, respectively.<br />

M<strong>in</strong>imize the follow<strong>in</strong>g quadratic criterion<br />

ˆ f ∗ G( ˆ G ∗ ˆ˜ C ˆ G) −1 G ∗ˆ f<br />

over the d free real parameters <strong>in</strong> G. <strong>On</strong>ce the polynomial coefficients<br />

are given, ˆ θ can be retrieved from the phase angle of the<br />

roots of the polynomial (4.31).<br />

Exactly how the m<strong>in</strong>imization of the two quadratic functions above is performed<br />

depends on the constra<strong>in</strong>t on the polynomial coefficients. For the<br />

constra<strong>in</strong>ts previously mentioned, the m<strong>in</strong>imization can be accomplished<br />

either by solv<strong>in</strong>g an eigenvalue problem or by solv<strong>in</strong>g an over-determ<strong>in</strong>ed<br />

system of l<strong>in</strong>ear equations (see [SS90b, NK94] for details).<br />

4.7 Numerical Examples<br />

In this section, some numerical examples <strong>and</strong> simulations are provided<br />

to illustrate the performance of the proposed estimator. Henceforth, the<br />

proposed method will be called DEUCE (direction estimator for uncorrelated<br />

emitters).<br />

Example 4.3. First, we will study the relative improvement <strong>in</strong> estimation<br />

accuracy which can be obta<strong>in</strong>ed by mak<strong>in</strong>g use of the prior knowledge<br />

that the signals are uncorrelated. We will consider a ULA with m = 3<br />

sensor elements separated by a half wavelength. There are two uncorrelated<br />

plane-waves imp<strong>in</strong>g<strong>in</strong>g on the array from different directions. The<br />

first source is located at 0◦ while the direction to the other is varied. The<br />

SNRs of the sources are varied, but the second source is always 10 decibel<br />

(dB) stronger than the first source. That is, if the SNR of the first source<br />

is denoted by SNR1, then<br />

P/σ 2 � �<br />

SNR1 0<br />

=<br />

.<br />

0 10SNR1<br />

The CRB on the estimation error variance on θ2 is calculated with <strong>and</strong><br />

without the prior on the signal correlation. The CRB for the case with


110 4 A Direction Estimator for Uncorrelated Emitters<br />

uncorrelated signals is given <strong>in</strong> (4.13). The expression for the CRB on θ<br />

without any prior on P is given by the follow<strong>in</strong>g expression (see [SN89a,<br />

VO91]):<br />

σ2 � �<br />

Re (<br />

2N<br />

¯ D ∗ Π ⊥ A ¯ D) ⊙ (PA ∗ R −1 AP) T�� −1<br />

, (4.36)<br />

where ⊙ denotes elementwise multiplication. In Figure 4.1, the ratio<br />

between the two CRBs on θ2 given by (4.36) <strong>and</strong> (4.13) is shown for<br />

different θ2 <strong>and</strong> SNR1. Clearly, the ratio between the two CRBs can<br />

Ratio<br />

8<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

20<br />

10<br />

SNR1<br />

0<br />

−10<br />

0<br />

Figure 4.1: The ratio between the two CRBs for θ2, versus θ2 <strong>in</strong> degrees<br />

<strong>and</strong> SNR1 <strong>in</strong> dB.<br />

be significantly larger than one <strong>in</strong> certa<strong>in</strong> scenarios. Hence, there is<br />

motivation for a method that can utilize the prior <strong>in</strong>formation about the<br />

signal correlation. In the next example, we show that the performance<br />

of DEUCE predicted by the large sample analysis also holds for samples<br />

of practical lengths.<br />

Example 4.4. In the second example, we study the small sample behavior<br />

of DEUCE. Aga<strong>in</strong>, we consider a ULA with m = 3 sensor elements<br />

10<br />

θ2<br />

20<br />

30<br />

40


4.7 Numerical Examples 111<br />

separated by a half wavelength. Two uncorrelated sources are located at<br />

θ = [0 ◦ ,20 ◦ ] T relative to array broadside. The signal covariance matrix<br />

is<br />

P =<br />

�<br />

(3/10) 10 0<br />

0 10 (20/10)<br />

�<br />

<strong>and</strong> the noise variance is σ 2 = 1. The root-mean-square (RMS) error of<br />

the second source is depicted <strong>in</strong> Figure 4.2, along with the correspond<strong>in</strong>g<br />

RMS error for the root-MUSIC estimates [Bar83, RH89]. The RMS errors<br />

RMS error<br />

10 0<br />

10 −1<br />

10 1<br />

Figure 4.2: RMS error <strong>in</strong> degrees for θ2 versus the number of snapshots,<br />

N: ’x’ – DEUCE, ’o’ – root-MUSIC. The solid l<strong>in</strong>e represents the CRB<br />

when it is known that the sources are uncorrelated <strong>and</strong> the dashed l<strong>in</strong>e<br />

is the CRB without this knowledge.<br />

are based on 200 <strong>in</strong>dependent trials. For comparison, the correspond<strong>in</strong>g<br />

CRBs are plotted as well. The solid l<strong>in</strong>e represents the CRB when the<br />

prior is <strong>in</strong>corporated (see (4.13)), <strong>and</strong> the dashed l<strong>in</strong>e denotes the CRB<br />

N<br />

10 2


112 4 A Direction Estimator for Uncorrelated Emitters<br />

without the prior (see (4.36)). It can be seen that the RMS error for<br />

DEUCE reaches the appropriate CRB at rather small sample values. In<br />

this example, we made use of the root-MUSIC estimate <strong>in</strong> the first step<br />

of the algorithm given <strong>in</strong> Section 4.6. A l<strong>in</strong>ear constra<strong>in</strong>t (Re(g0) = 1)<br />

on the polynomial coefficients was employed <strong>in</strong> the second step of the<br />

algorithm. The f<strong>in</strong>ite sample performance can be improved slightly by<br />

iterat<strong>in</strong>g the second step several times. However, the RMS errors <strong>in</strong><br />

Figure 4.2 are calculated for the estimates obta<strong>in</strong>ed from the two-step<br />

procedure outl<strong>in</strong>ed <strong>in</strong> Section 4.6.<br />

Example 4.5. Another important issue is the robustness. That is, what<br />

happens if the sources are correlated? The follow<strong>in</strong>g example is provided<br />

to partially answer this question. The setup is the same as <strong>in</strong> Example 4.4<br />

except for the signal correlation. The signal covariance matrix is <strong>in</strong> this<br />

example given by<br />

P =<br />

� p1 p12<br />

p c 12 p2<br />

where p1 = 10 (3/10) , p2 = 10 (20/10) , <strong>and</strong> where the cross-correlation is<br />

def<strong>in</strong>ed as p12 = � (p1p2)ρe iφ . In the simulations, the correlation coefficient,<br />

ρ, as well as the correlation phase, φ, are varied. The DOAs<br />

are estimated from a batch of 100 snapshots. The RMS error for θ2,<br />

computed from 200 <strong>in</strong>dependent trials, is plotted as a function of the<br />

correlation coefficient <strong>in</strong> Figure 4.3. For comparison, the correspond<strong>in</strong>g<br />

RMS error for the root-MUSIC estimates is shown as well. The CRB<br />

given by (4.36) is also displayed <strong>in</strong> Figure 4.3. It is seen that DEUCE<br />

has a better performance than root-MUSIC for small ρ as expected. It is<br />

<strong>in</strong>terest<strong>in</strong>g to observe that the two methods have a similar performance<br />

when the correlation <strong>in</strong>creases.<br />

4.8 Conclusions<br />

In this chapter, a method (DEUCE) for estimat<strong>in</strong>g the locations of uncorrelated<br />

emitter signals from array data was proposed <strong>and</strong> analyzed.<br />

The asymptotic distribution for the direction estimates was derived. The<br />

asymptotic variance was shown to co<strong>in</strong>cide with the Cramér-Rao lower<br />

bound <strong>in</strong>clud<strong>in</strong>g knowledge of uncorrelated signals. This means that<br />

DEUCE is asymptotically statistically efficient. It was also shown that<br />

DEUCE can be implemented as a two-step procedure for uniform l<strong>in</strong>ear<br />

arrays. This makes the method quite attractive s<strong>in</strong>ce it provides<br />

�<br />

,


4.8 Conclusions 113<br />

m<strong>in</strong>imum variance estimates without hav<strong>in</strong>g to resort to non-l<strong>in</strong>ear iterative<br />

m<strong>in</strong>imization. Some computer simulated examples <strong>in</strong>dicate that<br />

DEUCE is rather robust aga<strong>in</strong>st the assumption that the emitters are<br />

uncorrelated. For correlated emitters, DEUCE behaves similar to root-<br />

MUSIC. However, s<strong>in</strong>ce DEUCE is statistically efficient, it outperforms<br />

root-MUSIC for the case of uncorrelated emitters.


RMS<br />

RMS<br />

1.8<br />

1.6<br />

1.4<br />

1.2<br />

1<br />

0.8<br />

0.6<br />

0.4<br />

0.2<br />

ρ<br />

0<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />

Figure 4.3a: Correlation phase, φ =<br />

0 ◦ .<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />

Figure 4.3c: Correlation phase, φ =<br />

120 ◦ .<br />

ρ<br />

RMS<br />

RMS<br />

5<br />

4.5<br />

4<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

ρ<br />

0<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />

Figure 4.3b: Correlation phase, φ =<br />

60 ◦ .<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9<br />

Figure 4.3d: Correlation phase, φ =<br />

180 ◦ .<br />

Figure 4.3: The RMS error (<strong>in</strong> degrees) for θ2 versus the correlation coefficient,<br />

ρ : ’x’ – DEUCE, ’o’ – root-MUSIC. The dashed l<strong>in</strong>e represents<br />

the CRB without prior knowledge given <strong>in</strong> (4.36).<br />

ρ


Part II<br />

<strong>Subspace</strong> <strong>System</strong><br />

<strong>Identification</strong>


Chapter 5<br />

A L<strong>in</strong>ear Regression<br />

Approach to <strong>Subspace</strong><br />

<strong>System</strong> <strong>Identification</strong><br />

Recently, state-space subspace system identification (4SID) has been suggested<br />

as an alternative to the more traditional prediction error system<br />

identification. The aim of this chapter is to analyze the connections between<br />

these two different approaches to system identification. The conclusion<br />

is that 4SID can be viewed as a l<strong>in</strong>ear regression multistep-ahead<br />

prediction error method with certa<strong>in</strong> rank constra<strong>in</strong>ts. This allows us to<br />

describe 4SID methods with<strong>in</strong> the st<strong>and</strong>ard framework of system identification<br />

<strong>and</strong> l<strong>in</strong>ear regression estimation. For example, this observation<br />

is used to compare different cost-functions which occur rather implicitly<br />

<strong>in</strong> the ord<strong>in</strong>ary framework of 4SID. From the cost-functions, estimates<br />

of the extended observability matrix are derived <strong>and</strong> related to previous<br />

work. Based on the estimates of the observability matrix, the asymptotic<br />

properties of two pole estimators, namely the shift <strong>in</strong>variance method <strong>and</strong><br />

a weighted subspace fitt<strong>in</strong>g method, are analyzed. Expressions for the<br />

asymptotic variances of the pole estimation error are given. From these<br />

expressions, difficulties <strong>in</strong> choos<strong>in</strong>g user-specified parameters are po<strong>in</strong>ted<br />

out. Furthermore, it is found that a row-weight<strong>in</strong>g <strong>in</strong> the subspace estimation<br />

step does not affect the pole estimation error asymptotically.


118 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

5.1 Introduction<br />

Algorithms for state-space subspace system identification (4SID) have<br />

attracted a large <strong>in</strong>terest recently [VD94b, Lar90, Ver94, VOWL93]. An<br />

overview of 4SID methods can be found <strong>in</strong> [VD94a, Vib94]. The schemes<br />

for subspace system identification considered are very attractive s<strong>in</strong>ce<br />

they estimate a state-space representation directly from <strong>in</strong>put-output<br />

data. This is done without requir<strong>in</strong>g a canonical parameterization as<br />

is often necessary <strong>in</strong> traditional prediction error methods [Lju87, SS89].<br />

This requirement has been relaxed <strong>in</strong> [McK94b, McK94a] where it is<br />

shown that an over-parameterization of the state-space model does not<br />

necessarily lead to performance degradation. However the estimation<br />

method analyzed there<strong>in</strong> requires a large parameter search. The 4SID<br />

methods use reliable numerical tools such as QR-decomposition <strong>and</strong> s<strong>in</strong>gular<br />

value decomposition (SVD) which lead to numerically effective implementations.<br />

The nonl<strong>in</strong>ear iterative optimization schemes that are<br />

usually necessary <strong>in</strong> the traditional framework are avoided. However, at<br />

the same time, this <strong>in</strong>troduces new problems <strong>in</strong> terms of underst<strong>and</strong><strong>in</strong>g<br />

the properties of the algorithms. Perhaps the most tractable property of<br />

4SID is that the computations for multi-variable systems are just as easy<br />

as for s<strong>in</strong>gle <strong>in</strong>put s<strong>in</strong>gle output systems. Another <strong>in</strong>terest<strong>in</strong>g property<br />

is the connection to model reduction [VD95a].<br />

In the last few years, several new 4SID algorithms have been tested<br />

both <strong>in</strong> practice <strong>and</strong> by computer simulations. The results have been<br />

compared with results obta<strong>in</strong>ed with classical tools. In general, 4SID<br />

methods seem to perform very well, even close to optimal. The motivation<br />

for our work is to further <strong>in</strong>crease the underst<strong>and</strong><strong>in</strong>g of 4SID methods.<br />

The objective is to study the quality of the estimated models, to try to<br />

underst<strong>and</strong> the performance limitations <strong>and</strong> capabilities of 4SID methods<br />

<strong>and</strong>, if possible, to suggest improvements <strong>and</strong> give user guidel<strong>in</strong>es.<br />

In this chapter, 4SID methods are phrased <strong>in</strong> a l<strong>in</strong>ear regression framework.<br />

This allows us to analyze the problem <strong>in</strong> a more traditional way.<br />

<strong>On</strong>e advantage is that this expla<strong>in</strong>s more clearly the partition of data <strong>in</strong><br />

the past <strong>and</strong> future which is done <strong>in</strong> 4SID. Also the l<strong>in</strong>ear regression formulation<br />

is useful to relate different approaches to 4SID. Specifically, the<br />

estimation of the extended observability matrix (the subspace estimation)<br />

is discussed <strong>in</strong> some detail based on different cost-functions. The estimates<br />

are shown to be identical to estimates used <strong>in</strong> exist<strong>in</strong>g algorithms.<br />

To compare different methods, we consider the asymptotic performance<br />

of two pole estimation techniques which are based on an estimate of


5.2 Model <strong>and</strong> Assumptions 119<br />

the extended observability matrix. The methods are the shift <strong>in</strong>variance<br />

method [Kun78, HK66] <strong>and</strong> the weighted subspace fitt<strong>in</strong>g method<br />

proposed <strong>in</strong> [OV94]. We give explicit expressions for the asymptotic covariance<br />

of the estimates. These expressions are rather <strong>in</strong>volved, <strong>and</strong> it<br />

is difficult to <strong>in</strong>terpret the results. However, one <strong>in</strong>terest<strong>in</strong>g general observation<br />

is that a row-weight<strong>in</strong>g <strong>in</strong> the subspace estimation step does<br />

not affect the pole estimation error asymptotically. This result holds<br />

both for the shift <strong>in</strong>variance method <strong>and</strong> for the optimally weighted subspace<br />

fitt<strong>in</strong>g method. In order to ga<strong>in</strong> further <strong>in</strong>sight, we specialize to<br />

certa<strong>in</strong> numerical examples which show that the choice of method <strong>and</strong><br />

user-provided parameters is not obvious.<br />

The rest of this chapter is organized as follows. In Section 5.2 the<br />

problem description is given along with the general assumptions made.<br />

In Section 5.3, we present the l<strong>in</strong>ear regression <strong>in</strong>terpretation of 4SID.<br />

Based on the l<strong>in</strong>ear regression model, subspace estimation is discussed<br />

<strong>in</strong> some detail <strong>in</strong> Section 5.4. Section 5.5 <strong>in</strong>troduces the two pole estimation<br />

algorithms considered. In Section 5.6, we analyze the asymptotic<br />

performance of these two methods <strong>and</strong> give numerical illustrations of the<br />

results <strong>in</strong> Section 5.7. Section 5.8 concludes the chapter.<br />

5.2 Model <strong>and</strong> Assumptions<br />

Consider a l<strong>in</strong>ear time-<strong>in</strong>variant system <strong>and</strong> assume that the dynamics<br />

of the system can be described by an nth order state-space realization<br />

x(t + 1) = Ax(t) + Bu(t) + w(t) (5.1a)<br />

y(t) = Cx(t) + Du(t) + v(t) . (5.1b)<br />

Here, x(t) is an n-dimensional state-vector, u(t) is a vector of m <strong>in</strong>put<br />

signals, w(t) represents the process noise, the vector y(t) conta<strong>in</strong>s l output<br />

signals, <strong>and</strong> v(t) is additive measurement noise. The system matrices<br />

have appropriate dimensions. The system is assumed to be asymptotically<br />

stable, i.e., the eigenvalues of A lie strictly <strong>in</strong>side the unit circle.<br />

Furthermore, for certa<strong>in</strong> theoretical derivations, A is assumed to be diagonalizable.<br />

It is well known that the realization <strong>in</strong> (5.1) is not unique. In fact,<br />

any realization characterized by the set<br />

{A, B, C, D}


120 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

has the same <strong>in</strong>put-output description as the realization given by the set<br />

{T −1 AT, T −1 B, CT, D}<br />

(if the process noise is properly scaled), where T is any full rank matrix.<br />

This is known as a similarity transformation, <strong>and</strong> the two realizations<br />

are said to be similar. The objective of an identification of the system<br />

description <strong>in</strong> (5.1) can be stated as: estimate the system order n, a<br />

matrix set {A, B, C, D} which is similar to the true description <strong>and</strong><br />

possibly the noise covariances from the measured data {u(t), y(t)} N t=1.<br />

5.2.1 4SID Basic Equation<br />

<strong>Subspace</strong> estimation techniques rely on a model that has an <strong>in</strong>formative<br />

part which <strong>in</strong>herits a low-rank property plus a possible noise part. In<br />

4SID, the model is obta<strong>in</strong>ed through a stack<strong>in</strong>g of <strong>in</strong>put-output data.<br />

Introduce a lα × 1 vector of stacked consecutive outputs as<br />

yα(t) =<br />

1<br />

�<br />

√ y(t)<br />

N − α − β + 1<br />

T ,y(t + 1) T ,...,y(t + α − 1) T� T<br />

,<br />

(5.2)<br />

where α is a user-specified <strong>in</strong>teger. To assure a low-rank property, α<br />

is required to be larger than the system order n. The scal<strong>in</strong>g factor <strong>in</strong><br />

(5.2) (1/ √ N − α − β + 1) is <strong>in</strong>troduced <strong>in</strong> order to simplify the def<strong>in</strong>ition<br />

of sample covariance matrices <strong>in</strong> what follows. The parameter β is<br />

<strong>in</strong>troduced below. Similarly, def<strong>in</strong>e vectors of stacked <strong>in</strong>puts, process,<br />

<strong>and</strong> measurement noises as uα(t), wα(t), <strong>and</strong> vα(t), respectively. Now it<br />

is straightforward to iterate the system equations <strong>in</strong> (5.1) to obta<strong>in</strong> the<br />

equation for the stacked quantities<br />

yα(t) = Γαx(t) + Φαuα(t) + nα(t) , (5.3)


5.2 Model <strong>and</strong> Assumptions 121<br />

where nα(t) = Ψαwα(t)+vα(t) conta<strong>in</strong>s the noise <strong>and</strong> where the follow<strong>in</strong>g<br />

block matrices have been <strong>in</strong>troduced,<br />

⎡<br />

⎢<br />

Γα = ⎢<br />

⎣<br />

C<br />

CA<br />

.<br />

CAα−1 ⎤<br />

⎥ ,<br />

⎦<br />

(5.4)<br />

⎡<br />

D 0 · · · 0<br />

⎢<br />

Φα = ⎢<br />

⎣<br />

CB<br />

.<br />

D<br />

. ..<br />

. ..<br />

. ..<br />

0<br />

.<br />

CAα−2B CAα−3 ⎤<br />

⎥<br />

⎦<br />

B · · · D<br />

,<br />

⎡<br />

0 · · · 0 0<br />

⎢<br />

Ψα = ⎢<br />

⎣<br />

C<br />

.<br />

. ..<br />

. ..<br />

0 0<br />

.<br />

.<br />

CAα−2 ⎤<br />

⎥<br />

⎦<br />

· · · C 0<br />

.<br />

Equation (5.3) is the basic model <strong>in</strong> 4SID. Notice that Γα is the extended<br />

observability matrix <strong>and</strong> is of rank n if the system is observable.<br />

5.2.2 General Assumptions<br />

Some general assumptions are stated here. More specific assumptions<br />

will be made when needed.<br />

• The process <strong>and</strong> measurement noise are assumed to be stationary<br />

zero-mean ergodic white Gaussian r<strong>and</strong>om processes with covariance<br />

given by<br />

E<br />

� � �<br />

w(t) w(t)<br />

v(t) v(t)<br />

� T<br />

where E is the expectation operator.<br />

=<br />

� rw rwv<br />

r T wv rv<br />

• The measurable <strong>in</strong>put, u(t), is assumed to be a quasi-stationary determ<strong>in</strong>istic<br />

sequence [Lju87] uncorrelated with w(t) <strong>and</strong> v(t). The<br />

correlation sequence of u(t) is def<strong>in</strong>ed as<br />

ru(s) = Ē{u(t + s)uT (t)}<br />


122 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

where Ē is def<strong>in</strong>ed as<br />

1<br />

Ē{·} = lim<br />

N→∞ N<br />

N�<br />

E{·} .<br />

Furthermore, assume that the <strong>in</strong>put is persistently excit<strong>in</strong>g (PE) of<br />

order α + β, [Lju87], where the parameter β is described below.<br />

• The system is assumed to be observable which implies that the<br />

(extended) observability matrix, (5.4), has full rank, equal to the<br />

system order n.<br />

5.3 L<strong>in</strong>ear Regression<br />

As a first step, the key idea of subspace identification is to estimate the<br />

n dimensional range space of Γα from Equation (5.3). This can be done<br />

with more or less clever projections on the data. The traditional framework<br />

of 4SID can be a bit difficult <strong>in</strong> terms of advanced numerical l<strong>in</strong>ear<br />

algebra, conceal<strong>in</strong>g the statistical properties. Here, 4SID will be formulated<br />

<strong>in</strong> a l<strong>in</strong>ear regression framework. First we will provide motivation<br />

before <strong>in</strong>troduc<strong>in</strong>g a l<strong>in</strong>ear regression model.<br />

5.3.1 Motivation<br />

Consider the basic equation (5.3). The idea is to replace the unknown<br />

state x(t) with an estimate from an optimal stochastic state observer.<br />

Let<br />

t=1<br />

ˆx(t) = K1yβ(t − β) + K2uβ(t − β) + K3uα(t) (5.5)<br />

be the l<strong>in</strong>ear mean square error optimal estimate of x(t), given yβ(t−β),<br />

uβ(t − β) <strong>and</strong> uα(t). Here, β is a user-specified parameter which clearly<br />

can be <strong>in</strong>terpreted as the size of the “memory” of past data used for<br />

estimat<strong>in</strong>g the state. The estimation error x(t) − ˆx(t) is then orthogonal<br />

to yβ(t −β), uβ(t −β) <strong>and</strong> uα(t). Rewrite (5.3) to give the <strong>in</strong>put-output<br />

relation<br />

yα(t) = Γαˆxt + Φαuα(t) + Γα(x(t) − ˆx(t)) + nα(t)<br />

= Γα (K1yβ(t − β) + K2uβ(t − β) + K3uα(t)) + Φαuα(t) + eα(t)<br />

= L1yβ(t − β) + L2uβ(t − β) + L3uα(t) + eα(t) ,<br />

(5.6)


5.3 L<strong>in</strong>ear Regression 123<br />

where we have <strong>in</strong>troduced the notation<br />

� �<br />

L1 L2 = Γα<br />

<strong>and</strong> where eα(t) is def<strong>in</strong>ed as<br />

� K1 K2<br />

L3 = ΓαK3 + Φα ,<br />

eα(t) = Γα(x(t) − ˆx(t)) + nα(t) .<br />

� , (5.7)<br />

Thus, the “future” outputs, yα(t), are given by certa<strong>in</strong> l<strong>in</strong>ear comb<strong>in</strong>ations<br />

of the data. For example, the comb<strong>in</strong>ation of past <strong>in</strong>puts <strong>and</strong> outputs<br />

is constra<strong>in</strong>ed to lie <strong>in</strong> the range space of the extended observability<br />

matrix. Also notice that by construction, the noise term eα(t) is uncorrelated<br />

with the other terms on the right h<strong>and</strong> side of (5.6). From the construction<br />

above it is also clear that yα(t)−eα(t) conta<strong>in</strong>s the mean-square<br />

optimal multistep ahead predictions of yα(t) given yβ(t − β), uβ(t − β)<br />

<strong>and</strong> uα(t). This is <strong>in</strong> general an approximation of the true predictions<br />

s<strong>in</strong>ce we only have f<strong>in</strong>ite memory. As will be apparent later, another approximation<br />

made is that we consider time <strong>in</strong>dependent K-matrices. In<br />

[VD94b], the relation between subspace projections <strong>and</strong> Kalman filter<strong>in</strong>g<br />

is discussed <strong>in</strong> detail.<br />

Remark 5.1. It should be remarked that even though we <strong>in</strong>troduce an<br />

estimate of the states <strong>in</strong> (5.5), this is merely a help quantity <strong>in</strong> the elaborations<br />

to follow. In this chapter, we consider methods based on an<br />

estimate of Γα <strong>and</strong> the matrices K1, K2, K3 will be elim<strong>in</strong>ated along<br />

the way (see Section 5.4).<br />

Special Cases<br />

Determ<strong>in</strong>istic systems: For the determ<strong>in</strong>istic case with no noise, a<br />

dead-beat observer can be constructed which estimates the state exactly<br />

<strong>in</strong> at most n steps. Hence, the state vector can be obta<strong>in</strong>ed as<br />

x(t) = K1yβ(t − β) + K2uβ(t − β) , (5.8)<br />

for a specific choice of K1 <strong>and</strong> K2 (K3 = 0).<br />

ARX systems: In [WJ94], the properties of ARX models <strong>in</strong> 4SID<br />

are studied <strong>in</strong> some detail. S<strong>in</strong>ce the predictor for ARX models has<br />

f<strong>in</strong>ite memory, this implies that even <strong>in</strong> this case the state can be exactly<br />

recovered <strong>in</strong> n steps, i.e.,<br />

x(t) = K1yβ(t − β) + K2uβ(t − β), β ≥ n,


124 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

even if we have noise! Observe that we have the same expression as <strong>in</strong><br />

the determ<strong>in</strong>istic case, so L3 = Φα. This also implies that eα(t) = nα(t)<br />

which is precisely the prediction error <strong>in</strong> the ARX case (see [WJ94]).<br />

Thus, for ARX systems the 4SID output-vector, yα(t), consists of the<br />

true multistep-ahead predictions plus the prediction errors, i.e.,<br />

yα(t) = [ˆy T t|t−1 ,... , ˆyT t+α−1|t−1 ]T + eα(t) ,<br />

where ˆy t+k|t−1 denotes the prediction of y(t + k) given data up to time<br />

t − 1.<br />

5.3.2 4SID L<strong>in</strong>ear Regression Model<br />

Inspired by the elaborations made above, a l<strong>in</strong>ear regression form of the<br />

4SID output equation (5.3) will be <strong>in</strong>troduced. The explicit relation to<br />

l<strong>in</strong>ear regression will hopefully clarify certa<strong>in</strong> th<strong>in</strong>gs <strong>in</strong> 4SID.<br />

Def<strong>in</strong>ition. The 4SID l<strong>in</strong>ear regression model is def<strong>in</strong>ed by<br />

yα(t) = L1yβ(t − β) + L2uβ(t − β) + L3uα(t) + eα(t) . (5.9)<br />

As seen above, the l<strong>in</strong>ear regression model will <strong>in</strong>herit certa<strong>in</strong> properties.<br />

The noise vector eα(t) is orthogonal to yβ(t − β), uβ(t − β), uα(t), <strong>and</strong><br />

where R[·] denotes the range space of [·].<br />

rank[L1 L2] = n , (5.10a)<br />

R[L1 L2] = R[Γα] , (5.10b)<br />

Remark 5.2. While it is obvious that rank[L1 L2] ≤ n, it is not easy<br />

to establish explicit conditions when equality holds. Loosely speak<strong>in</strong>g,<br />

this requires the states to be sufficiently excited. We will not elaborate<br />

further on this here, but assume (5.10a) to hold true throughout this<br />

chapter. A thorough discussion about the rank condition is made <strong>in</strong> the<br />

next chapter.<br />

In [VD94b], a similar idea to replace the state with an estimate is<br />

elaborated. They consider the estimated state-sequence as obta<strong>in</strong>ed from<br />

a bank of non-steady state Kalman filters <strong>and</strong> derive explicit expressions<br />

for the matrices Li, i = 1,2,3 <strong>in</strong> the case of <strong>in</strong>f<strong>in</strong>ite data.<br />

In 4SID algorithms, it is common to use α = β, <strong>and</strong> no general rule<br />

exists for how to choose them. It is also a quite common idea that the<br />

performance improves if α <strong>and</strong> β are <strong>in</strong>creased. However <strong>in</strong> this chapter,


5.4 <strong>Subspace</strong> Estimation 125<br />

a simple example is provided to show that this is not always the case.<br />

The numerical complexity of 4SID methods <strong>in</strong>creases with the size of α<br />

<strong>and</strong> β, thus these parameters should be as small as possible. The l<strong>in</strong>ear<br />

regression <strong>in</strong>terpretation (th<strong>in</strong>k of the state observers <strong>in</strong> (5.5) <strong>and</strong> (5.8),<br />

<strong>and</strong> th<strong>in</strong>k also of the ARX case) suggests that β most often could be<br />

chosen close to the system order, n, as is also <strong>in</strong>dicated <strong>in</strong> the numerical<br />

examples <strong>in</strong> Section 5.7. While α is required to be greater than the<br />

system order, it is not that obvious what the correspond<strong>in</strong>g condition on<br />

β is. The <strong>in</strong>terpretation <strong>in</strong> terms of state observers suggests that β ≥ n is<br />

a necessary condition. However, an <strong>in</strong>strumental variable <strong>in</strong>terpretation<br />

of 4SID (see [VOWL93, Vib94]) <strong>in</strong>dicates that it may suffice to take<br />

β ≥ n/(l + m). This can also be seen by <strong>in</strong>spection of the necessary<br />

conditions for the dimension of quantities used <strong>in</strong> the subspace estimation<br />

<strong>in</strong> Section 5.4. The question of the required size of β is closely tied to<br />

the rank condition (5.10a); see Remark 5.2 <strong>and</strong> Chapter 6.<br />

5.4 <strong>Subspace</strong> Estimation<br />

Most 4SID methods start the estimation procedure by estimat<strong>in</strong>g the extended<br />

observability matrix. This section discusses the so-called subspace<br />

estimation. To be able to compare the effects of different weight<strong>in</strong>gs (different<br />

algorithms), the estimation of the system poles is considered <strong>in</strong><br />

Section 5.5.<br />

The idea <strong>in</strong> this section is to estimate Γα by m<strong>in</strong>imiz<strong>in</strong>g two costfunctions<br />

which arise naturally from the l<strong>in</strong>ear regression formulation.<br />

The so-obta<strong>in</strong>ed estimates are then related to previously published estimates.<br />

We believe that this comparison/unification aids <strong>in</strong> underst<strong>and</strong><strong>in</strong>g<br />

the nature of the subspace estimation procedure.<br />

Us<strong>in</strong>g (5.9), the observations can be modeled by the matrix equation<br />

Yα = L1Yβ + L2Uβ + L3Uα + Eα , (5.11)<br />

where the block Hankel matrices of outputs are def<strong>in</strong>ed as<br />

Yβ = [yβ(1),...,yβ(N − α − β + 1)] , (5.12a)<br />

Yα = [yα(1 + β),...,yα(N − α + 1)] , (5.12b)<br />

<strong>and</strong> similarly for Uα, Uβ, <strong>and</strong> Eα. The 4SID l<strong>in</strong>ear regression model (5.9)<br />

corresponds to an approximate multistep-ahead prediction error model.<br />

It is quite natural to base an estimation procedure on the m<strong>in</strong>imization of


126 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

the prediction error 1 <strong>in</strong> some suitable norm. Thus, consider the prediction<br />

error covariance matrix<br />

Θ(L1,L2,L3) = EαE T α =<br />

N−α+1 �<br />

t=1+β<br />

eα(t)e T α(t) .<br />

where only the explicit dependence on the regression coefficients has been<br />

stressed. In general, the matrices L1, L2 <strong>and</strong> L3 depend on the system<br />

matrices, noise covariances, <strong>and</strong> the <strong>in</strong>itial state. The structure of these<br />

matrices can be <strong>in</strong>corporated <strong>in</strong> the m<strong>in</strong>imization of Θ. However the<br />

result<strong>in</strong>g nonl<strong>in</strong>ear m<strong>in</strong>imization problem is quite complicated.<br />

We will obta<strong>in</strong> estimates by m<strong>in</strong>imiz<strong>in</strong>g two different norms of Θ.<br />

In the m<strong>in</strong>imization, the structure of the problem will be relaxed <strong>in</strong> the<br />

sense that L3 is considered to be a completely unknown matrix, <strong>and</strong><br />

the only restriction of [L1 L2] will be the rank n constra<strong>in</strong>t of (5.10a).<br />

Thus the m<strong>in</strong>imization is a rank-constra<strong>in</strong>ed problem; however, it can<br />

be transformed to an unconstra<strong>in</strong>ed one by the reparameterization (cf.<br />

(5.7))<br />

� �<br />

L1 L2 = Γα ¯ K . (5.13)<br />

The factorization <strong>in</strong> (5.13) is clearly not unique s<strong>in</strong>ce Γα ¯ K = ΓαTT −1 ¯ K<br />

for any <strong>in</strong>vertible T. However, it turns out that this non-uniqueness is<br />

not a problem. The m<strong>in</strong>imization problem to be solved <strong>in</strong> this section<br />

can now be formulated as<br />

m<strong>in</strong><br />

Γα, ¯ V (Γα,<br />

K, L3<br />

¯ K,L3)<br />

for one of the follow<strong>in</strong>g two functions<br />

or<br />

V (Γα, ¯ K,L3) = Tr{WΘ(Γα, ¯ K,L3)} (5.14)<br />

V (Γα, ¯ K,L3) = det{Θ(Γα, ¯ K,L3)}. (5.15)<br />

Here Tr{·} denotes the trace operator, W is any symmetric positive def<strong>in</strong>ite<br />

weight<strong>in</strong>g matrix, <strong>and</strong> det{·} is the determ<strong>in</strong>ant function. In an<br />

obvious notation, the argument of Θ has been altered accord<strong>in</strong>g to the<br />

1 For simplicity, we will call the residuals <strong>in</strong> the l<strong>in</strong>ear regression model for the<br />

prediction error <strong>in</strong> the sequel.


5.4 <strong>Subspace</strong> Estimation 127<br />

reparameterization. As already mentioned, we are only <strong>in</strong>terested <strong>in</strong> an<br />

estimate of the extended observability matrix Γα; hence ¯ K <strong>and</strong> L3 can<br />

be seen as auxiliary variables.<br />

The next idea is to m<strong>in</strong>imize Θ with respect to L3 <strong>and</strong> Γα <strong>in</strong> terms of<br />

matrix <strong>in</strong>equalities. (For two matrices A <strong>and</strong> B, the <strong>in</strong>equality A−B ≥ 0<br />

means that A − B is positive semi-def<strong>in</strong>ite.) The reason is that these<br />

results are valid for any nondecreas<strong>in</strong>g function of Θ <strong>and</strong> thus specifically<br />

for the two choices (5.14) <strong>and</strong> (5.15) (see [SS89]). To m<strong>in</strong>imize with<br />

respect to ¯ K, however, we have to consider the functions separately. This<br />

is done <strong>in</strong> two subsequent sections.<br />

In order to proceed, it is convenient to <strong>in</strong>troduce the notation<br />

� �<br />

� �<br />

yβ(t)<br />

Yβ<br />

pβ(t) = , Pβ = ,<br />

uβ(t)<br />

Uβ<br />

¯L = � �<br />

L1 L2 = Γα ¯ K , U † α = U T α(UαU T α) −1 .<br />

Rewrite Θ as<br />

Θ(Γα, ¯ K,L3) = � Yα − ¯ ��<br />

LPβ − L3Uα Yα − ¯ �T LPβ − L3Uα<br />

= � L3 − (Yα − ¯ LPβ)U † �<br />

α UαU T� α L3 − (Yα − ¯ LPβ)U † �T α<br />

+ � Yα − ¯ ��<br />

LPβ Yα − ¯ �T �<br />

LPβ − Yα − ¯ � � †<br />

LPβ U αUα Yα − ¯ �T LPβ .<br />

(5.16)<br />

S<strong>in</strong>ce UαU T α is positive def<strong>in</strong>ite by assumption, the unconstra<strong>in</strong>ed L3<br />

m<strong>in</strong>imiz<strong>in</strong>g any nondecreas<strong>in</strong>g function of Θ is given by<br />

ˆL3 = � Yα − ¯ � †<br />

LPβ U α . (5.17)<br />

Let<br />

Π ⊥ α = I − U † αUα<br />

denote the projection matrix on the null-space of Uα. Insert (5.17) <strong>in</strong><br />

(5.16) <strong>and</strong> aga<strong>in</strong> rewrite <strong>in</strong> a similar way us<strong>in</strong>g the factorization ¯ L = Γα ¯ K<br />

Θ(Γα, ¯ K) = � Yα − Γα ¯ � � ⊥<br />

KPβ Πα Yα − Γα ¯ �T KPβ<br />

= � Γα − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1� ¯ KPβΠ ⊥ αP T β ¯ K T<br />

× � Γα − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1� T<br />

+ YαΠ ⊥ αY T α − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 ¯ KPβΠ ⊥ αY T α .<br />

(5.18)


128 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

The factor ¯ KPβΠ ⊥ αPT β ¯ KT is positive def<strong>in</strong>ite by assumption. (This is<br />

because ¯ K has full row rank <strong>and</strong>, PβΠ ⊥ αPT β is positive def<strong>in</strong>ite under<br />

weak conditions on the noise <strong>and</strong> if u(t) is persistently excit<strong>in</strong>g of order<br />

α + β; see [SS89] Complement C5.1.) Hence, if ¯ K is given, the estimate<br />

of the extended observability matrix is<br />

ˆΓα = YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1<br />

<strong>and</strong> the concentrated form of the covariance matrix equals<br />

(5.19)<br />

Θ( ¯ K) = YαΠ ⊥ αY T α − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 ¯ KPβΠ ⊥ αY T α .<br />

(5.20)<br />

Until now, the estimates were <strong>in</strong>dependent of the precise norm of Θ<br />

chosen. However to proceed <strong>and</strong> m<strong>in</strong>imize with respect to ¯ K, we have to<br />

consider the two cost-functions (5.14) <strong>and</strong> (5.15) separately.<br />

5.4.1 Weighted Least Squares (WLS)<br />

In this section, we derive the estimate of Γα by m<strong>in</strong>imiz<strong>in</strong>g the costfunction<br />

(5.14), i.e., the weighted sum of the residuals <strong>in</strong> the l<strong>in</strong>ear regression<br />

model. This estimate will be called the weighted least squares<br />

(WLS) estimate. Us<strong>in</strong>g the concentrated form of Θ, (5.20), the m<strong>in</strong>imiz<strong>in</strong>g<br />

¯ K is given by<br />

ˆ¯K = arg m<strong>in</strong><br />

¯K<br />

Tr � WΘ( ¯ K) � . (5.21)<br />

The details of the optimization is deferred to Appendix A.9. The solution<br />

to (5.21) is<br />

ˆ¯K = ¯ T T ˆ V T s (PβΠ ⊥ αP T β ) −1/2<br />

(5.22)<br />

where ¯ T is any n ×n full rank matrix <strong>and</strong> where ˆ Vs is given by the SVD<br />

ˆQ ˆ S ˆ V T = � ˆ Qs<br />

�<br />

ˆQn<br />

� � �<br />

Ss<br />

ˆ 0 ˆV T<br />

s<br />

0 Sn<br />

ˆ ˆV T �<br />

= W<br />

n<br />

1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 .<br />

(5.23)<br />

ˆSs is a diagonal matrix conta<strong>in</strong><strong>in</strong>g the n largest s<strong>in</strong>gular values, <strong>and</strong> the<br />

columns of ˆ Vs are the correspond<strong>in</strong>g right s<strong>in</strong>gular vectors. W 1/2 is a


5.4 <strong>Subspace</strong> Estimation 129<br />

symmetric square-root of W. Insert<strong>in</strong>g (5.22) <strong>in</strong> (5.19) yields the WLS<br />

subspace estimate<br />

ˆΓα = W −1/2 W 1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 ˆ Vs ¯ T( ¯ T T ˆ V T s ˆ Vs ¯ T) −1<br />

= W −1/2 ˆ Q ˆ S ˆ V T ˆ Vs ¯ T −T = W −1/2 ˆ Qs ˆ Ss ¯ T −T .<br />

(5.24)<br />

As mentioned previously, the estimate of the extended observability matrix<br />

concludes the first step <strong>in</strong> many 4SID techniques. The appearance<br />

of the factor ˆ Ss ¯ T−T <strong>in</strong> (5.24) <strong>and</strong> ¯ TT <strong>in</strong> (5.22) reflects the fact that different<br />

state-space realizations are related by a similarity transformation<br />

(cf., the non-uniqueness <strong>in</strong> the factorization (5.13)). ¯ T does not affect<br />

the m<strong>in</strong>imization but determ<strong>in</strong>es which realization is obta<strong>in</strong>ed. Some<br />

common choices are: ¯ T = ˆ S 1/2<br />

s <strong>and</strong> ¯ T = ˆ Ss.<br />

5.4.2 Approximate Maximum Likelihood (AML)<br />

It is well known that under certa<strong>in</strong> assumptions, prediction error methods<br />

have the same performance as maximum likelihood (ML) estimators.<br />

More precisely, it is required that the prediction error is Gaussian noise,<br />

white <strong>in</strong> time, <strong>and</strong> that the cost-function is properly chosen, e.g., the determ<strong>in</strong>ant<br />

of the prediction error covariance matrix [Lju87, SS89]. However<br />

<strong>in</strong> the present problem, the prediction errors are not white, <strong>and</strong> tak<strong>in</strong>g<br />

the determ<strong>in</strong>ant cost-function will not be the ML estimate. Instead<br />

it will be called the Approximate ML (AML) estimate. Actually there<br />

are more profound considerations <strong>in</strong> an ML approach. This is due to the<br />

rank constra<strong>in</strong>t on the regression coefficients (5.10a). For the derivation<br />

of the maximum likelihood estimator for a reduced-rank l<strong>in</strong>ear regression<br />

model, we refer to [SV95, SV96] which also served as <strong>in</strong>spiration to the<br />

presentation <strong>in</strong> this section.<br />

The criterion <strong>in</strong> this section will be chosen as the determ<strong>in</strong>ant of<br />

the prediction error covariance matrix, i.e., (5.15). Due to the general<br />

“m<strong>in</strong>imization” of Θ <strong>in</strong> Section 5.4, the m<strong>in</strong>imiz<strong>in</strong>g ˆ L3 <strong>and</strong> ˆ Γα are given<br />

by (5.17) <strong>and</strong> (5.19), respectively, with the m<strong>in</strong>imiz<strong>in</strong>g ¯ K <strong>in</strong>serted. Us<strong>in</strong>g<br />

(5.20), the estimate of ¯ K is given by<br />

ˆ¯K = arg m<strong>in</strong><br />

¯K<br />

det � Θ( ¯ K) � . (5.25)<br />

The details of the m<strong>in</strong>imization are deferred to Appendix A.10. Intro-


130 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

duc<strong>in</strong>g the SVD similar to (5.23),<br />

ˆQ ˆ S ˆ V T = � ˆ Qs<br />

�<br />

ˆQn<br />

� � �<br />

Ss<br />

ˆ 0 ˆV T<br />

s<br />

0 Sn<br />

ˆ ˆV T �<br />

n<br />

the solution to (5.25) is given by<br />

= (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 , (5.26)<br />

ˆ¯K = ¯ T T ˆ V T s (PβΠ ⊥ αP T β ) −1/2<br />

(5.27)<br />

where ¯ T is any n × n full rank matrix. The AML estimate of Γα is<br />

obta<strong>in</strong>ed by us<strong>in</strong>g (5.27) <strong>in</strong> (5.19)<br />

ˆΓα = (YαΠ ⊥ αY T α) 1/2 (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β<br />

× (PβΠ ⊥ αP T β ) −1/2 ˆ Vs ¯ T( ¯ T T ˆ V T s ˆ Vs ¯ T) −1<br />

= (YαΠ ⊥ αY T α) 1/2 ˆ Q ˆ S ˆ V T ˆ Vs ¯ T −T = (YαΠ ⊥ αY T α) 1/2 ˆ Qs ˆ Ss ¯ T −T .<br />

(5.28)<br />

5.4.3 Discussion<br />

The weight<strong>in</strong>g matrix <strong>in</strong>troduced <strong>in</strong> the weighted least squares method<br />

is by user choice. A couple of different choices correspond to estimates<br />

previously suggested <strong>in</strong> the literature.<br />

PO-MOESP: The PO-MOESP method was <strong>in</strong>troduced <strong>in</strong> [Ver94]. It<br />

is straightforward to verify that choos<strong>in</strong>g W = I gives the same<br />

estimate of Γα (<strong>and</strong> the same s<strong>in</strong>gular values) as <strong>in</strong> PO-MOESP<br />

(cf. [VD94a, Vib94]).<br />

CVA: The choice W = (YαΠ ⊥ αY T α) −1 results <strong>in</strong> the canonical variate<br />

analysis (CVA) solution [Lar83, Lar90]. A natural <strong>in</strong>terpretation<br />

of the algorithmic details of Larimore’s is given <strong>in</strong> [VD94a]. Here<br />

it is called the CVA solution even if Larimore does not explicitly<br />

use the extended observability matrix. Larimore does actually propose<br />

a generalization of the canonical variate analysis method <strong>and</strong><br />

<strong>in</strong>troduces a general weight<strong>in</strong>g matrix correspond<strong>in</strong>g to W <strong>in</strong> our<br />

notation.<br />

Compar<strong>in</strong>g (5.23) <strong>and</strong> (5.24) with (5.26) <strong>and</strong> (5.28), it is clear that<br />

if W = (YαΠ ⊥ αY T α) −1 , the two estimates are identical <strong>and</strong> equal to the


5.4 <strong>Subspace</strong> Estimation 131<br />

CVA estimate. This means that AML is a special case of WLS, <strong>and</strong> it<br />

suffices to consider WLS <strong>in</strong> the sequel.<br />

In [VD94a], an <strong>in</strong>terpretation of W as a shap<strong>in</strong>g filter <strong>in</strong> the frequency<br />

doma<strong>in</strong> is given. More precisely, W conta<strong>in</strong>s impulse response coefficients<br />

of a shap<strong>in</strong>g filter specify<strong>in</strong>g important frequency b<strong>and</strong>s. These ideas are<br />

further extended <strong>in</strong> [VD95a].<br />

A further observation on the selection of the weight<strong>in</strong>g matrix, W,<br />

concerns the model order selection. The estimate of the model order<br />

is often obta<strong>in</strong>ed by study<strong>in</strong>g the s<strong>in</strong>gular values of (5.23). W clearly<br />

affects the s<strong>in</strong>gular value distribution. Thus, it would be nice if W could<br />

<strong>in</strong>crease the gap between the n dom<strong>in</strong>at<strong>in</strong>g s<strong>in</strong>gular values <strong>and</strong> the rest.<br />

In [Lar94], it is claimed by a numerical example that CVA is optimal<br />

(ML) for model order selection.<br />

A popular subspace identification algorithm is the N4SID method<br />

[VD94b]. Hence, a short note on the relation of the l<strong>in</strong>ear regression<br />

framework to the estimate <strong>in</strong> N4SID is warranted. The N4SID estimate<br />

of Γα can not be obta<strong>in</strong>ed directly from our WLS solution because of<br />

the rank constra<strong>in</strong>t on ¯ L. In N4SID, this constra<strong>in</strong>t is <strong>in</strong>voked <strong>in</strong> a later<br />

stage. This may be seen as follows. Consider the unconstra<strong>in</strong>ed least<br />

squares estimate of ¯ L <strong>and</strong> L3 from the cost-function Tr{Θ}<br />

ˆ¯L = YαΠ ⊥ αP T �<br />

β PβΠ ⊥ αP T β<br />

ˆL3 = YαΠ ⊥ Pβ UT α<br />

� −1<br />

�<br />

UαΠ ⊥ PβUT �−1 α<br />

, (5.29a)<br />

, (5.29b)<br />

where Π ⊥ Pβ = I − PT β (PβP T β )−1 Pβ is the projection onto the nullspace<br />

of Pβ. Def<strong>in</strong>e the least squares residuals by<br />

Êα = Yα − ˆ¯ LPβ − ˆ L3Uα,<br />

where Êα is orthogonal to Pβ <strong>and</strong> Uα. Rewrite the true residuals as<br />

Eα = Yα − ¯ LPβ − L3Uα = Êα − � L¯ − Lˆ¯ �<br />

Pβ − � L3 − ˆ �<br />

L3 Uα .<br />

Due to the orthogonality, the cost-function reads<br />

Tr{Θ} = Tr{EαE T α} = � �<br />

�Eα�2<br />

F<br />

= � �<br />

�Êα � 2<br />

F +<br />

�<br />

�<br />

� � L¯ − Lˆ¯ �<br />

Pβ + � L3 − ˆ L3<br />

� Uα<br />

�<br />

�<br />

,<br />

� 2<br />

F


132 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

where � �· � � F denotes Frobenius norm. Hence if L3 is taken as the (unconstra<strong>in</strong>ed)<br />

least-squares estimate <strong>in</strong> (5.29b) <strong>and</strong> we m<strong>in</strong>imize under the<br />

constra<strong>in</strong>t that the rank of ¯ LPβ should be n, the approximation is ob-<br />

ta<strong>in</strong>ed from the SVD of ˆ¯ LPβ which is exactly what is used <strong>in</strong> N4SID.<br />

Clearly the difference between N4SID <strong>and</strong> our derivation is at which stage<br />

the rank constra<strong>in</strong>t is used. It is expected that the estimate should be<br />

more accurate if the constra<strong>in</strong>t is <strong>in</strong>voked as early as possible.<br />

5.5 Pole Estimation<br />

Given an estimate of the extended observability matrix Γα, different<br />

approaches for deriv<strong>in</strong>g the system matrices exist, see [Kun78, Ver94,<br />

McK94b, VD94b, Van95]. This chapter focuses on the estimation of the<br />

poles. The reasons for this are two-fold:<br />

• For compar<strong>in</strong>g the quality of the estimates from different methods,<br />

we need to consider a canonical parameter set. The perhaps<br />

simplest <strong>and</strong> most useful set is the system poles.<br />

• Initial work on the performance of 4SID methods has been reported<br />

<strong>in</strong> [VOWL91, VOWL93, OV94], <strong>and</strong> the results presented there<strong>in</strong><br />

rely on the pole estimation error. We will heavily draw from their<br />

results.<br />

In this section, we <strong>in</strong>troduce two pole estimation methods appear<strong>in</strong>g <strong>in</strong><br />

the literature. The asymptotic (<strong>in</strong> number of data samples N) properties<br />

of these two methods are then analyzed <strong>in</strong> Section 5.6.<br />

5.5.1 The Shift Invariance Method<br />

For deriv<strong>in</strong>g the system matrix, A, a straightforward method is to use<br />

the shift <strong>in</strong>variance structure of Γα, [Kun78, HK66]. Def<strong>in</strong>e the selection<br />

matrices<br />

J1 = � �<br />

I (α−1)l 0 (α−1)l×l ,<br />

J2 = � �<br />

0 (α−1)l×l I (α−1)l ,<br />

where In denotes the n × n identity matrix <strong>and</strong> 0n×m denotes an n × m<br />

matrix of zeros. Then the estimate of the system matrix is<br />

� � † � �<br />

 =<br />

, (5.30)<br />

J1 ˆ Γα<br />

J2 ˆ Γα


5.5 Pole Estimation 133<br />

where (·) † denotes the Moore-Penrose pseudo <strong>in</strong>verse of (·). Let µ be a<br />

vector with components µk, k = 1,..., n, where µk denotes the kth pole<br />

of the true system. The estimated poles are then, of course, given by the<br />

eigenvalues of Â, i.e.,<br />

5.5.2 <strong>Subspace</strong> Fitt<strong>in</strong>g<br />

ˆµ = the eigenvalues of  . (5.31)<br />

The second pole estimation technique considered was proposed <strong>in</strong> [OV94]<br />

(see also [SROK95]). Let Γα(θ) denote the parameterized observability<br />

matrix of some realization. The estimated observability matrix, ˆ Γα, may<br />

then be written<br />

ˆΓα = Γα(θ)T + ˜ Γα, (5.32)<br />

where T denotes an unknown similarity transformation matrix <strong>and</strong> ˜ Γα<br />

is the error. The idea <strong>in</strong> the subspace fitt<strong>in</strong>g approach is to obta<strong>in</strong> parameter<br />

estimates so as to achieve the best fit between the two subspaces<br />

spanned by the columns of Γα(θ) <strong>and</strong> ˆ Γα. To be more specific, the<br />

unknown parameters (θ <strong>and</strong> the elements <strong>in</strong> T) can be estimated by<br />

m<strong>in</strong>imiz<strong>in</strong>g some suitable norm of ˜ Γα, for example,<br />

�vec( ˆ Γα − Γα(θ)T)� 2 Φ . (5.33)<br />

Here, vec(A) denotes a vector obta<strong>in</strong>ed by stack<strong>in</strong>g the columns of A,<br />

Φ is a positive def<strong>in</strong>ite weight<strong>in</strong>g matrix, <strong>and</strong> �x� 2 Φ = xT Φx. In [OV94]<br />

it was shown that, for m<strong>in</strong>imum variance parameter estimates (of θ), we<br />

may equivalently study the follow<strong>in</strong>g m<strong>in</strong>imization problem<br />

where V (θ) is def<strong>in</strong>ed by 2<br />

ˆθ = arg m<strong>in</strong><br />

θ V (θ), (5.34)<br />

V (θ) = vec T (Π ⊥<br />

W1/2 ˆQs)Λ vec(Π Γα<br />

⊥<br />

W1/2 ˆQs) (5.35)<br />

Γα<br />

for some symmetric positive semi-def<strong>in</strong>ite weight<strong>in</strong>g matrix Λ. Here<br />

Π ⊥<br />

W 1/2 Γα = I − W1/2 Γα(Γ ∗ αWΓα) −1 Γ ∗ αW 1/2<br />

(5.36)<br />

2 We use the notation V (·) to denote different cost-functions; we believe that this<br />

will not cause any confusion.


134 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

is a projection matrix on the null space of (W 1/2 Γα) ∗ , <strong>and</strong> (·) ∗ denotes<br />

the complex conjugate transpose. The argument (θ) has been suppressed<br />

for notational convenience.<br />

Remark 5.3. A few comments about the formulation <strong>in</strong> (5.34) follow. The<br />

cost-function (5.35) can be formulated <strong>in</strong> different ways lead<strong>in</strong>g to similar<br />

(or identical) results. The choice made here<strong>in</strong> differs slightly from [OV94].<br />

Instead of Π ⊥<br />

W1/2 ˆQs, we could equally well consider Π Γα<br />

⊥ ΓαW−1/2Qs ˆ or<br />

P T W −1/2 ˆ Qs where the columns of P are a parameterized basis for the<br />

null space of Γ ∗ α(θ); see [OV94, VWO97]. See Section 5.6.3 for further<br />

comments.<br />

Regard<strong>in</strong>g the parameterization, Γα(θ) is considered to be parameterized<br />

<strong>in</strong> any way such that the true d-dimensional parameter vector,<br />

θ0, is uniquely given by the identity, Π ⊥<br />

W1/2Γα (θ0)Qs = 0. In particular,<br />

Γα <strong>in</strong> the diagonal realization can be parameterized by the system<br />

poles, <strong>and</strong> for multi-output systems, possibly some of the coefficients <strong>in</strong><br />

C, see [OV94]. Notice that the parameterization should be such that the<br />

projection matrix <strong>in</strong> (5.36) is real valued. We also refer to [VWO97] for<br />

an <strong>in</strong>terest<strong>in</strong>g discussion about the parameterization issue. There<strong>in</strong>, it<br />

is shown that the null space of Γ ∗ α can be l<strong>in</strong>early parameterized which<br />

implies that the estimator (5.34) can be implemented <strong>in</strong> a non-iterative<br />

(two-step) fashion (see [OV94, VWO97]).<br />

5.6 Performance Analysis<br />

In this section, the asymptotic performance of the two pole estimation<br />

methods described <strong>in</strong> Section 5.5 is analyzed. A short discussion on<br />

consistency is made whereafter the asymptotic distribution of the pole<br />

estimates is derived.<br />

5.6.1 Consistency<br />

Consider the subspace estimate obta<strong>in</strong>ed through the WLS m<strong>in</strong>imization<br />

<strong>in</strong> Section 5.4.1 (as mentioned previously, this covers the AML case, as<br />

well). Thus, we assume the estimate of the extended observability matrix<br />

to be given by<br />

ˆΓα = W −1/2 ˆ Qs<br />

(5.37)


5.6 Performance Analysis 135<br />

where ˆ Qs is obta<strong>in</strong>ed from the SVD (cf. (5.23))<br />

ˆQs ˆ Ss ˆ V T s + ˆ Qn ˆ Sn ˆ V T n = W 1/2 YαΠ ⊥ αP T �<br />

β PβΠ ⊥ αP T β<br />

� −1/2<br />

. (5.38)<br />

In the asymptotic analysis, we consider W to be a fix full rank matrix or,<br />

<strong>in</strong> the data dependent case, a consistent estimate of a full rank matrix.<br />

Let Γ • α denote the extended observability matrix of a fix realization of<br />

(5.1) <strong>and</strong> let<br />

X = [x(1 + β) x(2 + β) ... x(N − α + 1)] .<br />

be the correspond<strong>in</strong>g state trajectory. In the analysis of the shift <strong>in</strong>variance<br />

method we let Γ • α be the extended observability matrix of the<br />

diagonal realization (Γ • α = Γ d α), whereas Γ • α = Γα(θ0) is the natural<br />

choice <strong>in</strong> the analysis of the subspace fitt<strong>in</strong>g method. Next notice that<br />

the block Hankel matrix of outputs from a fixed realization of (5.1) is<br />

given by (cf. (5.3))<br />

Yα = Γ • αX + ΦαUα + Nα<br />

(5.39)<br />

where Nα is def<strong>in</strong>ed similar to (5.12b). By assumption, the noise is<br />

uncorrelated with the past <strong>in</strong>puts <strong>and</strong> outputs <strong>and</strong> with the future <strong>in</strong>puts.<br />

By ergodicity, this implies that<br />

lim<br />

N→∞ NαU T α = 0 , (5.40)<br />

lim<br />

N→∞ NαP T β = 0 , (5.41)<br />

where the limits are <strong>in</strong>terpreted <strong>in</strong> terms of convergence w.p.1. Observe<br />

that all signals are assumed to be properly scaled such that multiplications<br />

as <strong>in</strong> (5.41) converge to the correspond<strong>in</strong>g covariance. Us<strong>in</strong>g these<br />

observations, the follow<strong>in</strong>g limit<br />

lim<br />

N→∞ W1/2YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 = W 1/2 Γ • αR •<br />

(5.42)<br />

is obta<strong>in</strong>ed (w.p.1) where, for notational convenience, we have def<strong>in</strong>ed<br />

R • = [Rxp(β) − Rxu(β)R −1<br />

u Rup][Rp − R T upR −1<br />

u Rup] −1/2<br />

<strong>and</strong> where<br />

Rxp(τ) = Ē� x(t + τ)p T β (t) � ,<br />

Rxu(τ) = Ē� x(t + τ)u T α(t + β) � ,<br />

Ru = Ē� uα(t)u T α(t) � ,<br />

Rup = Ē� uα(t + β)p T β (t) � .<br />

(5.43)


136 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

Notice that the covariances <strong>in</strong>volv<strong>in</strong>g system states are well def<strong>in</strong>ed s<strong>in</strong>ce<br />

the realization is fixed <strong>in</strong> the analysis. The notation R • is <strong>in</strong> correspondence<br />

with the notation for Γ • α (Γ d α ↔ R d , Γα(θ0) ↔ R θ0 ). Let<br />

QsSsV T s be the SVD of the right h<strong>and</strong> side of (5.42) or, equivalently,<br />

the limit of the SVD <strong>in</strong> (5.38). Assum<strong>in</strong>g the matrix R • <strong>in</strong> (5.42) has<br />

full rank equal to n, it is clear that the range space of Qs ( ˆ Qs → Qs as<br />

N → ∞) is a consistent estimate of the range space of W 1/2 Γ • α. Thus,<br />

(5.37) is a consistent estimate of the extended observability matrix of<br />

some realization. Any reasonable choice of estimator for C <strong>and</strong> A would<br />

then provide consistent estimates. In particular, the two pole estimation<br />

methods <strong>in</strong>troduced <strong>in</strong> Section 5.5 will give consistent estimates. Observe<br />

that the consistency result <strong>in</strong> [PSD95] does not apply here s<strong>in</strong>ce we consider<br />

a fix β. For a more complete analysis along the l<strong>in</strong>es of this section,<br />

see Chapter 6.<br />

5.6.2 Analysis of the Shift Invariance Method<br />

In this section, the asymptotic properties of the pole estimates from the<br />

shift <strong>in</strong>variance method are analyzed. These results are a straightforward<br />

extension of the analysis by [VOWL91, VOWL93] (see also [SS91, RH89]).<br />

The modification is needed because the subspace estimate is different <strong>and</strong><br />

because a weight<strong>in</strong>g (W) is <strong>in</strong>cluded. A direct application of the shift<br />

<strong>in</strong>variance method can, <strong>in</strong> some cases, lead to a quite strange behavior of<br />

the accuracy of the estimates as function of the dimension parameter α<br />

(see [WJ94]). However <strong>in</strong> [WJ94], the weight<strong>in</strong>g matrix, W, was not <strong>in</strong>cluded<br />

<strong>in</strong> the analysis. It would therefore be <strong>in</strong>terest<strong>in</strong>g to <strong>in</strong>vestigate the<br />

effect of the weight<strong>in</strong>g on the pole estimation error. It could be conjectured<br />

that a proper choice of such a weight<strong>in</strong>g (for <strong>in</strong>stance, a weight<strong>in</strong>g<br />

that makes the residuals white) would improve the accuracy of the pole<br />

estimates. However, a quite surpris<strong>in</strong>g result obta<strong>in</strong>ed below shows that<br />

this is not the case; the weight<strong>in</strong>g does not affect the asymptotic pole<br />

estimation error.<br />

In order to state the results of this section, we need the follow<strong>in</strong>g


5.6 Performance Analysis 137<br />

notation:<br />

˜µk = ˆµk − µk ,<br />

Rnα (τ) = Ē� nα(t + τ)n T α(t) � ,<br />

ζ(t) = [y T β (t) u T β (t) u T α(t + β)] T = [p T β (t) u T α(t + β)] T ,<br />

Rζ(τ) = Ē<br />

�<br />

ζ(t + τ)ζ T �<br />

(t) ,<br />

f ∗ �<br />

k = J1Γ d � †<br />

α (J2 − J1µk) ,<br />

k•<br />

� �<br />

�Rp<br />

RH =<br />

− R<br />

(5.44)<br />

T upR −1<br />

�−1/2<br />

,<br />

I<br />

−R −1<br />

u Rup<br />

Ωd = RHR d†<br />

u Rup<br />

(5.45)<br />

where Ak• (A•k) denotes row (column) k of the matrix A. The notation<br />

(·) c will be used to denote complex conjugate.<br />

Theorem 5.1. Let the subspace estimate be given by (5.37)-(5.38) <strong>and</strong><br />

let the estimates of the system poles be as <strong>in</strong> (5.31). Then √ N ˜µk is<br />

asymptotically zero-mean Gaussian with second moments<br />

lim<br />

N→∞ N E {˜µk˜µ c l } = �<br />

[(Ωd) •k ] T Rζ(τ)[(Ωd) •l ] c f ∗ kRnα (τ)fl , (5.46a)<br />

|τ|


138 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

Remark 5.4. Corollary 5.2 is <strong>in</strong> fact true for a larger class of subspace estimates<br />

than considered <strong>in</strong> this chapter, e.g., the set of subspace estimates<br />

obta<strong>in</strong>ed from the unify<strong>in</strong>g theorem <strong>in</strong> [VD94a].<br />

Remark 5.5. <strong>On</strong>e should keep <strong>in</strong> m<strong>in</strong>d that Corollary 5.2 holds for large<br />

data records. For f<strong>in</strong>ite data, the weight<strong>in</strong>g may <strong>in</strong>deed improve the<br />

estimates. Furthermore, the weight<strong>in</strong>g can be useful for other reasons,<br />

e.g., model order selection, f<strong>in</strong>ite sample bias shap<strong>in</strong>g, <strong>and</strong> for cop<strong>in</strong>g<br />

with un-modeled dynamics. We also rem<strong>in</strong>d the reader that the result<br />

only concerns the pole estimates.<br />

5.6.3 Analysis of the <strong>Subspace</strong> Fitt<strong>in</strong>g Method<br />

In this section, we analyze the asymptotic properties of the parameter<br />

estimates from the subspace fitt<strong>in</strong>g method (5.34). In [OV94], this estimation<br />

problem is solved by <strong>in</strong>vok<strong>in</strong>g the general theory of asymptotically<br />

best consistent estimation of nonl<strong>in</strong>ear regression parameters (for<br />

<strong>in</strong>stance, see [SS89]). Furthermore, the projection matrix is reparameterized<br />

<strong>in</strong> the coefficients of the characteristic polynomial of A <strong>and</strong> a<br />

non-iterative procedure for solv<strong>in</strong>g (5.34) is described. To some extent,<br />

the results of the analysis below will be parallel to the one <strong>in</strong> [OV94].<br />

Specifically, the optimal weighted method derived below is the same as<br />

<strong>in</strong> [OV94] though phrased slightly different <strong>and</strong> tailored to our subspace<br />

estimate, (5.38)-(5.37), with the weight W <strong>in</strong>cluded. Our derivation also<br />

provides an expression for the parameter estimation error covariance with<br />

a general weight<strong>in</strong>g matrix, not just for the optimal weight<strong>in</strong>g. This will<br />

make it possible to compare different weight<strong>in</strong>gs. Moreover, the results<br />

concern parameters <strong>in</strong> the extended observability matrix explicitly. This<br />

makes it possible to directly apply the result to the pole estimation error<br />

(see Section 5.7).<br />

In Appendix A.12, the asymptotic distribution of the estimates from<br />

(5.34) is derived through a st<strong>and</strong>ard Taylor series expansion approach<br />

[Lju87, SS89]. Before present<strong>in</strong>g the ma<strong>in</strong> results of this analysis we


5.6 Performance Analysis 139<br />

def<strong>in</strong>e<br />

G = ��<br />

(W 1/2 Γα) † �T Qs ⊗ Π ⊥<br />

W1/2ΓαW1/2� × � vec(Γα1 ) vec(Γα2 ) ... vec(Γαd )� , (5.47)<br />

Σ = � � T<br />

H Rζ(τ)H � ⊗ � Π ⊥<br />

�<br />

,<br />

|τ|


140 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

as<br />

lim<br />

N→∞ N E�θ˜ θ˜ T� T † −1<br />

= (G Σ G)<br />

�<br />

= [vec(Γα1 ) vec(Γα2 ) · · · vec(Γαd )]T<br />

�<br />

�<br />

×<br />

|τ|


5.6 Performance Analysis 141<br />

Remark 5.8. As the optimally weighted subspace fitt<strong>in</strong>g method has been<br />

formulated here, it is not particularly well suited for implementation.<br />

We refer to [OV94, VWO97] for further comments <strong>and</strong> details about the<br />

implementation.<br />

In the analysis above, it was assumed that a subspace estimate was<br />

given <strong>in</strong> which the parameters were estimated <strong>in</strong> a second step. <strong>On</strong>e could<br />

ask why not <strong>in</strong>troduce the parameterization of Γα directly <strong>in</strong> (5.14) before<br />

a specific subspace estimate is derived? The rest of this section will be<br />

concerned with the analysis of the accuracy of the so-obta<strong>in</strong>ed estimates.<br />

The derived results also lead to an <strong>in</strong>terest<strong>in</strong>g connection to a previously<br />

published 4SID parameter estimation algorithm [SROK95] (see Remark<br />

5.10).<br />

If the cost-function (5.14) is m<strong>in</strong>imized with respect to L3 <strong>and</strong> then<br />

with respect to ¯ K, the follow<strong>in</strong>g estimate is obta<strong>in</strong>ed<br />

ˆ¯K = (Γ T αWΓα) −1 Γ T αWYαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1 .<br />

The concentrated cost-function, omitt<strong>in</strong>g terms <strong>in</strong>dependent of Γα(θ),<br />

reads<br />

�<br />

V (θ) = Tr Π ⊥<br />

W1/2ΓαW1/2YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1 PβΠ ⊥ αY T αW 1/2�<br />

.<br />

Consider the parameter estimates obta<strong>in</strong>ed through<br />

(5.54)<br />

ˆθ1 = arg m<strong>in</strong><br />

θ V (θ) . (5.55)<br />

An <strong>in</strong>terest<strong>in</strong>g question is how this estimate is related to the one from<br />

(5.34). This can be answered (asymptotically) by us<strong>in</strong>g the follow<strong>in</strong>g<br />

result.<br />

Theorem 5.5. The estimator (5.55) is asymptotically equivalent to<br />

ˆθ2 = arg m<strong>in</strong> Tr{Π<br />

θ ⊥<br />

W1/2 ˆQsS Γα<br />

2 s ˆ Q T s } , (5.56)<br />

i.e., √ N( ˆ θ1 − ˆ θ2) → 0 <strong>in</strong> probability as N → ∞, imply<strong>in</strong>g that ˆ θ1 <strong>and</strong><br />

ˆθ2 have the same asymptotic distribution.<br />

Proof. The proof is given <strong>in</strong> Appendix A.13.


142 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

Now, it is possible to rewrite the cost-function <strong>in</strong> (5.56) to yield<br />

vec T (Π ⊥<br />

W1/2 ˆQs)(S Γα<br />

2 s ⊗ I)vec(Π ⊥<br />

W1/2 ˆQs) (5.57)<br />

Γα<br />

where the facts that Tr{AB} = vec T (A T )vec(B) <strong>and</strong> (A.76) have been<br />

used. Compar<strong>in</strong>g (5.57) <strong>and</strong> (5.35), it is obvious that (5.57) is a special<br />

case with<br />

The follow<strong>in</strong>g theorem is then immediate.<br />

Λ = S 2 s ⊗ I . (5.58)<br />

Theorem 5.6. The estimator <strong>in</strong> (5.55) yields less accurate estimates<br />

than us<strong>in</strong>g (5.34) with the optimal weight<strong>in</strong>g (5.52).<br />

Remark 5.9. The performance comparison <strong>in</strong> Theorem 5.6 is not completely<br />

proper s<strong>in</strong>ce the optimally weighted estimator is considerably<br />

more <strong>in</strong>volved (the norms are different). Of course, if we decide to estimate<br />

the parameters directly from (5.11), we can do better than (5.55).<br />

However, it is <strong>in</strong>terest<strong>in</strong>g that by this we have an expression for the<br />

asymptotic covariance for the estimates from (5.55).<br />

Remark 5.10. The last two results should also be compared with the subspace<br />

formulation <strong>in</strong> [SROK95]. They consider a weighted subspace fitt<strong>in</strong>g<br />

method applied to another subspace estimate than used here. However,<br />

if their approach is adopted on the estimate here<strong>in</strong>, the estimator<br />

would be (5.56), <strong>and</strong> the performance given by (5.51) with the weight<br />

(5.58).<br />

Remark 5.11. It is shown <strong>in</strong> [Jan95] that the estimator (5.56) also has<br />

the same asymptotic performance as if the term τ = 0 <strong>in</strong> (5.52) is only<br />

used as weight <strong>in</strong> (5.34) <strong>and</strong> W = (YαΠ ⊥ αY T α) −1 .<br />

Remark 5.12. Our experience is that the accuracy of the estimates from<br />

(5.56) <strong>in</strong> general is the same or worse than what is obta<strong>in</strong>ed by the<br />

shift <strong>in</strong>variance method. This is also the reason why this method is not<br />

<strong>in</strong>cluded <strong>in</strong> the numerical examples <strong>in</strong> Section 5.7.<br />

5.7 Examples<br />

In this section, we provide two examples <strong>in</strong> order to compare the shift<br />

<strong>in</strong>variance <strong>and</strong> the optimally weighted subspace fitt<strong>in</strong>g method.


5.7 Examples 143<br />

Example 5.1. As briefly mentioned <strong>in</strong> Section 5.5, a rather strange behavior<br />

of the pole estimation accuracy as a function of the prediction<br />

horizon parameter α has been noted <strong>in</strong> [WJ94]. There<strong>in</strong>, the system<br />

poles were estimated by the shift <strong>in</strong>variance method, i.e., by (5.31). It<br />

was noted that the accuracy does not necessarily <strong>in</strong>crease monotonically<br />

with a larger α but it can, <strong>in</strong> fact, even decrease. With the example below,<br />

the importance of proper weight<strong>in</strong>g when recover<strong>in</strong>g the poles from<br />

the subspace estimate is clearly visible. The theoretical expression for<br />

the estimation error covariance for the weighted subspace fitt<strong>in</strong>g estimator<br />

(5.53) will be compared with the correspond<strong>in</strong>g result for the shift<br />

<strong>in</strong>variance method (5.46).<br />

Consider the same example as <strong>in</strong> [WJ94, JW95]. The true system is<br />

described by a first-order state-space model<br />

x(t + 1) = 0.7x(t) + u(t) + w(t) , (5.59a)<br />

y(t) = x(t) + v(t) . (5.59b)<br />

The measurable <strong>in</strong>put, u(t), is a realization of a white Gaussian process<br />

with variance ru = 1. The process noise, w(t), <strong>and</strong> the measurement<br />

noise, v(t), are <strong>in</strong>dividually <strong>in</strong>dependent white Gaussian processes <strong>and</strong><br />

uncorrelated with u(t). The variance of v(t) is fixed to rv = 1 while the<br />

variance of the process noise will take on two different values, namely<br />

rw = 0.2 <strong>and</strong> rw = 2. The first choice makes the system to be approximately<br />

of output error type while the second choice makes the system<br />

approximately ARX.<br />

In order to use (5.34), a parameterization of Γα has to be chosen. If<br />

C is fixed to 1, Γα takes the form<br />

Γα = � 1 λ λ 2 · · · λ α−1� T ,<br />

where λ is the pole to be estimated. For details about the implementation<br />

of the subspace fitt<strong>in</strong>g method, we refer to [OV94].<br />

The theoretical expressions (5.46) <strong>and</strong> (5.53) will be compared for<br />

different variance of the process noise <strong>and</strong> different comb<strong>in</strong>ations of α<br />

<strong>and</strong> β. Fig. 5.1 shows the normalized (by √ N) asymptotic st<strong>and</strong>ard deviation<br />

of the pole estimate for the shift <strong>in</strong>variance method. The two<br />

subplots correspond to a different variance of the process noise. Next,<br />

the normalized asymptotic st<strong>and</strong>ard deviation of the pole estimate from<br />

(5.53) is plotted <strong>in</strong> Fig. 5.2. It is seen that the degradation <strong>in</strong> accuracy<br />

when <strong>in</strong>creas<strong>in</strong>g α for high process noise variance <strong>in</strong> Fig. 5.1, is overcome<br />

<strong>in</strong> the optimal weight<strong>in</strong>g case, Fig. 5.2. The shift <strong>in</strong>variance method does


144 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

0.75<br />

0.7<br />

0.65<br />

0.6<br />

0.55<br />

0.5<br />

2<br />

0.94<br />

0.93<br />

0.92<br />

0.91<br />

0.9<br />

0.89<br />

0.88<br />

0.87<br />

0.86<br />

0.85<br />

0.84<br />

2<br />

3<br />

3<br />

4<br />

4<br />

α<br />

5<br />

α<br />

Figure 5.1: The basic shift <strong>in</strong>variance approach. Normalized asymptotic<br />

st<strong>and</strong>ard deviation of the estimated pole as a function of α <strong>and</strong> β. In the<br />

first plot, the variance of the process noise is rw = 0.2. In the second, it<br />

is rw = 2. For comparison, the Cramér-Rao lower bounds are 0.506 <strong>and</strong><br />

0.830, respectively.<br />

5<br />

6<br />

1<br />

6<br />

2<br />

1<br />

3<br />

2<br />

β<br />

3<br />

4<br />

4<br />

β<br />

5<br />

5<br />

6


5.7 Examples 145<br />

0.75<br />

0.7<br />

0.65<br />

0.6<br />

0.55<br />

0.5<br />

2<br />

0.88<br />

0.87<br />

0.86<br />

0.85<br />

0.84<br />

0.83<br />

0.82<br />

2<br />

3<br />

3<br />

4<br />

4<br />

α<br />

α<br />

5<br />

5<br />

Figure 5.2: The optimally weighted subspace fitt<strong>in</strong>g approach. Normalized<br />

asymptotic st<strong>and</strong>ard deviation of the estimated pole as a function of<br />

α <strong>and</strong> β. In the first plot, the variance of the process noise is rw = 0.2.<br />

In the second, it is rw = 2. The Cramér-Rao lower bounds are 0.506 <strong>and</strong><br />

0.830, respectively.<br />

6<br />

6<br />

1<br />

1<br />

2<br />

2<br />

3<br />

3<br />

β<br />

β<br />

4<br />

4<br />

5<br />

5


146 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

not have any <strong>in</strong>herent weight<strong>in</strong>g compensat<strong>in</strong>g for bad scal<strong>in</strong>g, <strong>and</strong> thus<br />

it is impossible to give any rule for the choice of the prediction horizon α.<br />

In the weighted subspace fitt<strong>in</strong>g method, the weight<strong>in</strong>g assures a more<br />

robust or consistent behavior. The performance clearly improves with<br />

α, but the computational burden <strong>in</strong>creases, <strong>and</strong> a trade-off is necessary.<br />

For the choice of the observer dimension, β, it is observed that the performance<br />

is quite <strong>in</strong>dependent of β. Thus, a choice close to the system<br />

order would be recommended s<strong>in</strong>ce the computational cost <strong>in</strong>creases with<br />

β. The choice of variance for the <strong>in</strong>put is of no major importance for the<br />

po<strong>in</strong>t to be made; this value determ<strong>in</strong>es (approximatively) the level of<br />

the curves, not the shape. The shapes of the curves are determ<strong>in</strong>ed by<br />

the relation between process <strong>and</strong> measurement noise <strong>in</strong> comb<strong>in</strong>ation with<br />

how well damped the system is.<br />

For the example, it is possible to calculate a lower bound on the<br />

variance of the estimation error for any unbiased estimator of the pole,<br />

the Cramér-Rao lower Bound (CRB). This may be done by convert<strong>in</strong>g<br />

the system (5.59) to a first-order ARMAX system with identical secondorder<br />

properties as that of the model (5.59). The CRB for the parameters<br />

<strong>in</strong> this ARMAX model is then readily calculated (see, e.g., [SS89]). For<br />

this example, the bounds on the pole estimation error st<strong>and</strong>ard deviation<br />

were calculated to 0.506 <strong>and</strong> 0.830 for the low <strong>and</strong> high process noise,<br />

respectively. An <strong>in</strong>terest<strong>in</strong>g observation is that the CRB seems to be<br />

atta<strong>in</strong>ed by the method which uses the complete structure of Γα for<br />

large α <strong>and</strong> β.<br />

Example 5.2. For white <strong>in</strong>put signals, the pole estimators based on ˆ Γα<br />

from either N4SID or PO-MOESP give the same asymptotic accuracy.<br />

This equivalence does not hold true for colored <strong>in</strong>put signals. In this<br />

example, we study the difference between the subspace estimates ( ˆ Γα)<br />

from N4SID <strong>and</strong> PO-MOESP by compar<strong>in</strong>g the asymptotic pole estimation<br />

error variance. The expressions for the variance <strong>in</strong> the N4SID case<br />

are given <strong>in</strong> [VOWL93] for the shift <strong>in</strong>variance method <strong>and</strong> <strong>in</strong> [OV94] for<br />

the weighted subspace fitt<strong>in</strong>g method. (Observe that the N4SID subspace<br />

estimate is equivalent to the <strong>in</strong>strumental variable subspace estimate <strong>in</strong><br />

[VOWL93, OV94], see [VD94a].)<br />

Let the system be as <strong>in</strong> Example 5.1. We consider the low process<br />

noise scenario, i.e., rv = 1 <strong>and</strong> rw = 0.2. The difference <strong>in</strong> this example<br />

is that the <strong>in</strong>put is a colored signal. More precisely, the <strong>in</strong>put signal u(t)


5.8 Conclusions 147<br />

is generated as<br />

u(t) = 0.3 q2 + 2q + 1<br />

q2 η(t) + ν(t)<br />

− 0.17<br />

where q is the forward shift operator. The zero-mean white noise processes<br />

η(t) <strong>and</strong> ν(t) are <strong>in</strong>dependent <strong>and</strong> have the variances rη = 10 <strong>and</strong><br />

rν = 10 −4 , respectively. Consider the pole estimates from the shift <strong>in</strong>variance<br />

<strong>and</strong> the optimally weighted subspace fitt<strong>in</strong>g method based on<br />

the PO-MOESP estimate of Γα. In Fig. 5.3, the normalized asymptotic<br />

st<strong>and</strong>ard deviations of the pole estimates are shown as a function of α<br />

<strong>and</strong> β. We see that the variances still are approximatively constant with<br />

respect to β. Also notice that the variance of the weighted subspace<br />

fitt<strong>in</strong>g estimates does not decay monotonically with <strong>in</strong>creas<strong>in</strong>g α.<br />

The correspond<strong>in</strong>g results for the pole estimates based on the N4SID<br />

estimate of Γα are given <strong>in</strong> Fig. 5.4. The performance is dramatically<br />

affected by the choice of α <strong>and</strong> β. To avoid misunderst<strong>and</strong><strong>in</strong>g it should<br />

be noted that the st<strong>and</strong>ard deviations <strong>in</strong> Fig. 5.4 are not what is obta<strong>in</strong>ed<br />

by the N4SID algorithm. In [VD94b], the system matrices <strong>and</strong> thus the<br />

poles are estimated <strong>in</strong> a completely different way. However, the results<br />

presented here <strong>in</strong>dicates that the N4SID estimate of Γα may not be the<br />

best choice, <strong>and</strong> this was also alluded to when the subspace estimation<br />

was discussed <strong>in</strong> Section 5.4.3. Monte Carlo type simulations <strong>in</strong>dicate<br />

that the N4SID algorithm [VD94b] has a performance similar to the shift<br />

<strong>in</strong>variance method based on the PO-MOESP subspace estimate.<br />

We may also compare the st<strong>and</strong>ard deviations <strong>in</strong> Fig. 5.3 <strong>and</strong> Fig. 5.4<br />

with the CRB. The CRB for this example was calculated to 0.176. Clearly<br />

all the surfaces are above this limit.<br />

In clos<strong>in</strong>g this example we rem<strong>in</strong>d the reader of Corollary 5.2 <strong>and</strong><br />

Corollary 5.4 which imply that the asymptotic accuracy of the pole estimates<br />

based on either the PO-MOESP or the CVA subspace estimate is<br />

identical.<br />

5.8 Conclusions<br />

We have used a l<strong>in</strong>ear regression formulation to l<strong>in</strong>k state-space subspace<br />

system identification (4SID) to traditional identification methods. This<br />

has been used to compare some cost-functions which arise implicitly <strong>in</strong><br />

the ord<strong>in</strong>ary framework of 4SID. From the cost-functions, estimates of


148 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

0.225<br />

0.22<br />

0.215<br />

0.21<br />

0.205<br />

0.2<br />

0.195<br />

0.19<br />

2<br />

0.225<br />

0.22<br />

0.215<br />

0.21<br />

0.205<br />

0.2<br />

0.195<br />

0.19<br />

2<br />

3<br />

3<br />

4<br />

4<br />

5<br />

5<br />

6<br />

α<br />

6<br />

α<br />

Figure 5.3: The normalized asymptotic st<strong>and</strong>ard deviation of the estimated<br />

pole as a function of α <strong>and</strong> β. The first plot is for the shift<br />

<strong>in</strong>variance method <strong>and</strong> the second for the optimally weighted subspace<br />

fitt<strong>in</strong>g method. The two pole estimation algorithms are based on the<br />

PO-MOESP estimate of Γα.<br />

7<br />

7<br />

8<br />

8<br />

1<br />

1<br />

2<br />

2<br />

3<br />

3<br />

4<br />

4<br />

β<br />

β<br />

5<br />

5<br />

6<br />

6<br />

7<br />

7


5.8 Conclusions 149<br />

0.6<br />

0.55<br />

0.5<br />

0.45<br />

0.4<br />

0.35<br />

0.3<br />

0.25<br />

0.2<br />

2<br />

0.5<br />

0.45<br />

0.4<br />

0.35<br />

0.3<br />

0.25<br />

0.2<br />

2<br />

3<br />

3<br />

4<br />

4<br />

5<br />

5<br />

6<br />

α<br />

6<br />

α<br />

7<br />

7<br />

Figure 5.4: The normalized asymptotic st<strong>and</strong>ard deviation of the estimated<br />

pole as a function of α <strong>and</strong> β. The first plot is for the shift<br />

<strong>in</strong>variance method <strong>and</strong> the second for the optimally weighted subspace<br />

fitt<strong>in</strong>g method. The two pole estimation algorithms are based on the<br />

N4SID estimate of Γα.<br />

8<br />

8<br />

2<br />

2<br />

4<br />

4<br />

β<br />

β<br />

6<br />

6<br />

8<br />

8


150 5 A L<strong>in</strong>ear Regression Approach to <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong><br />

the extended observability matrix have been derived <strong>and</strong> related to exist<strong>in</strong>g<br />

4SID methods. Based on these estimates, the estimation of certa<strong>in</strong><br />

parameters were analyzed. In particular the problem of estimat<strong>in</strong>g the<br />

poles was considered. The asymptotic properties of the estimates of two<br />

different methods were analyzed. The first method is common <strong>in</strong> 4SID<br />

<strong>and</strong> makes use of the shift <strong>in</strong>variance structure of the observability matrix,<br />

while the second method is a recently proposed weighted subspace<br />

fitt<strong>in</strong>g method. From these results, the choice of user-specified parameters<br />

was discussed, <strong>and</strong> it was shown that this choice <strong>in</strong>deed may not be<br />

obvious for the shift <strong>in</strong>variance method. This problem can be mitigated<br />

by us<strong>in</strong>g more of the structure of the system. It was also shown that a<br />

proposed row-weight<strong>in</strong>g matrix <strong>in</strong> the subspace estimation step does not<br />

affect the asymptotic properties of the estimates.


Chapter 6<br />

<strong>On</strong> Consistency of<br />

<strong>Subspace</strong> <strong>Methods</strong> for<br />

<strong>System</strong> <strong>Identification</strong><br />

<strong>Subspace</strong> methods for the identification of l<strong>in</strong>ear time-<strong>in</strong>variant dynamical<br />

systems typically consist of two ma<strong>in</strong> steps. First, a so-called subspace<br />

estimate is constructed. This first step usually consists of estimat<strong>in</strong>g the<br />

range space of the extended observability matrix. Secondly, an estimate<br />

of system parameters is obta<strong>in</strong>ed, based on the subspace estimate. In<br />

this chapter, the consistency of a large class of methods for estimat<strong>in</strong>g<br />

the extended observability matrix is analyzed. Persistence of excitation<br />

conditions on the <strong>in</strong>put signal are given which guarantee consistency for<br />

systems with only measurement noise. For systems <strong>in</strong>clud<strong>in</strong>g process<br />

noise, it is shown that a persistence of excitation condition on the <strong>in</strong>put<br />

is not sufficient. More precisely, an example for which the subspace methods<br />

fail to give a consistent estimate of the transfer function is given. This<br />

failure occurs even though the <strong>in</strong>put is persistently excit<strong>in</strong>g, to any order.<br />

It is also shown that this problem is elim<strong>in</strong>ated if stronger conditions are<br />

imposed on the <strong>in</strong>put signal.


152 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

6.1 Introduction<br />

The large <strong>in</strong>terest <strong>in</strong> subspace methods for system identification is motivated<br />

by the need for useful eng<strong>in</strong>eer<strong>in</strong>g tools to model l<strong>in</strong>ear multivariable<br />

dynamical systems [VD94b, Lar90, Ver94, Vib95]. These methods,<br />

sometimes referred to as 4SID (subspace state-space system identification)<br />

methods, are easy to apply to identification of multivariable systems.<br />

Reliable numerical algorithms form the basis of these methods.<br />

The statistical properties, however, are not yet fully understood. Initial<br />

work <strong>in</strong> this direction is exemplified by [VOWL91, PSD96, JW96a,<br />

VWO97, BDS97].<br />

Conditions for consistency of the methods <strong>in</strong> the literature have often<br />

<strong>in</strong>volved assumptions on the rank of certa<strong>in</strong> matrices used <strong>in</strong> the<br />

schemes. It would be more useful to have explicit conditions on the<br />

<strong>in</strong>put signals <strong>and</strong> on the system. The consistency of comb<strong>in</strong>ed determ<strong>in</strong>istic<br />

<strong>and</strong> stochastic subspace identification methods has been analyzed<br />

<strong>in</strong> [PSD96]. In that paper, consistency is proven under two assumptions.<br />

<strong>On</strong>e, is the assumption that the <strong>in</strong>put is an ARMA process. Secondly,<br />

it is assumed that a dimension parameter tends to <strong>in</strong>f<strong>in</strong>ity at a certa<strong>in</strong><br />

rate as the number of samples approaches <strong>in</strong>f<strong>in</strong>ity. This assumption is<br />

needed to consistently estimate the stochastic sub-system <strong>and</strong> the system<br />

order. <strong>Subspace</strong> methods are typically applied with fixed dimension<br />

parameters. A consistency analysis under these conditions is given <strong>in</strong><br />

[JW96b, JW97a]. The analysis is further extended <strong>in</strong> this chapter. The<br />

analysis focuses on the crucial first step of the subspace identification<br />

methods. That is, the focus is on estimat<strong>in</strong>g the extended observability<br />

matrix, or the “subspace estimation” step. The methods analyzed here<strong>in</strong>,<br />

are called Basic-4SID <strong>and</strong> IV-4SID.<br />

The Basic-4SID method [DVVM88, Liu92] has been analyzed <strong>in</strong><br />

[Gop69, VOWL91, Liu92, VD92]. Here<strong>in</strong>, we extend the analysis <strong>and</strong><br />

give persistence of excitation conditions on the <strong>in</strong>put signal. The more<br />

recent IV-4SID class of methods <strong>in</strong>clude N4SID [VD94b], MOESP [Ver94]<br />

<strong>and</strong> CVA [Lar90]. An open question regard<strong>in</strong>g the IV-4SID methods has<br />

been whether or not only a persistence of excitation condition on the<br />

<strong>in</strong>put is sufficient for consistency. The current contribution employs a<br />

counterexample to show that this is not the case. For systems with only<br />

measurement noise, however, it is possible to show that a persistently excit<strong>in</strong>g<br />

<strong>in</strong>put of a given order ensures consistency of the IV-4SID subspace<br />

estimate. For systems <strong>in</strong>clud<strong>in</strong>g process noise, IV-4SID methods may<br />

suffer from consistency problems. This is explicitly shown <strong>in</strong> the afore-


6.2 Prelim<strong>in</strong>aries 153<br />

mentioned counterexample. A simple proof shows that a white noise (or<br />

a low order ARMA) <strong>in</strong>put signal guarantees consistency <strong>in</strong> this case.<br />

The rest of the chapter is organized as follows. In Section 6.2, the<br />

system model is def<strong>in</strong>ed <strong>and</strong> general assumptions are stated. Some prelim<strong>in</strong>ary<br />

results are also collected <strong>in</strong> this section. In Section 6.3, the ideas<br />

of the Basic 4SID <strong>and</strong> IV-4SID methods are presented. In Section 6.4,<br />

the consistency of the Basic-4SID method is analyzed. In Section 6.5, the<br />

critical relation for consistency of IV-4SID methods is presented. Conditions<br />

for this relation to hold are given. In Section 6.6, the counterexample<br />

is given, describ<strong>in</strong>g a system for which the critical relation does<br />

not hold. In Section 6.7, some simulations are presented to illustrate the<br />

theoretical results. F<strong>in</strong>ally, conclusions are provided <strong>in</strong> Section 6.8.<br />

6.2 Prelim<strong>in</strong>aries<br />

Assume that the true system is l<strong>in</strong>ear, time-<strong>in</strong>variant <strong>and</strong> described by<br />

the follow<strong>in</strong>g state-space equations:<br />

x(t + 1) = Ax(t) + Bu(t) + w(t), (6.1a)<br />

y(t) = Cx(t) + Du(t) + e(t). (6.1b)<br />

Here, x(t) ∈ R n is the state-vector, u(t) ∈ R m consists of the observed<br />

<strong>in</strong>put signals, <strong>and</strong> y(t) ∈ R l conta<strong>in</strong>s the observed output signals. The<br />

system is perturbed by the process noise w(t) ∈ R n <strong>and</strong> the measurement<br />

noise e(t) ∈ R l .<br />

Let us <strong>in</strong>troduce the follow<strong>in</strong>g general assumptions:<br />

A1: The noise process [w T (t) e T (t)] T is a stationary, ergodic, zero-mean,<br />

white noise process with covariance matrix,<br />

� �� �T w(t) w(t)<br />

E{<br />

} =<br />

e(t) e(t)<br />

� rw rwe<br />

r T we re<br />

It will be convenient to <strong>in</strong>troduce an n×p matrix K such that rw =<br />

KK T , where p is the rank of rw. This implies that w(t) = Kv(t).<br />

Here, E{v(t)v T (t)} = Ip <strong>and</strong> Ip denotes a p × p identity matrix.<br />

A2: The <strong>in</strong>put u(t) is a quasi-stationary signal [Lju87]. The correlation<br />

sequence is def<strong>in</strong>ed by<br />

ru(τ) = Ē{u(t + τ)uT (t)}<br />

�<br />

.


154 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

where Ē is def<strong>in</strong>ed as<br />

1<br />

Ē{·} = lim<br />

N→∞ N<br />

N�<br />

E{·} .<br />

The <strong>in</strong>put u(t) <strong>and</strong> the noise process are assumed to be uncorrelated.<br />

A3: The system (6.1) is asymptotically stable. That is, the eigenvalues<br />

of A lie strictly <strong>in</strong>side the unit circle.<br />

A4: The description (6.1) is m<strong>in</strong>imal <strong>in</strong> the sense that the pair (A,C)<br />

is observable <strong>and</strong> the pair (A,[B K]) is reachable.<br />

The concept of persistently excit<strong>in</strong>g signals is def<strong>in</strong>ed <strong>in</strong> the follow<strong>in</strong>g<br />

way:<br />

Def<strong>in</strong>ition 6.1. A (quasi-stationary) signal u(t) is said to be persistently<br />

excit<strong>in</strong>g of order n, denoted by PE(n), if the follow<strong>in</strong>g matrix is<br />

positive def<strong>in</strong>ite;<br />

⎡<br />

ru(0)<br />

⎢ ru(−1)<br />

⎢ .<br />

⎣ .<br />

ru(1)<br />

ru(0)<br />

...<br />

. ..<br />

⎤<br />

ru(n − 1)<br />

ru(n − 2) ⎥<br />

.<br />

⎥<br />

. ⎦<br />

ru(1 − n) ru(2 − n) ... ru(0)<br />

.<br />

The follow<strong>in</strong>g two lemmas will be useful <strong>in</strong> the consistency analysis.<br />

Lemma 6.1. Consider the (block) matrix<br />

� �<br />

A B<br />

S = , (6.2)<br />

C D<br />

where A is n × n <strong>and</strong> D is a non-s<strong>in</strong>gular m × m matrix. Then<br />

t=1<br />

rank{S} = m + rank{A − BD −1 C}.<br />

Proof. The result follows directly from the identity,<br />

� �� � �<br />

−1 I −BD A B I 0<br />

0 I C D −D−1 � � �<br />

∆ 0<br />

= , (6.3)<br />

C I 0 D<br />

where ∆ = A − BD −1 C.<br />

Us<strong>in</strong>g (6.3), the next lemma is readily proven.<br />

Lemma 6.2. Let S be as <strong>in</strong> (6.2) but symmetric (C = B T ), <strong>and</strong> let D<br />

be positive def<strong>in</strong>ite. Conclude then, that S is positive (semi) def<strong>in</strong>ite if<br />

<strong>and</strong> only if (A − BD −1 B T ) is positive (semi) def<strong>in</strong>ite.


6.3 Background 155<br />

6.3 Background<br />

In this section, the basic equations used <strong>in</strong> subspace state-space system<br />

identification (4SID) methods are reviewed. Def<strong>in</strong>e the lα × 1 vector of<br />

stacked outputs as<br />

yα(t) =<br />

�<br />

y T (t),y T (t + 1),...,y T �T (t + α − 1)<br />

where α is a user-specified parameter which is required to be greater<br />

than the observability <strong>in</strong>dex (or, for simplicity, the system order n). In<br />

a similar manner, def<strong>in</strong>e vectors made of stacked <strong>in</strong>puts, process <strong>and</strong><br />

measurement noises as uα(t), wα(t) <strong>and</strong> eα(t), respectively. By iteration<br />

of the state equations <strong>in</strong> (6.1), it is straightforward to verify the follow<strong>in</strong>g<br />

equation for the stacked quantities;<br />

where<br />

yα(t) = Γαx(t) + Φαuα(t) + nα(t) , (6.4)<br />

nα(t) = Ψαwα(t) + eα(t) ,<br />

<strong>and</strong> where the follow<strong>in</strong>g block matrices have been <strong>in</strong>troduced:<br />

⎡<br />

⎢<br />

Γα = ⎢<br />

⎣<br />

C<br />

CA<br />

.<br />

CAα−1 ⎤<br />

⎥<br />

⎦ ,<br />

⎡<br />

D 0 · · · 0<br />

⎢<br />

Φα = ⎢<br />

⎣<br />

CB<br />

.<br />

D<br />

. ..<br />

. ..<br />

. ..<br />

0<br />

.<br />

CAα−2B CAα−3 ⎤<br />

⎥<br />

⎦<br />

B · · · D<br />

,<br />

⎡<br />

0 0 · · · 0<br />

⎢<br />

Ψα = ⎢<br />

⎣<br />

C<br />

.<br />

0<br />

. ..<br />

. ..<br />

. ..<br />

0<br />

.<br />

CAα−2 CAα−3 ⎤<br />

⎥<br />

⎦<br />

· · · 0<br />

.<br />

The key idea of subspace identification methods is to first estimate the ndimensional<br />

range space of the (extended) observability matrix Γα. Based<br />

,


156 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

on the first step, the system parameters are subsequently estimated <strong>in</strong><br />

various ways. For consistency of 4SID methods it is crucial that the<br />

estimate of Γα is consistent.<br />

There have been several suggestions <strong>in</strong> the literature for estimat<strong>in</strong>g<br />

Γα. Most of them, however, are closely related to one another<br />

[VD95c, Vib95]. To describe the methods considered, additional notation<br />

is needed. Assume there are N + α + β − 1 data samples available<br />

<strong>and</strong> <strong>in</strong>troduce the block Hankel matrices<br />

Yα = [yα(1 + β),...,yα(N + β)], (6.5)<br />

Yβ = [yβ(1),...,yβ(N)]. (6.6)<br />

The parameter β is to be chosen by the user. Intuitively, β can be<br />

<strong>in</strong>terpreted as the dimension of the observer used to estimate the current<br />

state [WJ94] (see also Chapter 1 <strong>and</strong> Chapter 5). The partition<strong>in</strong>g of<br />

data <strong>in</strong>to Yβ <strong>and</strong> Yα is sometimes referred to as “past” <strong>and</strong> “future”.<br />

Def<strong>in</strong>e Uα <strong>and</strong> Uβ as block Hankel matrices of <strong>in</strong>puts similar to (6.5)<br />

<strong>and</strong> (6.6), respectively. Compar<strong>in</strong>g (6.5) <strong>and</strong> (6.4), it is clear that Yα is<br />

given by the relation<br />

where<br />

Yα = ΓαXf + ΦαUα + Nα , (6.7)<br />

Xf = � x(1 + β) x(2 + β) ... x(N + β) � ,<br />

<strong>and</strong> Nα is def<strong>in</strong>ed from nα(t) similar to (6.5). Two methods of estimat<strong>in</strong>g<br />

the range space of Γα from data are presented next. The methods are<br />

based on (6.7).<br />

6.3.1 Basic 4SID<br />

The method described <strong>in</strong> this section is sometimes referred to as the<br />

Basic 4SID method [DVVM88, Liu92]. When consider<strong>in</strong>g this method,<br />

it is not necessary to partition the data <strong>in</strong>to past <strong>and</strong> future. Hence it<br />

can be assumed that β = 0. The idea is to study the residuals <strong>in</strong> the<br />

regression of yα(t) on uα(t) <strong>in</strong> (6.4). Introduce the orthogonal projection<br />

matrix on the null space of Uα as<br />

Π ⊥<br />

U T α = I − UT α(UαU T α) −1 Uα.


6.3 Background 157<br />

If the <strong>in</strong>put u(t) is PE(α), then the <strong>in</strong>verse of UαU T α exists for sufficiently<br />

large N. The residual vectors <strong>in</strong> the regression of yα(t) on uα(t) are now<br />

given by the columns of the follow<strong>in</strong>g expression:<br />

1<br />

√ YαΠ<br />

N ⊥<br />

UT 1<br />

= √ ΓαXfΠ<br />

α N ⊥<br />

UT 1<br />

+ √ NαΠ<br />

α N ⊥<br />

UT , (6.8)<br />

α<br />

where the normaliz<strong>in</strong>g factor 1/ √ N has been <strong>in</strong>troduced for later convenience.<br />

The approach taken <strong>in</strong> Basic 4SID to estimate the range space<br />

of Γα is the follow<strong>in</strong>g: reta<strong>in</strong> the n left pr<strong>in</strong>cipal s<strong>in</strong>gular vectors <strong>in</strong> the<br />

s<strong>in</strong>gular value decomposition (SVD) of (6.8). The Basic 4SID estimate<br />

of Γα is then given by<br />

where ˆ Qs is obta<strong>in</strong>ed from the SVD<br />

� �<br />

ˆQs ˆQn<br />

� ��<br />

Ss<br />

ˆ 0 ˆV T<br />

s<br />

0 Sn<br />

ˆ ˆV T �<br />

n<br />

ˆΓα = ˆ Qs , (6.9)<br />

= 1<br />

√ N YαΠ ⊥<br />

U T α .<br />

Here, ˆ Ss is a diagonal matrix conta<strong>in</strong><strong>in</strong>g the n largest s<strong>in</strong>gular values <strong>and</strong><br />

ˆQs conta<strong>in</strong>s the correspond<strong>in</strong>g left s<strong>in</strong>gular vectors.<br />

6.3.2 IV-4SID<br />

The second type of estimator that will be analyzed is called IV-4SID<br />

[VD94b, Lar90, Ver94]. The reason for this name is that this approach<br />

is most easily expla<strong>in</strong>ed <strong>in</strong> terms of <strong>in</strong>strumental variables (IVs) (see e.g.<br />

[Vib95, Gus97]). The advantage of IV-4SID (compared to Basic 4SID)<br />

is that IV-4SID can h<strong>and</strong>le more general noise correlations. The ma<strong>in</strong><br />

idea here is to use an <strong>in</strong>strumental variable, which not only removes the<br />

effect of uα(t) <strong>in</strong> (6.4) (as <strong>in</strong> the Basic 4SID), but also reduces the effect<br />

of nα(t). The first step is to remove the future <strong>in</strong>puts. This is the same<br />

as <strong>in</strong> Basic 4SID. The second step is to remove the noise term <strong>in</strong> (6.8).<br />

This is accomplished by multiply<strong>in</strong>g from the right by the “<strong>in</strong>strumental<br />

variable matrix”<br />

Pβ =<br />

� Yβ<br />

Uβ<br />

�<br />

. (6.10)<br />

Post-multiply<strong>in</strong>g both sides of (6.8) with the matrix <strong>in</strong> (6.10), <strong>and</strong> normaliz<strong>in</strong>g<br />

by 1/ √ N yields,<br />

1 ⊥<br />

YαΠU N T α PTβ = 1<br />

⊥<br />

ΓαXfΠU N T α PTβ + 1 ⊥<br />

NαΠU N T α PT β ,


158 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

� 1<br />

N<br />

� 1<br />

N<br />

ˆWc<br />

⊥<br />

PβΠUT α PT β<br />

⊥<br />

PβΠUT α PT �−1 �<br />

1<br />

β<br />

�<br />

1<br />

N PβPT β<br />

�<br />

1 ⊥<br />

N PβΠUT α PT β<br />

� −1/2<br />

N PβPT β<br />

�−1/2 � −1/2<br />

� 1/2<br />

ˆWr<br />

Method<br />

I PO-MOESP<br />

I N4SID<br />

�<br />

1<br />

N YαΠ ⊥<br />

UT α YT �−1/2 α<br />

�<br />

1 ⊥<br />

N YαΠUT α YT �−1/2 α<br />

IVM<br />

CVA<br />

Table 6.1: Weight<strong>in</strong>g matrices correspond<strong>in</strong>g to specific methods. Notice<br />

that some of the weight<strong>in</strong>gs may not be as they appear <strong>in</strong> the referred<br />

papers. These weight<strong>in</strong>gs, however, give estimates of Γα identical to those<br />

obta<strong>in</strong>ed us<strong>in</strong>g the orig<strong>in</strong>al choice of weight<strong>in</strong>g.<br />

where, due to the assumptions, the second term on the right tends to<br />

zero with probability one (w.p.1) as N tends to <strong>in</strong>f<strong>in</strong>ity (s<strong>in</strong>ce the <strong>in</strong>put<br />

<strong>and</strong> the noise process are uncorrelated, <strong>and</strong> s<strong>in</strong>ce there is a “time shift”<br />

between Nα <strong>and</strong> the noise <strong>in</strong> Yβ). A quite general estimate of Γα is then<br />

given by<br />

ˆΓα = ˆ W −1<br />

r ˆ Qs , (6.11)<br />

where ˆ Qs is the αl × n matrix made of the n left s<strong>in</strong>gular vectors of<br />

1<br />

N ˆ WrYαΠ ⊥<br />

U T α PT β ˆ Wc<br />

(6.12)<br />

correspond<strong>in</strong>g to the n largest s<strong>in</strong>gular values. The (data dependent)<br />

weight<strong>in</strong>g matrices ˆ Wr <strong>and</strong> ˆ Wc can be chosen <strong>in</strong> different ways to yield<br />

a class of estimators. This class <strong>in</strong>cludes most of the known methods appear<strong>in</strong>g<br />

<strong>in</strong> the literature; see Table 6.1, cf. [VD95c, Vib95]. The methods<br />

PO-MOESP, N4SID, IVM <strong>and</strong> CVA appear <strong>in</strong> [Ver94, VD94b, Vib95,<br />

Lar90], respectively. For some results regard<strong>in</strong>g the effect of the choice<br />

of the weight<strong>in</strong>g matrices, see [JW95, JW96a, VD95b].<br />

6.4 Analysis of Basic 4SID<br />

In this section, conditions are established by which the Basic 4SID estimate<br />

ˆ Γα <strong>in</strong> (6.9) becomes consistent. Here, consistency means that the


6.4 Analysis of Basic 4SID 159<br />

column span of ˆ Γα tends to the column span of Γα as N tends to <strong>in</strong>f<strong>in</strong>ity<br />

(w.p.1). This method has been analyzed previously (see [Gop69,<br />

VOWL91, Liu92, VD92]). This problem is reconsidered here, however,<br />

for two reasons. First, the proof to be presented appears to be more<br />

explicit than previous ones (known to the authors). Second, these results<br />

are useful <strong>in</strong> the analysis of the more <strong>in</strong>terest<strong>in</strong>g IV-based methods given<br />

<strong>in</strong> the next section.<br />

Recall Equation (6.4). No correlation exists between the noise nα(t)<br />

<strong>and</strong> uα(t) or x(t). It is therefore relatively straightforward to show that<br />

the sample covariance matrix of the residuals <strong>in</strong> (6.8)<br />

tends to<br />

ˆRε = 1 ⊥<br />

YαΠU N T α YT α<br />

Rε = Γα(rx − rxuR −1<br />

u r T xu)Γ T α + Rn<br />

(6.13)<br />

with probability one, as N → ∞. Here, the various correlation matrices<br />

are def<strong>in</strong>ed as:<br />

rx = Ē{x(t)xT (t)},<br />

rxu = Ē{x(t)uT α(t)},<br />

Ru = Ē{uα(t)u T α(t)},<br />

Rn = Ē{nα(t)n T α(t)}.<br />

Recall that u(t) is assumed to be PE(α). This implies that Ru is positive<br />

def<strong>in</strong>ite. From (6.13) it can be concluded that the estimate of Γα <strong>in</strong> (6.9)<br />

is consistent if <strong>and</strong> only if, 1<br />

Rn = σ 2 I, (6.14)<br />

rx − rxuR −1<br />

u r T xu > 0 (6.15)<br />

for some scalar σ 2 . If (6.14) <strong>and</strong> (6.15) hold, then the limit of ˆ Γα (as<br />

def<strong>in</strong>ed <strong>in</strong> (6.9)), is given by ΓαT for some non-s<strong>in</strong>gular n × n matrix<br />

T. The condition (6.14) is essentially only true for output error systems<br />

perturbed by a measurement noise of equal power <strong>in</strong> each output channel.<br />

1 There may exist particular cases when Basic 4SID is consistent even when Rn �=<br />

σ 2 I. However, these cases depend on the system parameters <strong>and</strong> are not of <strong>in</strong>terest<br />

here.


160 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

That is, if K = 0 <strong>and</strong> re = σ 2 I. In the follow<strong>in</strong>g it is shown under what<br />

conditions (on the <strong>in</strong>put signals) the condition (6.15) is fulfilled.<br />

In view of Lemma 6.2, the condition (6.15) is equivalent to<br />

Ē<br />

�� � � � �<br />

T<br />

x(t) x(t)<br />

=<br />

uα(t) uα(t)<br />

� rx rxu<br />

r T xu Ru<br />

�<br />

> 0. (6.16)<br />

Next, the state is decomposed <strong>in</strong>to two terms x(t) = x d (t)+x s (t), where<br />

the “determ<strong>in</strong>istic” term x d (t) is due to the observed <strong>in</strong>puts <strong>and</strong> the<br />

“stochastic” term x s (t) is driven by the process noise w(t). Due to assumption<br />

A2, no correlation exists between x s (t) <strong>and</strong> x d (t) or uα(t).<br />

Furthermore, <strong>in</strong>troduce a transfer function (operator) description of x d (t)<br />

as<br />

x d (t) = Fu (q −1 )<br />

a(q −1 ) u(t),<br />

where q −1 is the backward shift operator. That is, q −1 u(t) = u(t − 1).<br />

The polynomials F u (q −1 ) <strong>and</strong> a(q −1 ) are def<strong>in</strong>ed by the relations<br />

F u (q −1 ) =<br />

a(q −1 ) =<br />

Adj{qI − A}B<br />

q n<br />

= F u 1q −1 + F u 2q −2 + · · · + F u nq −n , (6.17)<br />

det{qI − A}<br />

q n<br />

= a0 + a1q −1 + · · · + anq −n , (a0 = 1), (6.18)<br />

where Adj{A} denotes the adjugate matrix of A <strong>and</strong> det{·} is the determ<strong>in</strong>ant<br />

function. The polynomial coefficients ai (i = 0,...,n) are scalars<br />

while all F u i (i = 1,...,n) are n × m matrices. Analogously to Fu (q −1 ),<br />

def<strong>in</strong>e the matrix polynomial F w (q −1 ), with K replac<strong>in</strong>g B:<br />

where F w i<br />

F w (q −1 ) = F w 1 q −1 + F w 2 q −2 + · · · + F w nq −n ,<br />

(i = 1,...,n) are n×p matrices <strong>and</strong> K is def<strong>in</strong>ed <strong>in</strong> assumption<br />

A1. Thus, xs (t) is given by the relation<br />

x s (t) = Fw (q −1 )<br />

a(q −1 ) v(t).


6.4 Analysis of Basic 4SID 161<br />

Also def<strong>in</strong>e the Sylvester-like matrix<br />

⎡<br />

F u n ... F u 1 0 ... 0 F w n ... F w 1<br />

⎢anIm<br />

⎢<br />

S = ⎢<br />

⎣<br />

...<br />

. ..<br />

a1Im a0Im<br />

. ..<br />

0 0<br />

.<br />

... 0 ⎥<br />

.<br />

⎥ . (6.19)<br />

. ⎦<br />

�<br />

0 anIm ... ...<br />

��<br />

a0Im 0 ... 0<br />

�<br />

(n+αm)×((α+n)m+np)<br />

Us<strong>in</strong>g the notation <strong>in</strong>troduced above, it can be verified that<br />

� �<br />

x(t) 1<br />

= S<br />

uα(t) a(q−1 ⎡ ⎤<br />

u(t − n)<br />

⎢ .<br />

⎢ .<br />

⎥<br />

⎢ . ⎥<br />

⎢<br />

⎢u(t<br />

+ α − 1) ⎥<br />

) ⎢ v(t − n) ⎥ .<br />

⎥<br />

⎢ . ⎥<br />

⎣ . ⎦<br />

v(t − 1)<br />

It is shown <strong>in</strong> Appendix A.14 that S has full row rank under assumption<br />

A4. This implies that the condition (6.16) is satisfied if the follow<strong>in</strong>g<br />

matrix is positive def<strong>in</strong>ite:<br />

⎧<br />

⎪⎨<br />

1<br />

Ē<br />

a(q<br />

⎪⎩<br />

−1 ⎡ ⎤<br />

u(t − n)<br />

⎢ .<br />

⎢ .<br />

⎥<br />

⎢ . ⎥<br />

⎢<br />

⎢u(t<br />

+ α − 1) ⎥<br />

1<br />

) ⎢ v(t − n) ⎥ a(q<br />

⎢ . ⎥<br />

⎣ . ⎦<br />

v(t − 1)<br />

−1 ⎡ ⎤T<br />

u(t − n)<br />

⎢ .<br />

⎢ .<br />

⎥<br />

⎢ . ⎥<br />

⎢<br />

⎢u(t<br />

+ α − 1) ⎥<br />

) ⎢ v(t − n) ⎥<br />

⎢ . ⎥<br />

⎣ . ⎦<br />

v(t − 1)<br />

⎫⎪ ⎬<br />

. (6.20)<br />

⎪⎭<br />

S<strong>in</strong>ce u(t) <strong>and</strong> v(t) are uncorrelated, the covariance matrix <strong>in</strong> (6.20) will<br />

be block diagonal. The lower diagonal block is positive def<strong>in</strong>ite s<strong>in</strong>ce<br />

E{v(t)v T (τ)} = Iδt,τ. The upper diagonal block can be shown to be<br />

positive def<strong>in</strong>ite if u(t) is PE(α+n); see Lemma A.3.7 <strong>in</strong> [SS83], modified<br />

to quasi-stationary signals. In conclusion, the follow<strong>in</strong>g theorem has been<br />

proven.<br />

Theorem 6.1. Under the general assumptions A1-A4, the Basic 4SID<br />

subspace estimate (6.9) is consistent if the follow<strong>in</strong>g conditions are true:<br />

1. The covariance matrix of the stacked noise is proportional to the<br />

identity matrix. That is, Rn = σ 2 I.<br />


162 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

2. The <strong>in</strong>put signal u(t) is persistently excit<strong>in</strong>g of order α + n.<br />

Remark 6.1. As previously mentioned, condition (1) of the theorem is<br />

essentially only true if K = 0 <strong>and</strong> if re = σ 2 I. Furthermore, K = 0<br />

implies that the system must be reachable from u(t).<br />

6.5 Analysis of IV-4SID<br />

In this section, conditions for the IV-4SID estimate ˆ Γα <strong>in</strong> (6.11) to be<br />

consistent are established. As N → ∞, the limit of the quantity <strong>in</strong> (6.12)<br />

is w.p.1 given by<br />

WrΓα(rxp − rxuR −1<br />

u Rup)Wc, (6.21)<br />

where Wr <strong>and</strong> Wc are the limits of ˆ Wr <strong>and</strong> ˆ Wc, respectively. The<br />

correlation matrices appear<strong>in</strong>g <strong>in</strong> (6.21) are def<strong>in</strong>ed as follows:<br />

rxp = Ē{x(t + β)pT β (t)},<br />

rxu = Ē{x(t)uT α(t)},<br />

Ru = Ē{uα(t)u T α(t)},<br />

Rup = Ē{uα(t + β)p T β (t)},<br />

where pβ(t), t = 1, ..., N, are def<strong>in</strong>ed as the columns of Pβ <strong>in</strong> (6.10).<br />

If the weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc are positive def<strong>in</strong>ite, the estimate<br />

(6.11) is consistent if <strong>and</strong> only if the follow<strong>in</strong>g matrix has full rank equal<br />

to n:<br />

rxp − rxuR −1<br />

u Rup. (6.22)<br />

This is because the limit<strong>in</strong>g pr<strong>in</strong>cipal left s<strong>in</strong>gular vectors of (6.12) will<br />

be equal to WrΓαT for some non-s<strong>in</strong>gular n × n matrix T. Hence, the<br />

limit of ˆ Γα <strong>in</strong> (6.11) is ΓαT. In view of Lemma 6.1, the condition that<br />

(6.22) should have full rank can be reformulated as<br />

rank Ē<br />

⎧<br />

⎪⎨ � �<br />

x(t + β)<br />

⎪⎩ uα(t + β)<br />

⎡ ⎤T<br />

yβ(t)<br />

⎣ uβ(t) ⎦<br />

uα(t + β)<br />

⎫ ⎪⎬ = n + αm. (6.23)<br />

⎪⎭<br />

This is the critical relation for consistency of IV-4SID methods.


6.5 Analysis of IV-4SID 163<br />

Method Order of PE Condition on noise<br />

(t)} > 0<br />

N4SID α + β E{nβ(t)nT IVM ξ � max(α,β)<br />

β (t)} > 0<br />

E{nξ(t)nT ξ (t)} > 0<br />

(t)} > 0<br />

PO-MOESP α + β E{nβ(t)n T β<br />

CVA α + β E{nξ(t)n T ξ<br />

Table 6.2: Sufficient conditions for the weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc<br />

to be positive def<strong>in</strong>ite.<br />

Let us beg<strong>in</strong> by deriv<strong>in</strong>g conditions which ensure that the weight<strong>in</strong>g<br />

matrices <strong>in</strong> Table 6.1 exist (i.e., are positive def<strong>in</strong>ite). Consider for exam-<br />

ple the limit of the column weight<strong>in</strong>g Wc for the PO-MOESP method.<br />

/N is non-s<strong>in</strong>gular for large N.<br />

We need to establish that PβΠ ⊥<br />

UT α PT β<br />

Us<strong>in</strong>g Lemma 6.2, the condition W−2 c > 0 can be written as<br />

⎧⎡<br />

⎤ ⎡ ⎤T<br />

⎪⎨ yβ(t) yβ(t)<br />

Ē ⎣ uβ(t) ⎦ ⎣ uβ(t) ⎦<br />

⎪⎩ uα(t + β) uα(t + β)<br />

⎫ ⎪⎬ > 0. (6.24)<br />

⎪⎭<br />

This is recognized as a similar requirement appear<strong>in</strong>g <strong>in</strong> least-squares l<strong>in</strong>ear<br />

regression estimation [SS89, Complement C5.1]. The condition (6.24)<br />

holds if u(t) is PE(α+β) <strong>and</strong> if E{nβ(t)nT β (t)} > 0. Similarly, conditions<br />

for all cases <strong>in</strong> Table 6.1 can be obta<strong>in</strong>ed. The result is summarized <strong>in</strong><br />

Table 6.2.<br />

Hav<strong>in</strong>g established that the weight<strong>in</strong>g matrices are positive def<strong>in</strong>ite,<br />

condition (6.23) may now be exam<strong>in</strong>ed. The correlation matrix <strong>in</strong> (6.23)<br />

can be written as the sum of two terms:<br />

⎧<br />

⎪⎨ � �<br />

d x (t + β)<br />

Ē<br />

⎪⎩ uα(t + β)<br />

⎡<br />

y<br />

⎣<br />

d β (t)<br />

⎤T<br />

uβ(t) ⎦<br />

uα(t + β)<br />

⎫ ⎧<br />

⎪⎬ ⎪⎨ � �<br />

s x (t + β)<br />

+ Ē<br />

⎪⎭ ⎪⎩ 0<br />

⎡<br />

y<br />

⎣<br />

s β (t)<br />

⎤T<br />

0 ⎦<br />

0<br />

⎫ ⎪⎬ ⎪⎭ ,<br />

(6.25)<br />

where the superscripts (·) d <strong>and</strong> (·) s denote the determ<strong>in</strong>istic part <strong>and</strong><br />

the stochastic part, respectively. Below we show that the first term <strong>in</strong><br />

(6.25) has full row rank under a PE condition on u(t). This implies that<br />

(6.23) is fulfilled for systems with no process noise (s<strong>in</strong>ce x s (t) ≡ 0 <strong>in</strong><br />

that case). For systems with process noise, one may <strong>in</strong> exceptional cases<br />

lose rank when add<strong>in</strong>g the two terms <strong>in</strong> (6.25). <strong>On</strong>e example of such a


164 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

situation is given <strong>in</strong> the next section. Generally, loss of rank is unlikely<br />

to occur (at least if βl ≫ n). If βl ≫ n then the matrix <strong>in</strong> (6.25) has<br />

many more columns than rows. Hence it is more likely that the matrix<br />

has full rank. This last po<strong>in</strong>t can be made more strict (see Theorem 6.5<br />

<strong>and</strong> [PSD96]).<br />

In the follow<strong>in</strong>g it is shown that the first term <strong>in</strong> (6.25) has full row<br />

rank under certa<strong>in</strong> conditions. It can be verified that the correlation<br />

matrix is equivalent to;<br />

� �<br />

β A Cβ(A,B) 0<br />

0 0 Iαm<br />

× Ē<br />

⎧⎡<br />

⎪⎨<br />

⎣<br />

⎪⎩<br />

xd ⎤ ⎡<br />

(t)<br />

uβ(t) ⎦ ⎣<br />

uα(t + β)<br />

xd ⎤T<br />

(t)<br />

uβ(t) ⎦<br />

uα(t + β)<br />

⎫ ⎡<br />

⎪⎬ ⎣<br />

⎪⎭<br />

Γ T β 0 0<br />

Φ T β Iβm 0<br />

0 0 Iαm<br />

where Cβ(A,B) denotes the reversed reachability matrix. That is,<br />

⎤<br />

⎦, (6.26)<br />

Cβ(A,B) = � A β−1 B A β−2 B ... B � . (6.27)<br />

Clearly, the first factor has full row rank (n + αm) if the system (6.1)<br />

is reachable from u(t) <strong>and</strong> if β is greater than or equal to the controllability<br />

<strong>in</strong>dex. Notice that the last factor has full row rank, provided β is<br />

greater than or equal to the observability <strong>in</strong>dex. A sufficient condition is<br />

then obta<strong>in</strong>ed if the covariance matrix <strong>in</strong> the middle of (6.26) is positive<br />

def<strong>in</strong>ite. We have<br />

⎧⎡<br />

⎪⎨<br />

Ē ⎣<br />

⎪⎩<br />

xd ⎤ ⎡<br />

(t)<br />

uβ(t) ⎦ ⎣<br />

uα(t + β)<br />

xd ⎤T<br />

(t)<br />

uβ(t) ⎦<br />

uα(t + β)<br />

⎫ ⎪⎬ ⎪⎭<br />

= S Ē<br />

��<br />

1<br />

a(q−1 ) uα+β+n(t<br />

� �<br />

1<br />

− n)<br />

a(q−1 ) uT ��<br />

α+β+n(t − n) S T ,<br />

where<br />

⎡<br />

F u n ... F u 1 0 ... 0<br />

(6.28)<br />

⎢anIm<br />

⎢<br />

S = ⎢<br />

⎣<br />

...<br />

. ..<br />

a1Im a0Im<br />

. ..<br />

0 ⎥ .<br />

⎦<br />

(6.29)<br />

�<br />

0 anIm ...<br />

��<br />

... a0Im<br />

�<br />

(n+αm+βm)×((α+β+n)m)<br />


6.5 Analysis of IV-4SID 165<br />

If the system is reachable from u(t) then S is full row rank accord<strong>in</strong>g to<br />

Appendix A.14. The matrix <strong>in</strong> (6.28) then, is positive def<strong>in</strong>ite if u(t) is<br />

PE(α + β + n). The result is summarized <strong>in</strong> the follow<strong>in</strong>g theorem.<br />

Theorem 6.2. The IV-4SID subspace estimate ˆ Γα given by (6.11) is<br />

consistent if the general assumptions A1-A4 apply <strong>in</strong> addition to the<br />

follow<strong>in</strong>g conditions:<br />

1. (A, B) is reachable.<br />

2. β ≥ max(observability <strong>in</strong>dex, controllability <strong>in</strong>dex).<br />

3. rw = 0.<br />

4. re > 0.<br />

5. The <strong>in</strong>put is persistently excit<strong>in</strong>g of order α + β + n.<br />

Remark 6.2. The condition rw = 0 ensures that the second term <strong>in</strong> (6.25)<br />

is zero. Furthermore, the conditions for the weight<strong>in</strong>g matrices <strong>in</strong> Table<br />

6.2 are fulfilled if the last two conditions hold.<br />

We conclude this section by giv<strong>in</strong>g three alternate consistency theorems<br />

for IV-4SID. The first relaxes the order of PE on u(t) for s<strong>in</strong>gle<br />

<strong>in</strong>put systems, while the second strengthens the conditions on u(t) <strong>in</strong> order<br />

to prove consistency for systems with process noise. F<strong>in</strong>ally, the third<br />

theorem considers s<strong>in</strong>gle <strong>in</strong>put ARMA (autoregressive mov<strong>in</strong>g average)<br />

signals.<br />

Theorem 6.3. For s<strong>in</strong>gle <strong>in</strong>put systems, the IV-4SID subspace estimate<br />

ˆΓα given by (6.11) is consistent if the general assumptions A1-A4 apply<br />

<strong>in</strong> addition to the follow<strong>in</strong>g conditions:<br />

1. (A, B) is reachable.<br />

2. β ≥ the observability <strong>in</strong>dex.<br />

3. rw = 0.<br />

4. re > 0.<br />

5. The <strong>in</strong>put is persistently excit<strong>in</strong>g of order α + n.


166 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

Proof. Consider a s<strong>in</strong>gle <strong>in</strong>put system <strong>and</strong> def<strong>in</strong>e the vector polynomial<br />

<strong>in</strong> q −1 :<br />

where F y<br />

i<br />

F y (q −1 ) =<br />

CAdj{qI − A}B<br />

qn + D<br />

= F y<br />

0 + Fy1<br />

q−1 + F y<br />

2q−2 + · · · + F y nq −n ,<br />

(i = 0, ..., n) are l × 1 vectors. This implies that<br />

y d (t) =<br />

�n k=0 Fy<br />

kq−k �n k=0<br />

akq −k u(t), (6.30)<br />

where {ai} is def<strong>in</strong>ed <strong>in</strong> (6.18). Next, def<strong>in</strong>e the Sylvester matrix<br />

⎡<br />

F<br />

⎢<br />

Sy = ⎢<br />

⎣<br />

y n ... F y<br />

0 0 0 ... 0<br />

. .. . ..<br />

.<br />

.<br />

.<br />

.<br />

0 Fy n ... F y<br />

⎤ ⎫<br />

⎪⎬<br />

⎥ βl rows<br />

⎥ ⎪⎭<br />

0 0 ... 0 ⎥ ⎫<br />

an ... a0 0 0 ... 0 ⎥ ⎪⎬<br />

. .. . ..<br />

. . ⎥<br />

. . ⎥ β rows . (6.31)<br />

⎥ ⎪⎭<br />

0 an ... a0 0 ... 0 ⎥ ⎫<br />

0 ... 0 an ... a0 0 ⎥ ⎪⎬<br />

.<br />

. .<br />

.<br />

.. . ⎥<br />

.. ⎦ α rows<br />

⎪⎭<br />

0 ... 0 0 an ... a0<br />

� �� �<br />

(α+β+βl)×(n+α+β)<br />

It can be shown that the rank of Sy is n + α + β if the system is observable<br />

<strong>and</strong> reachable from u(t), <strong>and</strong> if β is greater than or equal to the<br />

observability <strong>in</strong>dex [AJ76, BKAK78]. A straightforward (simple) proof<br />

of this fact is given <strong>in</strong> Appendix A.15, us<strong>in</strong>g the notation <strong>in</strong>troduced <strong>in</strong><br />

this chapter. The first term <strong>in</strong> (6.25) can now be written as<br />

⎧<br />

⎪⎨ � �<br />

d x (t + β)<br />

Ē<br />

⎪⎩ uα(t + β)<br />

⎡ ⎤T<br />

⎣ ⎦<br />

⎫ ⎪⎬ ⎪⎭ = SRUS T y , (6.32)<br />

y d β (t)<br />

uβ(t)<br />

uα(t + β)<br />

where<br />

RU = Ē<br />

�<br />

1<br />

a(q−1 ) un+α(t<br />

1<br />

+ β − n)<br />

a(q−1 ) uT �<br />

α+β+n(t − n) .


6.5 Analysis of IV-4SID 167<br />

The covariance matrix RU is full row rank if u is PE(α + n). It is then<br />

clear that (6.32) is full rank s<strong>in</strong>ce S is non-s<strong>in</strong>gular <strong>and</strong> Sy is full column<br />

rank.<br />

Therefore, for s<strong>in</strong>gle <strong>in</strong>put systems we can relax the assumption about<br />

the <strong>in</strong>put signal. The proof of Theorem 6.3 does not carry over to the<br />

multi-<strong>in</strong>put case because Sy is not full column rank if m > 1 (see Appendix<br />

A.15).<br />

Next, we give the follow<strong>in</strong>g consistency result concern<strong>in</strong>g the case<br />

when the <strong>in</strong>put is a white noise process.<br />

Theorem 6.4. Let the <strong>in</strong>put u(t) to the system <strong>in</strong> (6.1) be a zero-mean<br />

white sequence <strong>in</strong> the sense that ru(τ) = ru(0)δτ,0 <strong>and</strong> r(0) > 0. Assume<br />

that A1-A4 hold, the weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc are positive<br />

def<strong>in</strong>ite, <strong>and</strong><br />

Here, Cβ(A,B) is def<strong>in</strong>ed <strong>in</strong> (6.27),<br />

rank �� Cβ(A,B) Cβ(A,G) �� = n. (6.33)<br />

Cβ(A,G) = � A β−1 G A β−2 G ... G � ,<br />

<strong>and</strong> the matrix G is def<strong>in</strong>ed as<br />

G = Ē{x(t + 1)yT (t)}.<br />

Given the above assumptions, the IV-4SID subspace estimate ˆ Γα def<strong>in</strong>ed<br />

<strong>in</strong> (6.11) is consistent.<br />

Proof. Consider the correlation matrix <strong>in</strong> (6.23):<br />

⎧<br />

⎪⎨ � �<br />

x(t + β)<br />

Ē<br />

⎪⎩ uα(t + β)<br />

⎡ ⎤T<br />

yβ(t)<br />

⎣ uβ(t) ⎦<br />

uα(t + β)<br />

⎫ ⎪⎬ ⎪⎭<br />

�<br />

T Ē{x(t + β)yβ (t)} Ē{x(t + β)u<br />

=<br />

T �<br />

β (t)} 0<br />

, (6.34)<br />

0 0 Ruα<br />

where<br />

Ruα = Ē{uα(t)u T α(t)} = Iα ⊗ ru(0).<br />

The symbol ⊗ denotes the Kronecker product. In (6.34), we used the<br />

fact that Ē{x(t)uT (t + k)} = 0 for k ≥ 0 s<strong>in</strong>ce u(t) is a white sequence.


168 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

Notice that Ruα > 0 s<strong>in</strong>ce ru(0) > 0. Under the given assumptions it is<br />

readily proven that<br />

<strong>and</strong><br />

Ē{x(t + β)y T β (t)} = Cβ(A,G),<br />

Ē{x(t + β)u T β (t)} = Cβ(A,B)(Iβ ⊗ ru(0)) .<br />

We conclude that the matrix <strong>in</strong> (6.34) is full row rank under the reachability<br />

assumption given <strong>in</strong> (6.33). This concludes the proof.<br />

Notice that Theorem 6.4 guarantees consistency even for systems with<br />

process noise. Also observe that we here were able to relax the reachability<br />

condition on (A,B) compared to the previous theorems. Some<br />

other results for the case with a white noise <strong>in</strong>put signal can be found <strong>in</strong><br />

[Ver93].<br />

F<strong>in</strong>ally, we give a result for s<strong>in</strong>gle <strong>in</strong>put ARMA signals.<br />

Theorem 6.5. Assume that the scalar (m = 1) <strong>in</strong>put u(t) to the system<br />

<strong>in</strong> (6.1) is generated by the ARMA model<br />

F(q −1 )u(t) = G(q −1 )ε(t),<br />

where ε(t) is a white noise process. Here, F(z) <strong>and</strong> G(z) are relatively<br />

prime polynomials of degree nF <strong>and</strong> nG, respectively, <strong>and</strong> they have all<br />

zeros outside the unit circle. Furthermore, assume that A1-A4 hold, the<br />

weight<strong>in</strong>g matrices Wr <strong>and</strong> Wc are positive def<strong>in</strong>ite, <strong>and</strong><br />

1. (A,B) is reachable.<br />

2. β ≥ n<br />

3. nG ≤ β − n<br />

4. nF ≤ α + β − n<br />

Given the above assumptions, the IV-4SID subspace estimate ˆ Γα <strong>in</strong> (6.11)<br />

is consistent.<br />

Proof. Study the last two block columns of (6.23). Clearly, if<br />

rank Ē<br />

�� � � � �<br />

T<br />

x(t + β) uβ(t)<br />

= n + αm (6.35)<br />

uα(t + β) uα(t + β)


6.6 Counterexample 169<br />

holds, then so does (6.23). Us<strong>in</strong>g the notation <strong>in</strong>troduced <strong>in</strong> (6.29), we<br />

can write the correlation matrix <strong>in</strong> (6.35) as<br />

SĒ �<br />

� 1<br />

a(q−1 ) un+α(t + β − n) � u T �<br />

α+β(t) , (6.36)<br />

where S is (n + α) × (n + α) <strong>and</strong> non-s<strong>in</strong>gular s<strong>in</strong>ce (A,B) is reachable<br />

(see Appendix A.14). Lemma A3.8 <strong>in</strong> [SS83] shows that the correlation<br />

matrix appear<strong>in</strong>g <strong>in</strong> (6.36) is full row rank if<br />

max(α + n + nG,n + nF) ≤ α + β, (6.37)<br />

which can be easily transformed to the conditions of the theorem.<br />

Notice that the theorem holds for arbitrarily colored noise as long as it<br />

is uncorrelated with the <strong>in</strong>put. With some more effort the theorem can<br />

be extended to the multi-<strong>in</strong>put case by modify<strong>in</strong>g Lemma A3.8 <strong>in</strong> [SS83].<br />

For example, the theorem holds if the different <strong>in</strong>puts are <strong>in</strong>dependent<br />

ARMA signals. The result of the theorem is very <strong>in</strong>terest<strong>in</strong>g s<strong>in</strong>ce it gives<br />

a quantification of the statement that “for large enough β, the matrix<br />

<strong>in</strong> (6.25) is full rank”. Also observe that an AR-<strong>in</strong>put of arbitrarily high<br />

order is allowed if α is chosen large enough.<br />

6.6 Counterexample<br />

The purpose of this section is to present an example of a system for which<br />

the IV-4SID methods are not consistent. We have shown that a necessary<br />

condition for consistency is the validity of (6.23). In the discussion<br />

follow<strong>in</strong>g (6.25), it was also po<strong>in</strong>ted out that consistency problems may<br />

occur for systems with process noise. In this section, we give an explicit<br />

example for which the rank drops when the two terms <strong>in</strong> (6.25) are added.<br />

Consider the cross correlation matrix appear<strong>in</strong>g <strong>in</strong> (6.23) which with<br />

obvious notation, reads<br />

� Rx d y d β + Rx s y s β R x d uβ R x d uα<br />

R uαy d β<br />

Ruαuβ Ruαuα<br />

The construction of a system for which this matrix loses rank is divided<br />

<strong>in</strong>to two major steps:<br />

�<br />

.


170 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

1. F<strong>in</strong>d a vector v such that<br />

2. Design Rx s y s β<br />

v T<br />

such that<br />

v T<br />

� Rx d uβ R x d uα<br />

Ruαuβ Ruαuα<br />

� Rx d y d β + Rx s y s β<br />

R uαy d β<br />

�<br />

= 0.<br />

�<br />

= 0. (6.38)<br />

The first step turns out to be similar to a construction used <strong>in</strong> the analysis<br />

of <strong>in</strong>strumental variable methods (see [SS81, SS83] <strong>and</strong> the references<br />

there<strong>in</strong>). <strong>On</strong>ce a solution to the first step is found, the second step is<br />

a matter of <strong>in</strong>vestigat<strong>in</strong>g whether or not a stochastic subsystem exists,<br />

generat<strong>in</strong>g a valid Rx s y s β .<br />

Let us study the first step <strong>in</strong> more detail. For notational convenience,<br />

def<strong>in</strong>e the covariance matrix<br />

Pη×ν(t1,t2) = Ē<br />

�<br />

� 1<br />

a(q−1 ) uη(t − t1) � u T �<br />

ν (t − t2) .<br />

The correlation matrix of <strong>in</strong>terest <strong>in</strong> the first step can be written as<br />

�<br />

Rxduβ Rxd �<br />

uα = SP (α+n)×(α+β)(n − β,0),<br />

Ruαuβ Ruαuα<br />

where S is def<strong>in</strong>ed <strong>in</strong> (6.29). Here, however, it is of dimension (n+αm)×<br />

((α+n)m). The rank properties of Pη×ν(0,0) have been studied <strong>in</strong> [SS83].<br />

P is full rank either if a is positive real <strong>and</strong> u(t) is persistently excit<strong>in</strong>g,<br />

or if u(t) is an ARMA-process of sufficiently low order (e.g., white noise,<br />

cf. Theorem 6.5). Here, we consider an a(q −1 ) that is not positive real to<br />

make P rank deficient. This implies that n ≥ 2. Let us fix the follow<strong>in</strong>g<br />

parameters: n = 2, m = 1, l = 1, β = 2 <strong>and</strong> α = 3. If (A,B) is<br />

reachable then S is non-s<strong>in</strong>gular (see Appendix A.14). The first step<br />

can then be solved if there exists a vector g such that g T P5×5(0,0) = 0.<br />

A solution to this can be found by us<strong>in</strong>g the same setup as that <strong>in</strong> the<br />

counterexample ([SS83]) for an <strong>in</strong>strumental variable method. Choose a<br />

pole polynomial <strong>and</strong> an <strong>in</strong>put process accord<strong>in</strong>g to:<br />

a(q −1 ) = (1 − γq −1 ) 2 , (6.39)<br />

u(t) = (1 − γq −1 ) 2 (1 + γq −1 ) 2 ε(t), (6.40)


6.6 Counterexample 171<br />

where ε(t) is a white noise process of unit variance. Notice that γ should<br />

have a modulus less than one for stability <strong>and</strong> that the <strong>in</strong>put is persistently<br />

excit<strong>in</strong>g to any order. Now, a straightforward calculation leads<br />

to<br />

⎡<br />

⎢<br />

P5×5(0,0) = ⎢<br />

⎣<br />

1 − 2γ 4 −4γ 3 γ 6 − 2γ 2 2γ 5 γ 4<br />

2γ 1 − 2γ 4 −4γ 3 γ 6 − 2γ 2 2γ 5<br />

γ 2 2γ 1 − 2γ 4 −4γ 3 γ 6 − 2γ 2<br />

0 γ 2 2γ 1 − 2γ 4 −4γ 3<br />

0 0 γ 2 2γ 1 − 2γ 4<br />

This matrix is s<strong>in</strong>gular if γ is a solution to<br />

det(P5×5(0,0)) = 1 + 4γ 4 + 10γ 8 − 8γ 12 − 15γ 16 − 12γ 20 = 0. (6.41)<br />

The polynomial has two real roots for γ ≈ ±0.91837. Choose, for example,<br />

the positive root for γ <strong>and</strong> g as the left eigenvector correspond<strong>in</strong>g<br />

to the zero eigenvalue of P. This provides a solution to the first step<br />

(namely, v T = g T S −1 ). The solution depends on the matrix B. Until<br />

now, we have fixed the poles of the system <strong>and</strong> a specific <strong>in</strong>put process.<br />

Remark 6.3. Observe that a subspace method that only uses the <strong>in</strong>puts<br />

Uβ as <strong>in</strong>strumental variables (cf. (6.10)) is not consistent for systems<br />

satisfy<strong>in</strong>g the first step. In other words, the past <strong>in</strong>put MOESP method<br />

(see [Ver94]) is not consistent if the poles <strong>and</strong> the <strong>in</strong>put process are related<br />

as <strong>in</strong> (6.39)-(6.40) <strong>and</strong> if γ is a solution to (6.41). It is readily checked<br />

that the conditions of Theorem 6.5 is violated.<br />

The free parameters <strong>in</strong> the second step are B, C, D, <strong>and</strong> those<br />

specify<strong>in</strong>g the stochastic subsystem, with constra<strong>in</strong>ts on m<strong>in</strong>imality of<br />

the system description. Somewhat arbitrarily, 2 we make the choices:<br />

B = [1 −2] T , C = [2 −1] <strong>and</strong> D = 0. Consider the follow<strong>in</strong>g description<br />

of the system <strong>in</strong> state space form:<br />

� �<br />

2 2γ −γ<br />

x(t + 1) = x(t) +<br />

1 0<br />

� �<br />

1<br />

u(t) +<br />

−2<br />

� k1<br />

k2<br />

⎤<br />

⎥<br />

⎦ .<br />

�<br />

e(t), (6.42a)<br />

y(t) = � 2 −1 � x(t) + e(t), (6.42b)<br />

where the variance of the noise process e(t) is re. With the choices made,<br />

the system is observable <strong>and</strong> reachable from u(t). For a given γ, (6.38)<br />

2 For some choices of B, C <strong>and</strong> D we were not able to f<strong>in</strong>d a solution to the second<br />

step. For other choices of C, mak<strong>in</strong>g the system “less observable”, we found solutions<br />

with lower noise variance.


172 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

100<br />

80<br />

60<br />

η1 40<br />

20<br />

0.9183<br />

0.91835<br />

0.9184<br />

0.91845<br />

0.9185 0.79 0.8<br />

0<br />

γ<br />

0.81<br />

0.82<br />

krel<br />

0.83<br />

0.84<br />

Figure 6.1: The graph shows η 1 as a function of γ <strong>and</strong> krel.<br />

now consists of two non-l<strong>in</strong>ear equations <strong>in</strong> k1, k2 <strong>and</strong> re. The equations<br />

are, however, l<strong>in</strong>ear <strong>in</strong> k1re, k2re, k 2 1re, k 2 2re <strong>and</strong> k1k2re. If we change<br />

variables accord<strong>in</strong>g to:<br />

krel = k1<br />

k2<br />

,<br />

η 1 = k 2 2re,<br />

η 2 = k2re,<br />

then the two non-l<strong>in</strong>ear equations become l<strong>in</strong>ear <strong>in</strong> η 1 <strong>and</strong> η 2. Thus,<br />

given γ <strong>and</strong> krel, we can easily solve for η 1 <strong>and</strong> η 2. Observe that η 1 must<br />

be positive to be a valid solution. A plot of η 1 as a function of γ <strong>and</strong> krel<br />

is shown <strong>in</strong> Figure 6.1. The solution η 1 is positive when krel is between<br />

≈ 0.7955 <strong>and</strong> (4γ 3 −3γ 2 +1)/(2γ 2 −2γ+2) ≈ 0.8475 (η 1 is discont<strong>in</strong>uous<br />

at this po<strong>in</strong>t). Let us, for example, choose the solution correspond<strong>in</strong>g to<br />

krel = 0.825. The parameters become: γ ≈ 0.9184, re ≈ 217.1, k1 ≈<br />

−0.1542 <strong>and</strong> k2 ≈ −0.1869. Thus, feed<strong>in</strong>g the system (6.42) with the<br />

<strong>in</strong>put (6.40) <strong>and</strong> us<strong>in</strong>g α = 3 <strong>and</strong> β = 2 will make the matrix <strong>in</strong> (6.23)


6.7 Numerical Examples 173<br />

rank deficient.<br />

We end this section with some short comments about the counterexample.<br />

Indeed, the example presented <strong>in</strong> this section is very special.<br />

However, the existence of these k<strong>in</strong>d of examples <strong>in</strong>dicates that the subspace<br />

methods may have a very poor performance <strong>in</strong> certa<strong>in</strong> situations.<br />

As should be clear from the derivations above, the consistency of 4SID<br />

methods are related to the consistency of some <strong>in</strong>strumental variable<br />

schemes (see [SS81, SS83] <strong>and</strong> the references there<strong>in</strong>). When us<strong>in</strong>g 4SID<br />

methods, it is a good idea to estimate a set of models for different choices<br />

of α <strong>and</strong> β. These models should then be validated separately to see<br />

which gives the best result. This reduces the performance degradation<br />

due to a bad choice of α <strong>and</strong> β. In the counterexample presented above,<br />

such a procedure would probably have led to another choice of the dimension<br />

parameters (given that the system order is known). While this may<br />

lead to consistent estimates, the accuracy may still be poor (as <strong>in</strong>dicated<br />

<strong>in</strong> the next section).<br />

6.7 Numerical Examples<br />

In this section, the previously obta<strong>in</strong>ed results are illustrated by means<br />

of numerical examples. First, the counterexample given <strong>in</strong> the previous<br />

section is simulated. Thus, identification data are generated accord<strong>in</strong>g<br />

to (6.40) <strong>and</strong> (6.42). The extended observability matrix is estimated by<br />

(6.11) us<strong>in</strong>g the weight<strong>in</strong>g matrices for the CVA method (see Table 6.1).<br />

Similar results are obta<strong>in</strong>ed with other choices of the weight<strong>in</strong>g matrices.<br />

The system matrix is then estimated as<br />

 = ˆ Γ †<br />

1 ˆ Γ2,<br />

where (·) † denotes the Moore-Penrose pseudo <strong>in</strong>verse, ˆ Γ1 denotes the<br />

matrix made of the first α − 1 rows <strong>and</strong> ˆ Γ2 conta<strong>in</strong>s the last α − 1 rows<br />

of ˆ Γα. To measure the estimation accuracy we consider the estimated<br />

system poles (i.e., the eigenvalues of Â). Recall that the system (6.42)<br />

has two poles at γ = 0.9184. Table 6.3 shows the absolute value of<br />

the bias <strong>and</strong> the st<strong>and</strong>ard deviation of the pole estimates. The sample<br />

statistics are based on 1000 <strong>in</strong>dependent trials. Each trial consists of 1000<br />

<strong>in</strong>put-output data samples. The results for two different choices of α <strong>and</strong><br />

β are given <strong>in</strong> the table. The first corresponds to the choice made <strong>in</strong> the<br />

counterexample. It is clearly seen that the estimates are not useful. For


174 6 <strong>On</strong> Consistency of <strong>Subspace</strong> <strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong><br />

α = 3, β = 2 α = 6, β = 5<br />

pole 1 5.9572 (69.8826) 0.4704 (0.4806)<br />

pole 2 2.6049 (30.9250) 0.1770 (0.3526)<br />

Table 6.3: The figures shown <strong>in</strong> the table are the absolute values of the<br />

bias <strong>and</strong>, with<strong>in</strong> brackets, the st<strong>and</strong>ard deviations.<br />

α = 3, β = 2 α = 6, β = 5<br />

pole 1 0.1161 (0.2405) 0.0316 (0.0452)<br />

pole 2 0.2266 (0.5043) 0.0282 (0.0290)<br />

Table 6.4: Same as Table 6.3 but for the system with K =<br />

[−0.2100, −0.5590] T .<br />

the second choice (α = 6, β = 5), it can be shown that the algorithm<br />

is consistent. This is <strong>in</strong>dicated by the much more reasonable results <strong>in</strong><br />

Table 6.3. The accuracy, however, is still very poor s<strong>in</strong>ce the matrix <strong>in</strong><br />

(6.23) is nearly rank deficient.<br />

Next, we study the <strong>in</strong>fluence of the stochastic sub-system on the estimation<br />

accuracy. Recall that <strong>in</strong> Section 6.6 the stochastics of the system<br />

were designed to cause a cancellation <strong>in</strong> (6.25). The zeros of the stochastic<br />

sub-system are given by the eigenvalues of A − KC which become,<br />

approximately: 0.9791 ±i0.1045. In the next simulation, the locations of<br />

these zeros are changed such that they have the same distance to the unit<br />

circle, but the angle relative to the real axis is <strong>in</strong>creased five times. This<br />

is accomplished by chang<strong>in</strong>g K <strong>in</strong> (6.42) to [−0.2100, −0.5590] T . The<br />

zeros become 0.8489 ± i0.4990. The simulation results are presented <strong>in</strong><br />

Table 6.4. It can be seen that the estimation accuracy is more reasonable<br />

<strong>in</strong> this case.<br />

F<strong>in</strong>ally, we change the <strong>in</strong>put to be realizations of a zero-mean white<br />

noise process. The power of the white noise <strong>in</strong>put is chosen to be equal<br />

to the power of the colored process (6.40). Except for the <strong>in</strong>put, the<br />

parameters are the same as <strong>in</strong> the counterexample. Recall Theorem 6.4,<br />

which shows that the estimates are consistent <strong>in</strong> this case. Table 6.5<br />

presents the result of the simulations. A comparison with the results of<br />

Table 6.3 clearly shows a great difference.


6.8 Conclusions 175<br />

α = 3, β = 2 α = 6, β = 5<br />

pole 1 0.0464 (0.1146) 0.0184 (0.0271)<br />

pole 2 0.0572 (0.0523) 0.0182 (0.0171)<br />

Table 6.5: Same as Table 6.3 but with a white noise <strong>in</strong>put signal.<br />

6.8 Conclusions<br />

Persistence of excitation conditions on the <strong>in</strong>put signal have been given.<br />

These conditions ensure consistency of the subspace estimate used <strong>in</strong><br />

subspace system identification methods. For these conditions to hold, we<br />

had to assume that the system does not conta<strong>in</strong> any process noise. With<br />

process noise, a persistence of excitation condition alone is not sufficient.<br />

In fact, an example was given of a system for which subspace methods are<br />

not consistent. The system is reachable from the <strong>in</strong>put <strong>and</strong> the <strong>in</strong>put is<br />

persistently excit<strong>in</strong>g to any order. The subspace methods fail, however,<br />

to provide a consistent model of the system transfer function. It was also<br />

shown that a white noise <strong>in</strong>put (or an ARMA <strong>in</strong>put signal of sufficiently<br />

low order) guarantees consistency under weak conditions. This shows<br />

that the performance of subspace methods may be very sensitive to the<br />

<strong>in</strong>put excitation.


Appendix A<br />

Proofs <strong>and</strong> Derivations<br />

A.1 Proofs of (2.34) <strong>and</strong> (2.35)<br />

By R8,<br />

where (see (2.8) <strong>and</strong> (2.13))<br />

B ∗ ˆ R = 1<br />

2N B∗<br />

(λk − σ 2 )B ∗ êk ≃ B ∗ ˆ Rek, (A.1)<br />

N� � � ∗ c T<br />

n(t)x (t) + Jn (t)x (t)J . (A.2)<br />

t=1


178 A Proofs <strong>and</strong> Derivations<br />

Us<strong>in</strong>g (A.2) <strong>and</strong> the st<strong>and</strong>ard formula for the fourth-order moment of<br />

circular Gaussian r<strong>and</strong>om variables, we obta<strong>in</strong><br />

�<br />

E [B ∗ Rei][B ˆ ∗ Rej] ˆ ∗�<br />

= 1<br />

4N 2B∗<br />

N�<br />

t=1 s=1<br />

N�<br />

E � n(t)x ∗ (t)eie ∗ jx(s)n ∗ (s)<br />

+n(t)x ∗ (t)eie ∗ jJx c (s)n T (s)J + Jn c (t)x T (t)Jeie ∗ jx(s)n ∗ (s)<br />

+Jn c (t)x T (t)Jeie ∗ jJx c (s)n T (s)J � B<br />

= 1<br />

4N 2B∗ � N 2 σ 4 eie ∗ j + Nσ 2 (e ∗ jR0ei)I + σ 4 N 2 eie ∗ j + σ 4 NJe c je T i J<br />

+σ 4 N 2 eie ∗ j + σ 4 NJe c je T i J + N 2 σ 4 eie ∗ j + σ 2 N(e ∗ jJR c 0Jei)I � B<br />

= σ2<br />

2N (e∗ jRei)B ∗ B = σ2<br />

2N λiδi,j(B ∗ B),<br />

(A.3)<br />

where to derive the fourth equality use was made of the fact that B∗ei = 0<br />

(see (2.13) <strong>and</strong> (2.22)) <strong>and</strong> that B∗Jec i = 0 (see R3). The expression for<br />

E[zz∗ ] <strong>in</strong> (2.34) follows from (A.1) <strong>and</strong> (A.3).<br />

Next, consider (2.35). A straightforward calculation gives<br />

�<br />

E [B ∗ Rei][B ˆ ∗ Rej] ˆ T�<br />

= 1<br />

4N 2B∗<br />

N�<br />

t=1 s=1<br />

N�<br />

E � n(t)x ∗ (t)eie T j x c (s)n T (s)<br />

+n(t)x ∗ (t)eie T j Jx(s)n ∗ (s)J + Jn c (t)x T (t)Jeie T j x c (s)n T (s)<br />

+Jn c (t)x T (t)Jeie T j Jx(s)n ∗ (s)J � B c<br />

= 1<br />

4N2B∗ � N 2 σ 4 eie T j + σ 4 Neje T i + σ 4 N 2 eie T j + σ 2 NJ(e T j JR0ei)<br />

+σ 4 N 2 eie T j + σ 2 NJ(e T j R c 0Jei) + σ 4 N 2 eie T j + σ 4 Neje T� c<br />

i B<br />

= σ2<br />

2N (B∗ JB c )(e T j JRei)<br />

= σ2<br />

2N (B∗ B ˜ J)(e ∗ jRei)e −iψj<br />

= σ2<br />

2N (B∗ B ˜ J)λiδi,jΓ ∗ ii,<br />

(A.4)


A.2 Asymptotic Covariance of the Residual 179<br />

where the last but one equality follows from R1 <strong>and</strong> R3. The expression<br />

<strong>in</strong> (2.35) for E[zz T ] follows from (A.1) <strong>and</strong> (A.4).<br />

A.2 Asymptotic Covariance of the Residual<br />

In order to compute the asymptotic covariance <strong>in</strong> (3.21), the first order<br />

perturbations <strong>in</strong> the residuals are related to the error <strong>in</strong> the sample covariance<br />

matrix. S<strong>in</strong>ce ˜ R = ˆ R −R = Op(1/ √ N), it is straightforward to<br />

obta<strong>in</strong> the follow<strong>in</strong>g first order relation from (3.4):<br />

B ∗ Ês = B ∗ REs<br />

˜ � Λ −1<br />

+ op(1/ √ N), (A.5)<br />

where B = B(θ0) (see Section 2.3.4 or [CTO89]). Next notice that the<br />

observations are given by the model<br />

x(t) = A(θ0,ρ)s(t) + n(t)<br />

= A0s(t) + n(t) + Ãs(t),<br />

(A.6)<br />

where à = A(θ0,ρ) − A0 denotes the error <strong>in</strong> the steer<strong>in</strong>g matrix due<br />

to ρ. It is important to observe that the first two terms <strong>in</strong> (A.6) represent<br />

the observation from the nom<strong>in</strong>al array, whereas the third term is<br />

due to model errors. The sample covariance matrix can be partitioned<br />

accord<strong>in</strong>gly as<br />

where<br />

ˆRFS = 1<br />

N<br />

ˆRME = 1<br />

N<br />

ˆR = 1<br />

N<br />

N�<br />

x(t)x ∗ (t) = ˆ RFS + ˆ RME, (A.7)<br />

t=1<br />

N� � ��<br />

A0s(t) + n(t) A0s(t) + n(t)<br />

t=1<br />

N�<br />

t=1<br />

�� �<br />

A0s(t) + n(t) s ∗ (t) Ã∗<br />

+ Ãs(t)<br />

�<br />

A0s(t) + n(t)<br />

� ∗<br />

� ∗<br />

, (A.8)<br />

+ Ãs(t)s∗ (t) Ã∗� , (A.9)<br />

<strong>and</strong> the subscripts FS <strong>and</strong> ME st<strong>and</strong> for f<strong>in</strong>ite sample <strong>and</strong> model error,<br />

respectively. The <strong>in</strong>-probability orders of the quantities <strong>in</strong> (A.8)-(A.9)


180 A Proofs <strong>and</strong> Derivations<br />

are<br />

1<br />

N<br />

à = Op(1/ √ N),<br />

N�<br />

s(t)s ∗ (t) − P = Op(1/ √ N),<br />

t=1<br />

1<br />

N<br />

N�<br />

s(t)n ∗ (t) = Op(1/ √ N).<br />

t=1<br />

The first follows from the assumptions on ρ, the second is a st<strong>and</strong>ard<br />

result from ergodicity theory, <strong>and</strong> the last follows from the central limit<br />

theorem (it is a sum of zero-mean <strong>in</strong>dependent r<strong>and</strong>om variables). A first<br />

order approximation of B ∗ ˆ R is then<br />

B ∗ ˆ R = B ∗ ˆ RFS + B ∗ ÃPA ∗ 0 + op(1/ √ N). (A.10)<br />

It is <strong>in</strong>terest<strong>in</strong>g to note that the residuals are, to first order, due to<br />

two <strong>in</strong>dependent terms. The first is the f<strong>in</strong>ite sample errors from the<br />

nom<strong>in</strong>al array <strong>and</strong> the second is the model error contribution. S<strong>in</strong>ce<br />

the terms are <strong>in</strong>dependent, they can be considered one at a time. In<br />

view of (A.7), we denote the error <strong>in</strong> the pr<strong>in</strong>cipal eigenvectors due to<br />

, respectively; that is,<br />

f<strong>in</strong>ite samples <strong>and</strong> model errors by ˜ EsFS <strong>and</strong> ˜ EsME<br />

Ês = Es + ˜ EsFS + ˜ EsME .<br />

Let us start by deriv<strong>in</strong>g the part of the residual covariance matrix due<br />

to f<strong>in</strong>ite samples. Def<strong>in</strong>e ˜ RFS = ˆ RFS−R. By straightforward calculations<br />

it is possible to show that<br />

E{vec( ˜ RFS)vec ∗ ( ˜ RFS)} = 1 � � T<br />

R ⊗ R , (A.11)<br />

N<br />

E{vec( ˜ RFS)vec T ( ˜ RFS)} = 1 � �BT T<br />

vec(R)vec (R) , (A.12)<br />

N<br />

s<strong>in</strong>ce {x(t)} N t=1 is a sequence of <strong>in</strong>dependent circular Gaussian r<strong>and</strong>om<br />

vectors. Here, the notation (·) BT means that every “block” <strong>in</strong> (·) is transposed.<br />

(It should be clear from the context what “block” is.) Recall<strong>in</strong>g


A.2 Asymptotic Covariance of the Residual 181<br />

the first order equality (A.5) <strong>and</strong> mak<strong>in</strong>g use of (A.11), we get<br />

CεFS<br />

� lim<br />

N→∞ N E{vec(B∗ ˜ EsFS )vec∗ (B ∗ ˜ EsFS )}<br />

= lim<br />

N→∞ N E{vec(B∗RFSEs ˜ � Λ −1<br />

)vec ∗ (B ∗ RFSEs<br />

˜ � Λ −1<br />

)}<br />

�<br />

(Es � Λ −1<br />

) T ⊗ B ∗�<br />

= lim<br />

N→∞ N<br />

× E{vec( ˜ RFS)vec ∗ ( ˜ RFS)}<br />

�<br />

= (Es � Λ −1<br />

) T ⊗ B ∗� � � T<br />

R ⊗ R �<br />

(Es � Λ −1<br />

) c ⊗ B<br />

� −1<br />

= �Λ E T s R T E c s � Λ −1<br />

⊗ B ∗ �<br />

RB<br />

� −2<br />

= �Λ Λs ⊗ σ 2 B ∗ �<br />

B .<br />

�<br />

(Es � Λ −1<br />

) c �<br />

⊗ B<br />

�<br />

(A.13)<br />

In the third equality we used the formula <strong>in</strong> (3.8). Similarly, from (A.5)<br />

<strong>and</strong> (A.12), we have<br />

¯CεFS � lim<br />

N→∞ N E{vec(B∗ ˜ EsFS )vecT (B ∗ ˜ EsFS )}<br />

= lim<br />

N→∞ N<br />

�<br />

(Es � Λ −1<br />

) T ⊗ B ∗�<br />

× E{vec( ˜ RFS)vec T ( ˜ �<br />

RFS)} (Es � Λ −1<br />

) ⊗ B c�<br />

�<br />

= (Es � Λ −1<br />

) T ⊗ B ∗� � �BT T<br />

vec(R)vec (R) �<br />

(Es � Λ −1<br />

) ⊗ B c�<br />

�<br />

= (Es � Λ −1<br />

) T ⊗ B ∗� � 2<br />

vec(σ Im)vec T (σ 2 �BT<br />

�<br />

Im) (Es � Λ −1<br />

) ⊗ B c�<br />

�<br />

4<br />

= σ (Es � Λ −1<br />

) T ⊗ B ∗��<br />

B c ⊗ (Es � Λ −1 � �vec(Im)vec )<br />

T (Id ′)�BT<br />

= 0.<br />

The last two equalities follow from property 3 <strong>and</strong> property 2 of the<br />

block-transpose operation derived <strong>in</strong> Appendix A.4.<br />

Remark A.1. The fact that ¯ CεFS = 0 may be more easily realized us<strong>in</strong>g<br />

the asymptotic distribution of the eigenvectors (e.g. [KB86]). It is then<br />

easy to verify that E{B∗êkêT l Bc } = 0, where êk is column k of Ês.<br />

Next, we proceed <strong>and</strong> study the model error contribution to the resid-


182 A Proofs <strong>and</strong> Derivations<br />

ual covariance matrix. Recall (A.10), (A.5), <strong>and</strong> that<br />

We have<br />

where<br />

CεME<br />

vec( Ã) = Dρ˜ρ + op(1/ √ N).<br />

� lim<br />

N→∞ N E{vec(B∗ ˜ EsME )vec∗ (B ∗ ˜ EsME )}<br />

= lim<br />

N→∞ N E{vec(B∗ÃPA ∗ 0Es � Λ −1<br />

)vec ∗ (B ∗ ÃPA ∗ 0Es � Λ −1<br />

)}<br />

= lim<br />

N→∞ N � T T ⊗ B ∗� E{vec( Ã)vec∗ ( Ã)}(Tc ⊗ B)<br />

= � T T ⊗ B ∗� Dρ ¯ ΩD ∗ ρ (T c ⊗ B) ,<br />

T = A †<br />

0 Es<br />

<strong>and</strong> the follow<strong>in</strong>g relation is used (cf. (3.1)):<br />

Similarly,<br />

¯CεME<br />

P = A †<br />

0Es � ΛE ∗ sA † ∗<br />

0 .<br />

� lim<br />

N→∞ N E{vec(B∗ ˜ EsME )vecT (B ∗ ˜ EsME )}<br />

(A.14)<br />

= lim<br />

N→∞ N E{vec(B∗ÃPA ∗ 0Es � Λ −1<br />

)vec T (B ∗ ÃPA ∗ 0Es � Λ −1<br />

)}<br />

= lim<br />

N→∞ N � T T ⊗ B ∗� E{vec( Ã)vecT ( Ã)}(T ⊗ Bc )<br />

= � T T ⊗ B ∗� Dρ ¯ ΩD T ρ (T ⊗ B c ) .<br />

(A.15)<br />

Hence, the asymptotic covariance matrix of the residuals <strong>in</strong> (3.21) is given<br />

by<br />

� �<br />

Cε<br />

¯Cε<br />

C¯ε = ,<br />

with<br />

where CεFS , CεME <strong>and</strong> ¯ CεME<br />

respectively.<br />

¯C c ε C c ε<br />

Cε = CεFS + CεME,<br />

¯Cε = ¯ CεME ,<br />

are given by (A.13), (A.14) <strong>and</strong> (A.15),


A.3 Proof of Theorem 3.2 183<br />

A.3 Proof of Theorem 3.2<br />

Recall the expression (3.31)<br />

V ′<br />

i ≃ 2k ∗ iW¯ε.<br />

Differentiat<strong>in</strong>g with respect to θj <strong>and</strong> lett<strong>in</strong>g N tend to <strong>in</strong>f<strong>in</strong>ity leads to<br />

Hij = 2k ∗ iWkj.<br />

Next, study the (i,j)th element of Q <strong>in</strong> (3.32):<br />

Def<strong>in</strong><strong>in</strong>g the matrix<br />

′<br />

Qij = lim N E{V i V<br />

N→∞ ′<br />

j }<br />

= lim<br />

N→∞ N E{4k∗iW¯ε¯ε ∗ Wkj}<br />

= 4k ∗ iWC¯εWkj.<br />

K = � �<br />

k1 ... kd ,<br />

the matrices H <strong>and</strong> Q can be written <strong>in</strong> compact form as<br />

H = 2K ∗ WK, (A.16)<br />

Q = 4K ∗ WC¯εWK. (A.17)<br />

These expressions are valid for any W satisfy<strong>in</strong>g (3.29). It rema<strong>in</strong>s to<br />

prove that H −1 QH −1 equals NCRBθ for the choice W = WGWSF given<br />

<strong>in</strong> (3.25). Insert<strong>in</strong>g W = C −1<br />

¯ε <strong>in</strong> (A.16) <strong>and</strong> (A.17) gives<br />

H −1 QH −1 = � K ∗ C −1<br />

¯ε K � −1 .<br />

Next, we rewrite ki as<br />

�<br />

∗ vec(BiEs) ki =<br />

vec(BT i Ec � �<br />

∗ vec(BiAT) =<br />

s) vec(BT i AcTc �<br />

)<br />

�<br />

∗ vec(B AiT)<br />

= −<br />

vec(BTAc iTc � ��<br />

T ∗ T ⊗ B<br />

= −<br />

)<br />

� �<br />

vec(Ai)<br />

∗ T T ⊗ B � vec(Ac i )<br />

�<br />

,<br />

<strong>and</strong> hence<br />

K = −<br />

��<br />

T ∗ T ⊗ B � �<br />

Dθ ∗ T T ⊗ B � Dc �<br />

.<br />

θ<br />

(A.18)


184 A Proofs <strong>and</strong> Derivations<br />

In (A.18) we made use of the fact that B ∗ i A = −B∗ Ai, which follows<br />

from B ∗ A = 0. We also rewrite the <strong>in</strong>verse of C¯ε us<strong>in</strong>g the “matrix<br />

<strong>in</strong>version lemma”:<br />

C −1<br />

¯ε = ( ¯ L + ¯ G ¯ G ∗ ) −1 = ¯ L −1 − ¯ L −1 ¯ G � ¯G ∗¯ L −1 ¯ G + I � −1 ¯G ∗¯ L −1 . (A.19)<br />

In view of this, the product K ∗ C −1<br />

¯ε K conta<strong>in</strong>s two terms. In the sequel<br />

we consider these two separately <strong>and</strong> show that they correspond to the<br />

/N <strong>in</strong> (3.15).<br />

two terms <strong>in</strong> the expression for CRB −1<br />

θ<br />

Let us first study<br />

K ∗¯ L −1 K =<br />

� ��<br />

T ∗ T ⊗ B � �<br />

Dθ ��<br />

T ∗ T ⊗ B � �<br />

Dθ ∗ T T ⊗ B � Dc �∗ �<br />

−1 L 0<br />

θ 0 L−c �<br />

∗ T T ⊗ B � Dc θ<br />

�<br />

= 2Re D ∗ �<br />

θ T c σ −2 2<br />

Λ� Λ −1<br />

s T T ⊗ B(B ∗ B) −1 B ∗�<br />

Dθ = 2σ −2 Re{D ∗ θMD θ} = 2σ −2 C.<br />

�<br />

(A.20)<br />

Here, M is def<strong>in</strong>ed <strong>in</strong> (3.16) <strong>and</strong> we used the fact that B(B ∗ B) −1 B ∗ =<br />

Π ⊥ A. Thus, we have shown that (A.20) equals the first term <strong>in</strong> CRB −1<br />

θ /N.<br />

To simplify the second term, we first need some prelim<strong>in</strong>ary calculations.<br />

Consider the <strong>in</strong>verse appear<strong>in</strong>g <strong>in</strong> (A.19),<br />

� ¯G ∗¯ L −1 ¯ G + I � −1<br />

=<br />

�<br />

�G<br />

∗ T G ��L−1 0<br />

0 L−c � �<br />

G<br />

Gc �<br />

�−1 + I<br />

= � 2Re{G ∗ L −1 G} + I �−1 � �<br />

= 2Re ¯Ω 1/2 ∗<br />

Dρ (T c ⊗ B)<br />

�<br />

× σ −2 2<br />

Λ� Λ −1<br />

s ⊗ (B ∗ B) −1� � T ∗<br />

T ⊗ B � Dρ ¯ Ω 1/2�<br />

�−1 + I<br />

= σ2<br />

2 ¯ Ω −1/2<br />

�<br />

Re{D ∗ ρMDρ} + σ2<br />

2 ¯ Ω −1<br />

�−1<br />

¯Ω −1/2<br />

= σ2<br />

2 ¯ Ω −1/2 Γ −1 ¯ Ω −1/2 ,<br />

(A.21)


A.4 Some Properties of the Block Transpose 185<br />

where Γ is def<strong>in</strong>ed <strong>in</strong> (3.18). We also need to compute<br />

K ∗¯ L −1 ¯ G = −<br />

��<br />

T ∗ T ⊗ B � �<br />

Dθ ∗ T T ⊗ B � Dc �∗ �<br />

−1 L 0<br />

θ<br />

0 L−c ��<br />

G<br />

Gc �<br />

�<br />

= −2Re D ∗ �<br />

θ T c σ −2 2<br />

Λ� Λ −1<br />

s T T ⊗ B(B ∗ B) −1 B ∗�<br />

Dρ ¯ Ω 1/2�<br />

= −2σ −2 F T θ ¯ Ω 1/2 ,<br />

(A.22)<br />

where Fθ is def<strong>in</strong>ed <strong>in</strong> (3.17). Comb<strong>in</strong><strong>in</strong>g (A.22), (A.21) <strong>and</strong> (A.19), we<br />

see that the second term <strong>in</strong> the product K ∗ C −1<br />

¯ε K contributes with<br />

−2σ −2 F T θ ¯ Ω 1/2 σ 2<br />

2 ¯ Ω −1/2 Γ −1 ¯ Ω −1/2 ¯Ω 1/2 Fθ2σ −2 = −2σ −2 F T θΓ −1 Fθ<br />

which is exactly equal to the second term <strong>in</strong> CRB −1<br />

θ /N (see (3.15)).<br />

A.4 Some Properties of the Block Transpose<br />

This section conta<strong>in</strong>s three useful properties related to the block transpose<br />

operation. Let Ai×j denote an arbitrary i ×j matrix <strong>and</strong> def<strong>in</strong>e the<br />

operator<br />

Lmn = [vec(Im)vec T (In)] BT<br />

which has dimension mn × mn. Observe that Lmn �= Lnm unless m = n.<br />

Property A.1.<br />

Lmn vec(Am×n) = vec(A T m×n)<br />

Proof. This is easily proven by direct verification.<br />

Property A.2.<br />

Lmn (An×d ⊗ Bm×p) = (Bm×p ⊗ An×d)Lpd


186 A Proofs <strong>and</strong> Derivations<br />

Proof. Let e be an arbitrary dp-vector <strong>and</strong> let E be a p×d matrix def<strong>in</strong>ed<br />

via vec(E) = e. Then<br />

<strong>and</strong> the result follows.<br />

Property A.3.<br />

Lmn (An×d ⊗ Bm×p)e = Lmn vec(BEA T )<br />

� vec(Am×nBn×p)vec T (Cq×rDr×s) �BT<br />

= vec(AE T B T )<br />

= (B ⊗ A) vec(E T )<br />

= (B ⊗ A)Lpd vec(E)<br />

= (Ip ⊗ C) � vec(B)vec T (D) �BT � Is ⊗ A T�<br />

Proof. Let A•i denote column i of A. Consider block ij (i = 1,... ,p;<br />

j = 1,... ,s) of the left h<strong>and</strong> side:<br />

� AB•i(CD•j) T� T = C(B•iD T •j) T A T<br />

which is readily seen to equal block ij of the right h<strong>and</strong> side.<br />

A.5 Proof of Theorem 4.1<br />

It is well known that the CRB is given by the <strong>in</strong>verse of the Fisher<br />

<strong>in</strong>formation matrix (FIM). Element (k,l) of the normalized FIM is given<br />

by Bangs’ formula [Ban71]<br />

1<br />

N FIMkl = Tr{R −1 RkR −1 Rl},<br />

where the notation Rk refers to the partial derivative of R with respect<br />

to the k:th parameter <strong>in</strong> η. Us<strong>in</strong>g the formula Tr{ABCD} =<br />

vec ∗ (B ∗ ) � A T ⊗ C � vec(D) we get<br />

1<br />

N FIMkl = vec ∗ (Rk) � R −T ⊗ R −1� vec(Rl) .<br />

The covariance matrix is given by<br />

R(η) = A(θ)PA ∗ (θ) + σ 2 I =<br />

d�<br />

a(θi)pia ∗ (θi) + σ 2 I<br />

i=1


A.5 Proof of Theorem 4.1 187<br />

<strong>and</strong>, hence, the partial derivatives are<br />

∂R<br />

Rθi �<br />

∂θi<br />

∂R<br />

Rpi � = aia ∂pi<br />

∗ i ,<br />

Rσ2 � ∂R<br />

= I ,<br />

∂σ2 where the notation a i = a(θi) <strong>and</strong><br />

= (d ia ∗ i + a id ∗ i )pi ,<br />

d i = ∂a i<br />

∂θi<br />

has been <strong>in</strong>troduced. Now, def<strong>in</strong>e the matrices<br />

<strong>and</strong> also<br />

D θ = � vec(Rθ1 ) · · · vec(Rθd )� = DP , (A.23)<br />

D p = � vec(Rp1 ) · · · vec(Rpd )� = (A c ◦ A) = B ,<br />

Dσ2 = vec(Rσ2) = vec(I) � i ,<br />

Dn = � �<br />

Dp Dσ2 , (A.24)<br />

WR = � R −T ⊗ R −1� ,<br />

X = D ∗ θWRD θ ,<br />

Y = D ∗ θWRD n ,<br />

Z = D ∗ nWRD n ,<br />

where the matrix D <strong>in</strong> (A.23) is def<strong>in</strong>ed <strong>in</strong> (4.15). With the notation<br />

<strong>in</strong>troduced above, note that the Fisher <strong>in</strong>formation matrix reads<br />

�<br />

X Y<br />

FIM =<br />

Y∗ �<br />

.<br />

Z<br />

The bound on the DOAs is given by the upper left d × d corner of the<br />

CRB matrix; that is, the upper left corner of the <strong>in</strong>verse FIM. Us<strong>in</strong>g the<br />

formulas for <strong>in</strong>vert<strong>in</strong>g partitioned matrices, the follow<strong>in</strong>g expression is<br />

obta<strong>in</strong>ed<br />

CRB −1<br />

θ = X − YZ−1 Y ∗<br />

= D ∗ θWRD θ − D ∗ θWRD n(D ∗ nWRD n) −1 D ∗ nWRD θ<br />

= D ∗ θW 1/2<br />

R Π⊥<br />

W 1/2<br />

R D W<br />

n<br />

1/2<br />

R Dθ. (A.25)


188 A Proofs <strong>and</strong> Derivations<br />

In what follows, the expression for CRB −1<br />

θ is rewritten <strong>in</strong> a form suitable<br />

for establish<strong>in</strong>g the equality of the asymptotic estimation error covariance<br />

of the proposed estimator <strong>and</strong> CRBθ.<br />

Recall from (A.24) that W 1/2<br />

tion decomposition theorem to write<br />

where<br />

R Dn = W 1/2<br />

R<br />

Π ⊥<br />

W 1/2<br />

R D = Π<br />

n<br />

⊥ BW − ΠΠ⊥ B<br />

iW<br />

W<br />

BW = W 1/2<br />

R B ,<br />

iW = W 1/2<br />

R i .<br />

� B i � <strong>and</strong> use the projec-<br />

, (A.26)<br />

Let the columns of G form a basis for the null-space of B ∗ (this matter<br />

is further dwelled upon <strong>in</strong> Section 4.6). Observe that<br />

Π ⊥ BW = ΠW −1/2<br />

R G. (A.27)<br />

Us<strong>in</strong>g (A.26) <strong>and</strong> (A.27), the expression (A.25) can be rewritten to yield<br />

CRB −1<br />

θ = D∗ θW 1/2<br />

R<br />

�<br />

Π W −1/2<br />

R G − ΠΠ W −1/2<br />

R GiW<br />

�<br />

W 1/2<br />

R D θ<br />

= D ∗ θG(G ∗ W −1<br />

R G)−1 G ∗ D θ<br />

− D∗ θ G(G∗ W −1<br />

R G)−1 G ∗ ii ∗ G(G ∗ W −1<br />

R G)−1 G ∗ D θ<br />

i ∗ G(G ∗ W −1<br />

R G)−1 G ∗ i<br />

Next it is proven that<br />

<strong>and</strong> that<br />

i ∗ G(G ∗ W −1<br />

R G)−1 G ∗ i<br />

= m − d<br />

Start<strong>in</strong>g with (A.30):<br />

. (A.28)<br />

D ∗ θG(G ∗ W −1<br />

R G)−1 G ∗ vec(Π ⊥ A) = 0 (A.29)<br />

σ 4 + vec∗ (Π A)G(G ∗ W −1<br />

R G)−1 G ∗ vec(Π A) . (A.30)<br />

i ∗ G(G ∗ W −1<br />

R G)−1G ∗ i = i ∗ W 1/2<br />

R ΠW −1/2<br />

R GW1/2 R i<br />

= i ∗ W 1/2<br />

R Π⊥<br />

W 1/2<br />

R BW1/2 R i<br />

= i ∗ WRi − i ∗ WRB(B ∗ WRB) −1 B ∗ WRi .<br />

(A.31)


A.5 Proof of Theorem 4.1 189<br />

Observe that i = vec(Π A) + vec(Π ⊥ A) <strong>and</strong><br />

B ∗ WR vec(Π ⊥ A) = B ∗ vec(R −1 Π ⊥ AR −1 )<br />

Us<strong>in</strong>g (A.32) <strong>in</strong> (A.31) leads to<br />

= L ∗ � A T ⊗ A ∗� vec(Π ⊥ A)σ −4 = 0 . (A.32)<br />

i ∗ WRi − vec ∗ (Π A)WRB(B ∗ WRB) −1 B ∗ WR vec(Π A)<br />

= i ∗ WRi − vec ∗ (ΠA)W 1/2<br />

R Π⊥<br />

W −1/2<br />

R GW1/2 R vec(ΠA) = i ∗ WRi − vec ∗ (ΠA)WR vec(ΠA) + vec ∗ (Π A)G(G ∗ W −1<br />

R G)−1 G ∗ vec(Π A)<br />

= Tr{R −2 − R −1 Π AR −1 Π A}<br />

+ vec ∗ (ΠA)G(G ∗ W −1<br />

R G)−1G ∗ vec(ΠA) = m − d<br />

σ4 + vec∗ (ΠA)G(G ∗ W −1<br />

R G)−1G ∗ vec(ΠA) which is (A.30). Consider next (A.29):<br />

D ∗ θG(G ∗ W −1<br />

R G)−1 G ∗ vec(Π ⊥ A)<br />

= D ∗ θW 1/2<br />

R Π⊥ BW W1/2<br />

R vec(Π⊥ A)<br />

= D ∗ θ WR vec(Π ⊥ A)<br />

� �� �<br />

σ −4 vec(Π ⊥ A )<br />

−D ∗ θWRB(B ∗ WRB) −1 B ∗ WR vec(Π ⊥ A)<br />

� �� �<br />

=0, see (A.32)<br />

= σ −4 PL ∗ �� ¯ D T ⊗ A ∗ � + � A T ⊗ ¯ D ∗�� vec(Π ⊥ A) = 0,<br />

where ¯ D is def<strong>in</strong>ed <strong>in</strong> (4.16).<br />

Mak<strong>in</strong>g use of (A.29) <strong>and</strong> (A.30) <strong>in</strong> (A.28) yields<br />

CRB −1<br />

θ<br />

= D ∗ θG<br />

�<br />

(G ∗ W −1<br />

R G)−1<br />

R G)−1<br />

− (G∗W −1<br />

R G)−1G∗ vec(ΠA)vec∗ (ΠA)G(G∗W −1<br />

m−d<br />

σ4 + vec∗ (ΠA)G(G∗W −1<br />

R G)−1G∗ vec(ΠA) = D ∗ �<br />

θG G ∗ W −1 σ4<br />

R G +<br />

m − d G∗ vec(ΠA)vec ∗ (ΠA)G = D ∗ �<br />

θG G ∗<br />

�<br />

W −1 σ4<br />

R +<br />

m − d vec(ΠA)vec ∗ �<br />

(ΠA) G<br />

�<br />

G ∗ D θ<br />

�−1<br />

G ∗ Dθ �−1<br />

G ∗ D θ,


190 A Proofs <strong>and</strong> Derivations<br />

where the matrix <strong>in</strong>version lemma has been used to obta<strong>in</strong> the last but<br />

one equality. This concludes the proof.<br />

A.6 Rank of B(θ)<br />

In this appendix it is shown that rank{B(θ)} = d as long as m > d/2.<br />

Let θ = [θ1,... ,θd] T , <strong>and</strong> consider the follow<strong>in</strong>g general form of the<br />

steer<strong>in</strong>g matrix<br />

⎡<br />

⎤<br />

1 ... 1<br />

⎢ f1(θ1) f1(θd) ⎥<br />

⎢<br />

A(θ) = ⎢<br />

⎣<br />

.<br />

.<br />

.<br />

fm−1(θ1) fm−1(θd)<br />

where, without loss of generality, the response from the first sensor is normalized<br />

to one. The array is assumed to be unambiguous which implies<br />

that A has full rank for d = m <strong>and</strong> θi �= θj,i �= j. It follows from (4.6)<br />

that B(θ) can be written as<br />

⎡<br />

B(θ) = (A c ⎢<br />

◦ A) = ⎢<br />

⎣<br />

A<br />

AΦ1<br />

.<br />

AΦm−1<br />

⎤<br />

⎥<br />

⎦<br />

⎥ , (A.33)<br />

⎦<br />

where Φi = diag[f c i (θ1) ... f c i (θd)]. The notation diag[v] refers to a<br />

matrix with the entries <strong>in</strong> the vector v as diagonal elements. From (A.33)<br />

it is obvious that rank{B} = d if d ≤ m s<strong>in</strong>ce A is full rank. Let us next<br />

study (non-zero) solutions x ∈ C d×1 to B(θ)x = 0. Any such vector<br />

must satisfy<br />

Ax = 0<br />

AΦ1x = 0<br />

AΦm−1x = 0 .<br />

. (A.34)<br />

A necessary condition for the existence of a solution x �= 0 is d > m.<br />

It is also important to notice that at least m + 1 elements <strong>in</strong> x have to<br />

be different from zeros s<strong>in</strong>ce the columns of A are l<strong>in</strong>early <strong>in</strong>dependent.


A.7 Derivation of the Asymptotic Distribution 191<br />

S<strong>in</strong>ce the dimension of the null-space of A is d−m, there are no solutions<br />

to (A.34) if<br />

rank{x Φ1x · · · Φm−1x} > d − m . (A.35)<br />

However, it is straightforward to verify that the matrix <strong>in</strong> (A.35) can<br />

be written as diag{x}A ∗ . The rank of diag{x}A ∗ is m s<strong>in</strong>ce d > m <strong>and</strong><br />

s<strong>in</strong>ce x has at least m+1 elements different from zero. Hence, no solution<br />

x �= 0 exists if m > d − m or equivalently m > d/2 <strong>and</strong> we conclude that<br />

rank{B(θ)} = d if m > d/2.<br />

A.7 Derivation of the Asymptotic Distribution<br />

We will start with the derivation of Q def<strong>in</strong>ed <strong>in</strong> (4.24). From (4.19) <strong>and</strong><br />

the fact that B † f = p, we have<br />

V ′<br />

k = −2Re{p ∗ B ∗ kΠ ⊥ BWΠ ⊥ B ˜ f} = −2Re{u ∗ k ˜ f},<br />

where uk is recognized as the k:th column of U <strong>in</strong> (4.27). From (4.24) it<br />

is evident that element (k,l) of Q is given by<br />

Qkl = lim<br />

N→∞ 4N E{Re{u∗ k ˜ f}Re{u ∗ l ˜ f}}<br />

= lim<br />

N→∞ 2N E{Re{u∗ k ˜ f ˜ f ∗ u l + u ∗ k ˜ f ˜ f T u c l }}<br />

= 2Re{u ∗ kCu l + u ∗ k ¯ Cu c l },<br />

which is element (k,l) of the expression <strong>in</strong> (4.26). Next, consider H as<br />

def<strong>in</strong>ed <strong>in</strong> (4.18). From (4.19) we get, for large N,<br />

V ′′<br />

kl = −2Re{p ∗ B klΠ ⊥ BWΠ ⊥ B ˆ f} − 2Re{p ∗ B ∗ k{Π ⊥ B}lWΠ ⊥ B ˆ f}<br />

− 2Re{p ∗ B ∗ kΠ ⊥ BW{Π ⊥ B}l ˆ f},<br />

<strong>and</strong> us<strong>in</strong>g (4.20) we obta<strong>in</strong><br />

Hkl = 2Re{p ∗ B ∗ kΠ ⊥ BWΠ ⊥ BB l p},<br />

which is element (k,l) of the matrix <strong>in</strong> (4.25).<br />

It rema<strong>in</strong>s to compute the covariance matrices C <strong>and</strong> ¯ C def<strong>in</strong>ed <strong>in</strong><br />

(4.28) <strong>and</strong> (4.29), respectively. S<strong>in</strong>ce {x(t)} is assumed to be zero-mean


192 A Proofs <strong>and</strong> Derivations<br />

Gaussian <strong>and</strong> temporally white, it can be shown that<br />

E{ ˆ Rkl ˆ Rpq} = RklRpq + 1<br />

N RkqRpl. (A.36)<br />

From (A.36), the follow<strong>in</strong>g two expressions are readily checked:<br />

E{vec( ˜ R)vec ∗ ( ˜ R)} = 1 � � T<br />

R ⊗ R , (A.37)<br />

N<br />

E{vec( ˜ R)vec T ( ˜ R)} = 1 � �BT T<br />

vec(R)vec (R) , (A.38)<br />

N<br />

where the superscript ’BT’ denotes block-transpose, which means that<br />

each m × m block is transposed. By mak<strong>in</strong>g use of (4.23), (A.37) <strong>and</strong><br />

(A.38), it is straightforward to derive C <strong>and</strong> ¯ C:<br />

C = lim<br />

N→∞ N E{Mvec( ˜ R)vec ∗ ( ˜ R)M ∗ }<br />

= M � R T ⊗ R � M ∗ , (A.39)<br />

¯C = lim<br />

N→∞ E{Mvec( ˜ R)vec T ( ˜ R)M T }<br />

= M � vec(R)vec T (R) �BT M T . (A.40)<br />

Insert<strong>in</strong>g the def<strong>in</strong>ition of M given <strong>in</strong> (4.23) <strong>in</strong>to (A.39) leads after some<br />

tedious calculations to the follow<strong>in</strong>g expression for C:<br />

�<br />

C = � R T ⊗ R � �<br />

4<br />

− σ Π ⊥T<br />

A ⊗ Π ⊥ A<br />

+ σ4<br />

m − d vec(Π A)vec ∗ (Π A) .<br />

Similarly, we can obta<strong>in</strong> an expression for ¯ C by <strong>in</strong>sert<strong>in</strong>g M <strong>in</strong>to (A.40).<br />

Alternatively, one can use the follow<strong>in</strong>g observation. Let Lm = L T m be<br />

the permutation matrix such that<br />

vec(X T ) = Lm vec(X) (A.41)<br />

for any (m × m) matrix X [Gra81]. It is then easy to see that f can be<br />

related to its conjugate f c as<br />

f c = vec � [EsΛE ∗ s] c� = vec � [EsΛE ∗ s] T� = Lmf. (A.42)<br />

It is obvious that a similar relation holds for any vector def<strong>in</strong>ed as vec(X)<br />

when X is Hermitian. This implies that<br />

¯C = lim<br />

N→∞ N E{˜ f ˜ f T } = lim<br />

N→∞ N E{˜ f ˜ f ∗ Lm} = CLm.<br />

Hence, ¯ C is related to C through a column permutation.


A.8 Proof of Theorem 4.3 193<br />

Remark A.2. Observe that the matrix C (<strong>and</strong> ¯ C) is s<strong>in</strong>gular. The nullspace<br />

of C is spanned by the columns of (E c n ⊗ En). The dimension of<br />

the null-space is (m − d) 2 <strong>and</strong> consequently the dimension of the rangespace<br />

of C is m 2 − (m − d) 2 = 2md − d 2 . Notice that this equals the<br />

number of real parameters <strong>in</strong> the Cholesky-factorization of APA ∗ .<br />

A.8 Proof of Theorem 4.3<br />

Recall<strong>in</strong>g the def<strong>in</strong>ition of B <strong>in</strong> (4.6) <strong>and</strong> mak<strong>in</strong>g use of (A.41) leads to<br />

which <strong>in</strong> turn implies that<br />

LmB k = B c k ,<br />

B c = LmB,<br />

B † Lm = B †c ,<br />

LmΠ ⊥ B = Π ⊥c<br />

B Lm = Π ⊥T<br />

B Lm .<br />

In what follows, we will prove that the two terms <strong>in</strong> the gradient (4.19) are<br />

real valued if the weight<strong>in</strong>g matrix has a special form satisfied by (4.30).<br />

Recall (A.42) <strong>and</strong> consider the first term <strong>in</strong> (4.19),<br />

ˆ f ∗ {Π ⊥ B}kWΠ ⊥ B ˆ f = ˆ f T Lm{Π ⊥ B}kWLm ˆ f c = ˆ f T {Π ⊥ B} c kLmWLmΠ ⊥c<br />

B ˆ f c .<br />

If LmWLm = W c = W T , then the two terms <strong>in</strong> (4.19) are real valued<br />

<strong>and</strong> we have<br />

This implies that<br />

V ′<br />

k = 2 ˆ f ∗ {Π ⊥ B}kWΠ ⊥ B ˆ f.<br />

H = 2PD ∗ Π ⊥ BWΠ ⊥ BDP, (A.43)<br />

Q = 4U ∗ CU, (A.44)<br />

where U is def<strong>in</strong>ed <strong>in</strong> (4.27). This is, as mentioned above, only valid if<br />

LmWLm = W c . We will now prove that this is true for the optimal


194 A Proofs <strong>and</strong> Derivations<br />

weight<strong>in</strong>g Wopt. This can be shown as follows:<br />

LmWoptLm = Lm<br />

It can be shown that<br />

�<br />

=<br />

=<br />

�<br />

Lm<br />

Π ⊥ B ˜ CΠ ⊥ B + BB ∗� −1<br />

Lm<br />

�<br />

Π ⊥ B ˜ CΠ ⊥ B + BB ∗�<br />

Lm<br />

� −1<br />

�<br />

Π ⊥c<br />

B Lm ˜ CLmΠ ⊥c<br />

B Lm + B c B T� −1<br />

. (A.45)<br />

Lm (X ⊗ Y) = (Y ⊗ X)Lm<br />

for any arbitrary (m × m) matrices X <strong>and</strong> Y [Gra81] (see also Section<br />

A.4). Hence, we get<br />

Lm ˜ CLm = ˜ C c = ˜ C T<br />

which, <strong>in</strong> view of (A.45), proves that LmWoptLm = W c opt.<br />

Next, we show that Q = 2H for W = Wopt. In order to prove this,<br />

the follow<strong>in</strong>g calculations are needed. Let G be a m 2 × (m 2 − d) matrix<br />

whose columns span the null-space of B ∗ , then (cf. Lemma 3.1)<br />

�<br />

� �−1 � � ∗<br />

−1 G<br />

G B = G B<br />

B∗ �−1 �<br />

∗ G<br />

B∗ � ��<br />

∗ G<br />

=<br />

B∗ �<br />

�G �<br />

B �−1 �<br />

∗ G<br />

B∗ �<br />

�<br />

∗ −1 (G G) 0<br />

=<br />

0 (B∗B) −1<br />

��<br />

∗ G<br />

B∗ � �<br />

† G<br />

=<br />

B †<br />

�<br />

.<br />

Us<strong>in</strong>g this result <strong>in</strong> (4.30) <strong>and</strong> observ<strong>in</strong>g that Π ⊥ B = Π G gives<br />

�<br />

Wopt = Π ˜<br />

GCΠG + BB ∗� �<br />

−1 �G �<br />

= B � G † ��<br />

CG ˜ †∗ ∗<br />

0 G<br />

0 I B∗ = � G †∗ B †∗� � (G † � �<br />

CG ˜ †∗ −1 †<br />

) 0 G<br />

0 I B †<br />

�<br />

,<br />

<strong>and</strong><br />

Π ⊥ BWoptΠ ⊥ B = Π G<br />

� G †∗ B †∗ � � (G † ˜ CG †∗ ) −1 0<br />

0 I<br />

� � G †<br />

B †<br />

�<br />

Π G<br />

��−1<br />

= G †∗ (G † ˜ CG †∗ ) −1 G † = G(G ∗ ˜ CG) −1 G ∗ . (A.46)


A.8 Proof of Theorem 4.3 195<br />

From Remark A.2 <strong>in</strong> the end of Appendix A.7, we know that the rank of<br />

C is 2md −d 2 . Let the m 2 ×(2md −d 2 ) matrix T be the Cholesky factor<br />

of C <strong>and</strong> def<strong>in</strong>e E = σ 2 (E c n ⊗ En). Recall that the columns of E span the<br />

null-space of C (see Remark A.2). Let R(·) <strong>and</strong> N(·) denote the range<strong>and</strong><br />

the null-space of a matrix, respectively. Study the decompositions<br />

C m2<br />

= R(T)<br />

� �� �<br />

2md−d2 ⊕ N(T ∗ )<br />

� �� �<br />

(m−d) 2<br />

= R(G) ⊕ N(G ∗ ) ,<br />

� �� �<br />

d<br />

� �� �<br />

m 2 −d<br />

where the dimension of each of the subspaces is <strong>in</strong>dicated, <strong>and</strong> where ⊕<br />

denotes direct sum. S<strong>in</strong>ce it is known that N(T ∗ ) = N(C) = N(E ∗ ) ⊂<br />

R(G), it follows that N(G ∗ ) ⊂ R(T). This means that a d-dimensional<br />

subspace of R(T) forms N(G ∗ ). Def<strong>in</strong>e the m 2 × (2md − d 2 − d) matrix<br />

¯T via<br />

G ∗ CG = G ∗ TT ∗ G = G ∗ ¯ T ¯ T ∗ G . (A.47)<br />

Notice that ˜ C = C + EE ∗ <strong>and</strong>, hence,<br />

G ∗ ˜ CG = G ∗ � ¯ T E �� ¯T E � ∗ G = G ∗ � ¯ T E B � � ¯T E B � ∗ G .<br />

For simplicity, <strong>in</strong>troduce the notation<br />

F = � ¯ T E B � .<br />

It is important to realize that the blocks <strong>in</strong> F are mutually orthogonal<br />

<strong>and</strong> that R( � ¯ T E � ) = R(G). This implies <strong>in</strong> particular that<br />

<strong>and</strong><br />

F −1 ⎡<br />

¯T<br />

= ⎣<br />

†<br />

⎤<br />

⎦<br />

E †<br />

B †<br />

ΠF∗ ⎡<br />

I2md−d2−d 0 0<br />

G = ⎣<br />

0 I (m−d) 2 0<br />

0 0 0d×d<br />

⎤<br />

⎦ .<br />

The fact that R( � ¯ T E � ) = R(G) also proves that G ∗ ˜ CG is <strong>in</strong>vertible.


196 A Proofs <strong>and</strong> Derivations<br />

Us<strong>in</strong>g the observations made above we obta<strong>in</strong><br />

D ∗ G(G ∗ ˜ CG) −1 G ∗<br />

1<br />

= D ∗ F −∗ F ∗ G(G ∗ FF ∗ G) −1 G ∗ FF −1 2 ∗ −∗<br />

= D F ΠF∗GF −1<br />

3<br />

= D ∗ � T ¯ †∗ †∗ †∗ E B �<br />

⎡<br />

⎤ ⎡<br />

I2md−d2−d 0 0 ¯T<br />

⎣ 0 I (m−d) 2 0⎦<br />

⎣<br />

0 0 0<br />

†<br />

E †<br />

B †<br />

⎤<br />

⎦<br />

4<br />

= D ∗ � T ¯ †∗ †∗ 0 B �<br />

⎡<br />

¯T<br />

⎣<br />

†<br />

E †<br />

⎤<br />

⎦<br />

0<br />

5 = D ∗ T ¯ †∗<br />

T ¯ † 6 ∗<br />

= D ( T¯ T¯ ∗ †<br />

) .<br />

(A.48)<br />

In the fourth equality, the fact that D ∗ E = 0 was used <strong>and</strong> the sixth<br />

equality follows from a property of pseudo<strong>in</strong>verses for full rank matrices<br />

(see Lemma A.1 <strong>in</strong> Appendix A.11). Now, comb<strong>in</strong><strong>in</strong>g (A.46), (A.47)<br />

<strong>and</strong> (A.48) <strong>in</strong> the equations (A.43) <strong>and</strong> (A.44) we get the desired result;<br />

that is,<br />

H = 2PD ∗ Π ⊥ BWoptΠ ⊥ BDP<br />

= 2PD ∗ G(G ∗ ˜ CG) −1 G ∗ DP ,<br />

Q = 4U ∗ CU = 4PD ∗ Π ⊥ BWoptΠ ⊥ BCΠ ⊥ BWoptΠ ⊥ BDP<br />

= 4PD ∗ G(G ∗ ˜ CG) −1 G ∗ CG(G ∗ ˜ CG) −1 G ∗ DP<br />

= 4PD ∗ G(G ∗ ˜ CG) −1 G ∗ ¯ T ¯ T ∗ G(G ∗ ˜ CG) −1 G ∗ DP<br />

= 4PD ∗ ( ¯ T ¯ T ∗ ) † ¯ T ¯ T ∗ ( ¯ T ¯ T ∗ ) † DP<br />

= 4PD ∗ ( ¯ T ¯ T ∗ ) † DP<br />

= 4PD ∗ G(G ∗ ˜ CG) −1 G ∗ DP<br />

<strong>and</strong>, hence, Q = 2H.<br />

We are now ready to prove Theorem 4.3. It follows from the above<br />

that<br />

Γ = H −1 QH −1 �<br />

= PD ∗ G(G ∗ �−1 CG) ˜ −1 ∗<br />

G DP (A.49)<br />

when W = Wopt is used. The expression <strong>in</strong> (A.49) is recognized as the<br />

CRB <strong>in</strong> Theorem 4.1. This concludes the proof that the weight<strong>in</strong>g matrix<br />

<strong>in</strong> (4.30) gives m<strong>in</strong>imum variance performance <strong>in</strong> large samples.


A.9 Proof of Expression (5.22) 197<br />

A.9 Proof of Expression (5.22)<br />

In this appendix it is shown that the solution to (5.21) is given by (5.22).<br />

Insert<strong>in</strong>g (5.20) <strong>in</strong> (5.21) we may equivalently write<br />

max<br />

¯K<br />

Tr � W 1/2 YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 ¯ KPβΠ ⊥ αY T αW 1/2� .<br />

The trick is now to change variables. Let<br />

(A.50)<br />

¯K = ¯ T T E T (PβΠ ⊥ αP T β ) −1/2 , (A.51)<br />

where ¯ T is an n × n matrix of full rank <strong>and</strong> E is an (m + l)β × n matrix<br />

with orthonormal columns. Us<strong>in</strong>g the SVD (5.23) <strong>and</strong> (A.51) <strong>in</strong> (A.50)<br />

we have<br />

max<br />

E<br />

E T E=I<br />

Tr � Qˆ Sˆ Vˆ T T<br />

EE Vˆ Sˆ Qˆ T� = max<br />

E<br />

E T Tr<br />

E=I<br />

� E T �<br />

Vˆ Sˆ 2<br />

Vˆ T<br />

E<br />

which clearly is solved by E = ˆ Vs. Insert<strong>in</strong>g this solution <strong>in</strong> (A.51) gives<br />

(5.22).<br />

A.10 Proof of Expression (5.27)<br />

In this appendix we show that the solution to (5.25) is given by (5.27).<br />

Us<strong>in</strong>g (5.20), the m<strong>in</strong>imization problem reads<br />

m<strong>in</strong><br />

¯K<br />

= m<strong>in</strong><br />

¯K<br />

= m<strong>in</strong><br />

¯K<br />

= m<strong>in</strong><br />

¯K<br />

det � Θ �<br />

det � YαΠ ⊥ αY T α − YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1 KPβΠ ¯ ⊥ αY T� α<br />

det � I − (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1<br />

n�<br />

(1 − λk) ,<br />

k=1<br />

where {λk} n<br />

k=1 are the nonzero eigenvalues of<br />

× ¯ KPβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1/2�<br />

M = (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β ¯ K T ( ¯ KPβΠ ⊥ αP T β ¯ K T ) −1<br />

(A.52)<br />

× ¯ KPβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1/2 . (A.53)


198 A Proofs <strong>and</strong> Derivations<br />

From the follow<strong>in</strong>g series of implications<br />

Θ ≥ 0 ⇒ I − M ≥ 0 ⇒ 1 − λk ≥ 0, k = 1, ..., n,<br />

it is clear that the m<strong>in</strong>imum of (A.52) is atta<strong>in</strong>ed if we can f<strong>in</strong>d a ¯ K<br />

which maximizes all n nonzero eigenvalues of M. Next it is shown that<br />

there <strong>in</strong>deed exists such a ¯ K.<br />

If the variable ¯ K is changed accord<strong>in</strong>g to (A.51) the expression (A.53)<br />

reads<br />

M = (YαΠ ⊥ αY T α) −1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1/2 E<br />

× E T (PβΠ ⊥ αP T β ) −1/2 PβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1/2 .<br />

The n nonzero eigenvalues of M are then equivalently given by the eigenvalues<br />

of<br />

E T (PβΠ ⊥ αP T β ) −1/2 PβΠ ⊥ αY T α<br />

× (YαΠ ⊥ αY T α) −1 (YαΠ ⊥ αP T β )(PβΠ ⊥ αP T β ) −1/2 E ,<br />

which by Po<strong>in</strong>caré’s separation theorem are bounded from above by the<br />

n largest eigenvalues of (see, e.g., [Bel70])<br />

(PβΠ ⊥ αP T β ) −1/2 PβΠ ⊥ αY T α(YαΠ ⊥ αY T α) −1 (YαΠ ⊥ αP T β )(PβΠ ⊥ αP T β ) −1/2 .<br />

(A.54)<br />

The bound is atta<strong>in</strong>ed if E is taken as the eigenvectors correspond<strong>in</strong>g to<br />

the n largest eigenvalues of (A.54). S<strong>in</strong>ce M is positive semi def<strong>in</strong>ite <strong>and</strong><br />

symmetric, the eigendecomposition of M is given by ˆ V ˆ S 2 ˆ V T where ˆ V<br />

<strong>and</strong> ˆ S are def<strong>in</strong>ed through the SVD (5.26). Hence, the upper bound on<br />

the eigenvalues of M is atta<strong>in</strong>ed if E = ˆ Vs. The sought ¯ K <strong>in</strong> (A.52) is<br />

thus given by (A.51) with E = ˆ Vs, which is (5.27).<br />

A.11 Proof of Theorem 5.1<br />

Let us first <strong>in</strong>troduce the follow<strong>in</strong>g lemma.<br />

Lemma A.1. Let A be m × r <strong>and</strong> B r × n, both of rank r, then<br />

(AB) † = B † A † . (A.55)


A.11 Proof of Theorem 5.1 199<br />

Proof. It is straightforward to verify that the expression <strong>in</strong> (A.55) satisfies<br />

the Moore-Penrose conditions. For a reference, see [Alb72].<br />

Consider the pole estimates from (5.30)-(5.31) where ˆ Γα is given by<br />

(5.37). Notice that the analysis below is valid even if the right h<strong>and</strong> side of<br />

(5.37) is multiplied from the right with any full rank matrix s<strong>in</strong>ce this does<br />

not alter the pole estimates. Any such multiplication just corresponds<br />

to a similarity transformation <strong>and</strong> does not affect the eigenvalues of Â.<br />

The error <strong>in</strong> the estimation of A may be written<br />

�<br />

à =  − A = J1 ˆ � † �<br />

Γα J2 ˆ Γα − J1 ˆ �<br />

ΓαA<br />

�<br />

†<br />

≃ (J1Γα) J2 ˆ Γα − J1 ˆ �<br />

ΓαA ,<br />

(A.56)<br />

where ≃ denotes first order approximation. Assume that A has dist<strong>in</strong>ct<br />

eigenvalues <strong>and</strong> thus can be diagonalized as<br />

A = T −1 AdT,<br />

where Ad is a diagonal matrix with the poles, {µk} n k=1 , on the diagonal<br />

<strong>and</strong> T−1 conta<strong>in</strong>s the correspond<strong>in</strong>g right eigenvectors of A. It can be<br />

shown that errors <strong>in</strong> the system matrix transforms to first order perturbations<br />

of the eigenvalues as<br />

˜µk ≃ Tk• ÃT−1<br />

•k<br />

see, e.g., [Bel70]. Insert<strong>in</strong>g (A.56) <strong>in</strong> (A.57) yields<br />

�<br />

†<br />

˜µk ≃ Tk• (J1Γα) J2 ˆ Γα − J1 ˆ �<br />

ΓαA<br />

; (A.57)<br />

T −1<br />

•k<br />

= Tk• (J1Γα) † (J2 − J1µk) ˆ ΓαT −1<br />

•k .<br />

(A.58)<br />

Note that ΓαT −1 = Γ d α is the extended observability matrix of the diagonal<br />

realization. Hence,<br />

T (J1Γα) † = � J1ΓαT −1� † =<br />

�<br />

J1Γ d � †<br />

α , (A.59)<br />

where Lemma A.1 has been used <strong>in</strong> the first equality. S<strong>in</strong>ce Γα =<br />

W −1/2 Qs, we have<br />

W −1/2 QsT −1 = Γ d α


200 A Proofs <strong>and</strong> Derivations<br />

which gives an expression for T −1<br />

Us<strong>in</strong>g (A.59) <strong>and</strong> (A.60) <strong>in</strong> (A.58) leads to<br />

where f ∗ k<br />

˜µk ≃<br />

�<br />

J1Γ d α<br />

� †<br />

T −1 = Q T s W 1/2 Γ d α . (A.60)<br />

(J2 − J1µk)<br />

k•<br />

ˆ ΓαQ T s W 1/2 (Γ d α) •k<br />

= f ∗ kW −1/2 QsQ ˆ T s W 1/2 (Γ d α) •k ,<br />

is def<strong>in</strong>ed <strong>in</strong> (5.44). Observe that<br />

f ∗ kΓ d �<br />

α = J1Γ d � † �<br />

α J2Γ<br />

k•<br />

d α − J1Γ d �<br />

αµk<br />

�<br />

= J1Γ d � † �<br />

α J1Γ d αAd − J1Γ d �<br />

αµk<br />

k•<br />

= 0 .<br />

(A.61)<br />

Thus, f ∗ k is orthogonal to Γd α, Γα <strong>and</strong> W −1/2 Qs. Viberg, Ottersten,<br />

Wahlberg <strong>and</strong> Ljung (see [VWO97] <strong>and</strong> also [VOWL91, VOWL93]) have<br />

shown that √ Nf ∗ k W−1/2 ˆ Qs has a limit<strong>in</strong>g zero-mean Gaussian distribution<br />

with covariances 1<br />

lim<br />

N→∞ N E� (f ∗ kW −1/2 Qs) ˆ T (f ∗ l W −1/2 Qs) ˆ �<br />

= �<br />

|τ|


A.12 Proof of Theorem 5.3 201<br />

where<br />

bk = HQ T s W 1/2 (Γ d α) •k . (A.64)<br />

Insert<strong>in</strong>g the def<strong>in</strong>ition of H (A.63) <strong>in</strong> (A.64) gives<br />

bk = RHVsS −1<br />

s Q T s W 1/2 (Γ d α) •k . (A.65)<br />

Next, recall that QsSsV T s = limN→∞ ˆ Q ˆ S ˆ V T is given by the limit <strong>in</strong><br />

(5.42). This implies that<br />

VsS −1<br />

s Q T �<br />

s = W 1/2 Γ d αR d� † �<br />

d†<br />

= R W 1/2 Γ d � †<br />

α , (A.66)<br />

where Lemma A.1 has been used <strong>in</strong> the last equality. Insert<strong>in</strong>g (A.66) <strong>in</strong><br />

(A.65) gives<br />

�<br />

d†<br />

bk = RHR W 1/2 Γ d � †<br />

α W 1/2 (Γ d α) •k = RHR d†<br />

•k<br />

(A.67)<br />

which is exactly equal to (Ωd) •k as def<strong>in</strong>ed <strong>in</strong> (5.45). We have thus proven<br />

Theorem 5.1.<br />

A.12 Proof of Theorem 5.3<br />

The proof of Theorem 5.3 is divided <strong>in</strong>to three parts. In part one we<br />

prove (5.51). In part two we show that (5.52) is the optimal choice of the<br />

weight<strong>in</strong>g Λ. F<strong>in</strong>ally, <strong>in</strong> part three the calculations lead<strong>in</strong>g to (5.53) are<br />

given.<br />

Part 1: For convenience, <strong>in</strong>troduce the notation<br />

F = F(θ) = W 1/2 Γα(θ) , (A.68)<br />

Π ⊥ = Π ⊥<br />

W 1/2 Γα (θ) = Π⊥ F = I − F(F ∗ F) −1 F ∗ . (A.69)<br />

The cost-function considered, (5.35), now reads<br />

V (θ) = vec T (Π ⊥ ˆ Qs)Λ vec(Π ⊥ ˆ Qs) . (A.70)<br />

S<strong>in</strong>ce ˆ θ m<strong>in</strong>imizes V (θ) a first order Taylor expansion of the gradient,<br />

V ′ (θ), around the true value, θ0, leads to<br />

( ˆ θ − θ0) = −[ ¯ V ′′ ] −1 V ′ (θ0) + o(V ′ (θ0)) (A.71)


202 A Proofs <strong>and</strong> Derivations<br />

where the limit<strong>in</strong>g Hessian<br />

¯V ′′ = lim<br />

N→∞ V ′′ (θ0) (A.72)<br />

is assumed to be <strong>in</strong>vertible 2 . As shown below, √ NV ′ (θ) is asymptotically<br />

Gaussian. Thus, the normalized parameter estimation error is also<br />

asymptotically Gaussian distributed,<br />

√ N( ˆ θ − θ0) ∈ AsN(0,Ξ),<br />

where<br />

<strong>and</strong> where Υ is<br />

Ξ = [ ¯ V ′′ ] −1 Υ[ ¯ V ′′ ] −1 ,<br />

Υ = lim<br />

N→∞ N E� [V ′ (θ0)][V ′ (θ0)] T� . (A.73)<br />

Next we derive expressions for the matrices ¯ V ′′ <strong>and</strong> Υ. In the follow<strong>in</strong>g<br />

let a subscript i denote partial derivative with respect to θi. The partial<br />

derivative of (A.69) is given by<br />

This gives for (A.70)<br />

Π ⊥ i = −Π ⊥ FiF † − F ∗†<br />

F ∗ iΠ ⊥ . (A.74)<br />

Vi = −2vec T (Π ⊥ FiF † ˆ Qs + F †∗ F ∗ iΠ ⊥ ˆ Qs)Λ vec(Π ⊥ ˆ Qs)<br />

≃ −2vec T (Π ⊥ FiF † Qs)Λ vec(Π ⊥ ˆ Qs),<br />

where ≃ denotes first order approximation (around θ0). Next, a result<br />

from [OV94] will be used, modified to the situation <strong>in</strong> this chapter. The<br />

elements of √ N vec(Π ⊥ Qs) ˆ have a limit<strong>in</strong>g zero-mean Gaussian distribution<br />

with asymptotic covariance<br />

�<br />

Σ = lim N E vec(Π<br />

N→∞ ⊥ Qs)vec ˆ T (Π ⊥ �<br />

Qs) ˆ<br />

= � � T<br />

H Rζ(τ)H � ⊗ � Π ⊥ W 1/2 Rnα (τ)W1/2Π ⊥� (A.75)<br />

,<br />

|τ|


A.12 Proof of Theorem 5.3 203<br />

is s<strong>in</strong>gular with a null space of dimension n 2 (the null space of In ⊗Π ⊥ )<br />

<strong>and</strong> it has a rank equal to n(αl − n) (the dimension of the space onto<br />

which In ⊗ Π ⊥ projects) 3 . Insert<strong>in</strong>g this <strong>in</strong> (A.73), the (i,j)th element<br />

of Υ is given by<br />

Υ (ij) = lim<br />

N→∞ N E {Vi Vj}<br />

= 4vec T (Π ⊥ FiF † Qs)ΛΣΛvec(Π ⊥ FjF † Qs) .<br />

S<strong>in</strong>ce the second partial derivative of (A.70) reads<br />

Vij = 2vec T (Π ⊥ ij ˆ Qs)Λ vec(Π ⊥ ˆ Qs) + 2vec T (Π ⊥ i ˆ Qs)Λ vec(Π ⊥ j ˆ Qs)<br />

<strong>and</strong> that ˆ Qs → Qs w.p.1 4 , the (i,j)th element of the limit<strong>in</strong>g Hessian<br />

(A.72) is given by<br />

¯V ′′<br />

(ij) = lim<br />

N→∞ Vij = 2vec T (Π ⊥ i Qs)Λ vec(Π ⊥ j Qs) =<br />

= 2vec T (Π ⊥ FiF † Qs)Λ vec(Π ⊥ FjF † Qs) .<br />

Introduce a matrix G whose ith column is given by<br />

G•i = vec(Π ⊥ FiF † Qs) = vec � Π ⊥<br />

W1/2ΓαW1/2Γαi (W1/2Γα) † �<br />

Qs ,<br />

where all expressions should be evaluated <strong>in</strong> θ0. Us<strong>in</strong>g the identity<br />

vec(ABC) = (C T ⊗ A)vec(B) (A.76)<br />

for matrices with compatible dimensions, G can be written<br />

G = ��<br />

(W 1/2 Γα) † �T Qs ⊗ Π ⊥<br />

W1/2ΓαW1/2� × � vec(Γα1 ) vec(Γα2 ) ... vec(Γαd )� , (A.77)<br />

3 We have no proof of the exact rank of Σ. In other words, we do know that the<br />

rank is no more than n(αl − n) but we can not state (useful) conditions on the data<br />

<strong>and</strong> on the system for the rank to be equal to n(αl − n). However <strong>in</strong> all numerical<br />

examples we have performed, the rank was n(αl − n).<br />

4 It should be noted that possible non-uniqueness of the SVD components may<br />

affect “local” results <strong>in</strong> this chapter, however, <strong>in</strong> all f<strong>in</strong>al results only unique parts<br />

appear. Qs is for example not unique if several s<strong>in</strong>gular values <strong>in</strong> Ss are equal.


204 A Proofs <strong>and</strong> Derivations<br />

which is (5.47). Remember that d is the number of parameters <strong>in</strong> θ. If<br />

Γα(θ) is properly parameterized the matrix G will have full rank. With<br />

this notation, the matrices Υ <strong>and</strong> ¯ V ′′ may be written <strong>in</strong> the follow<strong>in</strong>g<br />

compact form<br />

Υ = 4G T ΛΣΛG<br />

¯V ′′ = 2G T ΛG .<br />

The asymptotic covariance matrix is thus given by<br />

Ξ = [ ¯ V ′′ ] −1 Υ[ ¯ V ′′ ] −1 = (G T ΛG) −1 (G T ΛΣΛG)(G T ΛG) −1<br />

which is (5.51).<br />

Part 2: In this part we show that the covariance matrix (5.51) as<br />

a function of the weight<strong>in</strong>g matrix Λ has a lower bound, <strong>and</strong> that this<br />

bound is atta<strong>in</strong>ed for the choice made <strong>in</strong> (5.52). The rank of Σ is n(αl−n)<br />

which is equal to the rank of I ⊗ Π ⊥<br />

W1/2 . Study<strong>in</strong>g the expression for<br />

Γα<br />

Σ <strong>in</strong> (A.75), it is seen that the range space of Σ is given by the range<br />

space of I ⊗ Π ⊥<br />

W1/2 . Similarly, it can be seen from (A.77) that the<br />

Γα<br />

range space of G lies <strong>in</strong> the range space of I ⊗Π ⊥<br />

, i.e., <strong>in</strong> the range<br />

W1/2Γα space of Σ. Hence, the projector on the range space of Σ has no effect<br />

when operat<strong>in</strong>g on G, i.e., ΣΣ † G = G.<br />

Next we show that GTΣ † G is positive def<strong>in</strong>ite. This can be proven<br />

by us<strong>in</strong>g the orthogonal decomposition of the Euclidean space<br />

R nαl = R(Σ) ⊕ N(Σ T ) ,<br />

where N(·) denotes the null space of (·) <strong>and</strong> ⊕ denotes direct sum; see<br />

e.g. [SS89]. S<strong>in</strong>ce, for any x, Gx is <strong>in</strong> R(Σ) it can not be <strong>in</strong> N(Σ T ) =<br />

N(Σ) = N(Σ † ) <strong>and</strong> thus G T Σ † G > 0.<br />

Study the follow<strong>in</strong>g matrix<br />

� T T G ΛΣΛG G ΛG<br />

GTΛG GTΣ † �<br />

=<br />

G<br />

=<br />

� �<br />

T T †<br />

G ΛΣΛG G ΛΣΣ G<br />

G T Σ † ΣΛG G T Σ † ΣΣ † G<br />

� G T Λ<br />

G T Σ †<br />

�<br />

Σ � ΛG Σ † G � ≥ 0<br />

(A.78)<br />

where the last <strong>in</strong>equality follows s<strong>in</strong>ce Σ ≥ 0. The matrix <strong>in</strong>equality<br />

should be <strong>in</strong>terpreted as that the matrix is positive semi-def<strong>in</strong>ite. S<strong>in</strong>ce


A.12 Proof of Theorem 5.3 205<br />

the matrix (A.78) is positive semi-def<strong>in</strong>ite, the same holds for the Schur<br />

complement to G T Σ † G [SS89], i.e.,<br />

G T ΛΣΛG − G T ΛG(G T Σ † G) −1 G T ΛG ≥ 0 . (A.79)<br />

The matrix G T ΛG was <strong>in</strong> part one of this appendix shown to be the<br />

limit<strong>in</strong>g Hessian of the cost-function (5.34). We have previously assumed<br />

it to be <strong>in</strong>vertible for the analysis to hold. This sets some constra<strong>in</strong>t on<br />

the choice of the weight<strong>in</strong>g Λ but is needed for parameter identifiability.<br />

Multiply<strong>in</strong>g (A.79) from left <strong>and</strong> right with (G T ΛG) −1 gives the desired<br />

result<br />

(G T Σ † G) −1 ≤ (G T ΛG) −1 (G T ΛΣΛG)(G T ΛG) −1 .<br />

Thus, the weight<strong>in</strong>g matrix <strong>in</strong> (5.52) gives a lower bound on the asymptotic<br />

covariance matrix.<br />

Part 3: In this part the expression (5.53) is derived. The first<br />

equality <strong>in</strong> (5.53) follows trivially from (5.51) <strong>and</strong> (5.52) s<strong>in</strong>ce Σ † ΣΣ † =<br />

Σ † . The rema<strong>in</strong>der of this appendix proves the second equality <strong>in</strong> (5.53)<br />

(<strong>and</strong> thus Corollary 5.4).<br />

First, note that<br />

(W 1/2 Γα) † Qs = (QsT −1 ) † Qs = TQ T s Qs = T , (A.80)<br />

where T is the similarity transformation matrix describ<strong>in</strong>g the fact that<br />

the limit<strong>in</strong>g estimate of the observability matrix may correspond to a<br />

different realization than the one correspond<strong>in</strong>g to Γα. Remember that<br />

Γα is the parameterized extended observability matrix of some realization.<br />

Here, ˆ Qs → Qs = W 1/2 ΓαT as N tends to <strong>in</strong>f<strong>in</strong>ity. The matrix T<br />

depends on W. Let<br />

Π ⊥<br />

W1/2 = EET Γα<br />

(A.81)<br />

be an orthonormal factorization, i.e., E is αl × (αl − n) <strong>and</strong> E T E = I.<br />

Such a factorization is possible s<strong>in</strong>ce a projection matrix is symmetric<br />

<strong>and</strong> has eigenvalues equal to one or zero. Observe that this implies that<br />

E † = E T . In the follow<strong>in</strong>g we make extensive use of the follow<strong>in</strong>g rules<br />

for Kronecker products<br />

(A ⊗ B)(C ⊗ D) = AC ⊗ BD ,<br />

(A ⊗ B) T = A T ⊗ B T ,<br />

(A ⊗ B) † = A † ⊗ B † .


206 A Proofs <strong>and</strong> Derivations<br />

Rewrite (5.48) as<br />

Σ = (In ⊗ E)<br />

⎛<br />

× ⎝ �<br />

(H T Rζ(τ)H) ⊗ E T W 1/2 Rnα (τ)W1/2 ⎞<br />

E⎠<br />

|τ|


A.13 Proof of Theorem 5.5 207<br />

where (A.80), (A.66) <strong>and</strong> Lemma A.1 have been used. Hence, HT −1<br />

is equal to Ωθ0 def<strong>in</strong>ed <strong>in</strong> (5.49). F<strong>in</strong>ally, the factor W 1/2 E(W 1/2 E) †<br />

<strong>in</strong> (A.84) is a projection matrix on R[W 1/2 E]. Now, the columns <strong>in</strong> E<br />

form a basis for the null-space of Γ ∗ αW 1/2 , denoted as N[Γ ∗ αW 1/2 ]. This<br />

implies that the columns <strong>in</strong> W 1/2 E form a basis for N[Γ ∗ α]. Thus, a projector<br />

project<strong>in</strong>g on R[W 1/2 E], <strong>in</strong> fact projects on N[Γ ∗ α]. To conclude,<br />

the projector W 1/2 E(W 1/2 E) † can be written as<br />

Π ⊥ Γα = I − Γα (Γ ∗ αΓα) −1 Γ ∗ α<br />

<strong>and</strong> the expression (A.83) simplifies to (5.53) which was set to prove.<br />

A.13 Proof of Theorem 5.5<br />

In this appendix it is shown that (5.55) <strong>and</strong> (5.56) are asymptotically<br />

equivalent estimators. Let<br />

ˆR = W 1/2 YαΠ ⊥ αP T β (PβΠ ⊥ αP T β ) −1 PβΠ ⊥ αY T αW 1/2<br />

= ˆ Qs ˆ S 2 s ˆ Q T s + ˆ Qn ˆ S 2 n ˆ Q T n<br />

<strong>and</strong> let F be as <strong>in</strong> (A.68). Also recall the expression (A.69) <strong>and</strong> the<br />

correspond<strong>in</strong>g derivative (A.74). Introduce the two cost-functions<br />

�<br />

V (θ) = Tr Π ⊥ F ˆ �<br />

R , (A.85)<br />

�<br />

J(θ) = Tr Π ⊥ F ˆ QsS 2 s ˆ Q T �<br />

s . (A.86)<br />

Observe that ˆ R is real <strong>and</strong> thus there exists real s<strong>in</strong>gular vectors, F may<br />

on the other h<strong>and</strong> be complex-valued. S<strong>in</strong>ce the m<strong>in</strong>imizers of (A.85) <strong>and</strong><br />

(A.86) are consistent estimates of the true parameters it suffices, under<br />

mild conditions, to show that [Lju87, SS89]<br />

Vi = Ji + op(1/ √ N) , (A.87)<br />

Vij = Jij + op(1) , (A.88)<br />

where Vi denotes the partial derivative with respect to θi <strong>and</strong> similarly<br />

Vij denotes the second partial derivative (cf. (A.71)). The notation<br />

op(·) means the probability version of the correspond<strong>in</strong>g determ<strong>in</strong>istic<br />

notation 5 . Perturbation theory of subspaces gives<br />

Π ⊥ F ˆ RQs = Π ⊥ F ˆ QsS 2 s + op(1/ √ N) ;<br />

5 A sequence, xN, of r<strong>and</strong>om variables is op(aN) if the probability limit of a −1<br />

N xN<br />

is zero.


208 A Proofs <strong>and</strong> Derivations<br />

see Section 2.3.4 or [CTO89]. The follow<strong>in</strong>g holds for the limit<strong>in</strong>g pr<strong>in</strong>cipal<br />

s<strong>in</strong>gular vectors<br />

QsT = F<br />

for some full rank T. The scene is now settled to verify (A.87) <strong>and</strong> (A.88)<br />

by straightforward calculations. Consider (A.87)<br />

�<br />

Vi = Tr Π ⊥ Fi ˆ �<br />

R<br />

�<br />

= −2 Re Tr Π ⊥ FFi(F ∗ F) −1 F ∗ �<br />

Rˆ<br />

�<br />

= −2 Re Tr Fi(F ∗ F) −1 T ∗ Q ∗ s ˆ RΠ ⊥ �<br />

F<br />

�<br />

= −2 Re Tr Fi(F ∗ F) −1 T ∗ S 2 s ˆ Q T s Π ⊥ �<br />

F + op(1/ √ N) ,<br />

�<br />

Ji = Tr Π ⊥ Fi ˆ QsS 2 s ˆ Q T �<br />

s<br />

�<br />

= −2 Re Tr Π ⊥ FFi(F ∗ F) −1 F ∗ QsS ˆ 2 s ˆ Q T �<br />

s<br />

�<br />

= −2 Re Tr Π ⊥ FFi(F ∗ F) −1 (QsT) ∗ QsS 2 s ˆ Q T �<br />

s + op(1/ √ N)<br />

�<br />

= −2 Re Tr Fi(F ∗ F) −1 T ∗ S 2 s ˆ Q T s Π ⊥ �<br />

F + op(1/ √ N) .<br />

Hence, (A.87) holds true. Now turn to (A.88). S<strong>in</strong>ce<br />

the result is immediate.<br />

ˆR = R + op(1) = QsS 2 sQ T s + op(1) ,<br />

A.14 Rank of Coefficient Matrix<br />

Lemma A.2. Let A be an n × n matrix with all eigenvalues <strong>in</strong>side the<br />

unit circle <strong>and</strong> let B be an n × m matrix. Def<strong>in</strong>e the matrix polynomial<br />

F u (q −1 ) as <strong>in</strong> (6.17). The coefficient matrix is, then;<br />

�<br />

u Fn ... Fu �<br />

1 ,<br />

<strong>and</strong> it has the same rank as the reachability matrix of (A,B).


A.15 Rank of Sylvester Matrix 209<br />

Proof. The result follows from a simple construction. Let a(q −1 ) be as<br />

<strong>in</strong> (6.18); then,<br />

F u (q −1 ) =<br />

Adj{qI − A}B<br />

q n<br />

= a(q −1 ) � qI − A � −1 B<br />

Comparison of coefficients leads to<br />

= (a0 + a1q −1 + · · · + anq −n )<br />

× q −1 (I + q −1 A + q −2 A 2 + ...)B.<br />

�<br />

u Fn ... Fu � � �<br />

n−1 n−2<br />

1 = A B A B ... B<br />

⎡<br />

⎢<br />

× ⎢<br />

⎣<br />

a0Im 0 0 ... 0<br />

a1Im a0Im 0<br />

.<br />

. ..<br />

an−1Im an−2Im ... a0Im<br />

where Im is a m × m identity matrix. The last factor is clearly nons<strong>in</strong>gular<br />

(s<strong>in</strong>ce a0 = 1), so the lemma is proven.<br />

The lemma shows that the first block row of S as def<strong>in</strong>ed <strong>in</strong> (6.19) has<br />

full row rank if (A, [B K]) is reachable. This implies that S has full row<br />

rank (s<strong>in</strong>ce a0 = 1).<br />

A.15 Rank of Sylvester Matrix<br />

Consider the Sylvester matrix Sy <strong>in</strong> (6.31) generalized to the multi-<strong>in</strong>put<br />

case. This means that every ai is replaced by aiIm <strong>and</strong> that every F y<br />

i is<br />

now a matrix of dimension l × m. The dimension of Sy becomes ((α +<br />

β)m+βl)×((α+β+n)m). Observe that the relation (6.30) holds. Def<strong>in</strong>e<br />

⎡ ⎤<br />

z = ⎣<br />

y d β (t)<br />

uβ(t)<br />

uα(t + β)<br />

<strong>and</strong> express z <strong>in</strong> the follow<strong>in</strong>g two equivalent ways:<br />

⎦<br />

1<br />

z = Sy<br />

a(q−1 ) uα+β+n(t − n)<br />

.<br />

⎤<br />

⎥<br />

⎦ ,


210 A Proofs <strong>and</strong> Derivations<br />

<strong>and</strong><br />

⎡<br />

Γβ<br />

z = ⎣ 0<br />

Φβ<br />

Iβm<br />

⎤ ⎡<br />

0<br />

0 ⎦ ⎣<br />

0 0 Iαm<br />

xd (t)<br />

uβ(t)<br />

⎤ ⎡<br />

⎦ � Ξ⎣<br />

uα(t + β)<br />

xd (t)<br />

uβ(t)<br />

⎤<br />

⎦ .<br />

uα(t + β)<br />

Next, study<br />

Ē{zz T } = SyĒ �<br />

1<br />

a(q−1 ) uα+β+n(t<br />

1<br />

− n)<br />

a(q−1 ) uT �<br />

α+β+n(t − n)<br />

(A.89)<br />

= ΞĒ ⎧⎡<br />

⎪⎨<br />

⎣<br />

⎪⎩<br />

xd ⎤ ⎡<br />

(t)<br />

uβ(t) ⎦ ⎣<br />

uα(t + β)<br />

xd ⎤T<br />

(t)<br />

uβ(t) ⎦<br />

uα(t + β)<br />

⎫ ⎪⎬ ⎪⎭ ΞT . (A.90)<br />

From the proof of Theorem 6.2 we know that the covariance matrix <strong>in</strong><br />

the middle of (A.90) is positive def<strong>in</strong>ite if u(t) is at least PE(α + β + n)<br />

<strong>and</strong> if (A,B) is reachable. If rank{Γβ} = n, then Ξ is full column rank<br />

<strong>and</strong> it follows that<br />

rank � Ē{zz T } � = n + (α + β)m. (A.91)<br />

Furthermore, if u(t) is PE(α + β + n) then<br />

�<br />

1<br />

Ē<br />

a(q−1 ) uα+β+n(t<br />

1<br />

− n)<br />

a(q−1 ) uT �<br />

α+β+n(t − n)<br />

is positive def<strong>in</strong>ite [SS83]. Notice that we assume that the system is<br />

stable. This assumption implies that a(z) has all roots outside the unit<br />

circle. Compar<strong>in</strong>g (A.91) <strong>and</strong> (A.89), it can be concluded that,<br />

rank{Sy} = n + (α + β)m.<br />

This result holds if (A,B) is reachable, if (A,C) is observable, <strong>and</strong> if β<br />

is greater than or equal to the observability <strong>in</strong>dex. The observability <strong>and</strong><br />

reachability assumptions are equivalent to requir<strong>in</strong>g that the polynomials<br />

F y (z) <strong>and</strong> a(z) do not share any common zeros.<br />

S T y


Appendix B<br />

Notations<br />

In general, boldface uppercase letters are used for matrices <strong>and</strong> boldface<br />

lowercase letters denote vectors. Scalars are set <strong>in</strong> normal typeface.<br />

R, C The set of real <strong>and</strong> complex numbers, respectively.<br />

m × n The dimension of a matrix with m rows <strong>and</strong> n columns.<br />

X ∈ R m×n X is an m × n matrix of real numbers.<br />

X ∈ C m×n X is an m × n matrix of complex numbers.<br />

Aij<br />

θk<br />

A•k<br />

Ak•<br />

The (i,j)th element of the matrix A.<br />

The kth element of the vector θ.<br />

The kth column of A.<br />

The kth row of A.<br />

A T ,A ∗ ,A c The transpose, complex conjugate transpose <strong>and</strong> complex<br />

conjugate of the matrix A, respectively.<br />

A −T The transpose of the <strong>in</strong>verse of A. That is, A −T =<br />

(A −1 ) T .<br />

A † The Moore-Penrose pseudo <strong>in</strong>verse of A.<br />

Tr{A} The trace operator [Gra81].<br />

�x�2<br />

The 2-norm of the vector x (�x� 2 2 = x ∗ x).


212 B Notations<br />

�A�F<br />

The Frobenius norm of the matrix A.<br />

vec{A} The vectorization operator [Gra81].<br />

⊗ Kronecker product [Gra81, Bre78].<br />

⊙ The Hadamard or Schur product (elementwize multiplication).<br />

◦ The Khatri-Rao product (columnwize Kronecker product)<br />

[Bre78].<br />

In<br />

0m×n<br />

The n × n identity matrix. The subscript is often omitted.<br />

An m × n matrix of zeros. The subscript is typically<br />

omitted.<br />

diag{x} A diagonal matrix with the elements of the vector x on<br />

the diagonal.<br />

Re{A},Im{A} The real <strong>and</strong> imag<strong>in</strong>ary part of A, respectively.<br />

A > 0 (A ≥ 0) The Hermitian matrix A = A ∗ is positive def<strong>in</strong>ite (positive<br />

semidef<strong>in</strong>ite). For two Hermitian matrices A <strong>and</strong><br />

B, the notation A > B (A ≥ B) means that A − B is<br />

positive def<strong>in</strong>ite (positive semidef<strong>in</strong>ite).<br />

rank(A) The rank of the matrix A.<br />

N {A} The null-space of the matrix A.<br />

R{A} The range-space of A.<br />

⊕ Direct orthogonal sum of vector spaces.<br />

arg m<strong>in</strong><br />

x f(x) The value of x that m<strong>in</strong>imizes f(x).<br />

ˆθ An estimate of the quantity θ.<br />

O(x) An arbitrary function such that |O(x)/x| ≤ c < ∞ as<br />

x → 0.<br />

o(x) An arbitrary function such that |o(x)/x| → 0 as x → 0.


B Notations 213<br />

Op(x), op(x) The <strong>in</strong>-probability versions of O(x) <strong>and</strong> o(x), respectively<br />

[MKB79].<br />

δi,j<br />

Kronecker’s delta function (δi,j = 1 if i = j <strong>and</strong> zero<br />

otherwise.)<br />

⌊k⌋ The largest <strong>in</strong>teger less than or equal to k ∈ R.<br />

s ∈ (r,t] Half-open <strong>in</strong>terval, r < s ≤ t.<br />

q, q −1 The forward <strong>and</strong> backward unit shift operators, respectively<br />

(qu(t) = u(t + 1) <strong>and</strong> q −1 u(t) = u(t − 1)).<br />

log{·} The natural logarithm.<br />

det{·} The determ<strong>in</strong>ant function.<br />

E{·} Mathematical expectation.<br />

Ē{x(t)} limN→∞ 1<br />

N<br />

� N<br />

t=1 E{x(t)}<br />

x ∈ N(m,P) The vector x is (complex) Gaussian distributed with<br />

mean m <strong>and</strong> covariance P.<br />

x ∈ AsN(m,P) The vector x is asymptotically (complex) Gaussian distributed<br />

with mean m <strong>and</strong> covariance P.


Appendix C<br />

Abbreviations <strong>and</strong><br />

Acronyms<br />

4SID State-space subspace system identification<br />

ABC Asymptotically best consistent<br />

AR Autoregressive<br />

ARMA Autoregressive mov<strong>in</strong>g average<br />

ARX Autoregressive model with exogenous <strong>in</strong>puts<br />

CRB Cramér Rao lower bound<br />

CVA Canonical variate analysis<br />

DEUCE Direction estimator for uncorrelated emitters<br />

DOA Direction of arrival<br />

ESPRIT Estimation of signal parameters by rotational <strong>in</strong>variance<br />

techniques<br />

EVD Eigenvalue decomposition<br />

FB Forward backward<br />

FS F<strong>in</strong>ite samples


216 C Abbreviations <strong>and</strong> Acronyms<br />

GWSF Generalized weighted subspace fitt<strong>in</strong>g<br />

IV Instrumental variable<br />

ME Model errors<br />

ML Maximum likelihood<br />

MLE Maximum likelihood estimate<br />

MODE Method of direction estimation<br />

MOESP Multi-<strong>in</strong>put multi-output output error state-space model<br />

realization<br />

MUSIC Multiple signal classification<br />

N4SID Numerical algorithms for subspace state space system identification<br />

PEM Prediction error method<br />

SNR Signal to noise ratio<br />

SVD S<strong>in</strong>gular value decomposition<br />

ULA Uniform l<strong>in</strong>ear array<br />

w.p.1 With probability one<br />

WSF Weighted subspace fitt<strong>in</strong>g


Bibliography<br />

[AJ76] B. D. O. Anderson <strong>and</strong> E. I. Jury. “Generalized Bezoutian<br />

<strong>and</strong> Sylvester Matrices <strong>in</strong> Multivariable L<strong>in</strong>ear Control”.<br />

IEEE Transactions on Automatic Control, AC-21:551–556,<br />

Aug. 1976.<br />

[AK90] K. S. Arun <strong>and</strong> S. Y. Kung. “Balanced Approximation of<br />

Stochastic <strong>System</strong>s”. SIAM J. Matrix Analysis <strong>and</strong> Applications,<br />

11:42–68, 1990.<br />

[Aka74a] H. Akaike. “A New Look at the Statistical Model <strong>Identification</strong>”.<br />

IEEE Trans. Automat. Contr., AC-19:716–723, Dec.<br />

1974.<br />

[Aka74b] H. Akaike. “Stochastic Theory of M<strong>in</strong>imal Realization”.<br />

IEEE Trans. Automat. Contr., AC-19(6):667–674, Dec. 1974.<br />

[Aka75] H. Akaike. “Markovian Representation of Stochastic Processes<br />

by Canonical Variables”. SIAM Journal on Control,<br />

13(1):162–173, 1975.<br />

[Alb72] A. Albert. Regression <strong>and</strong> the Moore-Penrose Pseudo<strong>in</strong>verse.<br />

Academic Press, New York, 1972.<br />

[And84] T.W. Anderson. “An Introduction to Multivariate Statistical<br />

Analysis, 2 nd edition”. John Wiley & Sons, New York, 1984.<br />

[Aok87] M. Aoki. State Space Model<strong>in</strong>g of Time Series. Spr<strong>in</strong>ger<br />

Verlag, Berl<strong>in</strong>, 1987.<br />

[Ban71] W. J. Bangs. Array Process<strong>in</strong>g with Generalized Beamformers.<br />

PhD thesis, Yale University, New Haven, CT, 1971.


218 Bibliography<br />

[Bar83] A.J. Barabell. “Improv<strong>in</strong>g the Resolution Performance of<br />

Eigenstructure-Based Direction-F<strong>in</strong>d<strong>in</strong>g Algorithms”. In<br />

Proc. ICASSP 83, pages 336–339, Boston, MA, 1983.<br />

[Bay92] D. Bayard. “An Algorithm for State-space Frequency Doma<strong>in</strong><br />

<strong>Identification</strong> Without W<strong>in</strong>dow<strong>in</strong>g Distortions”. In Proc. 31st<br />

IEEE Conf. on Decision <strong>and</strong> Control, pages 1707–1712, Tucson,<br />

AZ, 1992.<br />

[BDS97] D. Bauer, M. Deistler, <strong>and</strong> W. Scherrer. “The Analysis of the<br />

Asymptotic Variance of <strong>Subspace</strong> Algorithms”. In Proc. 11th<br />

IFAC Symposium on <strong>System</strong> <strong>Identification</strong>, pages 1087–1091,<br />

Fukuoka, Japan, 1997.<br />

[Bel70] R. Bellman. Introduction to Matrix Analysis, 2 nd edition.<br />

McGraw-Hill, New York, 1970.<br />

[Bie79] G. Bienvenue. “Influence of the Spatial Coherence of the<br />

Background Noise on High Resolution Passive <strong>Methods</strong>”. In<br />

Proc. ICASSP’79, pages 306–309, Wash<strong>in</strong>gton, DC, 1979.<br />

[BKAK78] R. R. Bitmead, S. Y. Kung, B. D. O. Anderson, <strong>and</strong><br />

T. Kailath. “Greatest Common Divisors via Generalized<br />

Sylvester <strong>and</strong> Bezout Matrices”. IEEE Transactions on Automatic<br />

Control, 23(6):1043–1047, Dec. 1978.<br />

[BM86] Y. Bresler <strong>and</strong> A. Macovski. “Exact Maximum Likelihood Parameter<br />

Estimation of Superimposed Exponential Signals <strong>in</strong><br />

Noise”. IEEE Trans. ASSP, ASSP-34:1081–1089, Oct. 1986.<br />

[Böh86] J.F. Böhme. “Estimation of Spectral Parameters of Correlated<br />

Signals <strong>in</strong> Wavefields”. Signal Process<strong>in</strong>g, 10:329–337,<br />

1986.<br />

[Boh91] T. Bohl<strong>in</strong>. “Interactive <strong>System</strong> <strong>Identification</strong>: Prospects <strong>and</strong><br />

Pitfalls”. Spr<strong>in</strong>ger-Verlag, 1991.<br />

[Bre78] J. W. Brewer. “Kronecker Products <strong>and</strong> Matrix Calculus<br />

<strong>in</strong> <strong>System</strong> Theory”. IEEE Trans. on Circuits <strong>and</strong> <strong>System</strong>s,<br />

CAS-25(9):772–781, September 1978.<br />

[CK95] Y.M. Cho <strong>and</strong> T. Kailath. “Fast <strong>Subspace</strong>-based <strong>System</strong><br />

<strong>Identification</strong>: An Instrumental Variable Approach”. Automatica,<br />

31(6):903–905, 1995.


Bibliography 219<br />

[Com82] R. T. Compton. “The Effect of R<strong>and</strong>om Steer<strong>in</strong>g Vector<br />

Errors <strong>in</strong> the Applebaum Adaptive Array”. IEEE Trans. on<br />

Aero. <strong>and</strong> Elec. Sys., AES-18(5):392–400, Sept. 1982.<br />

[Cra46] H. Cramér. Mathematical <strong>Methods</strong> of Statistics. Pr<strong>in</strong>ceton<br />

University Press, 1946.<br />

[CS96] M. Cedervall <strong>and</strong> P. Stoica. “<strong>System</strong> <strong>Identification</strong> from<br />

Noisy Measurements by Us<strong>in</strong>g Instrumental Variables <strong>and</strong><br />

<strong>Subspace</strong> Fitt<strong>in</strong>g”. Circuits, <strong>System</strong>s <strong>and</strong> Signal Process<strong>in</strong>g,<br />

15(2):275–290, 1996.<br />

[CTO89] H. Clergeot, S. Tressens, <strong>and</strong> A. Ouamri. “Performance<br />

of High Resolution Frequencies Estimation <strong>Methods</strong> Compared<br />

to the Cramér-Rao Bounds”. IEEE Trans. ASSP,<br />

37(11):1703–1720, November 1989.<br />

[CXK94] Y.M. Cho, G. Xu, <strong>and</strong> T. Kailath. “Fast Recursive <strong>Identification</strong><br />

of State Space Models via Exploitation of Displacement<br />

Structure”. Automatica, Special Issue on Statistical Signal<br />

Process<strong>in</strong>g <strong>and</strong> Control, 30(1):45–59, Jan. 1994.<br />

[De 88] B. De Moor. Mathematical Concepts <strong>and</strong> Techniques for Model<strong>in</strong>g<br />

of Static <strong>and</strong> Dynamic <strong>System</strong>s. PhD thesis, Kath. Universiteit<br />

Leuven, Heverlee, Belgium, 1988.<br />

[DPS95] M. Deistler, K. Peternell, <strong>and</strong> W. Scherrer. “Consistency<br />

<strong>and</strong> Relative Efficiency of <strong>Subspace</strong> <strong>Methods</strong>”. Automatica,<br />

special issue on trends <strong>in</strong> system identification, 31(12):1865–<br />

1875, Dec. 1995.<br />

[DVVM88] B. De Moor, J. V<strong>and</strong>ewalle, L. V<strong>and</strong>enberghe, <strong>and</strong> P. Van<br />

Mieghem. “A Geometrical Strategy for the <strong>Identification</strong> of<br />

State Space Models of L<strong>in</strong>ear Multivariable <strong>System</strong>s with S<strong>in</strong>gular<br />

Value Decomposition”. In Proc. IFAC 88, pages 700–<br />

704, Beij<strong>in</strong>g, Ch<strong>in</strong>a, 1988.<br />

[Eyk74] P. Eykhoff. “<strong>System</strong> <strong>Identification</strong>, Parameter <strong>and</strong> State Estimation”.<br />

Wiley, 1974.<br />

[Fau76] P.L. Faurre. Stochastic realization algorithms. In R.K. Mehra<br />

<strong>and</strong> D.G. La<strong>in</strong>iotis, editors, <strong>System</strong> <strong>Identification</strong>: Advances<br />

<strong>and</strong> Case Studies. Academic Press, NY, 1976.


220 Bibliography<br />

[FFLC95] A. Flieller, A. Ferreol, P. Larzabal, <strong>and</strong> H. Clergeot. “Robust<br />

Bear<strong>in</strong>g Estimation <strong>in</strong> the Presence of Direction-Dependent<br />

Modell<strong>in</strong>g Errors: Identifiability <strong>and</strong> Treatment”. In Proc.<br />

ICASSP, pages 1884–1887, Detroit, MI, 1995.<br />

[Fri90a] B. Friedl<strong>and</strong>er. “A Sensitivity Analysis of the MUSIC Algorithm”.<br />

IEEE Trans. on ASSP, 38(10):1740–1751, October<br />

1990.<br />

[Fri90b] B. Friedl<strong>and</strong>er. “Sensitivity of the Maximum Likelihood<br />

Direction F<strong>in</strong>d<strong>in</strong>g Algorithm”. IEEE Trans. on AES,<br />

26(6):953–968, November 1990.<br />

[Fuc92] J.J. Fuchs. “Estimation of the Number of Signals <strong>in</strong> the<br />

Presence of Unknown Correlated <strong>Sensor</strong> Noise”. IEEE Trans.<br />

on Signal Process<strong>in</strong>g, SP-40(5):1053–1061, May 1992.<br />

[FW94] B. Friedl<strong>and</strong>er <strong>and</strong> A.J. Weiss. “Effects of Model Errors on<br />

Waveform Estimation Us<strong>in</strong>g the MUSIC Algorithm”. IEEE<br />

Trans. onSP, SP-42(1):147–155, Jan. 1994.<br />

[GJO96] B. Göransson, M. Jansson, <strong>and</strong> B. Ottersten. “Spatial <strong>and</strong><br />

Temporal Frequency Estimation of Uncorrelated Signals Us<strong>in</strong>g<br />

<strong>Subspace</strong> Fitt<strong>in</strong>g”. In Proceed<strong>in</strong>gs of 8th IEEE Signal<br />

Process<strong>in</strong>g Workshop on Statistical Signal <strong>and</strong> Array Process<strong>in</strong>g,<br />

pages 94–96, Corfu, Greece, June 1996.<br />

[GO97] B. Göransson <strong>and</strong> B. Ottersten. “Efficient Direction Estimation<br />

of Uncorrelated Sources Us<strong>in</strong>g Polynomial Root<strong>in</strong>g”.<br />

In Proc. 13th Intl. Conf. DSP’97, pages 935–938, Santor<strong>in</strong>i,<br />

Greece, 1997.<br />

[Goo63] N.R. Goodman. “Statistical Analysis Based on a Certa<strong>in</strong><br />

Multivariate Complex Gaussian Distribution (An Introduction)”.<br />

Annals Math. Stat., Vol. 34:152–176, 1963.<br />

[Gop69] B. Gop<strong>in</strong>ath. “<strong>On</strong> the <strong>Identification</strong> of L<strong>in</strong>ear Time-Invariant<br />

<strong>System</strong>s from Input-Output Data”. The Bell <strong>System</strong> Technical<br />

Journal, Vol. 48(5):1101–1113, 1969.<br />

[Gor97] A. Gorokhov. “Bl<strong>in</strong>d Separation of Convolutive Mixtures:<br />

Second Order <strong>Methods</strong>”. PhD thesis, Ecole Nationale Superieure<br />

des Telecommunications (ENST), Paris, 1997.


Bibliography 221<br />

[GP77] G. C. Goodw<strong>in</strong> <strong>and</strong> R. L. Payne. “Dynamic <strong>System</strong> <strong>Identification</strong>:<br />

Experiment Design <strong>and</strong> Data Analysis”, volume 136<br />

of Mathematics <strong>in</strong> Science <strong>and</strong> Eng<strong>in</strong>eer<strong>in</strong>g Series. Academic<br />

Press, 1977.<br />

[Gra81] A. Graham. Kronecker Products <strong>and</strong> Matrix Calculus with<br />

Applications. Ellis Horwood Ltd, Chichester, UK, 1981.<br />

[Gus97] T. Gustafsson. “<strong>System</strong> <strong>Identification</strong> Us<strong>in</strong>g <strong>Subspace</strong>-based<br />

Instrumental Variable <strong>Methods</strong>”. In Proc. 11th IFAC Symposium<br />

on <strong>System</strong> <strong>Identification</strong>, pages 1119–1124, Fukuoka,<br />

Japan, 1997.<br />

[GV96] G.H. Golub <strong>and</strong> C.F. Van Loan. Matrix Computations, third<br />

edition. Johns Hopk<strong>in</strong>s University Press, Baltimore, MD.,<br />

1996.<br />

[GW74] K. Glover <strong>and</strong> J. C. Willems. “Parameterizations of L<strong>in</strong>ear<br />

Dynamical <strong>System</strong>s: Canonical Forms <strong>and</strong> Identifiability”.<br />

IEEE Trans. on Automatic Control, AC-19(6):640–646, Dec.<br />

1974.<br />

[GW84] M. Gevers <strong>and</strong> V. Wertz. “Uniquely Identifiable State-space<br />

<strong>and</strong> ARMA Parameterizations for Multivariable L<strong>in</strong>ear <strong>System</strong>s”.<br />

Automatica, 20(3):333–347, 1984.<br />

[Haa97a] M. Haardt. “Efficient <strong>On</strong>e-, Two-, <strong>and</strong> Multidimensional<br />

High-Resolution Array Signal Process<strong>in</strong>g”. PhD thesis, Technical<br />

University of Munich, Shaker Verlag, 1997.<br />

[Haa97b] M. Haardt. “Structured Least Squares to Improve the Performance<br />

of ESPRIT-Type Algorithms”. IEEE Trans. Signal<br />

Process<strong>in</strong>g, 45(3):792–799, Mar. 1997.<br />

[Hay91] S. Hayk<strong>in</strong>. Advances <strong>in</strong> Spectrum Analysis <strong>and</strong> Array Process<strong>in</strong>g,<br />

edited by S. Hayk<strong>in</strong>, volume 2. Prentice-Hall, Englewood<br />

Cliffs, NJ, 1991.<br />

[HD88] E. J. Hannan <strong>and</strong> M. Deistler. The Statistical Theory of<br />

L<strong>in</strong>ear <strong>System</strong>s. J. Wiley & Sons, N. Y., 1988.<br />

[HK66] B.L. Ho <strong>and</strong> R.E. Kalman. “Efficient Construction of L<strong>in</strong>ear<br />

State Variable Models from Input/Output Functions”.<br />

Regelungstechnik, 14:545–548, 1966.


222 Bibliography<br />

[Hot33] H. Hotell<strong>in</strong>g. “Analysis of a Complex of Statistical Variables<br />

<strong>in</strong>to Pr<strong>in</strong>cipal Components”. J. Educ. Psych., 24:417–<br />

441,498–520, 1933.<br />

[Jaf88] A. G. Jaffer. “Maximum Likelihood Direction F<strong>in</strong>d<strong>in</strong>g of<br />

Stochastic Sources: A Separable Solution”. In Proc. ICASSP<br />

88, volume 5, pages 2893–2896, New York, April 1988.<br />

[Jan95] M. Jansson. “<strong>On</strong> Performance Analysis of <strong>Subspace</strong> <strong>Methods</strong><br />

<strong>in</strong> <strong>System</strong> <strong>Identification</strong> <strong>and</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g”. Technical<br />

Report TRITA-REG-9503, ISSN 0347-1071, Automatic<br />

Control, Signals, <strong>Sensor</strong>s & <strong>System</strong>s, <strong>KTH</strong>, Stockholm, June<br />

1995. Licentiate Thesis.<br />

[JD93] D.H. Johnson <strong>and</strong> D.E. Dudgeon. Array Signal Process<strong>in</strong>g<br />

– Concepts <strong>and</strong> Techniques. Prentice-Hall, Englewood Cliffs,<br />

NJ, 1993.<br />

[JGO97a] M. Jansson, B. Göransson, <strong>and</strong> B. Ottersten. “A <strong>Subspace</strong><br />

Method for Direction of Arrival Estimation of Uncorrelated<br />

Emitter Signals”. Submitted to IEEE Trans. on Signal Process<strong>in</strong>g,<br />

Aug. 1997.<br />

[JGO97b] M. Jansson, B. Göransson, <strong>and</strong> B. Ottersten. “Analysis of a<br />

<strong>Subspace</strong>-Based Spatial Frequency Estimator”. In Proceed<strong>in</strong>gs<br />

of ICASSP’97, pages 4001–4004, München, Germany,<br />

April 1997.<br />

[JO95] M. Jansson <strong>and</strong> B. Ottersten. “Covariance Preprocess<strong>in</strong>g <strong>in</strong><br />

Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g”. In IEEE/IEE workshop on signal<br />

process<strong>in</strong>g methods <strong>in</strong> multipath environments, pages 23–<br />

32, Glasgow, Scotl<strong>and</strong>, 1995.<br />

[Joh93] R. Johansson. “<strong>System</strong> Model<strong>in</strong>g <strong>and</strong> <strong>Identification</strong>”. Information<br />

<strong>and</strong> <strong>System</strong> Sciences Series. Prentice Hall, 1993.<br />

[JP85] J.N. Juang <strong>and</strong> R.S. Pappa. “An Eigensystem Realization<br />

Algorithm for Modal Parameter <strong>Identification</strong> <strong>and</strong> Model Reduction”.<br />

J. Guidance, Control <strong>and</strong> Dynamics, 8(5):620–627,<br />

1985.<br />

[JSO97] M. Jansson, A. L. Sw<strong>in</strong>dlehurst, <strong>and</strong> B. Ottersten. “Weighted<br />

<strong>Subspace</strong> Fitt<strong>in</strong>g for General Array Error Models”. Submitted<br />

to IEEE Trans. on Signal Process<strong>in</strong>g, Jul. 1997.


Bibliography 223<br />

[JW95] M. Jansson <strong>and</strong> B. Wahlberg. “<strong>On</strong> Weight<strong>in</strong>g <strong>in</strong> State-Space<br />

<strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”. In Proc. European Control<br />

Conference, ECC’95, pages 435–440, Roma, Italy, 1995.<br />

[JW96a] M. Jansson <strong>and</strong> B. Wahlberg. “A L<strong>in</strong>ear Regression Approach<br />

to State-Space <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong>”. Signal<br />

Process<strong>in</strong>g (EURASIP), Special Issue on <strong>Subspace</strong> <strong>Methods</strong>,<br />

Part II: <strong>System</strong> <strong>Identification</strong>, 52(2):103–129, July 1996.<br />

[JW96b] M. Jansson <strong>and</strong> B. Wahlberg. “<strong>On</strong> Consistency of <strong>Subspace</strong><br />

<strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”. In Prepr<strong>in</strong>ts of 13th IFAC<br />

World Congress, pages 181–186, San Francisco, California,<br />

1996.<br />

[JW97a] M. Jansson <strong>and</strong> B. Wahlberg. “Counterexample to General<br />

Consistency of <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”. In<br />

11th IFAC Symposium on <strong>System</strong> <strong>Identification</strong>, pages 1677–<br />

1682, Fukuoka, Japan, 1997.<br />

[JW97b] M. Jansson <strong>and</strong> B. Wahlberg. “<strong>On</strong> Consistency of <strong>Subspace</strong><br />

<strong>Methods</strong> for <strong>System</strong> <strong>Identification</strong>”. Submitted to Automatica,<br />

Jun. 1997.<br />

[KAB83] S. Y. Kung, K. S. Arun, <strong>and</strong> D. V. BhaskarRao. “State-<br />

Space S<strong>in</strong>gular Value Decomposition Based Method for the<br />

Harmonic Retrieval Problem”. J. Opt. Soc. Amer., 73:1799–<br />

1811, Dec. 1983.<br />

[KB86] M. Kaveh <strong>and</strong> A. J. Barabell. “The Statistical Performance of<br />

the MUSIC <strong>and</strong> the M<strong>in</strong>imum-Norm Algorithms <strong>in</strong> Resolv<strong>in</strong>g<br />

Plane Waves <strong>in</strong> Noise”. IEEE Trans. ASSP, ASSP-34(2):331–<br />

341, April 1986. Correction <strong>in</strong> ASSP-34(3), June 1986, p. 633.<br />

[KDS88] A.M. K<strong>in</strong>g, U.B. Desai, <strong>and</strong> R.E. Skelton. “A Generalized<br />

Approach to q-Markov Covariance Equivalent Realizations<br />

of Discrete <strong>System</strong>s”. Automatica, 24(4):507–515, 1988.<br />

[KF93] M. Koerber <strong>and</strong> D. Fuhrmann. “Array Calibration by<br />

Fourier Series Parameterization: Scaled Pr<strong>in</strong>ciple Components<br />

Method”. In Proc. ICASSP, pages IV340–IV343, M<strong>in</strong>neapolis,<br />

M<strong>in</strong>n., 1993.


224 Bibliography<br />

[KP89] B. H. Kwon <strong>and</strong> S. U. Pillai. “A Self Inversive Array Process<strong>in</strong>g<br />

Scheme for Angle-of-Arrival Estimation”. Signal Process<strong>in</strong>g,<br />

17(3):259–277, Jul. 1989.<br />

[KSS94] A. Kangas, P. Stoica, <strong>and</strong> T. Söderström. “F<strong>in</strong>ite Sample <strong>and</strong><br />

Model<strong>in</strong>g Error Effects on ESPRIT <strong>and</strong> MUSIC Direction<br />

Estimators”. IEE Proc. Radar, Sonar, & Navig., 141:249–<br />

255, 1994.<br />

[KSS96] A. Kangas, P. Stoica, <strong>and</strong> T. Söderström. “Large Sample<br />

Analysis of MUSIC <strong>and</strong> M<strong>in</strong>-Norm Direction Estimators <strong>in</strong><br />

the Presence of Model Errors”. Circ., Sys., <strong>and</strong> Sig. Proc.,<br />

15:377–393, 1996.<br />

[Kun78] S.Y. Kung. “A New <strong>Identification</strong> <strong>and</strong> Model Reduction Algorithm<br />

via S<strong>in</strong>gular Value Decomposition”. In Proc. 12th<br />

Asilomar Conf. on Circuits, <strong>System</strong>s <strong>and</strong> Computers, pages<br />

705–714, Pacific Grove, CA, November 1978.<br />

[Kun81] S. Y. Kung. “A Toeplitz Approximation Method <strong>and</strong> Some<br />

Applications”. In Proc. Int. Symp. Mathematical Theory of<br />

Networks <strong>and</strong> <strong>System</strong>s, pages 262–266, Santa Monica, CA.,<br />

Aug. 1981.<br />

[KV96] H. Krim <strong>and</strong> M. Viberg. “Two Decades of Array Signal Process<strong>in</strong>g<br />

Research: The Parametric Approach”. IEEE Signal<br />

Process<strong>in</strong>g Magaz<strong>in</strong>e, 13(4):67–94, Jul. 1996.<br />

[Lar83] W. E. Larimore. “<strong>System</strong> <strong>Identification</strong>, Reduced-Order Filter<strong>in</strong>g<br />

<strong>and</strong> Model<strong>in</strong>g via Canonical Variate Analysis”. In<br />

Proc. ACC, pages 445–451, June 1983.<br />

[Lar90] W. E. Larimore. “Canonical Variate Analysis <strong>in</strong> <strong>Identification</strong>,<br />

Filter<strong>in</strong>g <strong>and</strong> Adaptive Control”. In Proc. 29th CDC,<br />

pages 596–604, Honolulu, Hawaii, December 1990.<br />

[Lar94] W. E. Larimore. “The Optimality of Canonical Variate <strong>Identification</strong><br />

by Example”. In Proc. 10th IFAC Symp. on Syst.<br />

Id., volume 2, pages 151–156, Copenhagen, Denmark, 1994.<br />

[Law40] D. N. Lawley. The estimation of factor load<strong>in</strong>gs by the<br />

method of maximum likelihood. In Proceed<strong>in</strong>gs of the Royal<br />

Society of Ed<strong>in</strong>burgh, Sec. A., volume 60, pages 64–82, 1940.


Bibliography 225<br />

[Liu92] K. Liu. “<strong>Identification</strong> of Multi-Input <strong>and</strong> Multi-Output <strong>System</strong>s<br />

by Observability Range Space Extraction”. In Proc.<br />

CDC, pages 915–920, Tucson, AZ, 1992.<br />

[Lju87] L. Ljung. <strong>System</strong> <strong>Identification</strong>: Theory for the User.<br />

Prentice-Hall, Englewood Cliffs, NJ, 1987.<br />

[Lju91] L. Ljung. “A Simple Start-up Procedure for Canonical Form<br />

State Space <strong>Identification</strong>, Based on <strong>Subspace</strong> Approximation”.<br />

In Proc. 30th IEEE Conf. on Decision <strong>and</strong> Control,<br />

pages 1333–1336, Brighton, U.K., 1991.<br />

[LM87] J. Lo <strong>and</strong> S. Marple. “Eigenstructure <strong>Methods</strong> for Array<br />

<strong>Sensor</strong> Localization”. In Proc. ICASSP, pages 2260–2263,<br />

Dallas, TX, 1987.<br />

[LS91] K. Liu <strong>and</strong> R. E. Skelton. “<strong>Identification</strong> <strong>and</strong> Control of<br />

NASA’s ACES Structure”. In Proc. American Control Conference,<br />

pages 3000–3006, Boston, MA, 1991.<br />

[LV92] F. Li <strong>and</strong> R. Vaccaro. “Sensitivity Analysis of DOA Estimation<br />

Algorithms to <strong>Sensor</strong> Errors”. IEEE Trans. AES,<br />

AES-28(3):708–717, July 1992.<br />

[McK94a] T. McKelvey. “Fully Parametrized State-Space Models <strong>in</strong><br />

<strong>System</strong> <strong>Identification</strong>”. In Proc. 10th IFAC Symp. on Syst.<br />

Id., volume 2, pages 373–378, Copenhagen, Denmark, 1994.<br />

[McK94b] T. McKelvey. <strong>On</strong> state-space models <strong>in</strong> system identification.<br />

Technical Report LiU-TEK-LIC-1994:33, L<strong>in</strong>köp<strong>in</strong>g University,<br />

May 1994. Licentiate Thesis.<br />

[MDVV89] M. Moonen, B. De Moor, L. V<strong>and</strong>enberghe, <strong>and</strong> J. V<strong>and</strong>ewalle.<br />

“<strong>On</strong>- <strong>and</strong> off-l<strong>in</strong>e identification of l<strong>in</strong>ear state-space<br />

models”. Int. J. Control, 49(1):219–232, 1989.<br />

[MKB79] K.V. Mardia, J.T. Kent, <strong>and</strong> J.M. Bibby. Multivariate Analysis.<br />

Academic Press, London, 1979.<br />

[MR94] D. McArthur <strong>and</strong> J. Reilly. “A Computationally Efficient<br />

Self-Calibrat<strong>in</strong>g Direction of Arrival Estimator”. In Proc.<br />

ICASSP, pages IV–201 – IV–205, Adelaide, Australia, 1994.


226 Bibliography<br />

[NK94] V. Nagesha <strong>and</strong> S. Kay. “<strong>On</strong> Frequency Estimation with<br />

the IQML Algorithm”. IEEE Trans. on Signal Process<strong>in</strong>g,<br />

42(9):2509–2513, Sep. 1994.<br />

[NN93] B. Ng <strong>and</strong> A. Nehorai. “Active Array <strong>Sensor</strong> Location Calibration”.<br />

In Proc. ICASSP, pages IV21–IV24, M<strong>in</strong>neapolis,<br />

M<strong>in</strong>n., 1993.<br />

[OAKP97] B. Ottersten, D. Asztély, M. Kristensson, <strong>and</strong> S. Parkvall. “A<br />

Statistical Approach to <strong>Subspace</strong> Based Estimation with Applications<br />

<strong>in</strong> Telecommunications”. In S. Van Huffel, editor,<br />

2nd International Workshop on TLS <strong>and</strong> Errors-<strong>in</strong>-Variables<br />

Model<strong>in</strong>g, pages 285–294, Leuven, Belgium, 1997. SIAM.<br />

[ORK89] B. Ottersten, R. Roy, <strong>and</strong> T. Kailath. “Signal Waveform Estimation<br />

<strong>in</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g”. In Proc. 23 rd Asilomar<br />

Conf. Sig., Syst.,Comput., pages 787–791, Nov. 1989.<br />

[Ott89] B. Ottersten. Parametric <strong>Subspace</strong> Fitt<strong>in</strong>g <strong>Methods</strong> for Array<br />

Signal Process<strong>in</strong>g. PhD thesis, Stanford University, Stanford,<br />

CA, Dec. 1989.<br />

[OV94] B. Ottersten <strong>and</strong> M. Viberg. “A <strong>Subspace</strong>-Based Instrumental<br />

Variable Method for State-Space <strong>System</strong> <strong>Identification</strong>”.<br />

In Proc. 10th IFAC Symp. on Syst. Id., volume 2, pages 139–<br />

144, Copenhagen, Denmark, 1994.<br />

[OVK90] B. Ottersten, M. Viberg, <strong>and</strong> T. Kailath. “Asymptotic Robustness<br />

of <strong>Sensor</strong> Array Process<strong>in</strong>g <strong>Methods</strong>”. In Proc.<br />

ICASSP 90 Conf, pages 2635–2638, Albuquerque, NM, April<br />

1990.<br />

[OVSN93] B. Ottersten, M. Viberg, P. Stoica, <strong>and</strong> A. Nehorai. “Exact<br />

<strong>and</strong> Large Sample ML Techniques for Parameter Estimation<br />

<strong>and</strong> Detection <strong>in</strong> Array Process<strong>in</strong>g”. In Hayk<strong>in</strong>, Litva,<br />

<strong>and</strong> Shepherd, editors, Radar Array Process<strong>in</strong>g, pages 99–<br />

151. Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1993.<br />

[Pil89] S. U. Pillai. “Array Signal Process<strong>in</strong>g”. New York: Spr<strong>in</strong>ger-<br />

Verlag, 1989.<br />

[Pis73] V.F. Pisarenko. “The Retrieval of Harmonics from a Covariance<br />

Function”. Geophys. J. Roy. Astron. Soc., 33:347–366,<br />

1973.


Bibliography 227<br />

[PK85] A. Paulraj <strong>and</strong> T. Kailath. “Direction-of-Arrival Estimation<br />

by Eigenstructure <strong>Methods</strong> with Unknown <strong>Sensor</strong> Ga<strong>in</strong> <strong>and</strong><br />

Phase”. In Proc. IEEE ICASSP, pages 17.7.1–17.7.4, Tampa,<br />

Fla., March 1985.<br />

[PK89] S. U. Pillai <strong>and</strong> B. H. Kwon. “Performance Analysis of<br />

MUSIC-Type High Resolution Estimators for Direction F<strong>in</strong>d<strong>in</strong>g<br />

<strong>in</strong> Correlated <strong>and</strong> Coherent Scenes”. IEEE Trans. on<br />

ASSP, 37(8):1176–1189, Aug. 1989.<br />

[PPL90] D. Pearson, S.U. Pillai, <strong>and</strong> Y. Lee. “An Algorithm for Near-<br />

Optimal Placements of <strong>Sensor</strong> Elements”. IEEE Transactions<br />

on Information Theory, 36(6):1280–1284, Nov. 1990.<br />

[PRK86] A. Paulraj, R. Roy, <strong>and</strong> T. Kailath. “A <strong>Subspace</strong> Rotation<br />

Approach to Signal Parameter Estimation”. Proceed<strong>in</strong>gs of<br />

the IEEE, pages 1044–1045, July 1986.<br />

[PSD95] K. Peternell, W. Scherrer, <strong>and</strong> M. Deistler. “Statistical Analysis<br />

of <strong>Subspace</strong> <strong>Identification</strong> <strong>Methods</strong>”. In Proc. European<br />

Control Conference, ECC’95, pages 1342–1347, Roma, Italy,<br />

1995.<br />

[PSD96] K. Peternell, W. Scherrer, <strong>and</strong> M. Deistler. “Statistical Analysis<br />

of Novel <strong>Subspace</strong> <strong>Identification</strong> <strong>Methods</strong>”. Signal Process<strong>in</strong>g<br />

(EURASIP), Special Issue on <strong>Subspace</strong> <strong>Methods</strong>, Part<br />

II: <strong>System</strong> <strong>Identification</strong>, 52(2):161–177, July 1996.<br />

[Qua82] A. H. Quazi. “Array Beam Response <strong>in</strong> the Presence of<br />

Amplitude <strong>and</strong> Phase Fluctuations”. J. Acoust. Soc. Amer.,<br />

72(1):171–180, July 1982.<br />

[Qua84] N. Quang. “<strong>On</strong> the Uniqueness of the Maximum-Likelihood<br />

Estimate of Structured Covariance Matrices”. IEEE<br />

Trans. on Acoustics, Speech, <strong>and</strong> Signal Process<strong>in</strong>g, ASSP-<br />

32(6):1249–1251, Dec. 1984.<br />

[RA92] B.D. Rao <strong>and</strong> K.S. Arun. “Model Based Process<strong>in</strong>g of Signals:<br />

A State Space Approach”. Proceed<strong>in</strong>gs of the IEEE,<br />

80(2):283–309, Feb. 1992.<br />

[RH80] D. J. Ramsdale <strong>and</strong> R. A. Howerton. “Effect of Element<br />

Failure <strong>and</strong> R<strong>and</strong>om Errors <strong>in</strong> Amplitude <strong>and</strong> Phase on the


228 Bibliography<br />

Sidelobe Level Atta<strong>in</strong>able with a L<strong>in</strong>ear Array”. J. Acoust.<br />

Soc. Amer., 68(3):901–906, Sept. 1980.<br />

[RH89] B. D. Rao <strong>and</strong> K. V. S. Hari. “Performance Analysis of<br />

ESPRIT <strong>and</strong> TAM <strong>in</strong> Determ<strong>in</strong><strong>in</strong>g the Direction of Arrival<br />

of Plane Waves <strong>in</strong> Noise”. IEEE Trans. ASSP, ASSP-<br />

37(12):1990–1995, Dec. 1989.<br />

[RH93] B. D. Rao <strong>and</strong> K. V. S. Hari. “Weighted <strong>Subspace</strong> <strong>Methods</strong><br />

<strong>and</strong> Spatial Smooth<strong>in</strong>g: Analysis <strong>and</strong> Comparison”. IEEE<br />

Trans. ASSP, 41(2):788–803, Feb. 1993.<br />

[RK89] R. Roy <strong>and</strong> T. Kailath. “ESPRIT–Estimation of Signal<br />

Parameters Via Rotational Invariance Techniques”. IEEE<br />

Trans. ASSP, ASSP-37(7):984–995, Jul. 1989.<br />

[RPK86] R. Roy, A. Paulraj, <strong>and</strong> T. Kailath. “ESPRIT – A <strong>Subspace</strong><br />

Rotation Approach to Estimation of Parameters of Cisoids<br />

<strong>in</strong> Noise”. IEEE Trans. on ASSP, ASSP-34(4):1340–1342,<br />

October 1986.<br />

[RS87a] Y. Rockah <strong>and</strong> P. M. Schultheiss. “Array Shape Calibration<br />

Us<strong>in</strong>g Sources <strong>in</strong> Unknown Locations – Part I: Far-Field<br />

Sources”. IEEE Trans. on ASSP, 35:286–299, March 1987.<br />

[RS87b] Y. Rockah <strong>and</strong> P. M. Schultheiss. “Array Shape Calibration<br />

Us<strong>in</strong>g Sources <strong>in</strong> Unknown Locations – Part II: Near-Field<br />

Sources <strong>and</strong> Estimator Implementation”. IEEE Trans. on<br />

ASSP, 35:724–735, June 1987.<br />

[SCE95] P. Stoica, M. Cedervall, <strong>and</strong> A. Eriksson. “Comb<strong>in</strong>ed Instrumental<br />

Variable <strong>and</strong> <strong>Subspace</strong> Fitt<strong>in</strong>g Approach to Parameter<br />

Estimation of Noisy Input-Output <strong>System</strong>s”. IEEE<br />

Trans. on Signal Process<strong>in</strong>g, 43(10):2386–97, Oct. 1995.<br />

[Sch79] R.O. Schmidt. “Multiple Emitter Location <strong>and</strong> Signal Parameter<br />

Estimation”. In Proc. RADC Spectrum Estimation<br />

Workshop, pages 243–258, Rome, NY, 1979.<br />

[Sch81] R. O. Schmidt. A Signal <strong>Subspace</strong> Approach to Multiple Emitter<br />

Location <strong>and</strong> Spectral Estimation. PhD thesis, Stanford<br />

Univ., Stanford, CA, Nov. 1981.


Bibliography 229<br />

[SH92] V. C. Soon <strong>and</strong> Y. F. Huang. “An Analysis of ESPRIT Under<br />

R<strong>and</strong>om <strong>Sensor</strong> Uncerta<strong>in</strong>ties”. IEEE Trans. on Sig. Proc.,<br />

SP-40(9):2353–2358, 1992.<br />

[SJ97] P. Stoica <strong>and</strong> M. Jansson. “<strong>On</strong> Forward-Backward MODE<br />

for Array Signal Process<strong>in</strong>g”. Digital Signal Process<strong>in</strong>g – A<br />

Review Journal, 1997. Accepted.<br />

[SK90] A. Sw<strong>in</strong>dlehurst <strong>and</strong> T. Kailath. “<strong>On</strong> the Sensitivity of the<br />

ESPRIT Algorithm to Non-Identical Subarrays”. Sādhanā,<br />

Academy Proc. <strong>in</strong> Eng. Sciences, 15(3):197–212, November<br />

1990.<br />

[SK92] A. Sw<strong>in</strong>dlehurst <strong>and</strong> T. Kailath. “A Performance Analysis of<br />

<strong>Subspace</strong>-Based <strong>Methods</strong> <strong>in</strong> the Presence of Model Errors –<br />

Part 1: The MUSIC Algorithm”. IEEE Trans. on Sig. Proc.,<br />

40(7):1758–1774, July 1992.<br />

[SK93] A. Sw<strong>in</strong>dlehurst <strong>and</strong> T. Kailath. “A Performance Analysis of<br />

<strong>Subspace</strong>-Based <strong>Methods</strong> <strong>in</strong> the Presence of Model Errors –<br />

Part 2: Multidimensional Algorithms”. IEEE Trans. on Sig.<br />

Proc., 41(9):2882–2890, Sept. 1993.<br />

[SM97] P. Stoica <strong>and</strong> R. Moses. “Introduction to Spectral Analysis”.<br />

Prentice Hall, Upper Saddle River, New Jersey, 1997.<br />

[SN89a] P. Stoica <strong>and</strong> A. Nehorai. “MUSIC, Maximum Likelihood <strong>and</strong><br />

Cramér-Rao Bound”. IEEE Trans. ASSP, ASSP-37:720–741,<br />

May 1989.<br />

[SN89b] P. Stoica <strong>and</strong> A. Nehorai. “MUSIC, Maximum Likelihood <strong>and</strong><br />

Cramér-Rao Bound: Further Results <strong>and</strong> Comparisons”. In<br />

ICASSP’89, 1989.<br />

[SN90] P. Stoica <strong>and</strong> A. Nehorai. “Performance Study of Conditional<br />

<strong>and</strong> Unconditional Direction-of-Arrival Estimation”. IEEE<br />

Trans. ASSP, ASSP-38:1783–1795, October 1990.<br />

[SROK95] A.L. Sw<strong>in</strong>dlehurst, R. Roy, B. Ottersten, <strong>and</strong> T. Kailath. “A<br />

<strong>Subspace</strong> Fitt<strong>in</strong>g Method for <strong>Identification</strong> of L<strong>in</strong>ear State<br />

Space Models”. IEEE Trans. Autom. Control, 40(2):311–316,<br />

Feb. 1995.


230 Bibliography<br />

[SS81] T. Söderström <strong>and</strong> P. Stoica. “Comparison of Some Instrumental<br />

Variable <strong>Methods</strong> – Consistency <strong>and</strong> Accuracy Aspects”.<br />

Automatica, 17(1):101–115, 1981.<br />

[SS83] T. Söderström <strong>and</strong> P. G. Stoica. Instrumental Variable <strong>Methods</strong><br />

for <strong>System</strong> <strong>Identification</strong>. Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1983.<br />

[SS89] T. Söderström <strong>and</strong> P. Stoica. <strong>System</strong> <strong>Identification</strong>. Prentice-<br />

Hall International, Hemel Hempstead, UK, 1989.<br />

[SS90a] P. Stoica <strong>and</strong> K. Sharman. “Maximum Likelihood <strong>Methods</strong><br />

for Direction-of-Arrival Estimation”. IEEE Trans. ASSP,<br />

ASSP-38:1132–1143, July 1990.<br />

[SS90b] P. Stoica <strong>and</strong> K. Sharman. “Novel Eigenanalysis Method for<br />

Direction Estimation”. IEE Proc. Part F, 137(1):19–26, Feb<br />

1990.<br />

[SS91] P. Stoica <strong>and</strong> T. Söderström. “Statistical Analysis of MUSIC<br />

<strong>and</strong> <strong>Subspace</strong> Rotation Estimates of S<strong>in</strong>usoidal Frequencies”.<br />

IEEE Trans. ASSP, ASSP-39:1836–1847, Aug. 1991.<br />

[SV95] P. Stoica <strong>and</strong> M. Viberg. “Maximum-Likelihood Parameter<br />

<strong>and</strong> Rank Estimation <strong>in</strong> Reduced-Rank Multivariate L<strong>in</strong>ear<br />

Regressions”. Technical Report CTH-TE-31, Chalmers University<br />

of Technology, Gothenburg, Sweden, July 1995. Submitted<br />

to IEEE Trans SP.<br />

[SV96] P. Stoica <strong>and</strong> M. Viberg. “Maximum-Likelihood Parameter<br />

<strong>and</strong> Rank Estimation <strong>in</strong> Reduced-Rank Multivariate<br />

L<strong>in</strong>ear Regressions”. IEEE Trans. on Signal Process<strong>in</strong>g,<br />

44(12):3069–3078, Dec. 1996.<br />

[Swi96] A. Sw<strong>in</strong>dlehurst. “A Maximum a Posteriori Approach to<br />

Beamform<strong>in</strong>g <strong>in</strong> the Presence of Calibration Errors”. In<br />

Proc. 8th Workshop on Stat. Sig. <strong>and</strong> Array Proc., Corfu,<br />

Greece, June 1996.<br />

[TO96] T. Trump <strong>and</strong> B. Ottersten. “Estimation of Nom<strong>in</strong>al Direction<br />

of Arrival <strong>and</strong> Angular Spread Us<strong>in</strong>g an Array of <strong>Sensor</strong>s”.<br />

Signal Process<strong>in</strong>g, 50(1-2):57–69, Apr. 1996.<br />

[Tre71] H.L. Van Trees. Detection, Estimation <strong>and</strong> Modulation Theory,<br />

volume III. Wiley, New York, 1971.


Bibliography 231<br />

[Van95] P. Van Overschee. <strong>Subspace</strong> <strong>Identification</strong>: Theory - Implementation<br />

- Application. PhD thesis, Katholieke Universiteit<br />

Leuven, Belgium, Feb. 1995.<br />

[VB88] B.D. Van Veen <strong>and</strong> K.M. Buckley. “Beamform<strong>in</strong>g: A Versatile<br />

Approach to Spatial Filter<strong>in</strong>g”. Signal Process<strong>in</strong>g Magaz<strong>in</strong>e,<br />

pages 4–24, Apr. 1988.<br />

[VD91] M. Verhaegen <strong>and</strong> E. Deprettere. “A Fast <strong>Subspace</strong>, Recursive<br />

MIMO State-space Model <strong>Identification</strong> Algorithm”. In<br />

Proc. 30th IEEE Conf. on Decision <strong>and</strong> Control, pages 1349–<br />

1354, Brighton, U.K., 1991.<br />

[VD92] M. Verhaegen <strong>and</strong> P. Dewilde. “<strong>Subspace</strong> Model <strong>Identification</strong>.<br />

Part I: The Output-Error State-Space Model <strong>Identification</strong><br />

Class of Algorithms”. Int. J. Control, 56:1187–1210,<br />

1992.<br />

[VD93] P. Van Overschee <strong>and</strong> B. De Moor. “<strong>Subspace</strong> Algorithms<br />

for the Stochastic <strong>Identification</strong> Problem”. Automatica,<br />

29(3):649–660, 1993.<br />

[VD94a] P. Van Overschee <strong>and</strong> B. De Moor. “A Unify<strong>in</strong>g Theorem for<br />

three <strong>Subspace</strong> <strong>System</strong> <strong>Identification</strong> Algorithms”. In Proc.<br />

10th IFAC Symp. on Syst. Id., Copenhagen, Denmark, 1994.<br />

[VD94b] P. Van Overschee <strong>and</strong> B. De Moor. “N4SID: <strong>Subspace</strong> Algorithms<br />

for the <strong>Identification</strong> of Comb<strong>in</strong>ed Determ<strong>in</strong>istic-<br />

Stochastic <strong>System</strong>s”. Automatica, Special Issue on Statistical<br />

Signal Process<strong>in</strong>g <strong>and</strong> Control, 30(1):75–93, Jan. 1994.<br />

[VD95a] P. Van Overschee <strong>and</strong> B. De Moor. “About the choice of State<br />

Space Basis <strong>in</strong> Comb<strong>in</strong>ed Determ<strong>in</strong>istic-Stochastic <strong>Subspace</strong><br />

<strong>Identification</strong>”. Accepted for publication <strong>in</strong> Automatica, 1995.<br />

[VD95b] P. Van Overschee <strong>and</strong> B. De Moor. Choice of state-space basis<br />

<strong>in</strong> comb<strong>in</strong>ed determ<strong>in</strong>istic-stochastic subspace identification.<br />

Automatica, Special issue on trends <strong>in</strong> system identification,<br />

31(12):1877–1883, Dec. 1995.<br />

[VD95c] P. Van Overschee <strong>and</strong> B. De Moor. A unify<strong>in</strong>g theorem for<br />

three subspace system identification algorithms. Automatica,<br />

Special issue on trends <strong>in</strong> system identification, 31(12):1853–<br />

1864, Dec. 1995.


232 Bibliography<br />

[VD96] P. Van Overschee <strong>and</strong> B. De Moor. <strong>Subspace</strong> <strong>Identification</strong><br />

for L<strong>in</strong>ear <strong>System</strong>s: Theory–Implementation–Applications.<br />

Kluwer Academic Publishers, 1996.<br />

[VDS93] A.-J. Van Der Veen, E.F. Deprettere, <strong>and</strong> A.L. Sw<strong>in</strong>dlehurst.<br />

“<strong>Subspace</strong>-Based Signal Analysis Us<strong>in</strong>g S<strong>in</strong>gular Value Decomposition”.<br />

Proceed<strong>in</strong>gs of the IEEE, 81(9):1277–1308,<br />

Sept. 1993.<br />

[Ver91] M. Verhaegen. “A Novel Non-iterative MIMO State Space<br />

Model <strong>Identification</strong> Technique”. In Proc. 9th IFAC/IFORS<br />

Symp. on <strong>Identification</strong> <strong>and</strong> <strong>System</strong> Parameter Estimation,<br />

pages 1453–1458, Budapest, 1991.<br />

[Ver93] M. Verhaegen. “<strong>Subspace</strong> Model <strong>Identification</strong>. Part III: analysis<br />

of the ord<strong>in</strong>ary output-error state space model identification<br />

algorithm”. Int. J. Control, 58:555–586, 1993.<br />

[Ver94] M. Verhaegen. “<strong>Identification</strong> of the Determ<strong>in</strong>istic Part of<br />

MIMO State Space Models given <strong>in</strong> Innovations Form from<br />

Input-Output Data”. Automatica, Special Issue on Statistical<br />

Signal Process<strong>in</strong>g <strong>and</strong> Control, 30(1):61–74, Jan. 1994.<br />

[Vib89] M. Viberg. <strong>Subspace</strong> Fitt<strong>in</strong>g Concepts <strong>in</strong> <strong>Sensor</strong> Array Process<strong>in</strong>g.<br />

PhD thesis, L<strong>in</strong>köp<strong>in</strong>g University, L<strong>in</strong>köp<strong>in</strong>g, Sweden,<br />

Oct. 1989.<br />

[Vib94] M. Viberg. “<strong>Subspace</strong> <strong>Methods</strong> <strong>in</strong> <strong>System</strong> <strong>Identification</strong>”. In<br />

Proc. 10th IFAC Symp. on Syst. Id., volume 1, pages 1–12,<br />

Copenhagen, Denmark, 1994.<br />

[Vib95] M. Viberg. “<strong>Subspace</strong>-based <strong>Methods</strong> for the <strong>Identification</strong><br />

of L<strong>in</strong>ear Time-<strong>in</strong>variant <strong>System</strong>s”. Automatica, Special issue<br />

on trends <strong>in</strong> system identification, 31(12):1835–1851, Dec.<br />

1995.<br />

[VL82] A.J.M. Van Overbeek <strong>and</strong> L. Ljung. “<strong>On</strong>-l<strong>in</strong>e Structure Selection<br />

for Multivariable State-space Models”. Automatica,<br />

18(5):529–543, 1982.<br />

[VO91] M. Viberg <strong>and</strong> B. Ottersten. “<strong>Sensor</strong> Array Process<strong>in</strong>g Based<br />

on <strong>Subspace</strong> Fitt<strong>in</strong>g”. IEEE Trans. SP, SP-39(5):1110–1121,<br />

May 1991.


Bibliography 233<br />

[VOK91a] M. Viberg, B. Ottersten, <strong>and</strong> T. Kailath. “Detection <strong>and</strong> Estimation<br />

<strong>in</strong> <strong>Sensor</strong> Arrays Us<strong>in</strong>g Weighted <strong>Subspace</strong> Fitt<strong>in</strong>g”.<br />

IEEE Trans. SP, SP-39(11):2436–2449, Nov. 1991.<br />

[VOK91b] M. Viberg, B. Ottersten, <strong>and</strong> T. Kailath. “<strong>Subspace</strong> Based<br />

Detection for L<strong>in</strong>ear Structural Relations”. J. of Comb<strong>in</strong>atorics,<br />

Information,& <strong>System</strong>s Sciences, 16(2-3):170–189,<br />

1991.<br />

[VOWL91] M. Viberg, B. Ottersten, B. Wahlberg, <strong>and</strong> L. Ljung. “A Statistical<br />

Perspective on State-Space Model<strong>in</strong>g Us<strong>in</strong>g <strong>Subspace</strong><br />

<strong>Methods</strong>”. In Proc. 30 th IEEE Conf. on Decision & Control,<br />

pages 1337–1342, Brighton, Engl<strong>and</strong>, Dec. 1991.<br />

[VOWL93] M. Viberg, B. Ottersten, B. Wahlberg, <strong>and</strong> L. Ljung. “Performance<br />

of <strong>Subspace</strong>-Based <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong>”.<br />

In Proc. IFAC 93, Sydney, Australia, July 1993.<br />

[VS94a] M. Viberg <strong>and</strong> A. L. Sw<strong>in</strong>dlehurst. “A Bayesian Approach to<br />

Auto-Calibration for Parametric Array Signal Process<strong>in</strong>g”.<br />

IEEE Trans. on Sig. Proc., 42(12):3495–3507, Dec. 1994.<br />

[VS94b] M. Viberg <strong>and</strong> A. L. Sw<strong>in</strong>dlehurst. “Analysis of the Comb<strong>in</strong>ed<br />

Effects of F<strong>in</strong>ite Samples <strong>and</strong> Model Errors on Array<br />

Process<strong>in</strong>g Performance”. IEEE Trans. on Sig. Proc.,<br />

42(11):3073–3083, Nov. 1994.<br />

[VWO97] M. Viberg, B. Wahlberg, <strong>and</strong> B. Ottersten. “Analysis of<br />

State Space <strong>System</strong> <strong>Identification</strong> <strong>Methods</strong> Based on Instrumental<br />

Variables <strong>and</strong> <strong>Subspace</strong> Fitt<strong>in</strong>g”. Automatica, 1997.<br />

Accepted.<br />

[Wax91] M. Wax. “Detection <strong>and</strong> Localization of Multiple Sources<br />

via the Stochastic Signals Model”. IEEE Trans. on SP,<br />

39(11):2450–2456, Nov. 1991.<br />

[Wax92] M. Wax. “Detection <strong>and</strong> Localization of Multiple Sources <strong>in</strong><br />

Noise with Unknown Covariance”. IEEE Trans. on Signal<br />

Process<strong>in</strong>g, SP-40(1):245–249, Jan. 1992.<br />

[WF89] A. J. Weiss <strong>and</strong> B. Friedl<strong>and</strong>er. “Array Shape Calibration<br />

Us<strong>in</strong>g Sources <strong>in</strong> Unknown Locations - A Maximum Likelihood<br />

Approach”. IEEE Trans. on ASSP, 37(12):1958–1966,<br />

Dec. 1989.


234 Bibliography<br />

[WJ94] B. Wahlberg <strong>and</strong> M. Jansson. “4SID L<strong>in</strong>ear Regression”. In<br />

Proc. IEEE 33rd Conf. on Decision <strong>and</strong> Control, pages 2858–<br />

2863, Orl<strong>and</strong>o, USA, December 1994.<br />

[WOV91] B. Wahlberg, B. Ottersten, <strong>and</strong> M. Viberg. “Robust Signal<br />

Parameter Estimation <strong>in</strong> the Presence of Array Perturbations”.<br />

In Proc. IEEE ICASSP, pages 3277–3280, Toronto,<br />

Canada, 1991.<br />

[WRM94] M. Wylie, S. Roy, <strong>and</strong> H. Messer. “Jo<strong>in</strong>t DOA Estimation<br />

<strong>and</strong> Phase Calibration of L<strong>in</strong>ear Equispaced (LES) Arrays”.<br />

IEEE Trans. on Sig. Proc., 42(12):3449–3459, Dec. 1994.<br />

[WSW95] M. Wax, J. She<strong>in</strong>vald, <strong>and</strong> A. J. Weiss. “Localization of Correlated<br />

<strong>and</strong> Uncorrelated Signals <strong>in</strong> Colored Noise via Generalized<br />

Least Squares”. In Proc. ICASSP 95, pages 2104–2107,<br />

Detroit, MI, 1995. IEEE.<br />

[WWN88] K. M. Wong, R. S. Walker, <strong>and</strong> G. Niezgoda. “Effects of<br />

R<strong>and</strong>om <strong>Sensor</strong> Motion on Bear<strong>in</strong>g Estimation by the MU-<br />

SIC Algorithm”. IEE Proceed<strong>in</strong>gs, 135, Pt. F(3):233–250,<br />

June 1988.<br />

[WZ89] M. Wax <strong>and</strong> I. Zisk<strong>in</strong>d. “<strong>On</strong> Unique Localization of Multiple<br />

Sources by Passive <strong>Sensor</strong> Arrays”. IEEE Trans. on ASSP,<br />

ASSP-37(7):996–1000, July 1989.<br />

[YS95] J. Yang <strong>and</strong> A. Sw<strong>in</strong>dlehurst. “The Effects of Array Calibration<br />

Errors on DF-Based Signal Copy Performance”. IEEE<br />

Trans. on Sig. Proc., 43(11):2724–2732, November 1995.<br />

[YZ95] P. Yip <strong>and</strong> Y. Zhou. “A Self-Calibration Algorithm for Cyclostationary<br />

Signals <strong>and</strong> its Uniqueness Analysis”. In Proc.<br />

ICASSP, pages 1892–1895, Detroit, MI, 1995.<br />

[ZM74] H.P. Zeiger <strong>and</strong> A.J. McEwen. “Approximate L<strong>in</strong>ear Realizations<br />

of Given Dimension via Ho’s Algorithm”. IEEE Trans.<br />

AC, AC-19:153, April 1974.<br />

[ZW88] J. X. Zhu <strong>and</strong> H. Wang. “Effects of <strong>Sensor</strong> Position <strong>and</strong> Pattern<br />

Perturbations on CRLB for Direction F<strong>in</strong>d<strong>in</strong>g of Multiple<br />

Narrowb<strong>and</strong> Sources”. In Proc. 4 th ASSP Workshop on<br />

Spectral Estimation <strong>and</strong> Model<strong>in</strong>g, pages 98–102, M<strong>in</strong>neapolis,<br />

MN, August 1988.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!